feat: add manual retry state apis #405

NiveditJain · 2025-09-16T10:53:00Z

Introduced a new API endpoint for manual state retries, allowing users to trigger retries for specific states.
Implemented request and response models for manual retry operations.
Added error handling for duplicate retry states and invalid API keys.
Enhanced logging for better traceability of retry operations.

This feature improves the system's ability to manage state retries effectively.

- Introduced a new API endpoint for manual state retries, allowing users to trigger retries for specific states. - Implemented request and response models for manual retry operations. - Added error handling for duplicate retry states and invalid API keys. - Enhanced logging for better traceability of retry operations. This feature improves the system's ability to manage state retries effectively.

coderabbitai · 2025-09-16T10:53:07Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added manual retry endpoint for states: POST /v0/namespace/{namespace_name}/state/{state_id}/manual-retry.
- Accepts a fanout_id and returns the new state’s id and status.
- Uses existing API key authentication and request ID tracing.
Tests
- Introduced comprehensive unit tests for models, route, and controller covering success, not found, duplicate detection, validation, and auth scenarios.
Documentation
- Updated testing README to include coverage details for manual retry behavior and edge cases.

Walkthrough

Adds manual retry capability: new Pydantic request/response models, an async controller to create retry states, and a POST route at /v0/namespace/{namespace_name}/state/{state_id}/manual-retry. Includes unit tests for models, route behavior, and controller scenarios (not found, duplicate key, DB errors).

Changes

Cohort / File(s)	Summary
Models: Manual Retry DTOs `state-manager/app/models/manual_retry.py`	New Pydantic models: `ManualRetryRequestModel` (`fanout_id: str`) and `ManualRetryResponseModel` (`id: str`, `status: StateStatusEnum`).
Controller: Manual Retry `state-manager/app/controller/manual_retry_state.py`	New async `manual_retry_state`: finds original State, constructs & inserts a retry State (status=CREATED, fanout_id from request), updates original to `RETRY_CREATED`, returns `ManualRetryResponseModel`; handles 404 and DuplicateKeyError (returns 409), logs and re-raises unexpected errors.
Routing: Manual Retry Endpoint `state-manager/app/routes.py`	Adds POST route `/state/{state_id}/manual-retry` under namespace router. Validates API key, extracts or generates `x_exosphere_request_id`, converts `state_id` to `PydanticObjectId`, calls controller, and returns the response model.
Tests: Model Validation `state-manager/tests/unit/models/test_manual_retry.py`	New unit tests for request/response models: validation, error cases, and serialization (dict/JSON).
Tests: Route Behavior `state-manager/tests/unit/test_routes.py`	Adds imports and tests for manual-retry route: path existence, API-key success/failure, request-id generation/forwarding, and controller invocation/return behavior.
Tests: Controller Behavior `state-manager/tests/unit/controller/test_manual_retry_state.py`	New controller unit tests covering success, not-found, duplicate-key (409), fanout_id behavior, DB errors, empty inputs/parents, namespace mismatch, field preservation/reset, and logging verification.
Docs: Tests README `state-manager/tests/README.md`	Updated test coverage sections to include manual_retry_state test descriptions and reorganized entries.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Client
    participant API as Routes (FastAPI)
    participant C as Controller: manual_retry_state
    participant DB as Database (Beanie/Mongo)

    Client->>API: POST /v0/namespace/{ns}/state/{state_id}/manual-retry { fanout_id }
    API->>API: Validate API key
    API->>API: Get or gen x_exosphere_request_id
    API->>C: manual_retry_state(ns, PydanticObjectId(state_id), body, req_id)
    C->>DB: Find original State by id
    alt State not found
        C-->>API: HTTP 404
        API-->>Client: 404 Not Found
    else Found
        C->>DB: Insert retry State (status=CREATED, fanout_id from body)
        alt Duplicate key
            C-->>API: HTTP 409 Duplicate retry
            API-->>Client: 409 Conflict
        else Inserted
            C->>DB: Update original State status=RETRY_CREATED
            C-->>API: ManualRetryResponseModel { id, status }
            API-->>Client: 200 OK with { id, status }
        end
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Add prune and re-enqueue signal functionality #322 — Touches StateStatusEnum and state retry-related enums; may conflict with status names like RETRY_CREATED.
Implement retry policy and enhance errored state handling #326 — Adds retry-related fields/logic (fanout_id, retry_count) and automatic retry creation; overlaps on retry construction and fanout_id usage.

Suggested reviewers

nk-ag

Poem

I hop through states with careful cheer,
A retry trail, now crystal-clear.
A fanout twig, a fresh new start,
CREATED beats within my heart.
If keys collide, I pause—then scurry—
RETRY_CREATED, onward, hurry! 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.86% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title is concise and accurately summarizes the primary change — adding manual retry state APIs — which aligns with the new models, controller, route, and tests introduced in the changeset.
Description Check	✅ Passed	The PR description is directly related to the changeset: it documents the new manual retry endpoint, the request/response models, duplicate-retry and API key error handling, and enhanced logging, which match the added controller, route, models, and tests.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Summary of Changes

Hello @NiveditJain, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new feature that enables manual retries for specific states within the system. It provides a dedicated API endpoint, along with the necessary backend logic, data models, and robust error handling to facilitate effective management and traceability of these retry operations.

Highlights

New API Endpoint: Introduced a new API endpoint /state/{state_id}/manual-retry to allow users to trigger manual retries for specific states.
Manual Retry Logic: Implemented core logic for manual state retries, including creating a new state with CREATED status, updating the original state, and handling duplicate retry attempts.
Data Models: Defined ManualRetryRequestModel and ManualRetryResponseModel for structured input and output of the manual retry API.
Error Handling & Logging: Added error handling for cases like state not found, duplicate retry states, and invalid API keys, along with enhanced logging for traceability.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

codecov · 2025-09-16T10:54:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

gemini-code-assist

Code Review

This pull request introduces a new API endpoint for manually retrying a state. The implementation is well-structured, with a new controller, models, and route. My review includes several suggestions for improvement. I've pointed out a critical correctness issue where states in any status can be retried, which could lead to data inconsistencies; I recommend restricting retries to error states. Other feedback includes improving exception handling to preserve stack traces, fixing a typo, ensuring URL path consistency, and suggesting a refactor to reduce duplicated code for API key validation.

state-manager/app/controller/manul_retry_state.py

state-manager/app/routes.py

coderabbitai

Actionable comments posted: 8

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e5337e and 844f13a.

📒 Files selected for processing (3)

state-manager/app/controller/manul_retry_state.py (1 hunks)
state-manager/app/models/manual_retry.py (1 hunks)
state-manager/app/routes.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

state-manager/app/controller/manul_retry_state.py (4)

state-manager/app/models/manual_retry.py (2)

ManualRetryRequestModel (5-6)

ManualRetryResponseModel (9-11)

state-manager/app/singletons/logs_manager.py (2)

LogsManager (9-66)

get_logger (65-66)

state-manager/app/models/state_status_enum.py (1)

StateStatusEnum (4-20)

state-manager/app/models/db/state.py (1)

State (13-104)

state-manager/app/models/manual_retry.py (1)

state-manager/app/models/state_status_enum.py (1)

StateStatusEnum (4-20)

state-manager/app/routes.py (3)

state-manager/app/models/manual_retry.py (2)

ManualRetryRequestModel (5-6)

ManualRetryResponseModel (9-11)

state-manager/app/controller/manul_retry_state.py (1)

manual_retry_state (12-49)

state-manager/app/utils/check_secret.py (1)

check_api_key (11-16)

🔇 Additional comments (1)

state-manager/app/models/manual_retry.py (1)

9-11: Response model looks good.

Fields align with controller return; no issues.

state-manager/app/controller/manul_retry_state.py

state-manager/app/models/manual_retry.py

state-manager/app/routes.py

…nition - Fixed the import path for the manual retry state controller in routes.py. - Updated the route definition for manual retry to include a leading slash for consistency. - Added a new controller file for manual retry state functionality, implementing the logic for handling manual retries. - Introduced unit tests for the manual retry request and response models, ensuring validation and functionality. - Enhanced unit tests for the manual retry state route, covering various scenarios including valid and invalid API keys. These changes improve the structure and reliability of the manual retry feature.

- Updated the query for fetching the state to include the namespace name, ensuring accurate state retrieval. - Changed the HTTP status code for duplicate retry state errors from 400 to 409 to better reflect the conflict nature of the error. - Simplified exception handling by removing the unnecessary re-raise of the caught exception. These changes enhance the reliability and clarity of the manual retry state functionality.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

state-manager/app/controller/manual_retry_state.py (1)

12-50: Add input validation for state_id and consider retry count tracking.

The implementation looks good overall, but consider these improvements:

The function should validate that state_id is a valid ObjectId before querying
Consider incrementing retry_count on the new retry state to track retry attempts
The error message in line 48 is missing the exception details

Apply this diff to address the issues:

 async def manual_retry_state(namespace_name: str, state_id: PydanticObjectId, body: ManualRetryRequestModel, x_exosphere_request_id: str):
     try:
+        # Validate state_id format
+        if not state_id:
+            raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid state ID")
+            
         logger.info(f"Manual retry state {state_id} for namespace {namespace_name}", x_exosphere_request_id=x_exosphere_request_id)

         state = await State.find_one(State.id == state_id)
         if not state:
             raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="State not found")
         
         try:
             retry_state = State(
                 node_name=state.node_name,
                 namespace_name=state.namespace_name,
                 identifier=state.identifier,
                 graph_name=state.graph_name,
                 run_id=state.run_id,
                 status=StateStatusEnum.CREATED,
                 inputs=state.inputs,
                 outputs={},
                 error=None,
                 parents=state.parents,
                 does_unites=state.does_unites,
+                retry_count=state.retry_count + 1,  # Track retry attempts
                 fanout_id=body.fanout_id # this will ensure that multiple unwanted retries are not formed because of index in database
             )
             retry_state = await retry_state.insert()
             logger.info(f"Retry state {retry_state.id} created for state {state_id}", x_exosphere_request_id=x_exosphere_request_id)

             state.status = StateStatusEnum.RETRY_CREATED
             await state.save()

             return ManualRetryResponseModel(id=str(retry_state.id), status=retry_state.status)
         except DuplicateKeyError:
             logger.info(f"Duplicate retry state detected for state {state_id}. A retry state with the same unique key already exists.", x_exosphere_request_id=x_exosphere_request_id)
             raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Duplicate retry state detected")


     except Exception as e:
-        logger.error(f"Error manual retry state {state_id} for namespace {namespace_name}", x_exosphere_request_id=x_exosphere_request_id)
+        logger.error(f"Error manual retry state {state_id} for namespace {namespace_name}: {str(e)}", x_exosphere_request_id=x_exosphere_request_id, exc_info=True)
         raise e

♻️ Duplicate comments (3)

state-manager/app/routes.py (3)
183-189: Route decorator path is correct.

The route path /state/{state_id}/manual-retry is properly formatted with the leading slash. The previous review comments about the missing leading slash appear to be outdated as the current implementation is correct.

186-186: Consider using HTTP 201 CREATED for new retry state creation.

Since this endpoint creates a new retry state resource, HTTP 201 CREATED would be more semantically appropriate than HTTP 200 OK.
-    status_code=status.HTTP_200_OK,
+    status_code=status.HTTP_201_CREATED,
187-187: Fix grammatical error in response description.

The response description has a grammatical error.
-    response_description="State manual retry successfully",
+    response_description="State manual retry triggered successfully",

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 844f13a and 3d68362.

📒 Files selected for processing (4)

state-manager/app/controller/manual_retry_state.py (1 hunks)
state-manager/app/routes.py (2 hunks)
state-manager/tests/unit/models/test_manual_retry.py (1 hunks)
state-manager/tests/unit/test_routes.py (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

state-manager/tests/unit/models/test_manual_retry.py (2)

state-manager/app/models/manual_retry.py (2)

ManualRetryRequestModel (5-6)

ManualRetryResponseModel (9-11)

state-manager/app/models/state_status_enum.py (1)

StateStatusEnum (4-20)

state-manager/app/controller/manual_retry_state.py (4)

state-manager/app/models/manual_retry.py (2)

ManualRetryRequestModel (5-6)

ManualRetryResponseModel (9-11)

state-manager/app/singletons/logs_manager.py (2)

LogsManager (9-66)

get_logger (65-66)

state-manager/app/models/state_status_enum.py (1)

StateStatusEnum (4-20)

state-manager/app/models/db/state.py (1)

State (13-104)

state-manager/tests/unit/test_routes.py (3)

state-manager/app/models/manual_retry.py (2)

ManualRetryRequestModel (5-6)

ManualRetryResponseModel (9-11)

state-manager/app/models/state_status_enum.py (1)

StateStatusEnum (4-20)

state-manager/app/routes.py (2)

get_node_run_details_route (372-381)

manual_retry_state_route (190-199)

state-manager/app/routes.py (3)

state-manager/app/models/manual_retry.py (2)

ManualRetryRequestModel (5-6)

ManualRetryResponseModel (9-11)

state-manager/app/controller/manual_retry_state.py (1)

manual_retry_state (12-49)

state-manager/app/utils/check_secret.py (1)

check_api_key (11-16)

🔇 Additional comments (9)

state-manager/tests/unit/models/test_manual_retry.py (1)

1-241: Comprehensive test coverage!

The test suite thoroughly validates both ManualRetryRequestModel and ManualRetryResponseModel with excellent edge case coverage including empty strings, very long strings, special characters, and validation errors. The tests properly verify Pydantic's validation behavior and serialization.

state-manager/app/routes.py (2)

53-56: LGTM!

The imports for manual retry models and controller are correctly added.

190-199: Route implementation follows established patterns.

The route handler correctly:

Validates the API key and logs appropriately

Extracts or generates the request ID

Converts the state_id string to PydanticObjectId

Delegates to the controller function

The implementation is consistent with other route handlers in the file.

state-manager/tests/unit/test_routes.py (6)

11-11: LGTM!

The import for manual retry models is correctly added to support the new test cases.

36-36: Route path assertion correctly validates the manual retry endpoint.

The test properly verifies that the manual retry route is registered with the expected path.

278-296: Well-structured model validation tests.

The tests properly validate both request and response models with appropriate test data and assertions.

320-321: Route handler imports correctly updated.

The manual_retry_state_route is properly added to the imports verification.

339-339: Route handler callable check correctly added.

The test properly verifies that manual_retry_state_route is callable.

1062-1120: Comprehensive route handler tests!

The test suite thoroughly covers the manual retry route with:

Valid API key scenario with proper argument forwarding

Invalid API key handling with 401 response

Request ID generation when not present

Proper PydanticObjectId conversion verification

The tests follow the established patterns and provide excellent coverage.

- Introduced a new test file for the manual retry state, covering various scenarios including successful state creation, error handling for not found states, and duplicate key errors. - Enhanced tests to verify logging, database error handling, and preservation of original state fields during retries. - Updated the README to include the new test file and detailed coverage of the manual retry state functionality. These changes improve the test coverage and reliability of the manual retry state feature.

coderabbitai

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

state-manager/tests/unit/test_routes.py (1)

354-358: Fixture may raise AttributeError; ensure request.state exists instead of delattr.
Creating .state explicitly avoids deleting a non-existent attribute on a MagicMock.

Apply:

-    def mock_request_no_id(self):
-        """Mock request object without request_id"""
-        request = MagicMock()
-        delattr(request.state, 'x_exosphere_request_id')
-        return request
+    def mock_request_no_id(self):
+        """Mock request object without request_id"""
+        request = MagicMock()
+        request.state = MagicMock()  # state exists but has no x_exosphere_request_id
+        return request

♻️ Duplicate comments (1)

state-manager/app/controller/manual_retry_state.py (1)

21-34: Enqueue timing: preserve or make configurable.
Consider preserving original enqueue_after or allowing a delay in the request model. Otherwise all manual retries enqueue “now” by default.

Example:

-            retry_state = State(
+            retry_state = State(
                 ...
-                does_unites=state.does_unites,
+                does_unites=state.does_unites,
+                enqueue_after=state.enqueue_after,  # or compute based on optional body.retry_delay_ms
                 fanout_id=body.fanout_id
             )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d68362 and 19b3a65.

📒 Files selected for processing (4)

state-manager/app/controller/manual_retry_state.py (1 hunks)
state-manager/tests/README.md (2 hunks)
state-manager/tests/unit/controller/test_manual_retry_state.py (1 hunks)
state-manager/tests/unit/test_routes.py (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

state-manager/app/controller/manual_retry_state.py (4)

state-manager/app/models/manual_retry.py (2)

ManualRetryRequestModel (5-6)

ManualRetryResponseModel (9-11)

state-manager/app/singletons/logs_manager.py (2)

LogsManager (9-66)

get_logger (65-66)

state-manager/app/models/state_status_enum.py (1)

StateStatusEnum (4-20)

state-manager/app/models/db/state.py (1)

State (13-104)

state-manager/tests/unit/controller/test_manual_retry_state.py (3)

state-manager/app/controller/manual_retry_state.py (1)

manual_retry_state (12-49)

state-manager/app/models/manual_retry.py (2)

ManualRetryRequestModel (5-6)

ManualRetryResponseModel (9-11)

state-manager/app/models/state_status_enum.py (1)

StateStatusEnum (4-20)

state-manager/tests/unit/test_routes.py (3)

state-manager/app/models/manual_retry.py (2)

ManualRetryRequestModel (5-6)

ManualRetryResponseModel (9-11)

state-manager/app/models/state_status_enum.py (1)

StateStatusEnum (4-20)

state-manager/app/routes.py (2)

get_node_run_details_route (372-381)

manual_retry_state_route (190-199)

🪛 markdownlint-cli2 (0.17.2)

state-manager/tests/README.md

84-84: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

85-85: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

98-98: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🔇 Additional comments (4)

state-manager/tests/README.md (1)

17-17: Docs entry for new tests: LGTM.
Accurately lists the new controller test file.

state-manager/tests/unit/test_routes.py (3)

11-11: Imports for manual retry models: LGTM.
Models are referenced in new tests below.

36-36: Route path assertion: LGTM.
Covers the new manual retry endpoint path.

1062-1120: Manual retry route tests: LGTM.
Validate API key gating, request-id propagation, and arg forwarding including state_id casting.

coderabbitai · 2025-09-16T11:57:52Z

state-manager/app/controller/manual_retry_state.py

+            retry_state = State(
+                node_name=state.node_name,
+                namespace_name=state.namespace_name,
+                identifier=state.identifier,
+                graph_name=state.graph_name,
+                run_id=state.run_id,
+                status=StateStatusEnum.CREATED,
+                inputs=state.inputs,
+                outputs={},
+                error=None,
+                parents=state.parents,
+                does_unites=state.does_unites,
+                fanout_id=body.fanout_id # this will ensure that multiple unwanted retries are not formed because of index in database
+            )


⚠️ Potential issue

Blocker: fingerprint collision for does_unites states; set retry_count on the new retry state.
Without bumping retry_count, the retry state’s fingerprint can equal the original (unique index uniq_state_fingerprint_unites), causing DuplicateKeyError even on first retry. Also aligns with the unique (node, ns, graph, identifier, run_id, retry_count, fanout_id) index.

Apply:

retry_state = State( node_name=state.node_name, namespace_name=state.namespace_name, identifier=state.identifier, graph_name=state.graph_name, run_id=state.run_id, status=StateStatusEnum.CREATED, inputs=state.inputs, outputs={}, error=None, parents=state.parents, does_unites=state.does_unites, + retry_count=(getattr(state, "retry_count", 0) + 1), fanout_id=body.fanout_id # this will ensure that multiple unwanted retries are not formed because of index in database )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

retry_state = State(

node_name=state.node_name,

namespace_name=state.namespace_name,

identifier=state.identifier,

graph_name=state.graph_name,

run_id=state.run_id,

status=StateStatusEnum.CREATED,

inputs=state.inputs,

outputs={},

error=None,

parents=state.parents,

does_unites=state.does_unites,

fanout_id=body.fanout_id # this will ensure that multiple unwanted retries are not formed because of index in database

)

retry_state = State(

node_name=state.node_name,

namespace_name=state.namespace_name,

identifier=state.identifier,

graph_name=state.graph_name,

run_id=state.run_id,

status=StateStatusEnum.CREATED,

inputs=state.inputs,

outputs={},

error=None,

parents=state.parents,

does_unites=state.does_unites,

retry_count=(getattr(state, "retry_count", 0) + 1),

fanout_id=body.fanout_id # this will ensure that multiple unwanted retries are not formed because of index in database

)

🤖 Prompt for AI Agents

In state-manager/app/controller/manual_retry_state.py around lines 21 to 34, the newly created retry State does not set retry_count which can produce a fingerprint collision against the uniq_state_fingerprint_unites index; set retry_count on the new State to (state.retry_count or 0) + 1 so the retry fingerprint differs from the original (handle None by treating as 0), e.g. assign retry_count before persisting the new State.

coderabbitai · 2025-09-16T11:57:52Z

state-manager/app/controller/manual_retry_state.py

+            logger.info(f"Retry state {retry_state.id} created for state {state_id}", x_exosphere_request_id=x_exosphere_request_id)
+
+            state.status = StateStatusEnum.RETRY_CREATED
+            await state.save()
+
+            return ManualRetryResponseModel(id=str(retry_state.id), status=retry_state.status)


🧹 Nitpick

Consider atomicity between insert and original state update.
If save() fails after insert(), you end up with a retry state but original not marked RETRY_CREATED. Prefer a Mongo transaction or a compensating update with retry/backoff.

🤖 Prompt for AI Agents

In state-manager/app/controller/manual_retry_state.py around lines 36 to 41, the code inserts a retry_state then sets state.status and calls state.save() separately, which can leave data inconsistent if save() fails; wrap both the insert and the original state update in a single MongoDB transaction (start a client session and use with_transaction to insert the retry document and update the original state status to RETRY_CREATED within the same transaction) so both succeed or fail together; if transactions are not available in the deployment, implement a compensating flow: after insert() attempt the state.save() with an exponential backoff retry loop, and if all retries fail, delete the created retry_state (or mark it as failed) to restore consistency, and ensure all operations log errors with context and surface failures to the caller.

coderabbitai · 2025-09-16T11:57:52Z

state-manager/app/controller/manual_retry_state.py

+    except Exception as _:
+        logger.error(f"Error manual retry state {state_id} for namespace {namespace_name}", x_exosphere_request_id=x_exosphere_request_id)
+        raise


⚠️ Potential issue

Don’t log HTTPException as server errors; use logger.exception for unexpected failures.
Current blanket except re-logs 404/409 as errors. Keep HTTPException untouched and add stacktrace for real failures.

Apply:

- except Exception as _: - logger.error(f"Error manual retry state {state_id} for namespace {namespace_name}", x_exosphere_request_id=x_exosphere_request_id) - raise + except HTTPException: + # propagate expected HTTP errors without error severity logging + raise + except Exception: + logger.exception( + f"Error creating manual retry state {state_id} for namespace {namespace_name}", + x_exosphere_request_id=x_exosphere_request_id, + ) + raise

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except Exception as _:

logger.error(f"Error manual retry state {state_id} for namespace {namespace_name}", x_exosphere_request_id=x_exosphere_request_id)

raise

except HTTPException:

# propagate expected HTTP errors without error severity logging

raise

except Exception:

logger.exception(

f"Error creating manual retry state {state_id} for namespace {namespace_name}",

x_exosphere_request_id=x_exosphere_request_id,

)

raise

🤖 Prompt for AI Agents

In state-manager/app/controller/manual_retry_state.py around lines 47 to 49, the current blanket except catches HTTPException and logs 404/409 as server errors; change the handler to let HTTPException pass through unmodified (re-raise immediately) and for all other exceptions use logger.exception to log the error with stacktrace and context (include state_id, namespace_name, x_exosphere_request_id), then re-raise the exception.

coderabbitai · 2025-09-16T11:57:52Z

state-manager/tests/README.md

+### 8. `manual_retry_state.py`
+- ✅ Successful manual retry state creation
+- ✅ State not found scenarios
+- ✅ Duplicate retry state detection (DuplicateKeyError)
+- ✅ Different fanout_id handling
+- ✅ Complex inputs and multiple parents preservation
+- ✅ Database errors during state lookup
+- ✅ Database errors during state save
+- ✅ Database errors during retry state insert
+- ✅ Empty inputs and parents handling
+- ✅ Namespace mismatch scenarios
+- ✅ Field preservation and reset logic
+- ✅ Logging verification
+


🧹 Nitpick

Fix markdownlint errors around the new section.
Add blank lines around the heading and the list to satisfy MD022/MD032.

Apply:

-### 8. `manual_retry_state.py` + +### 8. `manual_retry_state.py` + - ✅ Successful manual retry state creation - ✅ State not found scenarios - ✅ Duplicate retry state detection (DuplicateKeyError) - ✅ Different fanout_id handling - ✅ Complex inputs and multiple parents preservation - ✅ Database errors during state lookup - ✅ Database errors during state save - ✅ Database errors during retry state insert - ✅ Empty inputs and parents handling - ✅ Namespace mismatch scenarios - ✅ Field preservation and reset logic - ✅ Logging verification +

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### 8. `manual_retry_state.py`

- ✅ Successful manual retry state creation

- ✅ State not found scenarios

- ✅ Duplicate retry state detection (DuplicateKeyError)

- ✅ Different fanout_id handling

- ✅ Complex inputs and multiple parents preservation

- ✅ Database errors during state lookup

- ✅ Database errors during state save

- ✅ Database errors during retry state insert

- ✅ Empty inputs and parents handling

- ✅ Namespace mismatch scenarios

- ✅ Field preservation and reset logic

- ✅ Logging verification

### 8. `manual_retry_state.py`

- ✅ Successful manual retry state creation

- ✅ State not found scenarios

- ✅ Duplicate retry state detection (DuplicateKeyError)

- ✅ Different fanout_id handling

- ✅ Complex inputs and multiple parents preservation

- ✅ Database errors during state lookup

- ✅ Database errors during state save

- ✅ Database errors during retry state insert

- ✅ Empty inputs and parents handling

- ✅ Namespace mismatch scenarios

- ✅ Field preservation and reset logic

- ✅ Logging verification

🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

84-84: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

85-85: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

🤖 Prompt for AI Agents

In state-manager/tests/README.md around lines 84 to 97, the new "### 8. `manual_retry_state.py`" section is triggering markdownlint errors MD022/MD032 because there are no blank lines around the heading and the following list; add a single blank line immediately before the "### 8. `manual_retry_state.py`" heading, add one blank line between the heading and the start of the list, and ensure there is a blank line after the end of the list so the heading and list are each separated by one empty line from surrounding content.

coderabbitai · 2025-09-16T11:57:53Z

state-manager/tests/README.md

+- ✅ Field preservation and reset logic
+- ✅ Logging verification
+
+### 9. `upsert_graph_template.py`


🧹 Nitpick

Add a blank line before this heading.
Prevents MD022 violation.

Apply:

-### 9. `upsert_graph_template.py` + +### 9. `upsert_graph_template.py`

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### 9. `upsert_graph_template.py`

### 9. `upsert_graph_template.py`

🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

98-98: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents

In state-manager/tests/README.md around line 98, there is a missing blank line immediately before the "### 9. `upsert_graph_template.py`" heading which triggers an MD022 lint violation; insert a single empty line above that heading so it is separated from the previous content.

coderabbitai · 2025-09-16T11:57:53Z