-
Notifications
You must be signed in to change notification settings - Fork 42
Enhance state management and logging functionality #232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance state management and logging functionality #232
Conversation
- Updated `serve` function in `run.py` to utilize the `workers` argument in development mode. - Improved `executed_state` function in `executed_state.py` to handle multiple new states more efficiently, including error handling for state insertion. - Modified `LogsManager` in `logs_manager.py` to dynamically set logging level based on the application mode (development or production). - Added logging for state creation in `create_next_state.py` to improve traceability. These changes aim to enhance the application's performance and maintainability in different environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @NiveditJain, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the application's state management and logging capabilities. It optimizes the creation and insertion of multiple states by leveraging bulk database operations and introduces robust error handling for state insertion. Furthermore, the logging system is improved by dynamically adjusting log levels based on the application's run mode, reducing verbosity in development while maintaining detailed logs in production. The changes also add crucial traceability for state creation and ensure consistent worker configuration across different environments, ultimately boosting performance and maintainability.
Highlights
- State Management Optimization: Implemented bulk insertion for new states in executed_state.py using insert_many to improve efficiency when handling multiple state creations.
- Robust State Insertion: Added error handling in executed_state.py to verify that all new states are successfully inserted, raising an exception if a discrepancy is found.
- Dynamic Logging Configuration: Modified LogsManager to automatically set the logging level to WARNING in development mode (to reduce verbosity) and INFO in production, based on command-line arguments or environment variables.
- Enhanced State Traceability: Introduced an INFO level log entry in create_next_state.py to record when a new state is being created, improving debugging and monitoring.
- Consistent Worker Configuration: Updated run.py to ensure the workers argument is applied consistently to uvicorn.run in both development and production modes, streamlining application startup.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces several valuable enhancements, notably improving performance in state creation by using bulk inserts and making the logging configuration more flexible. The changes are well-structured. My review includes a couple of suggestions to further improve the code's robustness by using more specific exceptions and simplifying the environment detection logic to make it less fragile.
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Caution Review failedThe pull request is closed. 📝 WalkthroughSummary by CodeRabbit
WalkthroughImplements bulk insertion and verification for multi-output state creation and defers next-state dispatch to inserted children; adds development-mode log-level detection, an early info log in create_next_state, and consistent CLI worker usage for uvicorn.run. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Controller as executed_state controller
participant DB as State DB
participant Tasks as Background Task Queue
participant Worker as create_next_state worker
Client->>Controller: executed_state(base_state, outputs)
alt no outputs
Controller->>DB: update base state (status=Executed, outputs={}, parents)
Controller-->>Tasks: enqueue create_next_state(base_state)
else outputs exist
Controller->>DB: insert_many(new child states)
DB-->>Controller: inserted_ids
Controller->>DB: find In(State.id, inserted_ids)
DB-->>Controller: inserted_states
alt mismatch count
Controller-->>Client: raise RuntimeError
else all inserted
loop each inserted_state
Controller-->>Tasks: enqueue create_next_state(inserted_state)
end
end
end
Tasks-->>Worker: process create_next_state(state)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (4)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limits.
🔭 Outside diff range comments (3)
state-manager/app/tasks/create_next_state.py (1)
16-21: Logging before validation can raise AttributeError when state is NoneYou access state.identifier before validating state. This can crash and mask the intended ValueError.
@@ -async def create_next_state(state: State): - logger.info(f"Creating next state for {state.identifier}") - graph_template = None - - if state is None or state.id is None: - raise ValueError("State is not valid") +async def create_next_state(state: State): + if state is None or state.id is None: + raise ValueError("State is not valid") + logger.info("Creating next state", state_identifier=state.identifier, run_id=state.run_id, graph_name=state.graph_name) + graph_template = Nonestate-manager/app/controller/executed_state.py (2)
75-77: Preserve traceback and avoid error logs for expected HTTP errorsUse a dedicated branch for HTTPException and re-raise without logging it as an error. When logging unexpected exceptions, prefer logger.exception and re-raise with plain raise to preserve the original traceback.
- except Exception as e: - logger.error(f"Error executing state {state_id} for namespace {namespace_name}", x_exosphere_request_id=x_exosphere_request_id, error=e) - raise e + except HTTPException: + # Expected/handled errors: let FastAPI propagate without error-level logging. + raise + except Exception: + logger.exception( + f"Error executing state {state_id} for namespace {namespace_name}", + x_exosphere_request_id=x_exosphere_request_id, + ) + raise
42-59: Avoid recomputing parents mapping for each child statestate.parents already includes {state.identifier: state.id} at this point. Reuse it to reduce per-iteration dict allocations and ensure consistency.
- new_states = [] - for output in body.outputs[1:]: - new_states.append(State( + new_states = [] + parents_for_children = state.parents # already updated above + for output in body.outputs[1:]: + new_states.append(State( node_name=state.node_name, namespace_name=state.namespace_name, identifier=state.identifier, graph_name=state.graph_name, run_id=state.run_id, status=StateStatusEnum.EXECUTED, inputs=state.inputs, outputs=output, error=None, - parents={ - **state.parents, - state.identifier: state.id - } + parents=parents_for_children, ))
♻️ Duplicate comments (2)
state-manager/app/singletons/logs_manager.py (1)
47-64: Stop parsing sys.argv; rely on MODE env var set by entrypointManually scanning sys.argv is brittle and was already flagged previously. It won’t handle forms like --mode=development and couples a library with CLI concerns. Have run.py set MODE, and read only from env here.
@@ - def _is_development_mode(self) -> bool: - """ - Check if the application is running in development mode. - Development mode is determined by checking if '--mode' 'development' - is in the command line arguments. - """ - # Check command line arguments for development mode - if '--mode' in sys.argv: - try: - mode_index = sys.argv.index('--mode') - if mode_index + 1 < len(sys.argv) and sys.argv[mode_index + 1] == 'development': - return True - except (ValueError, IndexError): - pass - - # Fallback: check environment variable - return os.getenv('MODE', '').lower() == 'development' + def _is_development_mode(self) -> bool: + """ + Decide development mode from environment only. + The entrypoint is responsible for setting MODE. + """ + return os.getenv("MODE", "production").lower() == "development"Note: After this change, you can drop the
import sysat the top of the file. Also consider aligning this logger with api-server/app/singletons/logs_manager.py (single source of truth for logging config).state-manager/app/controller/executed_state.py (1)
67-69: Raise a more specific exception on partial insertsRaising a generic Exception harms debuggability and was flagged earlier.
- if len(inserted_states) != len(new_states): - raise Exception(f"Failed to insert all new states. Expected {len(new_states)} states, but only {len(inserted_states)} were inserted") + if len(inserted_states) != len(new_states): + raise RuntimeError( + f"Failed to insert all new states. Expected {len(new_states)} states, " + f"but only {len(inserted_states)} were inserted" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
state-manager/app/controller/executed_state.py(3 hunks)state-manager/app/singletons/logs_manager.py(2 hunks)state-manager/app/tasks/create_next_state.py(1 hunks)state-manager/run.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
state-manager/app/singletons/logs_manager.py (1)
api-server/app/singletons/logs_manager.py (3)
LogsManager(7-36)__init__(12-33)get_logger(35-36)
state-manager/run.py (1)
api-server/run.py (1)
serve(12-21)
state-manager/app/controller/executed_state.py (2)
state-manager/tests/unit/controller/test_executed_state.py (3)
test_executed_state_success_single_output(51-87)test_executed_state_empty_outputs(203-239)test_executed_state_success_multiple_outputs(91-143)state-manager/app/models/db/state.py (1)
State(8-18)
🪛 GitHub Actions: State Manager Unit Tests
state-manager/app/controller/executed_state.py
[error] 61-61: TypeError: object MagicMock can't be used in 'await' expression. The code awaits State.insert_many(new_states) but in tests State.insert_many is not mocked as an AsyncMock, causing the failure in test_executed_state.py::TestExecutedState::test_executed_state_success_multiple_outputs.
🔇 Additional comments (2)
state-manager/app/controller/executed_state.py (2)
60-66: Good use of bulk insert for new statesUsing insert_many followed by a single fetch is a clear improvement for fanout scenarios and aligns with the PR objective to optimize bulk state creation.
70-72: BackgroundTasks with async functions: verify execution semanticsFastAPI/Starlette BackgroundTasks are designed for sync callables. Passing an async function may schedule a coroutine that is never awaited. If this is the case in your stack version, wrap the coroutine in a sync shim or schedule via asyncio.create_task inside a sync function.
Two options to consider:
- Wrap with a sync launcher:
def _launch_create_next_state(s: State): import asyncio asyncio.create_task(create_next_state(s))Then:
background_tasks.add_task(_launch_create_next_state, inserted_state)
- Or if your Starlette version supports awaiting callables in BackgroundTasks, confirm this behavior and keep as-is. Please verify the runtime behavior in your environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limits.
🔭 Outside diff range comments (2)
state-manager/app/controller/executed_state.py (2)
42-59: Simplify fan-out state construction and avoid dict aliasing
- Avoid recomputing the parents dict in each iteration.
- Prevent accidental shared-mutation by copying inputs.
- Reduce boilerplate via a list comprehension.
Apply this diff within the selected range:
- new_states = [] - for output in body.outputs[1:]: - new_states.append(State( - node_name=state.node_name, - namespace_name=state.namespace_name, - identifier=state.identifier, - graph_name=state.graph_name, - run_id=state.run_id, - status=StateStatusEnum.EXECUTED, - inputs=state.inputs, - outputs=output, - error=None, - parents={ - **state.parents, - state.identifier: state.id - } - )) + parents_map = state.parents # already updated above + new_states = [ + State( + node_name=state.node_name, + namespace_name=state.namespace_name, + identifier=state.identifier, + graph_name=state.graph_name, + run_id=state.run_id, + status=StateStatusEnum.EXECUTED, + inputs=state.inputs.copy(), + outputs=output, + error=None, + parents=parents_map, + ) + for output in body.outputs[1:] + ]
75-77: Preserve traceback; differentiate HTTP 4xx from 5xx and log with stack
- Use a bare
raiseto preserve the original traceback.- Log client errors at warning, server errors at exception level.
Apply this diff:
- except Exception as e: - logger.error(f"Error executing state {state_id} for namespace {namespace_name}", x_exosphere_request_id=x_exosphere_request_id, error=e) - raise e + except HTTPException: + logger.warning( + f"State {state_id} execution failed with client error", + x_exosphere_request_id=x_exosphere_request_id, + ) + raise + except Exception: + logger.exception( + f"Error executing state {state_id} for namespace {namespace_name}", + x_exosphere_request_id=x_exosphere_request_id, + ) + raise
♻️ Duplicate comments (1)
state-manager/app/controller/executed_state.py (1)
60-60: Idiomatic emptiness checkPrefer truthiness over length comparisons for containers.
Apply this diff:
- if len(new_states) > 0: + if new_states:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
state-manager/app/controller/executed_state.py(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
state-manager/app/controller/executed_state.py (2)
state-manager/app/models/db/state.py (1)
State(8-18)state-manager/app/tasks/create_next_state.py (1)
create_next_state(16-157)
🪛 GitHub Actions: State Manager Unit Tests
state-manager/app/controller/executed_state.py
[error] 61-61: TypeError: object MagicMock can't be used in 'await' expression while awaiting State.insert_many(new_states) in executed_state.py:61.
🔇 Additional comments (3)
state-manager/app/controller/executed_state.py (3)
2-2: Import of BeanieInoperator is correct and appropriately scopedUsage aligns with the filter on State.id later. No issues.
63-66: Consider avoiding the extra round-trip if Beanie can return inserted documentsCurrent pattern is correct. If your Beanie version supports returning inserted documents (or populating ids on passed models), you could drop the subsequent find. Otherwise, keep as-is.
Would you like me to check Beanie version constraints and propose a variant that returns inserted docs directly?
67-69: Good: specific exception type for partial insert validationRaising RuntimeError here is appropriate and improves debuggability over a generic Exception.
…fanout-apis' of https://github.com/NiveditJain/exospherehost into 230-improve-process-of-bulk-state-creation-in-terms-of-fanout-apis
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
state-manager/tests/unit/controller/test_executed_state.py (4)
125-129: Remove leftover per-document construction mockingWith the bulk
insert_manypath, the code should not be instantiatingState()per child. These lines are now dead and make the test couple to an implementation detail that should not exist anymore.- # Mock State.save() for new states - mock_new_state = MagicMock() - mock_new_state.save = AsyncMock() - mock_state_class.return_value = mock_new_state
141-147: Replace brittle constructor-call assertion with intent-focused checks; fix misleading commentAsserting
mock_state_class.call_count == 2ties the test to a non-bulk flow. Verify the bulk insert and task scheduling instead, and update the comment to reflect the singlefind_onelookup.- # Should create 2 additional states (3 outputs total, 1 for main state, 2 new states) - assert mock_state_class.call_count == 2 - # Should add 3 background tasks (1 for main state + 2 for new states) - assert mock_background_tasks.add_task.call_count == 3 - # State.find_one should be called multiple times: once for finding, once for updating main state, and twice in the loop - assert mock_state_class.find_one.call_count == 1 + # Should bulk-insert 2 new states (3 outputs total: 1 reuses main state, 2 are new) + mock_state_class.insert_many.assert_awaited_once() + # Should add 3 background tasks (1 for main state + 2 for fetched children) + assert mock_background_tasks.add_task.call_count == 3 + mock_background_tasks.add_task.assert_any_call(mock_create_next_state, mock_state) + mock_background_tasks.add_task.assert_any_call(mock_create_next_state, child_state_1) + mock_background_tasks.add_task.assert_any_call(mock_create_next_state, child_state_2) + # State.find_one only for the main-state lookup + assert mock_state_class.find_one.call_count == 1 + mock_state_class.find.assert_called_once()
145-146: Inline comment contradicts the assertionThe comment claims
find_oneshould be called multiple times, but the assertion expects 1 call. Update the comment as suggested in the previous diff to reflect the actual expectation.
245-269: Add coverage for bulk-insert error handlingPR claims improved error handling for state insertion. Add a test that simulates
insert_manyraising and asserts the surfaced error/HTTP status.Example test to add (outside this block):
@patch('app.controller.executed_state.State') @patch('app.controller.executed_state.create_next_state') async def test_executed_state_insert_many_failure( self, mock_create_next_state, mock_state_class, mock_namespace, mock_state_id, mock_state, mock_background_tasks, mock_request_id ): executed_request = ExecutedRequestModel( outputs=[{"ok": 1}, {"ok": 2}] # triggers bulk insert path ) mock_state_class.find_one = AsyncMock(return_value=mock_state) mock_state.save = AsyncMock() mock_state_class.insert_many = AsyncMock(side_effect=Exception("insert failed")) with pytest.raises(Exception) as exc_info: await executed_state( mock_namespace, mock_state_id, executed_request, mock_request_id, mock_background_tasks ) assert "insert failed" in str(exc_info.value) # Ensure no background tasks scheduled when insert fails mock_background_tasks.add_task.assert_not_called()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
state-manager/tests/unit/controller/test_executed_state.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
state-manager/tests/unit/controller/test_executed_state.py (1)
state-manager/tests/unit/controller/test_create_states.py (2)
test_create_states_multiple_states(239-301)test_create_states_success(111-155)
|
@cursoragent I see the coverage has decreased, can you please add tests to improve coverage |
servefunction inrun.pyto utilize theworkersargument in development mode.executed_statefunction inexecuted_state.pyto handle multiple new states more efficiently, including error handling for state insertion.LogsManagerinlogs_manager.pyto dynamically set logging level based on the application mode (development or production).create_next_state.pyto improve traceability.These changes aim to enhance the application's performance and maintainability in different environments.
Fixes #230