Skip to content

Add TTL index creation for runs collection in state-manager( Issue #432)#565

Closed
Brijesh-Thakkar wants to merge 6 commits intoexospherehost:mainfrom
Brijesh-Thakkar:TTL_Indexing
Closed

Add TTL index creation for runs collection in state-manager( Issue #432)#565
Brijesh-Thakkar wants to merge 6 commits intoexospherehost:mainfrom
Brijesh-Thakkar:TTL_Indexing

Conversation

@Brijesh-Thakkar
Copy link

Added ensure_ttl_creation function

To prevent databse bloat and improve state manager performance

It is currently apptied to only "runs"

@safedep
Copy link

safedep bot commented Nov 30, 2025

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

Package Details
Package Malware Vulnerability Risky License Report
icon @mongodb-js/saslprep @ 1.3.2
package-lock.json
ok icon
ok icon
ok icon
🔗
icon @types/webidl-conversions @ 7.0.3
package-lock.json
ok icon
ok icon
ok icon
🔗
icon @types/whatwg-url @ 13.0.0
package-lock.json
ok icon
ok icon
ok icon
🔗
icon bson @ 7.0.0
package-lock.json
ok icon
ok icon
ok icon
🔗
icon memory-pager @ 1.5.0
package-lock.json
ok icon
ok icon
ok icon
🔗
icon mongodb @ 7.0.0
package-lock.json
ok icon
ok icon
ok icon
🔗
icon mongodb-connection-string-url @ 7.0.0
package-lock.json
ok icon
ok icon
ok icon
🔗
icon punycode @ 2.3.1
package-lock.json
ok icon
ok icon
ok icon
🔗
icon sparse-bitfield @ 3.0.3
package-lock.json
ok icon
ok icon
ok icon
🔗
icon tr46 @ 5.1.1
package-lock.json
ok icon
ok icon
ok icon
🔗
icon webidl-conversions @ 7.0.0
package-lock.json
ok icon
ok icon
ok icon
🔗
icon whatwg-url @ 14.2.0
package-lock.json
ok icon
ok icon
ok icon
🔗

This report is generated by SafeDep Github App

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 30, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added Docker Compose configuration for local development with state manager and dashboard services
    • Implemented automatic time-to-live (TTL) index management for data retention policies
  • Documentation

    • Added step-by-step Quick Start guide with environment setup instructions and security notes
    • Created environment configuration template file with required variables
  • Chores

    • Updated .gitignore for Node.js and Python projects
    • Added MongoDB driver dependency

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds Docker Compose for state manager and dashboard, a .env example and README updates, introduces TTL index management during state-manager startup, adds pydantic-settings and a Node mongodb dependency, and updates .gitignore.

Changes

Cohort / File(s) Summary
Compose & env & docs
docker-compose.yml, README.md, .env.example, .gitignore
Adds docker-compose.yml defining exosphere-state-manager and exosphere-dashboard services, network, ports, environment variables, healthchecks, and pull/build rules; introduces .env.example and README instructions for local/cloud Mongo setup and env handling; replaces top-level temp ignores with expanded ignore patterns.
State manager — startup & TTL
state-manager/app/main.py, state-manager/app/config/settings.py, state-manager/pyproject.toml
Adds ensure_ttl_indexes(...) to create/update TTL indexes (defaults: 30 days, runs collection) and integrates TTL setup into lifespan/startup (beanie init → TTL setup → init tasks); exposes ttl_days and run_ttl_days settings and reads TTL_DAYS/RUN_TTL_DAYS env vars; adds pydantic-settings>=2.0.0.
Node manifest
package.json
Adds package.json declaring dependency mongodb: ^7.0.0.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Dev as Developer (docker-compose)
    participant Dash as Exosphere Dashboard
    participant SM as Exosphere State Manager
    participant DB as MongoDB

    Dev->>Dash: Start services (docker-compose)
    Dev->>SM: Start services (docker-compose)
    Dash->>SM: Wait for health (depends_on condition)
    SM->>DB: Initialize DB connection (beanie / pymongo)
    SM->>DB: ensure_ttl_indexes(collections=["runs"], ttl_days=RUN_TTL_DAYS)
    alt TTL index exists with correct TTL
        DB-->>SM: OK
    else TTL missing or mismatch
        DB-->>SM: create/collMod index → result
    end
    SM-->>Dash: Become healthy
    Dash->>SM: API requests (runtime)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Review ensure_ttl_indexes logic: index detection, collMod usage, recreate-on-failure.
  • Verify startup ordering and error handling in state-manager/app/main.py.
  • Confirm env precedence and defaults in Settings.from_env for RUN_TTL_DAYS vs TTL_DAYS.
  • Validate docker-compose healthchecks, environment variable requirements, and network configuration.

Possibly related PRs

  • Docs update for setup #347 — Related Docker Compose and local-setup documentation changes that align with this PR's compose/docs edits.

Poem

🐇 I hopped through compose and env tonight,
I nudged the TTLs so data stays light.
Indexes hummed as startup began,
Containers warmed under my rabbit plan.
🥕 Cheers — logs green, all set and right!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly reflects the main change: adding TTL index creation for the runs collection in state-manager, which aligns with the core functionality introduced across multiple files.
Description check ✅ Passed The description relates to the changeset by mentioning the ensure_ttl_creation function and its purpose of preventing database bloat, though it contains minor typos and lacks comprehensive detail about all changes.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c5a070 and ece253c.

📒 Files selected for processing (1)
  • docker-compose.yml (1 hunks)
🔇 Additional comments (1)
docker-compose.yml (1)

1-58: ✅ All prior security and configuration issues resolved.

The docker-compose.yml correctly addresses all issues flagged in previous reviews:

  • Security: Hardcoded secrets removed; required secrets (MONGO_URI, STATE_MANAGER_SECRET, SECRETS_ENCRYPTION_KEY, EXOSPHERE_API_KEY) use ${VAR:?error} syntax and must be explicitly provided.
  • TTL Configuration: TTL_DAYS and RUN_TTL_DAYS are now both exposed to the state-manager service, enabling TTL index management as described in the PR objective (line 16–17).
  • Healthchecks: Both services use portable curl-based checks with appropriate timeouts and retries.
  • Service readiness: Dashboard correctly depends on state-manager health before starting (lines 41–43), ensuring the TTL index creation completes before the API is accessed.
  • Guidance: Comments at the top document the .env requirement and security best practices.

The file is well-structured for local development and aligns with the PR goal of adding TTL index creation to the runs collection.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Brijesh-Thakkar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial database maintenance feature by adding a Time-To-Live (TTL) index to the runs collection in the state manager. This mechanism ensures that old run data is automatically purged after 30 days, directly addressing potential database bloat and contributing to improved overall performance. Alongside this functional enhancement, the PR also includes new docker-compose.yml, package.json, and package-lock.json files, suggesting a foundational setup for the project's development and deployment environment, integrating the state manager with a dashboard and specifying MongoDB dependencies.

Highlights

  • Automatic Data Pruning: Implemented a Time-To-Live (TTL) index on the 'runs' collection in MongoDB to automatically delete documents older than 30 days, effectively preventing database bloat.
  • Performance Improvement: By ensuring the 'runs' collection remains lean through automatic data expiration, the state manager's overall performance is expected to improve.
  • New Docker Compose Setup: Introduced a docker-compose.yml file to define and orchestrate the exosphere-state-manager and exosphere-dashboard services, streamlining local development and deployment.
  • MongoDB Driver Integration: Added mongodb as a dependency in package.json and package-lock.json, indicating a new or updated integration with MongoDB from a Node.js context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a TTL (Time-To-Live) index on the runs collection to automatically delete old documents, which is a great way to prevent database bloat. The implementation in state-manager/app/main.py is well-structured. My feedback focuses on improving configuration and reusability by avoiding hardcoded values. Additionally, I've pointed out a security concern in the new docker-compose.yml regarding default secrets, which should be addressed. Overall, this is a valuable addition for maintaining the health of the database.

Comment on lines 11 to 13
- STATE_MANAGER_SECRET=${STATE_MANAGER_SECRET:-exosphere@123}
- MONGO_DATABASE_NAME=${MONGO_DATABASE_NAME:-exosphere}
- SECRETS_ENCRYPTION_KEY=${SECRETS_ENCRYPTION_KEY:-YTzpUlBGLSwm-3yKJRJTZnb0_aQuQQHyz64s8qAERVU=}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Committing default secrets, even for local development, is a security risk. These secrets could be inadvertently used in other environments. It's recommended to remove the default values. For local development, you can use a .env file (which should be in .gitignore) and provide a .env.example file with placeholder values to guide users.

      - STATE_MANAGER_SECRET=${STATE_MANAGER_SECRET}
      - MONGO_DATABASE_NAME=${MONGO_DATABASE_NAME:-exosphere}
      - SECRETS_ENCRYPTION_KEY=${SECRETS_ENCRYPTION_KEY}

environment:
# Server-side secure configuration (NOT exposed to browser)
- EXOSPHERE_STATE_MANAGER_URI=${EXOSPHERE_STATE_MANAGER_URI:-http://exosphere-state-manager:8000}
- EXOSPHERE_API_KEY=${EXOSPHERE_API_KEY:-exosphere@123}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the other secrets, this default API key should be removed to avoid security risks. It's better to rely on environment variables provided via a .env file for local setup.

      - EXOSPHERE_API_KEY=${EXOSPHERE_API_KEY}

logger = logging.getLogger(__name__)


async def ensure_ttl_indexes(db, ttl_days: int = 30, collections: Optional[List[str]] = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better type safety and code clarity, consider adding a type hint for the db parameter. You would need to import Database from pymongo.database and then change the signature to async def ensure_ttl_indexes(db: Database, ...).

collections = ["runs"]

ttl_seconds = int(ttl_days) * 24 * 3600
timestamp_field = "created_at"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The timestamp_field is hardcoded to "created_at". This makes the function less reusable if you want to apply TTL indexes on other collections with different timestamp fields in the future. Consider making this a parameter of the ensure_ttl_indexes function, e.g., timestamp_field: str = "created_at".

logger = LogsManager().get_logger()
logger.info("server starting")
# beginning of the server
log = LogsManager().get_logger()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

You've introduced a module-level logger on line 53, which is great practice. However, inside the lifespan function, you are getting another logger instance via LogsManager().get_logger() and assigning it to log. This is inconsistent. For consistency, you should use the module-level logger throughout the lifespan function as well.

# Start with 'runs' only to be safe. Add other collections once you confirm.
log.info("Starting TTL index creation...")
try:
await ensure_ttl_indexes(db, ttl_days=30, collections=["runs"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The TTL value of 30 days is hardcoded. It's better to make this configurable via environment variables and read it from settings, similar to trigger_retention_hours. This would allow changing the retention period without code changes. For example, you could add a RUN_RETENTION_DAYS setting.

@coderabbitai coderabbitai bot added the enhancement New feature or request label Nov 30, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3db1e6f and a0ec478.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (3)
  • docker-compose.yml (1 hunks)
  • package.json (1 hunks)
  • state-manager/app/main.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
state-manager/app/main.py (4)
state-manager/app/tasks/init_tasks.py (1)
  • init_tasks (16-20)
state-manager/app/models/db/trigger.py (1)
  • DatabaseTriggers (9-52)
state-manager/app/config/settings.py (1)
  • get_settings (34-38)
state-manager/app/utils/check_database_health.py (1)
  • check_database_health (3-19)
🔇 Additional comments (7)
docker-compose.yml (2)

18-23: Healthcheck configurations look good.

The healthchecks have appropriate intervals, timeouts, retries, and start periods for both services. The dashboard's dependency on service_healthy ensures proper startup ordering.

Also applies to: 43-48


50-53: Network configuration is appropriate.

The bridge network with attachable: true allows additional containers to join if needed, which is useful for debugging and development.

state-manager/app/main.py (4)

126-131: Good resilient error handling for TTL setup.

The approach of logging errors but continuing startup is appropriate—TTL index failures shouldn't prevent the application from serving requests. The log.exception captures the full traceback for debugging.


9-10: Imports look appropriate for the new functionality.

The logging and typing imports support the new ensure_ttl_indexes function properly.


183-189: Helpful middleware ordering documentation.

The comments clearly explain the execution order, which is valuable for maintainability.


72-72: Verify that the Run model has a created_at field.

The TTL index relies on created_at existing on documents. If this field is missing or named differently, documents won't be expired.

package.json (1)

1-5: Clarify the purpose and location of this package.json.

The state-manager is a Python application using pymongo, not the Node.js MongoDB driver. This package.json at the repository root seems out of place. If it's for the dashboard service, it should be in the dashboard's directory. If this serves a different purpose (e.g., workspace root configuration, separate tooling), add documentation explaining its role.

# Start with 'runs' only to be safe. Add other collections once you confirm.
log.info("Starting TTL index creation...")
try:
await ensure_ttl_indexes(db, ttl_days=30, collections=["runs"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Make TTL days configurable via settings.

The 30-day TTL is hardcoded. For operational flexibility, consider making this configurable through environment variables similar to other settings.

In settings.py, add:

ttl_days: int = Field(default=30, description="TTL in days for runs collection")

Then use it here:

-        await ensure_ttl_indexes(db, ttl_days=30, collections=["runs"])
+        await ensure_ttl_indexes(db, ttl_days=settings.ttl_days, collections=["runs"])
🤖 Prompt for AI Agents
In state-manager/app/main.py around line 127, the call to ensure_ttl_indexes
uses a hardcoded ttl_days=30; change it to use the TTL setting from settings
(e.g., settings.ttl_days). Add the new Field ttl_days:int with default 30 in
settings.py as suggested, import or access the settings object in main.py, and
pass settings.ttl_days into ensure_ttl_indexes so the TTL becomes configurable
via environment-backed settings.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
state-manager/app/config/settings.py (1)

17-28: Validate ttl_days to guard against invalid TTL configuration

The new ttl_days setting is wired correctly and matches the compose default, but currently any integer (including 0 or negative) from TTL_DAYS will be accepted and passed into TTL index creation. That can cause index creation/update errors or immediate expiry.

Consider enforcing a sensible lower bound, for example:

ttl_days: int = Field(gt=0, default=30, description="TTL in days for TTL indexes")

and/or validating the parsed value in from_env before returning Settings, so bad configuration fails fast instead of only surfacing via TTL index errors at runtime.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0ec478 and 85db717.

📒 Files selected for processing (5)
  • .gitignore (1 hunks)
  • README.md (2 hunks)
  • docker-compose.yml (1 hunks)
  • state-manager/app/config/settings.py (2 hunks)
  • state-manager/app/main.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
state-manager/app/main.py (7)
state-manager/app/tasks/trigger_cron.py (1)
  • trigger_cron (97-101)
state-manager/app/tasks/init_tasks.py (1)
  • init_tasks (16-20)
state-manager/app/models/db/trigger.py (1)
  • DatabaseTriggers (9-52)
state-manager/app/singletons/logs_manager.py (2)
  • LogsManager (9-66)
  • get_logger (65-66)
state-manager/app/config/settings.py (1)
  • get_settings (36-40)
state-manager/app/middlewares/unhandled_exceptions_middleware.py (1)
  • UnhandledExceptionsMiddleware (10-30)
state-manager/app/middlewares/request_id_middleware.py (1)
  • RequestIdMiddleware (10-54)
🔇 Additional comments (6)
docker-compose.yml (2)

9-14: Secrets handling in compose looks good now

Using ${VAR:?VAR must be set} for MONGO_URI, STATE_MANAGER_SECRET, SECRETS_ENCRYPTION_KEY, and EXOSPHERE_API_KEY removes insecure defaults and forces explicit configuration via .env or environment, which matches the new README guidance. This resolves the earlier security concerns about baked‑in secrets.

Also applies to: 31-37


12-13: Unify default MONGO_DATABASE_NAME between compose and settings

The container env var specifies MONGO_DATABASE_NAME=${MONGO_DATABASE_NAME:-exosphere}, while Settings.from_env() in state-manager/app/config/settings.py may use a different default. If these defaults differ, compose-based runs and local/non-compose runs will behave differently. Consider standardizing on a single default value across both configuration sources.

README.md (2)

238-256: Environment and secrets setup docs are clear and aligned with the compose config

The new step‑by‑step .env workflow (copying .env.example, generating strong secrets, and the explicit security note) lines up well with the updated docker-compose.yml that no longer bakes in default secrets. This should significantly reduce the risk of accidental weak credentials in local or production deployments.

Also applies to: 270-274, 280-280


259-268: No action needed — compose file paths are correct

The docker-compose files are confirmed to exist in the docker-compose/ directory on the main branch, matching the URLs in the README. The curl commands on lines 259-268 correctly reference:

  • docker-compose/docker-compose.yml
  • docker-compose/docker-compose-with-mongodb.yml

No path adjustments are required.

Likely an incorrect or invalid review comment.

state-manager/app/main.py (2)

56-174: TTL index helper is robust and matches the new configuration model

The ensure_ttl_indexes helper cleanly encapsulates TTL setup:

  • Defaults to the runs collection and created_at field, but allows overriding collections and the timestamp field.
  • Computes expireAfterSeconds from ttl_days, checks existing indexes by name, and:
    • Uses collMod to update TTL when the value changes.
    • Falls back to drop+recreate on collMod failure.
  • Logs per-collection success/failure and completes without crashing the server, aligning with a best-effort indexing strategy.

Combined with settings.ttl_days and the new TTL_DAYS env wiring, this should keep the runs collection under control while staying flexible for future collections or fields.


192-202: Decide whether TTL index creation failures should be fatal

ensure_ttl_indexes(...) is wrapped in a try-except that logs errors but allows the app to continue. If TTL creation fails (e.g., bad TTL_DAYS, insufficient privileges), the app starts without TTL enforcement, potentially allowing the 'runs' collection to grow unbounded—contrary to this PR's stated goal.

Consider either:

  • Failing fast (re-raising) when running in production, or
  • Making this configurable (e.g., FAIL_ON_TTL_SETUP_ERROR setting)

At minimum, log clearly whether TTL enforcement is active or was skipped.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85db717 and 90c21d8.

📒 Files selected for processing (4)
  • .gitignore (1 hunks)
  • docker-compose.yml (1 hunks)
  • state-manager/app/config/settings.py (2 hunks)
  • state-manager/app/main.py (4 hunks)
🔇 Additional comments (5)
state-manager/app/main.py (4)

221-225: Lifespan logging and shutdown sequencing look good

The added lifecycle logs (server_starting, beanie_initialized, init_tasks_*, secret_initialized, server_stopped) and the explicit secret check plus scheduler wiring make startup/shutdown behavior easier to observe and reason about. The order—init DB/Béanie → TTL setup → init_tasks → secret validation → health check → cron scheduling → yield → DB close + scheduler shutdown—is coherent and should be operationally friendly.

Also applies to: 227-231, 235-244, 250-253


285-286: Router inclusion order is fine

Including global_router before the main router is a reasonable choice and avoids any surprise overrides; no issues here.


8-10: Async PyMongo types and methods are correctly used for PyMongo 4.0+

The imports AsyncMongoClient and AsyncDatabase from pymongo.asynchronous.database are correct for modern PyMongo versions (4.0+). The methods db.command(), collection.index_information(), collection.create_index(), and collection.drop_index() are all awaitable in the PyMongo async API.

However, verify that:

  • Your project depends on PyMongo 4.0 or later (not Motor, which is deprecated),
  • All usages of these methods at lines 56–61 and 201–207 properly include await statements,
  • The client is initialized with AsyncMongoClient and not accidentally synchronous, and
  • The database and collection instances are obtained from the async client (not mixed with sync clients).

56-61: TTL index management is robust; be explicit about limitations and env naming

The ensure_ttl_indexes flow (check existing index, collMod on TTL mismatch, drop+recreate on failure, and structured logging) is a solid, defensive implementation and will handle most operational cases well.

Two small follow‑ups to consider:

  • Existing TTL indexes with a different name/key pattern: MongoDB's collMod cannot modify an index's keyPattern—it only updates expireAfterSeconds on an existing index. If an earlier version created a TTL index with a different name or key pattern, this code's approach of creating a new ttl_created_at_index and relying on IndexOptionsConflict handling is appropriate. However, if you expect pre‑existing TTL indexes in production, document this conflict behavior clearly or consider a one‑time migration path so operators understand what happens when index names or patterns differ.

  • Env naming in comments: verify that the lifespan comment accurately reflects the actual environment variable precedence (particularly whether RUN_TTL_DAYS or TTL_DAYS is the primary override) and that it matches the current docker‑compose configuration, so operators know which variable to set.

Behaviorally, the "log and continue" strategy on TTL failures is reasonable for avoiding startup breaks; if you ever need a stricter mode, a future TTL_FAIL_FAST/similar flag could toggle between "warn and continue" vs "raise".

docker-compose.yml (1)

1-27: Compose service config and secret handling look solid

Moving secrets to required env vars with ${VAR:?...} and pointing users at .env/.env.example is a good security improvement. TTL_DAYS with a default of 30 aligns with run_ttl_days's TTL_DAYS fallback in Settings.from_env, so operators can tune TTL per deployment without code changes.

If you want to surface the more granular RUN_TTL_DAYS in compose as well (for future per‑collection TTLs), you could add it alongside TTL_DAYS, but the current setup already works correctly with the fallback chain.

Comment on lines +17 to +19
ttl_days: int = Field(default=30, description="TTL in days for TTL indexes")
# Specific TTL for runs collection; overrides ttl_days when used for runs
run_ttl_days: int = Field(30, env="RUN_TTL_DAYS", description="TTL in days for runs collection")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

TTL settings and env fallbacks look correct; optional Pydantic simplification

ttl_days and run_ttl_days with the RUN_TTL_DAYS -> TTL_DAYS -> 30 precedence match the intended flexibility and line up with the compose defaults. The direct int(os.getenv(...)) parsing also matches how trigger_workers and trigger_retention_hours are handled.

If you later consolidate config handling, consider letting Pydantic drive env parsing (e.g., using env="TTL_DAYS" / env="RUN_TTL_DAYS" plus a custom @classmethod or validator for fallback) to centralize defaults and get more consistent error messages on bad env values, but current behavior is fine.

Also applies to: 29-31

🤖 Prompt for AI Agents
In state-manager/app/config/settings.py around lines 17-19 and 29-31,
consolidate TTL env parsing into Pydantic-driven logic: declare both fields with
env names (env="TTL_DAYS" and env="RUN_TTL_DAYS") and implement a validator or
root_validator that applies the precedence RUN_TTL_DAYS -> TTL_DAYS -> 30,
converting and validating values as ints; this centralizes parsing, avoids
manual os.getenv int casts elsewhere, and ensures clear validation errors when
env values are invalid.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90c21d8 and fb05217.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (5)
  • .env.example (1 hunks)
  • .gitignore (1 hunks)
  • docker-compose.yml (1 hunks)
  • state-manager/app/main.py (4 hunks)
  • state-manager/pyproject.toml (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
state-manager/app/main.py (6)
state-manager/app/models/db/run.py (1)
  • Run (7-25)
state-manager/app/singletons/logs_manager.py (2)
  • LogsManager (9-66)
  • get_logger (65-66)
state-manager/app/config/settings.py (1)
  • get_settings (39-43)
state-manager/app/utils/check_database_health.py (1)
  • check_database_health (3-19)
state-manager/app/middlewares/request_id_middleware.py (1)
  • RequestIdMiddleware (10-54)
state-manager/app/middlewares/unhandled_exceptions_middleware.py (1)
  • UnhandledExceptionsMiddleware (10-30)
🪛 dotenv-linter (4.0.0)
.env.example

[warning] 25-25: [UnorderedKey] The RUN_TTL_DAYS key should go before the TTL_DAYS key

(UnorderedKey)

🔇 Additional comments (17)
state-manager/pyproject.toml (1)

21-21: LGTM! Dependency addition supports environment-based TTL configuration.

The pydantic-settings>=2.0.0 dependency is necessary for the new TTL-related settings fields and environment loading introduced in this PR.

.gitignore (1)

1-26: Comprehensive ignore patterns are well-scoped.

The ignore patterns appropriately cover Node modules, environment files, Python caches, IDE folders, and logs. The explicit comment about keeping .env.example tracked is helpful.

.env.example (4)

1-4: Clear security guidance in template header.

The header appropriately warns users to never commit real secrets and provides clear instructions for local usage.


5-9: MongoDB configuration examples are clear and helpful.

The MongoDB connection string example and database name default align with the docker-compose configuration.


11-20: Secret placeholders with helpful generation guidance.

The placeholder values ("changeme") are safe for a template, and the openssl rand -hex 32 examples provide clear guidance for generating secure values.


22-25: TTL configuration with sensible defaults and clear override explanation.

The 30-day default and the explanation of how RUN_TTL_DAYS overrides TTL_DAYS for runs-specific TTL are clear and appropriate.

state-manager/app/main.py (6)

8-10: Correct async type hints for MongoDB usage.

The imports properly use AsyncDatabase from pymongo.asynchronous.database, which aligns with the async/await patterns used throughout the code.


52-53: Consistent structured logging setup.

The module-level logger now uses LogsManager for consistent structured logging across the application, addressing previous inconsistencies.


56-74: Well-documented TTL index management function.

The function signature is properly typed with AsyncDatabase, and the comprehensive docstring clearly explains the purpose, parameters, and behavior including the collMod update strategy.


78-191: Robust TTL index management with proper fallback handling.

The implementation correctly:

  • Detects existing indexes and compares TTL values
  • Updates via collMod when TTL differs
  • Falls back to drop/recreate if collMod fails
  • Uses structured logging with key-value pairs throughout
  • Handles errors per collection without crashing the server

The logic addresses all edge cases and aligns with MongoDB best practices.


209-220: TTL index setup properly integrated into application lifecycle.

The TTL setup is conservatively placed after Beanie initialization but before init_tasks, uses configurable settings.run_ttl_days, and includes appropriate error handling that logs failures without crashing the server. The structured logging provides good visibility into the initialization process.


271-277: Middleware ordering is correct with clear documentation.

The registration order properly makes UnhandledExceptionsMiddleware the outermost layer, allowing it to catch exceptions from both RequestIdMiddleware and CORSMiddleware. The comment clearly explains the inner-to-outer ordering.

docker-compose.yml (5)

1-10: Clear security guidance and proper local build configuration.

The security warning appropriately directs users to .env for local secrets, and the build configuration correctly uses the local state-manager context.


21-26: Health check properly configured with curl.

The health check correctly uses curl -f against the /health endpoint with reasonable timing parameters.


28-45: Dashboard service properly configured with health dependency.

The dashboard configuration correctly:

  • Uses the published image with configurable tag
  • Requires EXOSPHERE_API_KEY via :? syntax
  • Depends on state-manager being healthy before starting
  • Separates server-side and client-side environment variables with clear comments

46-51: Dashboard health check correctly uses curl.

The health check properly uses curl -f to verify the dashboard is responding on port 3000.


53-56: Network configuration is appropriate for local development.

The bridge network with attachable flag allows both services to communicate and enables manual attachment of other containers if needed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fb05217 and 8c5a070.

📒 Files selected for processing (1)
  • docker-compose.yml (1 hunks)
🔇 Additional comments (3)
docker-compose.yml (3)

12-28: Environment and healthcheck configuration for state-manager looks good overall.

The required secrets use ?MUST_SET syntax (lines 12–15), TTL configuration is present (lines 16–17), and the healthcheck uses portable curl with adequate timing for TTL index creation (lines 23–28). Assuming the nested variable substitution issue on line 17 is resolved, the state-manager service is well-configured for the TTL feature.


30-53: Dashboard service configuration is sound.

The dashboard correctly depends on state-manager with service_healthy condition (lines 41–43), ensuring the state-manager's TTL setup completes before the dashboard connects. The healthcheck uses portable curl (line 49), and environment variables are properly scoped (server-side vs. client-side, lines 36–40).


1-58: Note: No MongoDB service in compose — verify this is intentional for this PR.

The compose file does not include a MongoDB service; instead, it expects an external MongoDB via MONGO_URI (line 12). If the intent of this PR is to provide a complete local development setup with TTL indexing, consider whether a mongo service should be included. If MongoDB is intentionally managed separately, ensure this is documented in setup instructions.

Comment on lines 16 to 17
- TTL_DAYS=${TTL_DAYS:-30}
- RUN_TTL_DAYS=${RUN_TTL_DAYS:-${TTL_DAYS:-30}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix nested variable substitution syntax — RUN_TTL_DAYS fallback will not work as intended.

Docker Compose does not support nested variable expansion. The current syntax:

- RUN_TTL_DAYS=${RUN_TTL_DAYS:-${TTL_DAYS:-30}}

If RUN_TTL_DAYS is unset, this will literally set RUN_TTL_DAYS to the string "${TTL_DAYS:-30}" instead of evaluating it to TTL_DAYS or 30. This breaks the intended fallback logic for the TTL feature.

Simplify to:

- RUN_TTL_DAYS=${RUN_TTL_DAYS}

Then ensure that upstream .env or deployment automation explicitly sets RUN_TTL_DAYS (or omit it entirely if the state-manager defaults suffice). Alternatively, if you need a per-compose default, set it at the top level of the compose file or document the .env requirement clearly.

🤖 Prompt for AI Agents
In docker-compose.yml around lines 16-17, the RUN_TTL_DAYS environment line uses
unsupported nested variable expansion which will set the literal string
"${TTL_DAYS:-30}" when RUN_TTL_DAYS is unset; replace the nested fallback with a
single variable reference (e.g. set RUN_TTL_DAYS=${RUN_TTL_DAYS}) and ensure the
desired default is provided either via the top-level compose "env_file"/".env"
or by the deployment automation, or document that RUN_TTL_DAYS must be set so
the service receives the correct TTL value.

@NiveditJain
Copy link
Member

I cannot merge this PR:
1 - As the issue was already resolved (we will work to improve this from our side)
2 - This PR doesn't actually address the issue well.

@NiveditJain NiveditJain closed this Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants