Skip to content

Misc v4 checkpoint fixes #1859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Apr 1, 2025
Merged

Misc v4 checkpoint fixes #1859

merged 25 commits into from
Apr 1, 2025

Conversation

nicktrn
Copy link
Collaborator

@nicktrn nicktrn commented Apr 1, 2025

Biggest change: image refs are now optional on TaskRunCheckpoint - we'll default to the deploy image

Other changes:

  • ability to set more env vars for new runners
  • node 22 was having checkpoint issues, so switched back to 21 for now while investigating

Summary by CodeRabbit

  • New Features
    • Enhanced environment configuration with options for heartbeat and snapshot intervals and support for additional environment variables.
    • Standardized feature flag usage for connecting to default worker groups and updated CLI runtime selection.
  • Refactor
    • Consolidated workload manager settings into a unified configuration object.
  • Tests
    • Introduced comprehensive tests for validating environment variable parsing.
  • Chores
    • Updated the database schema to allow optional checkpoint image references.

Copy link

changeset-bot bot commented Apr 1, 2025

⚠️ No Changeset found

Latest commit: 405f5e9

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

coderabbitai bot commented Apr 1, 2025

Walkthrough

This pull request refactors environment variable handling by removing an inline boolean conversion function and introducing a shared utility for parsing booleans and additional key-value environment variables. It also consolidates workload manager configuration options into a single object, updating both Docker and Kubernetes implementations. Additionally, feature flag references in the webapp have been standardized, the database schema has been updated to allow a nullable image reference, and runtime selection logic has been modified. Schema validations also reflect these changes for environment and workload configurations.

Changes

File(s) Change Summary
apps/supervisor/src/env.ts, apps/supervisor/src/envUtil.ts, apps/supervisor/src/envUtil.test.ts Removed local BoolEnv declaration in favor of an imported version; added RUNNER_HEARTBEAT_INTERVAL_SECONDS, RUNNER_SNAPSHOT_POLL_INTERVAL_SECONDS, and RUNNER_ADDITIONAL_ENV_VARS; introduced corresponding tests and utility definitions.
apps/supervisor/src/index.ts, apps/supervisor/src/workloadManager/docker.ts, apps/supervisor/src/workloadManager/kubernetes.ts, apps/supervisor/src/workloadManager/types.ts Updated workload manager configuration by consolidating individual environment variables into a WorkloadManagerOptions object; added options for heartbeat intervals, snapshot poll intervals, and additional environment variables.
apps/webapp/app/v3/featureFlags.server.ts, apps/webapp/app/v3/services/worker/workerGroupService.server.ts Introduced a FEATURE_FLAG constant for default worker instance group ID and refactored code to reference this constant instead of string literals.
internal-packages/database/prisma/migrations/20250331105838_.../migration.sql, internal-packages/database/prisma/schema.prisma Modified the TaskRunCheckpoint.imageRef column to be nullable by dropping the NOT NULL constraint and updating the Prisma schema type.
packages/cli-v3/src/config.ts, packages/core/src/v3/schemas/runEngine.ts Changed the default runtime selection (from "node-22" to "node") based on a feature flag and updated Zod schemas to allow imageRef as optional (nullable).

Sequence Diagram(s)

sequenceDiagram
    participant Env as Environment Utility
    participant Supervisor as Managed Supervisor
    participant WM as Workload Manager
    participant Container as Docker/Kubernetes Container

    Supervisor->>Env: Request boolean and env var parsing
    Env-->>Supervisor: Return parsed BoolEnv and AdditionalEnvVars
    Supervisor->>WM: Construct WorkloadManagerOptions (heartbeat, snapshot intervals, additional vars)
    WM->>Container: Pass environment configurations
    Container-->>WM: Launch container with updated env variables
Loading

Possibly related PRs

Suggested reviewers

  • ericallam
  • matt-aitken

Poem

I'm a rabbit, hopping with glee,
New env vars bloom just like a tree.
Workload configs now streamlined and neat,
Containers and flags all joyfully meet.
With parsed booleans and schema so clear,
I celebrate these changes with a cheerful cheer!
🐇🌟

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/cli-v3/src/config.ts (1)

176-176: Change addresses Node.js 22 compatibility issue.

This change modifies the default runtime from what was likely "node-22" to the more generic "node" when run_engine_v2 feature flag is enabled. This aligns with the PR objective of addressing issues encountered with Node.js 22 by temporarily reverting to an earlier version.

Consider adding a comment explaining this change is temporary while investigating Node.js 22 issues:

-  const defaultRuntime: BuildRuntime = features.run_engine_v2 ? "node" : DEFAULT_RUNTIME;
+  // Temporarily using generic node runtime instead of node-22 due to checkpoint compatibility issues
+  const defaultRuntime: BuildRuntime = features.run_engine_v2 ? "node" : DEFAULT_RUNTIME;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9daa397 and 405f5e9.

📒 Files selected for processing (13)
  • apps/supervisor/src/env.ts (2 hunks)
  • apps/supervisor/src/envUtil.test.ts (1 hunks)
  • apps/supervisor/src/envUtil.ts (1 hunks)
  • apps/supervisor/src/index.ts (3 hunks)
  • apps/supervisor/src/workloadManager/docker.ts (1 hunks)
  • apps/supervisor/src/workloadManager/kubernetes.ts (1 hunks)
  • apps/supervisor/src/workloadManager/types.ts (1 hunks)
  • apps/webapp/app/v3/featureFlags.server.ts (1 hunks)
  • apps/webapp/app/v3/services/worker/workerGroupService.server.ts (3 hunks)
  • internal-packages/database/prisma/migrations/20250331105838_make_checkpoint_image_ref_optional/migration.sql (1 hunks)
  • internal-packages/database/prisma/schema.prisma (1 hunks)
  • packages/cli-v3/src/config.ts (1 hunks)
  • packages/core/src/v3/schemas/runEngine.ts (2 hunks)
🧰 Additional context used
🧬 Code Definitions (3)
apps/webapp/app/v3/services/worker/workerGroupService.server.ts (1)
apps/webapp/app/v3/featureFlags.server.ts (2)
  • FEATURE_FLAG (4-6)
  • makeSetFlags (38-55)
apps/supervisor/src/envUtil.test.ts (1)
apps/supervisor/src/envUtil.ts (2)
  • BoolEnv (3-9)
  • AdditionalEnvVars (11-39)
apps/supervisor/src/env.ts (1)
apps/supervisor/src/envUtil.ts (1)
  • AdditionalEnvVars (11-39)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: units / 🧪 Unit Tests
🔇 Additional comments (23)
apps/webapp/app/v3/featureFlags.server.ts (2)

4-6: Good addition of a centralized feature flag constant

Creating a centralized FEATURE_FLAG constant is a great practice that improves maintainability by following the DRY principle. This makes it easier to add, modify, or reference feature flags throughout the codebase while reducing the risk of typos in string literals.


9-9: Consistent usage of feature flag reference

Good update to use the constant reference instead of a string literal. This ensures consistency and makes future refactoring easier.

apps/webapp/app/v3/services/worker/workerGroupService.server.ts (4)

5-5: Good import of the feature flag constant

Properly importing the FEATURE_FLAG constant from the feature flags module ensures consistent usage across the codebase.


52-53: Consistent use of feature flag constant

Replacing the string literal with the constant reference improves maintainability and reduces the risk of typos. Good refactoring.


59-60: Consistent use of feature flag constant

Good use of the constant reference here as well, maintaining consistency with the other usage.


172-173: Consistent use of feature flag constant

Good final update to use the constant reference. All instances of this feature flag reference have been properly updated throughout the file.

internal-packages/database/prisma/migrations/20250331105838_make_checkpoint_image_ref_optional/migration.sql (1)

1-2: LGTM! This migration makes the imageRef field optional.

The SQL migration correctly alters the "TaskRunCheckpoint" table to drop the NOT NULL constraint from the "imageRef" column, making it optional as intended in the PR objectives.

internal-packages/database/prisma/schema.prisma (1)

2067-2067: LGTM! Prisma schema change matches the migration.

The change to make the imageRef field nullable in the Prisma schema is consistent with the SQL migration and properly implements the requirement for optional image references on the TaskRunCheckpoint model.

packages/core/src/v3/schemas/runEngine.ts (2)

164-164: Schema validation updated to allow nullable imageRef.

The CheckpointInput Zod schema has been correctly updated to make the imageRef field optional, which aligns with the database changes.


220-220: Schema validation updated to allow nullable imageRef for dequeued messages.

The DequeueMessageCheckpoint Zod schema has been correctly updated to make the imageRef field optional. This ensures consistent validation throughout the application for the checkpoint image reference.

apps/supervisor/src/workloadManager/types.ts (1)

9-11: Well-structured extension of the WorkloadManagerOptions interface.

The newly added optional properties (heartbeatIntervalSeconds, snapshotPollIntervalSeconds, and additionalEnvVars) enhance configurability for workloads, aligning with the PR's objective of improving environment variable configuration for runners.

apps/supervisor/src/envUtil.test.ts (2)

1-36: Thorough unit tests for BoolEnv utility.

The tests comprehensively cover various input scenarios for boolean parsing, including different string representations of "true" and "false", case sensitivity, whitespace handling, direct boolean values, and invalid inputs.


37-80: Comprehensive test coverage for AdditionalEnvVars utility.

The tests for AdditionalEnvVars thoroughly validate parsing of key-value pairs from strings, handling whitespace, managing invalid formats, skipping invalid pairs, and handling edge cases like empty values. The test coverage ensures robust parsing of additional environment variables, which aligns with the PR objective of configuring more environment variables for runners.

apps/supervisor/src/workloadManager/docker.ts (1)

48-64: Well-implemented conditional addition of environment variables.

The code properly checks for the presence of the optional configuration values before adding them to the Docker container environment. This implementation is consistent with the existing pattern and aligns with the PR objective of configuring more environment variables for runners.

apps/supervisor/src/workloadManager/kubernetes.ts (1)

137-158: Consistent implementation of environment variables in Kubernetes workload manager.

The code correctly implements the same optional environment variables as in the Docker workload manager, maintaining consistency across different workload managers. The spread operator with conditional expressions is a clean approach for adding optional environment variables to the existing array.

apps/supervisor/src/envUtil.ts (2)

3-9: Well-implemented environment variable boolean parser.

The BoolEnv implementation correctly handles common conventions for boolean values in environment variables, accepting both "true" and "1" (case-insensitive) as true values.


11-39: Good implementation for parsing additional environment variables.

The AdditionalEnvVars utility effectively parses comma-separated key-value pairs from environment variables with proper error handling and edge case management. The implementation:

  • Properly handles non-string or empty inputs
  • Gracefully skips invalid entries
  • Provides helpful warning logs on parsing failures
  • Returns undefined when appropriate

Note that if the same key appears multiple times in the input string, the last occurrence will take precedence.

apps/supervisor/src/env.ts (2)

4-4: Good modularization of environment variable utilities.

Importing the utilities from a dedicated module improves code organization and reusability.


26-29: Enhanced runner configuration with additional environment variables support.

The new configuration options increase the flexibility of the runner settings:

  • RUNNER_HEARTBEAT_INTERVAL_SECONDS allows configuring the interval for heartbeat operations
  • RUNNER_SNAPSHOT_POLL_INTERVAL_SECONDS provides control over snapshot polling frequency
  • RUNNER_ADDITIONAL_ENV_VARS enables passing custom environment variables to runners

These additions align with the PR objective of allowing more environment variables for new runners.

apps/supervisor/src/index.ts (4)

5-5: Appropriate type import for the workload manager options.

Importing the WorkloadManagerOptions type provides better type safety for the options object.


59-69: Good consolidation of workload manager configuration.

Consolidating the configuration into a single options object improves code organization and maintainability. The use of satisfies ensures type safety while providing better IDE support.

The addition of the new runner configuration options (heartbeatIntervalSeconds, snapshotPollIntervalSeconds, and additionalEnvVars) successfully implements the PR objective of configuring more environment variables for new runners.


102-102: Simplified Kubernetes workload manager initialization.

Using the consolidated options object creates cleaner, more maintainable code.


105-105: Simplified Docker workload manager initialization.

Using the consolidated options object creates cleaner, more maintainable code.

@nicktrn nicktrn merged commit bed834b into main Apr 1, 2025
15 checks passed
@nicktrn nicktrn deleted the fix/v4-checkpoints branch April 1, 2025 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants