-
-
Notifications
You must be signed in to change notification settings - Fork 700
Managed run controller revamp #1927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit e03f417.
🦋 Changeset detectedLatest commit: df845e1 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
WalkthroughThis update introduces a comprehensive refactor of the managed run controller system. The legacy implementation of the Changes
Sequence Diagram(s)sequenceDiagram
participant Worker as Worker Process
participant ManagedRunController
participant Env as RunnerEnv
participant Execution as RunExecution
participant Supervisor as Supervisor Socket
participant HTTP as HTTP Client
participant WarmStart as WarmStart Client
participant Metadata as MetadataClient
Worker->>Env: Initialize environment configuration
Worker->>ManagedRunController: Start controller with manifest and env
ManagedRunController->>Supervisor: Connect WebSocket
ManagedRunController->>WarmStart: (If enabled) Connect for warm start
ManagedRunController->>HTTP: Request next run
HTTP-->>ManagedRunController: Run execution data
ManagedRunController->>Execution: Prepare and start run execution
Execution->>HTTP: Start attempt, send heartbeat, poll snapshot
Execution->>Supervisor: Subscribe to run notifications
Supervisor-->>Execution: Run/snapshot updates
Execution->>Metadata: (If needed) Fetch env overrides
Execution->>HTTP: Complete run, send results
Execution-->>ManagedRunController: Notify completion
ManagedRunController->>Supervisor: Unsubscribe, wait for next run
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (2)
🔇 Additional comments (6)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (20)
packages/core/src/v3/utils/interval.ts (2)
8-25
: Consider consistent naming convention for private membersThe class mixes two different private member naming conventions: some members use the underscore prefix (
_onInterval
), while some methods use the private field syntax (#doInterval
). Consider standardizing on one approach for better maintainability.- private _onInterval: () => Promise<void>; - private _onError?: (error: unknown) => Promise<void>; - - private _intervalMs: number; - private _nextInterval: NodeJS.Timeout | undefined; - private _leadingEdge: boolean; - private _isEnabled: boolean; + #onInterval: () => Promise<void>; + #onError?: (error: unknown) => Promise<void>; + + #intervalMs: number; + #nextInterval: NodeJS.Timeout | undefined; + #leadingEdge: boolean; + #isEnabled: boolean;Then update all references accordingly throughout the class.
64-84
: Consider more robust error handling strategyThe error handling in
#doInterval
catches and processes errors from the main callback, but only logs errors from the error handler itself. Consider defining a clearer error handling strategy for nested errors.try { await this._onInterval(); } catch (error) { if (this._onError) { try { await this._onError(error); - } catch (error) { - console.error("Error during interval error handler", error); + } catch (handlerError) { + console.error("Error during interval error handler", { + originalError: error, + handlerError, + }); } } }internal-packages/run-engine/src/engine/db/worker.ts (1)
267-267
: Make the TODO comment more descriptiveThe generic "TODO: fixme" comment doesn't provide any context about what needs to be fixed or why. Consider adding more details to make it actionable for future developers.
- // TODO: fixme + // TODO: Improve worker retrieval from promoted deployment - current method may not handle all edge casespackages/core/src/v3/runEngineWorker/workload/http.ts (1)
168-168
: Consider including replacement information in deprecation noticeWhile the deprecation notice is good, it would be more helpful if it also directed developers to the recommended alternative (if one exists).
- /** @deprecated Not currently used */ + /** @deprecated Not currently used. Use RunExecution and RunExecutionHeartbeat services instead. */apps/supervisor/src/workloadServer/index.ts (1)
455-455
: Optimized socket connection behaviorThis change prevents unnecessary disconnections when the same run ID is being reconnected. Now, the socket only disconnects an existing run if the new run ID is different from the currently connected one.
packages/cli-v3/src/entryPoints/managed/heartbeat.ts (1)
81-83
: Inconsistent interval units between methods.The
updateInterval
method acceptsintervalMs
(milliseconds) while the constructor usesheartbeatIntervalSeconds
(seconds). This inconsistency could lead to confusion when calling the update method.-updateInterval(intervalMs: number) { +updateInterval(intervalSeconds: number) { + const intervalMs = intervalSeconds * 1000; this.heartbeat.updateInterval(intervalMs); }packages/cli-v3/src/entryPoints/managed/logger.ts (1)
1-52
: Well-structured new RunLogger classThis new RunLogger class is well-organized and provides a centralized approach to debug logging for run executions. The implementation includes proper context enrichment with run identifiers.
Consider adding error handling for the HTTP request in the
sendDebugLog
method, as it's currently fire-and-forget:sendDebugLog({ runId, message, date, properties }: SendDebugLogOptions) { if (!runId) { runId = this.env.TRIGGER_RUN_ID; } if (!runId) { return; } const mergedProperties = { ...properties, runId, runnerId: this.env.TRIGGER_RUNNER_ID, workerName: this.env.TRIGGER_WORKER_INSTANCE_NAME, }; console.log(message, mergedProperties); - this.httpClient.sendDebugLog(runId, { + this.httpClient.sendDebugLog(runId, { message, time: date ?? new Date(), properties: mergedProperties, - }); + }).catch(error => { + console.error("Failed to send debug log", { error, runId, message }); + }); }packages/cli-v3/src/entryPoints/managed-run-controller.ts (1)
9-12
: Validate environment usage consistency with the new configuration class.Currently,
env: stdEnv
is passed directly toManagedRunController
. Consider migrating to a stricter typed object (e.g.,RunnerEnv
) if you need thorough validation or override capabilities, ensuring parity with other modules that leverage the new environment handling.packages/cli-v3/src/entryPoints/managed/env.ts (1)
168-208
: Override logic is straightforward; consider a single object approach if expansions become frequent.The current pattern individually updates each environment variable. If you foresee more fields in the future, consider a single-iteration approach or a dynamic patch mechanism for simpler code handling.
packages/cli-v3/src/entryPoints/managed/poller.ts (2)
42-99
: Consider handling repeated failures with a fallback strategy.If repeated polls fail for an extended period, you might implement an exponential backoff or a max retry limit to minimize excessive log noise and repeated overhead.
102-120
: Add tests for these methods.Methods like
resetCurrentInterval()
,updateSnapshotId()
,updateInterval()
,start()
, andstop()
are critical for system flow. Consider adding dedicated unit tests to verify their behavior and catch regressions.packages/cli-v3/src/dev/devSupervisor.ts (2)
416-416
: Consider lazy or offline-safe socket creation.If the environment does not require a live socket at all times, implementing lazy or retry-based creation may improve resiliency.
478-478
: Clarify return of newly created socket.Consider documenting that
#createSocket()
always returns a fresh socket instance. If the method is re-invoked, ensure existing connections are gracefully handled.packages/cli-v3/src/entryPoints/managed/controller.ts (3)
16-19
: Add explicit documentation for controller options.Clarify each field’s usage and behavior in
ManagedRunControllerOptions
, especially regarding environment assumptions, to ensure new contributors can onboard quickly.
369-379
: Process exit may interrupt asynchronous tasks.Exiting the process directly can halt tasks (e.g., logging or HTTP calls). Consider a graceful shutdown approach, such as emitting an event for cleanup and then calling
process.exit
.
509-529
: Unify immediate run start vs. warm start logic.You trigger
startRunExecution
if IDs exist, otherwise youwaitForNextRun
. Consider a single path with conditions to reduce branching and future maintenance overhead.packages/cli-v3/src/entryPoints/managed/execution.ts (4)
269-283
: Clarify concurrent snapshot updates.The code places multiple checks for snapshot changes after cleanup and suspend stages. While this handles common race conditions, consider documenting precisely how multiple concurrent snapshot events are queued and processed. Verifying each snapshot event carefully ensures no unexpected skip or override happens partway through the queue.
325-326
: Remove or rephrase humor in debug logs.Logging messages like “Suspending, any day now 🚬” may be confusing in production or user-facing logs. Recommend using clearer, professional messaging to maintain consistent log style.
- this.sendDebugLog("Suspending, any day now 🚬", snapshotMetadata); + this.sendDebugLog("Suspending run; awaiting further instructions.", snapshotMetadata);
483-493
: Use a more robust pattern to ensure services are stopped.You repeatedly call
this.stopServices()
after both success and error scenarios. Consolidating service cleanup in atry/finally
block or a single top-level handler can reduce duplication, ensure consistent invocation, and simplify troubleshooting.
49-923
: Refactor theRunExecution
class for maintainability.At ~900 lines, this class handles various workflow stages, environment manipulations, concurrency controls, and run attempt logic. Splitting out concerns—e.g., snapshot handling, service control, or run attempt orchestration—into smaller modules will improve readability and make testing easier.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (31)
.changeset/tricky-houses-invite.md
(1 hunks).configs/tsconfig.base.json
(1 hunks)apps/supervisor/src/env.ts
(1 hunks)apps/supervisor/src/index.ts
(1 hunks)apps/supervisor/src/services/podCleaner.ts
(4 hunks)apps/supervisor/src/workloadManager/docker.ts
(1 hunks)apps/supervisor/src/workloadManager/types.ts
(1 hunks)apps/supervisor/src/workloadServer/index.ts
(1 hunks)apps/webapp/app/v3/authenticatedSocketConnection.server.ts
(3 hunks)internal-packages/run-engine/src/engine/db/worker.ts
(2 hunks)packages/cli-v3/e2e/utils.ts
(4 hunks)packages/cli-v3/src/dev/devSupervisor.ts
(8 hunks)packages/cli-v3/src/entryPoints/dev-run-controller.ts
(5 hunks)packages/cli-v3/src/entryPoints/managed-run-controller.ts
(1 hunks)packages/cli-v3/src/entryPoints/managed/controller.ts
(1 hunks)packages/cli-v3/src/entryPoints/managed/env.ts
(1 hunks)packages/cli-v3/src/entryPoints/managed/execution.ts
(1 hunks)packages/cli-v3/src/entryPoints/managed/heartbeat.ts
(1 hunks)packages/cli-v3/src/entryPoints/managed/logger.ts
(1 hunks)packages/cli-v3/src/entryPoints/managed/overrides.ts
(1 hunks)packages/cli-v3/src/entryPoints/managed/poller.ts
(1 hunks)packages/cli-v3/src/executions/taskRunProcess.ts
(3 hunks)packages/core/src/utils.ts
(1 hunks)packages/core/src/v3/index.ts
(1 hunks)packages/core/src/v3/machines/index.ts
(1 hunks)packages/core/src/v3/runEngineWorker/supervisor/http.ts
(1 hunks)packages/core/src/v3/runEngineWorker/supervisor/session.ts
(5 hunks)packages/core/src/v3/runEngineWorker/workload/http.ts
(1 hunks)packages/core/src/v3/schemas/common.ts
(1 hunks)packages/core/src/v3/utils/heartbeat.ts
(1 hunks)packages/core/src/v3/utils/interval.ts
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (11)
apps/supervisor/src/env.ts (1)
apps/supervisor/src/envUtil.ts (1)
BoolEnv
(3-9)
apps/supervisor/src/services/podCleaner.ts (1)
packages/core/src/v3/utils/interval.ts (1)
IntervalService
(8-95)
packages/cli-v3/src/entryPoints/managed/heartbeat.ts (3)
packages/core/src/v3/runEngineWorker/workload/http.ts (1)
WorkloadHttpClient
(21-181)packages/cli-v3/src/entryPoints/managed/logger.ts (1)
RunLogger
(19-52)packages/core/src/v3/utils/interval.ts (1)
IntervalService
(8-95)
packages/cli-v3/e2e/utils.ts (1)
packages/core/src/v3/schemas/common.ts (2)
MachinePreset
(117-122)MachinePreset
(124-124)
packages/cli-v3/src/executions/taskRunProcess.ts (1)
packages/core/src/v3/schemas/common.ts (1)
MachinePresetResources
(126-126)
packages/core/src/v3/machines/index.ts (1)
packages/core/src/v3/schemas/common.ts (1)
MachinePresetResources
(126-126)
packages/core/src/v3/runEngineWorker/supervisor/session.ts (1)
packages/core/src/v3/utils/interval.ts (1)
IntervalService
(8-95)
packages/cli-v3/src/entryPoints/managed/env.ts (1)
packages/cli-v3/src/entryPoints/managed/overrides.ts (1)
Metadata
(1-11)
packages/cli-v3/src/entryPoints/managed/poller.ts (3)
packages/core/src/v3/runEngineWorker/workload/http.ts (1)
WorkloadHttpClient
(21-181)packages/cli-v3/src/entryPoints/managed/logger.ts (1)
RunLogger
(19-52)packages/core/src/v3/utils/interval.ts (1)
IntervalService
(8-95)
packages/cli-v3/src/entryPoints/managed/logger.ts (2)
packages/core/src/v3/runEngineWorker/workload/http.ts (1)
WorkloadHttpClient
(21-181)packages/cli-v3/src/entryPoints/managed/env.ts (1)
RunnerEnv
(54-223)
apps/webapp/app/v3/authenticatedSocketConnection.server.ts (1)
packages/core/src/v3/utils/interval.ts (1)
IntervalService
(8-95)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: units / 🧪 Unit Tests
🔇 Additional comments (52)
packages/core/src/v3/schemas/common.ts (1)
126-127
: Great type refinement!This new
MachinePresetResources
type creates a precise subset ofMachinePreset
with just the resource-related properties. This aligns well with the PR's strict typing improvements.apps/supervisor/src/env.ts (1)
30-30
: Good addition for Docker container cleanup!Adding
RUNNER_DOCKER_AUTOREMOVE
with a default oftrue
will help prevent resource leaks by automatically removing completed containers. This is a sensible default that aligns with Docker best practices.packages/core/src/v3/utils/interval.ts (1)
1-63
: Well-designed interval service with robust configuration options!The new
IntervalService
class provides a flexible and robust replacement for the deprecatedHeartbeatService
. The design allows for customizable intervals, error handling, and execution timing..configs/tsconfig.base.json (1)
13-13
: Great improvement for type safety!Enabling
strictPropertyInitialization
ensures all class properties are explicitly initialized, preventing potential runtime errors. This aligns well with the overall goal of enhancing reliability in the managed run controller revamp.apps/supervisor/src/workloadManager/docker.ts (1)
46-48
: Good improvement for Docker resource managementAdding the
--rm
flag whendockerAutoremove
is enabled implements the automatic removal of completed Docker containers mentioned in the PR objectives. This prevents resource leaks from accumulating unused containers and improves system reliability.apps/supervisor/src/workloadManager/types.ts (1)
13-13
: Clean type definition for Docker container cleanupThe addition of the optional
dockerAutoremove
property to theWorkloadManagerOptions
interface provides a type-safe way to configure automatic container removal. This works in conjunction with the implementation in the Docker workload manager and maintains backward compatibility..changeset/tricky-houses-invite.md (1)
1-6
: Changeset accurately reflects the scope of changesThis changeset correctly specifies patch updates for the affected packages, matching the non-breaking improvements made to the managed run controller. The message is concise but captures the essence of the extensive refactoring work.
internal-packages/run-engine/src/engine/db/worker.ts (1)
196-196
:❓ Verification inconclusive
Query method change might impact uniqueness assumptions
The change from
findUnique
tofindFirst
suggests that there might be multiple records with the same worker ID, or that ordering is important. Ensure this change is intentional and aligns with your database schema constraints.
🏁 Script executed:
#!/bin/bash # Check if there are any unique constraints on backgroundWorker.id # and if there are any other usages of findUnique with this ID # Check for unique constraint on id in model definitions rg -A 5 -B 5 "model BackgroundWorker" --glob "*.prisma" # Check for other findUnique calls with worker ID rg "backgroundWorker.findUnique.*id: workerId" --type tsLength of output: 2347
Attention: Verify Query Method Semantics for BackgroundWorker.lookup
The change from using
findUnique
tofindFirst
merits a manual re‐evaluation to ensure it aligns with the intended filtering logic. The Prisma schema forBackgroundWorker
shows that theid
field is uniquely constrained, which would normally justify usingfindUnique
. If the query is filtering by this uniqueid
(or any other unique field), consider reverting tofindUnique
for semantic clarity and to leverage built-in uniqueness guarantees. Conversely, if the filter intentionally targets a non-unique field or a composite condition, then usingfindFirst
is appropriate—but please document the reasoning.
- Verify that:
- The filter condition in this query uses a unique field (such as the primary key
id
or the uniquefriendlyId
) so that usingfindUnique
would be more precise.- Or, if filtering by a non-unique field is intentional, confirm that returning the first matching entry is the desired behavior.
packages/core/src/v3/index.ts (1)
70-70
: LGTM: Good addition of new interval service exportsThis change correctly exposes the new
IntervalService
functionality, which replaces the deprecatedHeartbeatService
for managing periodic tasks.apps/supervisor/src/index.ts (1)
69-69
:✅ Verification successful
Good improvement for Docker container management
Adding the
dockerAutoremove
option helps prevent resource leaks by automatically removing completed containers. This is a good practice for production environments.
🏁 Script executed:
#!/bin/bash # Check how this option is used in the Docker workload manager # Find where the dockerAutoremove flag is being used rg "dockerAutoremove" --type ts # Examine Docker run arguments construction rg "docker run.*--rm" --type ts -A 5 -B 5Length of output: 1217
Docker Auto-Removal Integration Confirmed
The
dockerAutoremove
flag is correctly integrated. Inapps/supervisor/src/index.ts
the flag is assigned fromenv.RUNNER_DOCKER_AUTOREMOVE
, its type is declared inapps/supervisor/src/workloadManager/types.ts
, and it is used in the Docker setup logic inapps/supervisor/src/workloadManager/docker.ts
. Additionally, tests inpackages/core
show that Docker is invoked with the--rm
flag, reinforcing that auto-removal is properly managed.This is a good improvement for Docker container management and production environments.
packages/core/src/v3/runEngineWorker/supervisor/http.ts (1)
84-84
: Appropriate deprecation noticeGood addition of the deprecation notice for
dequeueFromVersion
. This aligns with the broader refactoring of the managed run controller system where this method is no longer being used.packages/core/src/v3/utils/heartbeat.ts (1)
8-10
: Clear deprecation with alternative specifiedThe deprecation notice correctly guides developers to use the new
IntervalService
instead ofHeartbeatService
. This is an essential part of a clean migration path.apps/webapp/app/v3/authenticatedSocketConnection.server.ts (3)
2-4
: Updated import for new serviceCorrectly updated the import to use
IntervalService
instead of the deprecatedHeartbeatService
.
22-22
: Updated type declarationType declaration correctly updated to reflect the new service type.
78-79
: Migrated service instantiationSuccessfully migrated from
HeartbeatService
toIntervalService
, properly changing the callback property fromheartbeat
toonInterval
while maintaining the same functionality.packages/cli-v3/src/executions/taskRunProcess.ts (3)
4-4
: Type import update aligns with machine resource representation.The import change from
MachinePreset
toMachinePresetResources
reflects the refactoring to use a more focused type for machine resources.
53-53
: Property type updated to use specialized resource type.This change properly updates the
TaskRunProcessOptions
interface to use the more specificMachinePresetResources
type, which contains only the necessarymemory
andcpu
properties needed for this context.
128-128
: Effective local variable renaming maintains backward compatibility.Renaming the destructured property from
machineResources
tomachine
locally is a clean approach that preserves the existing method implementation while adapting to the new naming convention in the options interface.packages/cli-v3/e2e/utils.ts (4)
11-11
: Updated import to support machine resource changes.Adding the
MachinePreset
import supports the machine resource definition changes.
299-305
: Improved code through resource consolidation.Creating a centralized
machine
constant reduces duplication and ensures consistent machine resource configuration throughout the execution flow. The use of thesatisfies
operator is a good practice that preserves literal type information while ensuring type compatibility.
325-325
: Property name updated to align with TaskRunProcess changes.This line correctly updates the property name from
machine
tomachineResources
to match the changes in theTaskRunProcess
class.
378-378
: Reused machine constant for execution configuration.Using the same
machine
constant in the execution configuration ensures consistency and reduces the risk of configuration mismatches.packages/cli-v3/src/entryPoints/managed/overrides.ts (1)
1-11
: Well-structured Metadata type with comprehensive environment properties.The
Metadata
type clearly defines the structure for environment-related properties with proper typing, making all fields optional which provides flexibility for partial metadata.packages/cli-v3/src/entryPoints/managed/heartbeat.ts (5)
1-4
: Good dependency imports for heartbeat functionality.The imports are well-structured, bringing in the required dependencies for interval management, HTTP client functionality, and logging.
5-12
: Clear options interface for RunExecutionHeartbeat.The
RunExecutionHeartbeatOptions
type clearly defines the required parameters for the heartbeat service.
13-21
: Well-structured class properties with clear types.The class properties are well-defined with appropriate types, and the separation between configuration values and service instances is clear.
22-71
: Constructor with comprehensive initialization and logging.The constructor properly initializes all properties, sets up logging to record the heartbeat configuration, and configures the
IntervalService
with appropriate callbacks.
73-92
: Well-designed public interface for heartbeat management.The class provides a clean and comprehensive public API for managing the heartbeat functionality, including methods to reset, update, start, and stop the heartbeat service.
apps/supervisor/src/services/podCleaner.ts (1)
4-4
: Good job replacing HeartbeatService with IntervalServiceThe replacement of HeartbeatService with IntervalService is a positive change that aligns with the broader refactoring effort. The IntervalService provides more robust interval management with error handling capabilities.
Also applies to: 22-22, 35-39, 60-60, 64-64
packages/core/src/v3/runEngineWorker/supervisor/session.ts (1)
11-11
: Good migration to IntervalService with proper error handlingThe replacement of the heartbeat service with IntervalService includes proper error handling and maintains the same functionality. This aligns with the project's goal to improve reliability in the managed run controller.
Also applies to: 32-32, 53-69, 185-185, 199-199
packages/cli-v3/src/entryPoints/dev-run-controller.ts (1)
4-4
: Good migration to IntervalService and improved type safetyThe changes in this file show a consistent approach to replacing HeartbeatService with IntervalService for both run heartbeat and snapshot polling. The implementation includes proper error handling and maintains the same functionality.
The change from
machine
tomachineResources
in the TaskRunProcess constructor properly aligns with the standardized machine resource typing mentioned in the PR objectives.Also applies to: 47-49, 81-151, 622-622
packages/cli-v3/src/entryPoints/managed-run-controller.ts (2)
3-4
: Use of the refactoredManagedRunController
import looks good.Great job delegating the logic to the new module and keeping this file minimal and focused. This separation improves maintainability and clarity.
6-7
: Consider adding error handling for JSON retrieval.If
index.json
is missing or malformed,readJSONFile
could throw an error. For production resilience, wrap this in atry/catch
or check whether the file exists before parsing.Here is an optional snippet:
+try { const manifest = await readJSONFile("./index.json"); const workerManifest = WorkerManifest.parse(manifest); +} catch (err) { + console.error("Failed to read or parse index.json", err); + process.exit(1); +}packages/core/src/v3/machines/index.ts (3)
1-7
: Good move towards narrowerMachinePresetResources
.Switching from
MachinePreset
toMachinePresetResources
helps ensure onlycpu
andmemory
are passed to these functions. This enhances type safety and avoids handling irrelevant fields.
22-22
: Ensure overhead usage is consistent across all references.Both
maxOldSpaceSizeFlag
andmaxOldSpaceSizeForMachine
rely on the same overhead default and logic. Confirm there's no scenario in which overhead or memory is negative and might cause undesired flags.Also applies to: 26-29
36-36
: JSDoc updates reflect the narrower type usage.Using
@param {MachinePresetResources}
clarifies that only CPU and memory resources matter here. This consistency across your codebase is beneficial.Also applies to: 42-42
packages/cli-v3/src/entryPoints/managed/env.ts (3)
6-9
: Confirm valid numeric strings for date conversions.The transformation from string to date using
parseInt
might fail if the value is not purely numeric or empty. Ensure your environment or upstream code provides valid numeric strings for scheduling timestamps.
38-40
: Check for potential initialization issues with date fields.
TRIGGER_POD_SCHEDULED_AT_MS
andTRIGGER_DEQUEUED_AT_MS
expect valid timestamps. If these are missing, Zod will throw an error. Ensure your deployment or run process always sets these values.
210-222
:gatherProcessEnv
is minimal; confirm additional env needs.Only a handful of environment variables (like
NODE_ENV
andOTEL_EXPORTER_OTLP_ENDPOINT
) are passed along. Validate if your tasks need other environment variables (e.g.,TRIGGER_PROJECT_ID
). If so, extend this helper to avoid missing configurations in subprocesses.packages/cli-v3/src/entryPoints/managed/poller.ts (2)
5-12
: Validate or document Poller options.Consider clarifying validation requirements or providing defaults for fields like
runFriendlyId
andsnapshotFriendlyId
, ensuring they're never empty at runtime.
14-41
: Initialization and logging approach looks solid.Your constructor effectively initializes all required properties and logs helpful metadata. This sets a good example for maintainable code.
packages/cli-v3/src/dev/devSupervisor.ts (5)
52-52
: Confirm nullability of config property.Declaring
config
as optional implies call sites must handle undefined. Verify all usage carefully or consider throwing earlier if config is mandatory.
96-96
: Handle potential socket creation errors.If
#createSocket()
fails or returns undefined, your property assignment remains incomplete. Ensure you handle or log errors in socket creation routines.
190-192
: Good defensive check for config property.Throwing an error here ensures you fail fast if
config
is missing, preventing more obscure downstream issues.
423-423
: Validate concurrency on run notifications.Multiple run notifications could arrive close together. Verify if the code can handle race conditions or partial concurrency (e.g., notifications for the same run).
466-466
: Check socket.io version for 'socket.active' support.In some older socket.io releases or alternative TypeScript definitions,
active
may not exist. Confirm that this property is indeed reliable.packages/cli-v3/src/entryPoints/managed/controller.ts (4)
23-88
: Verify constructor incomplete-initialization scenarios.If an error is thrown mid-constructor, partially initialized properties may remain in an inconsistent state. Consider a guarded approach or explicit cleanup for robust error handling.
90-115
: Getter-based environment references are well-structured.Exposing environment-based data through getters clarifies usage and avoids direct mutable access patterns.
499-507
: Check partial cancellation edge cases.If multiple attempts are queued or in partial states,
cancelAttempt
might not cover them all. Clarify or expand your logic to ensure a safe rollback for partially completed steps.
541-551
: Enhanced logging with appended runner metrics looks good.Attaching additional run metrics in each debug log fosters transparency and helps in post-incident analysis.
packages/cli-v3/src/entryPoints/managed/execution.ts (2)
210-216
: Double-check string comparison forsnapshot.friendlyId
.You're comparing
snapshot.friendlyId
andthis.currentSnapshotId
with<
(line 210). If these IDs are strings (e.g., UUIDs, timestamps, or any non-numeric format), the comparison may be lexicographical, not numeric. Confirm thatfriendlyId
is guaranteed to be sequentially sortable as a string format, or convert it to a number (if valid) to ensure correct ordering.
670-714
: Ensure test coverage for all run attempt statuses.Multiple statuses (e.g.,
RUN_FINISHED
,RUN_PENDING_CANCEL
,RETRY_QUEUED
, etc.) can produce different code paths. Confirm that unit tests or integration tests exist for each case to ensure correct handling of edge conditions, such as unexpected fallback or race conditions.
* update nypm to support text-based bun lockfiles * fix retry spans * only download debug logs if admin * add nypm changeset * pull out env override logic * use runner env gather helper * handle dev flushing failures gracefully * fix path normalization for init.ts * add logger * add execution heartbeat service * add snapshot poller service * fix poller * add changesets * create socket in constructor * enable strictPropertyInitialization * deprecate dequeue from version * start is not async * dependency injection in prep for tests * add warm start count to all controller logs * add restore count * pull out run execution logic * temp disable pre * add a controller log when starting an execution * refactor execution and squash some bugs * cleanup completed docker containers by default * execution fixes and logging improvements * don't throw afet abort cleanup * poller should use private interval * rename heartbeat service file * rename HeartbeatService to IntervalService * restore old heartbeat service but deprecate it * use the new interval service everywhere * Revert "temp disable pre" This reverts commit e03f417. * add changeset * replace all run engine find uniques with find first
* WIP on secret env vars * Editing individual env var values is working * Sort the env vars by the key * Deleting values * Allowing setting secret env vars * Added medium switch style * Many style changes to the env var form * “Copy text” -> “Copy” * Draw a divider between hidden buttons * Env var tweaks * Don’t show Dev:you anymore * Grouping the same env var keys together * Styles improved * Improved styling of edit panel * Fix bun detection, dev flushing, and init command (#1914) * update nypm to support text-based bun lockfiles * add nypm changeset * handle dev flushing failures gracefully * fix path normalization for init.ts * add changesets * chore: remove pre.json after exiting pre mode * init command to install v4-beta packages * Revert "chore: remove pre.json after exiting pre mode" This reverts commit f5694fd. * make init default to cli version for all packages * Release 4.0.0-v4-beta.1 (#1916) * chore: Update version for release (v4-beta) * Release 4.0.0-v4-beta.1 --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: nicktrn <55853254+nicktrn@users.noreply.github.com> * Both run engines will only lock to versions they can handle (#1922) * run engine v1 will only lock to v1 deployments * run engine v2 will only lock to managed v2 deployments * test: create background worker and deployment with correct engine version * Add links to and from deployments (#1921) * link from deployments tasks to filtered runs view * jump to deployment * don't add version links for dev (yet) * Fix current worker deployment getter (#1924) * only return last v1 deployment in the shared queue consumer * be explicit about only returning managed deployments * Add a docs page for the human-in-the-loop example project (#1919) * Add a docs page for the human-in-the-loop example project * Order guides, example projects and example tasks alphabetically in the docs list * Managed run controller revamp (#1927) * update nypm to support text-based bun lockfiles * fix retry spans * only download debug logs if admin * add nypm changeset * pull out env override logic * use runner env gather helper * handle dev flushing failures gracefully * fix path normalization for init.ts * add logger * add execution heartbeat service * add snapshot poller service * fix poller * add changesets * create socket in constructor * enable strictPropertyInitialization * deprecate dequeue from version * start is not async * dependency injection in prep for tests * add warm start count to all controller logs * add restore count * pull out run execution logic * temp disable pre * add a controller log when starting an execution * refactor execution and squash some bugs * cleanup completed docker containers by default * execution fixes and logging improvements * don't throw afet abort cleanup * poller should use private interval * rename heartbeat service file * rename HeartbeatService to IntervalService * restore old heartbeat service but deprecate it * use the new interval service everywhere * Revert "temp disable pre" This reverts commit e03f417. * add changeset * replace all run engine find uniques with find first * Release 4.0.0-v4-beta.2 (#1928) * chore: Update version for release (v4-beta) * Release 4.0.0-v4-beta.2 --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: nicktrn <55853254+nicktrn@users.noreply.github.com> * Remove batch ID carryover for non-batch waits (#1930) * add failing test case * do not carry over previous batch id when blocking with waitpoint * delete irrelevant test * Delete project (#1913) * Delete project - Don’t schedule tasks if the project is deleted - Delete queues from the master queues - Add the old delete project UI back in * Mark the project as deleted last * Fix for overriding local variable * Added a todo for deleting env queues * Remove todo * Improve usage flushing (#1931) * add flush to global usage api * enable controller debug logs * initialize usage manager after env overrides * add previous run id to more debug logs * add changeset * For secret env vars, don’t return the value * Added a new env var repository function for getting secrets with redactions * Test task for env vars * Delete heartbeat file, merge mess up --------- Co-authored-by: nicktrn <55853254+nicktrn@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Saadi Myftija <saadi.myftija@gmail.com>
This is a major rework of the managed run controller to improve reliability and performance. It has been tested extensively in our test environment. Unit tests and porting relevant parts to the dev controller will follow.
Bonus:
strictPropertyInitialization: true
Summary by CodeRabbit