-
-
Notifications
You must be signed in to change notification settings - Fork 700
Fix managed run controller edge cases #1987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: aef9ccb The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Caution Review failedThe pull request is closed. WalkthroughThis set of changes enhances the execution lifecycle management for run executions, focusing on preventing duplicate polling intervals, handling attempt number mismatches, and avoiding orphaned or "zombie" executions after warm starts. The updates introduce new state-tracking properties and methods to classes responsible for execution and polling, ensuring idempotent operations and robust cleanup. Debug logging is standardized and improved for better traceability. The interval service is updated to prevent overlapping handler executions. No public API signatures are altered, but several new internal methods and properties are added to support these behaviors. Changes
Sequence Diagram(s)sequenceDiagram
participant Controller
participant Execution
participant Poller
participant TaskRunProcess
Controller->>Execution: Start new run execution
alt Existing execution not ready
Execution->>TaskRunProcess: Kill existing execution
Execution-->>Controller: Execution killed
end
Controller->>Execution: Create new execution if canExecute
Poller->>Execution: Poll for snapshot changes
Execution->>TaskRunProcess: Validate attempt number
alt Attempt number mismatch
Execution->>TaskRunProcess: Suspend process
Execution-->>Poller: Abort processing
else
Execution->>TaskRunProcess: Continue execution
end
Execution->>TaskRunProcess: Start/Retry attempt
TaskRunProcess-->>Execution: Update attempt state
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/cli-v3/src/entryPoints/managed/poller.ts (1)
65-75
: Minor: duplicatedrunId
insideproperties
runId
is already sent at the top level of the log payload. Duplicating it insideproperties
increases payload size and may confuse downstream log processors.If you need it only once, consider:
- properties: { - ...properties, - runId: this.runFriendlyId, + properties: { + ...properties,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.changeset/tidy-books-smell.md
(1 hunks)packages/cli-v3/src/entryPoints/managed/controller.ts
(3 hunks)packages/cli-v3/src/entryPoints/managed/execution.ts
(10 hunks)packages/cli-v3/src/entryPoints/managed/poller.ts
(3 hunks)packages/cli-v3/src/executions/taskRunProcess.ts
(3 hunks)packages/core/src/v3/utils/interval.ts
(5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/cli-v3/src/entryPoints/managed/controller.ts (2)
packages/cli-v3/src/entryPoints/managed/execution.ts (2)
runFriendlyId
(953-959)runFriendlyId
(961-963)packages/cli-v3/src/entryPoints/dev-run-controller.ts (1)
runFriendlyId
(262-268)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: typecheck / typecheck
- GitHub Check: units / 🧪 Unit Tests
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (16)
packages/core/src/v3/utils/interval.ts (5)
16-16
: Added execution guard to prevent concurrent interval handler executions.The new
_isExecuting
flag is introduced to track whether the interval handler is currently running, initialized tofalse
in the constructor. This prevents overlapping executions of the same interval handler.Also applies to: 26-26
57-59
: Early return during execution prevents duplicate interval scheduling.This protection in
resetCurrentInterval()
prevents scheduling a new interval while the handler is still executing, avoiding potential race conditions that could create duplicate intervals.
81-86
: Prevent concurrent execution of interval handler.This check prevents the interval handler from running multiple times in parallel, which could lead to race conditions or unexpected behavior when the execution time exceeds the interval duration.
98-100
: Ensure scheduler runs in finally block and reset execution flag.Moving the interval scheduling to the
finally
block ensures it always happens, regardless of success or failure. Resetting the_isExecuting
flag tofalse
is also crucial to allow future executions.
70-72
: Added getter for interval duration.The new
intervalMs
getter exposes the current interval duration, which is useful for debugging and logging purposes..changeset/tidy-books-smell.md (1)
1-8
: Clear and concise changeset for this fix.The changeset properly documents the patches applied to the related packages, listing the three main issues being addressed:
- Fixing duplicate intervals in polling
- Adding protection against unexpected attempt number changes
- Preventing run execution zombies after warm starts
These align well with the code changes observed in the other files.
packages/cli-v3/src/executions/taskRunProcess.ts (3)
92-92
: Added tracking of preparation state for next attempt.The new
_isPreparedForNextAttempt
property distinguishes between being prepared for a completely new run versus being prepared for a retry attempt of the current run. This initialization in the constructor ensures it starts in a ready state.Also applies to: 96-96
103-105
: Added getter for attempt preparation state.This public getter exposes the attempt preparation state, allowing other components (like RunExecution) to check if the process is ready for a new attempt before deciding to reuse or kill it.
232-232
: Execution lifecycle tracking for attempt management.Setting
_isPreparedForNextAttempt
tofalse
at the start of execution and back totrue
after completion ensures proper tracking of whether the process can accept a new attempt. This helps prevent issues with immediately retried executions.Also applies to: 276-276
packages/cli-v3/src/entryPoints/managed/controller.ts (5)
181-190
: Prevent zombie executions by killing stale ones.This new logic explicitly kills any existing execution that is not ready to execute before creating a new one. This prevents resource leaks and ensures clean execution states, especially after warm starts.
191-191
: Update condition to use new execution readiness check.Changed from checking
isPreparedForNextRun
to usingcanExecute
, which aligns with the new execution readiness logic implemented in the RunExecution class.
279-280
: Improved debug message and added clarifying comment.The debug message now more accurately describes what's happening - creating a fresh execution for the next run rather than just recreating the task run process. The added comment makes the intention clearer for future maintainers.
Also applies to: 285-285
500-520
: Better socket disconnect error handling and logging.The new
parseDescription
helper function normalizes error descriptions from socket disconnects, providing structured data for logging instead of just calling toString(). This improves debugging capabilities by ensuring consistent log formats regardless of the error type.
525-525
: Use structured error data in disconnect logs.Spreading the parsed description properties into the log ensures all relevant context is captured in a structured way, making it easier to analyze disconnect events in logs.
packages/cli-v3/src/entryPoints/managed/execution.ts (2)
692-695
: 🛠️ Refactor suggestionReturn a
Promise<void>
fromkill()
for consistent async usageDown-stream callers use
await this.kill()
.
Returningvoid
whentaskRunProcess
isundefined
forces them to add extra guards (see previous bug).
Always return aPromise
to simplify callers:- public async kill() { - await this.taskRunProcess?.kill("SIGKILL"); - } + public kill(): Promise<void> { + return this.taskRunProcess?.kill("SIGKILL") ?? Promise.resolve(); + }Likely an incorrect or invalid review comment.
620-627
:⚠️ Potential issue
this.kill()
may returnundefined
, causing a runtime errorWhen
taskRunProcess
isundefined
,this.kill()
returnsvoid
.
Calling.catch(...)
onvoid
throws aTypeError
, masking the original intent.- this.sendDebugLog("killing existing task run process before executing next attempt"); - await this.kill().catch(() => {}); + this.sendDebugLog("killing existing task run process before executing next attempt"); + if (this.taskRunProcess) { + await this.kill().catch(() => {}); + }Likely an incorrect or invalid review comment.
Summary by CodeRabbit
Bug Fixes
New Features
Refactor