-
-
Notifications
You must be signed in to change notification settings - Fork 711
Fix stalled run detection #1934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Caution Review failedThe pull request is closed. WalkthroughThe changes replace the previous heartbeat management approach in the run execution workflow. The heartbeat mechanism, previously handled by a dedicated service, is now triggered directly by task run process events, with heartbeat signals sent via HTTP calls. The worker's heartbeat message now uses the run ID instead of the attempt ID. Additionally, a new changeset file documents a patch update for stalled run detection. The default heartbeat interval environment variable was adjusted from 30 to 20 seconds. No public APIs or exported entity declarations are altered, aside from internal class property and method adjustments and removal of the heartbeat service class. Changes
Sequence Diagram(s)sequenceDiagram
participant TaskRunProcess
participant RunExecution
participant HttpClient
TaskRunProcess->>RunExecution: emit onTaskRunHeartbeat(runId)
RunExecution->>RunExecution: validate runId
RunExecution->>HttpClient: heartbeatRun(runId, snapshotId)
HttpClient-->>RunExecution: heartbeat response
RunExecution->>RunExecution: log heartbeat, update lastHeartbeat
Possibly related PRs
Poem
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
packages/cli-v3/src/entryPoints/managed/execution.ts (2)
127-146
: Consider handling repeated heartbeat failures.
The event subscription and check for mismatched run ID are solid. Ifthis.onHeartbeat()
fails too often, you may want to retry or escalate beyond just logging the error.
871-891
: Enhance error handling inonHeartbeat
method.
Even though a failed heartbeat logs an error, consider designing a fallback or retry mechanism to detect multiple consecutive failures and prevent silent run stalls.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.changeset/tiny-buckets-teach.md
(1 hunks)packages/cli-v3/src/entryPoints/managed-run-worker.ts
(1 hunks)packages/cli-v3/src/entryPoints/managed/execution.ts
(5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/cli-v3/src/entryPoints/managed/execution.ts (2)
packages/cli-v3/src/executions/taskRunProcess.ts (1)
TaskRunProcess
(64-495)packages/core/src/utils.ts (1)
tryCatch
(5-18)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: units / 🧪 Unit Tests
🔇 Additional comments (5)
packages/cli-v3/src/entryPoints/managed/execution.ts (3)
67-68
: No concerns with introducing alastHeartbeat
field.
Tracking the last heartbeat timestamp is a sensible addition that aligns with the new approach.
108-108
: Initialization ofTaskRunProcess
looks correct.
Merging environment variables and then initializing is cohesive and follows a clear pattern.
907-907
: AppendinglastHeartbeat
to debug log is clear and helpful.
UsingtoISOString()
ensures a standardized UTC timestamp, which is typically best for distributed logging.packages/cli-v3/src/entryPoints/managed-run-worker.ts (1)
579-579
: Switching heartbeat messages to use the run ID is consistent.
This aligns perfectly with the new run-based heartbeat approach, ensuring the correct ID is tracked..changeset/tiny-buckets-teach.md (1)
1-5
: Changeset accurately reflects the fix for stalled run detection.
The patch update is succinct and aligned with the new heartbeat strategy.
🦋 Changeset detectedLatest commit: 1c7160a The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
packages/cli-v3/src/entryPoints/managed/execution.ts (3)
108-127
: Validate heartbeat interval environment variable.Passing
HEARTBEAT_INTERVAL_MS
is a clear way to communicate the interval to the child process. However, consider validating thatTRIGGER_HEARTBEAT_INTERVAL_SECONDS
is non-zero and positive to avoid unexpected behaviors if the environment variable is missing or misconfigured.
128-149
: Attach heartbeat event listener to run checks.This block appropriately verifies the incoming run ID before triggering the heartbeat. As a minor improvement, you might consider early-returning if the run is already canceled/aborted to skip sending extraneous heartbeats. Otherwise, this logic appears robust.
872-893
: Handle failed heartbeat attempts with optional retries.The new
onHeartbeat()
implementation works well for sending run heartbeat signals. For added resilience against transient network issues, consider implementing a lightweight retry or backoff strategy in the event of repeated failures, rather than stopping at a log message.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
packages/cli-v3/src/entryPoints/managed/env.ts
(1 hunks)packages/cli-v3/src/entryPoints/managed/execution.ts
(5 hunks)packages/cli-v3/src/entryPoints/managed/heartbeat.ts
(0 hunks)
💤 Files with no reviewable changes (1)
- packages/cli-v3/src/entryPoints/managed/heartbeat.ts
✅ Files skipped from review due to trivial changes (1)
- packages/cli-v3/src/entryPoints/managed/env.ts
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/cli-v3/src/entryPoints/managed/execution.ts (2)
packages/cli-v3/src/executions/taskRunProcess.ts (1)
TaskRunProcess
(64-495)packages/core/src/utils.ts (1)
tryCatch
(5-18)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: units / 🧪 Unit Tests
🔇 Additional comments (2)
packages/cli-v3/src/entryPoints/managed/execution.ts (2)
67-68
: Storing lastHeartbeat for improved diagnostics.Introducing a dedicated
lastHeartbeat
property helps track heartbeat timestamps in a straightforward way. This looks good and should enhance run troubleshooting.
908-908
: IncludelastHeartbeat
in debug logs.Adding
lastHeartbeat
to the log properties makes it easier to pinpoint the most recent successful heartbeat. This is a beneficial addition for monitoring and debugging.
Run heartbeats are now driven via the worker child process again.
Summary by CodeRabbit