Runlock telemetry #1974

matt-aitken · 2025-04-24T09:47:56Z

Added spans to the RunLock

Summary by CodeRabbit

New Features
- Improved system reliability and observability with enhanced locking mechanisms that now include detailed logging and tracing.
Bug Fixes
- Enhanced error handling and resource management during run processing and queue operations.
Tests
- Expanded test coverage for locking behavior, including error propagation, nested locks, and timeout scenarios.
Chores
- Updated TypeScript test configuration for improved compatibility.

coderabbitai · 2025-04-24T09:48:04Z

Walkthrough

This set of changes refactors the locking mechanism within the run-engine package by updating the RunLocker API and its usage across multiple systems. The RunLocker class now requires a logger and tracer, and its lock method signature includes a lock name string as the first argument, followed by resource identifiers and timeout. All internal calls to the locking mechanism throughout the engine and its subsystems have been updated to use the new signature. The locking logic is now instrumented for tracing and enhanced error logging. Corresponding tests have been expanded to cover error propagation and nested locking scenarios.

Changes

File(s)	Change Summary
internal-packages/run-engine/src/engine/index.ts	Updated `RunLocker` instantiation to include logger and tracer; updated lock calls to use string key as first argument.
internal-packages/run-engine/src/engine/locking.ts	`RunLocker` constructor now requires logger and tracer; `lock` method signature updated to include a name parameter; locking logic wrapped in tracing span; enhanced error handling and logging.
internal-packages/run-engine/src/engine/systems/checkpointSystem.ts	Updated lock calls in `createCheckpoint` and `continueRunExecution` to include a descriptive string key as first argument.
internal-packages/run-engine/src/engine/systems/delayedRunSystem.ts	Updated lock call in `rescheduleDelayedRun` to include a string key as first argument.
internal-packages/run-engine/src/engine/systems/dequeueSystem.ts	Updated lock calls to include a string key; refactored error handling, resource accounting, and snapshot management; improved maxAttempts logic and concurrency release logic.
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts	Updated lock call in `enqueueRun` to include a string key as first argument.
internal-packages/run-engine/src/engine/systems/releaseConcurrencySystem.ts	Updated lock call in `executeReleaseConcurrencyForSnapshot` to include a string key; added logic to determine concurrency release method based on queue properties.
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts	Updated all lock calls to include descriptive string keys as first argument across multiple methods.
internal-packages/run-engine/src/engine/systems/ttlSystem.ts	Updated lock call in `expireRun` to include a string key as first argument.
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts	Updated lock calls in `blockRunWithWaitpoint` and `continueRunIfUnblocked` to include string keys as first argument.
internal-packages/run-engine/src/engine/tests/locking.test.ts	Updated tests for new lock API; added tests for error propagation, nested locks, lock timeout, and nested lock state reporting.
internal-packages/run-engine/tsconfig.test.json	Added `customConditions` compiler option for test configuration.

Sequence Diagram(s)

sequenceDiagram
    participant System as Engine System (e.g., DequeueSystem)
    participant RunLocker
    participant Logger
    participant Tracer

    System->>RunLocker: lock("operationName", [resourceId], timeout, callback)
    RunLocker->>Tracer: startSpan("RunLocker.lock", {name, resources, timeout})
    alt Lock is nested
        RunLocker->>Tracer: setAttribute("nested", true)
        RunLocker->>callback: execute with existing lock signal
    else New lock needed
        RunLocker->>Tracer: setAttribute("nested", false)
        RunLocker->>Logger: log lock attempt
        RunLocker->>callback: execute with new lock signal
        alt Error occurs
            RunLocker->>Logger: log error details
            RunLocker->>Tracer: record error
            RunLocker->>System: throw error
        end
    end
    RunLocker->>Tracer: endSpan
    RunLocker->>System: return result

Possibly related PRs

triggerdotdev/trigger.dev#1761: Introduces a new Redis client factory with error handling and options, which is directly related as the main PR's locking improvements depend on robust Redis client management.

Suggested reviewers

ericallam

Poem

🐇
Locks now have names, and tracing is here,
With loggers and spans, intentions are clear.
Nested or single, the locks do their dance,
Errors are logged, no bugs left to chance.
From tests to the engine, the code is refined—
A hop, and a skip, and the locks are aligned!

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

internal-packages/run-engine/src/engine/index.ts (2)
95-99: Reuse the engine’s tracer instead of instantiating a separate one

RunEngine receives an externally-configured Tracer in options.tracer (assigned later on line 179). Creating a second tracer here breaks span correlation and doubles the instrumentation overhead.
-    this.runLock = new RunLocker({
-      redis: this.runLockRedis,
-      logger: this.logger,
-      tracer: trace.getTracer("RunLocker"),
-    });
+    // Re-use the same tracer that the rest of the engine will use
+    this.runLock = new RunLocker({
+      redis: this.runLockRedis,
+      logger: this.logger,
+      tracer: options.tracer ?? trace.getTracer("RunLocker"),
+    });
If options.tracer isn’t available yet, consider moving the this.tracer assignment above this block so you can pass this.tracer directly.

498-498: Minor: consider extracting lock name strings to enums/constants

Several callers now pass literal strings ("trigger", "dequeueFromMasterQueue", …). A central enum (e.g. RunLockSpan) prevents typos and eases refactors/searchability.

Not critical, just a maintainability suggestion.
internal-packages/run-engine/src/engine/tests/locking.test.ts (1)
130-143: Make the timeout test assertion implementation-agnostic

The test expects the error message to literally contain “unable to achieve a quorum”, which couples the suite to the wording of the underlying redlock library. Any library upgrade or customisation will break the test even though functionality is intact. Assert on the error type instead:
-).rejects.toThrow("unable to achieve a quorum");
+).rejects.toThrowError(/quorum|Lock/i);
or, better, check for a custom LockTimeoutError class thrown by RunLocker.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56ec052 and 1b0f6dc.

📒 Files selected for processing (12)

internal-packages/run-engine/src/engine/index.ts (3 hunks)
internal-packages/run-engine/src/engine/locking.ts (3 hunks)
internal-packages/run-engine/src/engine/systems/checkpointSystem.ts (2 hunks)
internal-packages/run-engine/src/engine/systems/delayedRunSystem.ts (1 hunks)
internal-packages/run-engine/src/engine/systems/dequeueSystem.ts (2 hunks)
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts (1 hunks)
internal-packages/run-engine/src/engine/systems/releaseConcurrencySystem.ts (1 hunks)
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts (5 hunks)
internal-packages/run-engine/src/engine/systems/ttlSystem.ts (1 hunks)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (2 hunks)
internal-packages/run-engine/src/engine/tests/locking.test.ts (3 hunks)
internal-packages/run-engine/tsconfig.test.json (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

internal-packages/run-engine/src/engine/systems/delayedRunSystem.ts (1)

internal-packages/run-engine/src/engine/index.ts (1)

runId (1159-1313)

internal-packages/run-engine/src/engine/systems/releaseConcurrencySystem.ts (2)

internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts (1)

getLatestExecutionSnapshot (28-100)

internal-packages/run-engine/src/engine/statuses.ts (1)

canReleaseConcurrency (49-52)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: typecheck / typecheck
GitHub Check: units / 🧪 Unit Tests
GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (21)

internal-packages/run-engine/src/engine/locking.ts (6)

6-8: Good addition of necessary imports

The imports for tryCatch, Logger, and tracing utilities provide the foundation for enhanced observability of locking operations.

18-19: Well-structured class property additions

Adding the logger and tracer as private properties with proper initialization in the constructor follows good OOP practices and enables the tracing enhancements.

Also applies to: 30-31

36-36: API extension with descriptive lock name parameter

Adding the name parameter is a good practice for better identification and debugging of locks. This makes lock usage more explicit and traceable.

44-77: Excellent implementation of tracing spans for lock operations

The lock method implementation has several well-designed features:

Proper wrapping of locking logic in a tracing span

Detailed span attributes for observability

Efficient handling of nested locks with the same resources

Robust error handling with logging

Preservation of the original behavior while adding observability

This approach will significantly improve debugging and monitoring of lock-related issues.

57-70: Well-implemented error handling with tryCatch

The use of tryCatch to capture and log errors before rethrowing them is an excellent pattern that enhances observability without changing behavior.

21-21:
✅ Verification successful

Breaking change to constructor signature handled appropriately

Expanding the constructor parameters to require logger and tracer instances is a breaking change, but necessary for the enhanced functionality. Make sure all instantiations of RunLocker across the codebase have been updated.

🏁 Script executed:
#!/bin/bash
# Find all instances where RunLocker is instantiated to verify they include logger and tracer
echo "Looking for RunLocker instantiations..."
rg -A 3 "new RunLocker" --type ts
Length of output: 3373
All RunLocker instances updated to include logger and tracer – no further action required.

Verification confirmed that every new RunLocker({...}) call in

internal-packages/run-engine/src/engine/index.ts

internal-packages/run-engine/src/engine/tests/locking.test.ts

already supplies redis, logger, and tracer.
internal-packages/run-engine/tsconfig.test.json (1)

19-19: Compiler configuration update

Adding customConditions for @triggerdotdev/source supports the test environment requirements for the new features.

internal-packages/run-engine/src/engine/systems/delayedRunSystem.ts (1)

40-40: Consistent usage of the updated lock API

The lock call has been properly updated to include the descriptive name "rescheduleDelayedRun" as the first parameter, following the new API signature.
internal-packages/run-engine/src/engine/systems/ttlSystem.ts (2)

27-27: Consistent usage of the updated lock API

The lock call has been correctly updated to include the descriptive name "expireRun" as the first parameter, maintaining consistency with the updated API signature.

25-118:
✅ Verification successful

Verify all lock calls across systems are updated

The changes to include a descriptive lock name are consistent in the files reviewed. However, we should verify all lock calls across the codebase have been updated to maintain consistency.

🏁 Script executed:
#!/bin/bash
# Find all calls to runLock.lock to verify they include the name parameter
echo "Looking for runLock.lock calls..."
rg "runLock\.lock\(" --type ts
Length of output: 4511
All runLock.lock calls include descriptive names
Verified that every call to runLock.lock across the codebase specifies a descriptive lock name as the first argument.
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts (5)

79-79: Updated lock method call with descriptive name

The lock call now includes a descriptive string identifier "startRunAttempt" as the first parameter, which improves traceability.

415-415: Consistent lock naming for improved observability

The lock call now includes a descriptive name "attemptSucceeded" that clearly identifies the operation context.

549-549: Enhanced lock telemetry

Adding the descriptive identifier "attemptFailed" to the lock operation provides better context for tracing and debugging.

853-853: Clear lock operation labeling

The lock call now includes a meaningful operation name "tryNackAndRequeue" which will help with telemetry and error tracking.

929-929: Improved lock traceability

The addition of "cancelRun" as a lock name provides explicit context for this critical operation in telemetry data.

internal-packages/run-engine/src/engine/systems/enqueueSystem.ts (1)

57-57: Added descriptive lock name for telemetry

The lock method call now includes "enqueueRun" as the first parameter, consistent with the updated RunLocker API and improving observability across the system.

internal-packages/run-engine/src/engine/systems/checkpointSystem.ts (2)

56-56: Enhanced lock telemetry with descriptive name

Adding "createCheckpoint" as the lock name provides clear context for this operation in traces and logs, consistent with the new locking API pattern.

241-241: Named lock for better observability

The lock call now includes "continueRunExecution" as an identifier, which improves tracing and debugging capabilities.

internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (2)

361-361: Improved lock operation identification

The lock method call now includes "blockRunWithWaitpoint" as a descriptive identifier, enhancing observability and fitting with the updated locking API.

512-512: Consistent lock naming pattern

The addition of "continueRunIfUnblocked" as a lock name maintains the consistent pattern of named locks across the codebase for better telemetry.
internal-packages/run-engine/src/engine/systems/releaseConcurrencySystem.ts (1)

200-205:
❓ Verification inconclusive

Callback may outlive 5 s lock TTL – use signal or increase TTL

The callback performs several network calls (DB fetch, queue ops). If latency spikes, the 5 000 ms TTL may lapse, causing a second worker to enter the same critical section.

Either
• raise the TTL to something less brittle (e.g. 15 000 ms), or
• call signal.throwIfAborted() (and/or extend()) after every awaited statement to guarantee exclusive ownership.
-  [snapshot.runId],
-  5_000,
+  [snapshot.runId],
+  15_000,              // or keep 5 000 and add periodic signal checks
🏁 Script executed:
#!/bin/bash
# Show the top of releaseConcurrencySystem.ts to understand imports and context
sed -n '1,100p' internal-packages/run-engine/src/engine/systems/releaseConcurrencySystem.ts

# Find all occurrences of runLock.lock to see usage patterns
rg -n "runLock\.lock" -n internal-packages

# Locate the runLock implementation file to inspect its lock() signature
RUNLOCK_FILE=$(fd runLock.ts internal-packages | head -n1)
echo "Found runLock implementation at: $RUNLOCK_FILE"
sed -n '1,200p' "$RUNLOCK_FILE"
Length of output: 7621
Callback may outlive 5s lock TTL – verify signal support or increase TTL
The lock callback in releaseConcurrencySystem performs multiple awaited operations (DB fetch, queue ops). If its execution exceeds the 5 000 ms TTL, the lock will expire and a second worker can enter the same critical section.

• Increase the TTL to a more robust value (e.g. 15 000 ms)
• Or leverage the signal passed into the callback—call signal.throwIfAborted() after each await (and/or use lock.extend() if your implementation supports it)
• Confirm the runLock.lock signature on SystemResources actually passes an AbortSignal and exposes an extension API before relying on signal checks

coderabbitai · 2025-04-24T09:55:09Z

internal-packages/run-engine/src/engine/index.ts

+    return await this.runLock.lock("handleStalledSnapshot", [runId], 5_000, async () => {
      const latestSnapshot = await getLatestExecutionSnapshot(prisma, runId);


💡 Verification agent

🧩 Analysis chain

Long-running callback risks lock expiry; propagate signal checks

#handleStalledSnapshot performs multiple DB round-trips and conditional branches that can easily exceed the 5 s lock TTL. Unlike the trigger path, the callback never calls signal.throwIfAborted() nor refreshes the lock, so a competing worker could acquire the same lock mid-execution.

Add periodic checks or extend the lock inside the callback, e.g.:

await this.$.runLock.lock("handleStalledSnapshot", [runId], 10_000, async (signal) => { const latestSnapshot = await getLatestExecutionSnapshot(prisma, runId); signal.throwIfAborted(); // ✅ after IO // … extra logic … if (complexCondition) { signal.throwIfAborted(); // ✅ before continuing } });

Also consider a slightly larger TTL (e.g. 10–15 s) to cover worst-case DB latency.

🏁 Script executed:

#!/bin/bash set -e echo "----- Searching runLock usage in index.ts -----" rg -n "runLock" internal-packages/run-engine/src/engine/index.ts echo "----- Searching import of runLock -----" rg -n "import.*runLock" internal-packages/run-engine/src/engine/index.ts echo "----- Searching lock method signature usage -----" rg -n "runLock.lock" -R internal-packages echo "----- Searching class definitions for ‘Lock’ -----" rg -n "class .*Lock" -R internal-packages

Length of output: 988

🏁 Script executed:

#!/bin/bash set -e echo "----- Locating RunLocker class -----" FILE=$(rg -l "class RunLocker" -R internal-packages | head -n1) if [[ -z "$FILE" ]]; then echo "RunLocker class not found." exit 1 fi echo "Found RunLocker in $FILE" echo "----- Displaying lock method signature -----" rg -n "lock(" "$FILE" | head -n20

Length of output: 650

Ensure handleStalledSnapshot lock callback handles abort signal and extends TTL

The callback passed to
internal-packages/run-engine/src/engine/index.ts:1169
currently omits the signal parameter, preventing any signal.throwIfAborted() checks and risking lock expiry during its multiple DB round-trips. To prevent a competing worker from stealing the lock mid-execution and to handle timeouts:

Change the callback signature to accept signal: AbortSignal and bump the TTL:

// before (lines 1169–1170) await this.runLock.lock("handleStalledSnapshot", [runId], 5_000, async () => { const latestSnapshot = await getLatestExecutionSnapshot(prisma, runId);

// after await this.runLock.lock("handleStalledSnapshot", [runId], 10_000, async (signal) => { const latestSnapshot = await getLatestExecutionSnapshot(prisma, runId); signal.throwIfAborted(); // ✅ after DB read // … additional logic … if (complexCondition) { signal.throwIfAborted(); // ✅ before continuing } });

Insert signal.throwIfAborted() after each heavy I/O or before key branches.

Consider a TTL of 10 000–15 000 ms to cover worst-case DB latency.

coderabbitai · 2025-04-24T09:55:10Z

internal-packages/run-engine/src/engine/systems/dequeueSystem.ts

+                  const parsedConfig = RetryOptions.nullable().safeParse(retryConfig);

-                this.$.logger.debug(
-                  "RunEngine.dequeueFromMasterQueue(): maxAttempts not set, using task's retry config",
-                  {
-                    runId,
-                    task: result.task.id,
-                    rawRetryConfig: retryConfig,
+                  if (!parsedConfig.success) {
+                    this.$.logger.error(
+                      "RunEngine.dequeueFromMasterQueue(): Invalid retry config",
+                      {
+                        runId,
+                        task: result.task.id,
+                        rawRetryConfig: retryConfig,
+                      }
+                    );
                  }
-                );
-
-                const parsedConfig = RetryOptions.nullable().safeParse(retryConfig);

-                if (!parsedConfig.success) {
-                  this.$.logger.error("RunEngine.dequeueFromMasterQueue(): Invalid retry config", {
-                    runId,
-                    task: result.task.id,
-                    rawRetryConfig: retryConfig,
-                  });
+                  maxAttempts = parsedConfig.data?.maxAttempts;
                }


⚠️ Potential issue

Type-safety bug: accessing .data on failed Zod parse

When parsedConfig.success === false, the union type no longer exposes .data. Accessing it unconditionally causes a compile-time error and will break strict builds.

-const parsedConfig = RetryOptions.nullable().safeParse(retryConfig); - -if (!parsedConfig.success) { - this.$.logger.error(/* … */); -} - -maxAttempts = parsedConfig.data?.maxAttempts; +const retryParse = RetryOptions.nullable().safeParse(retryConfig); + +if (retryParse.success) { + maxAttempts = retryParse.data?.maxAttempts; +} else { + this.$.logger.error(/* … */); +}

This guarantees both sound typings and correct runtime behaviour.

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2025-04-24T09:55:10Z

internal-packages/run-engine/src/engine/systems/dequeueSystem.ts

+                consumedResources.cpu += machinePreset.cpu;
+                consumedResources.memory += machinePreset.memory;
+
+                //are we under the limit?
+                if (maxResources) {
+                  if (
+                    consumedResources.cpu > maxResources.cpu ||
+                    consumedResources.memory > maxResources.memory
+                  ) {
+                    this.$.logger.debug(


⚠️ Potential issue

Resource accounting can exceed limits by one message

consumedResources is incremented before you validate the totals. If the next run tips the totals over maxResources, the function nacks the message but leaves the over-counted values in the accumulator, potentially blocking subsequent (valid) runs in the same loop iteration.

Compute the prospective totals first:

-consumedResources.cpu += machinePreset.cpu; -consumedResources.memory += machinePreset.memory; - -//are we under the limit? -if (maxResources) { - if (consumedResources.cpu > maxResources.cpu || - consumedResources.memory > maxResources.memory) { +const prospectiveCpu = consumedResources.cpu + machinePreset.cpu; +const prospectiveMem = consumedResources.memory + machinePreset.memory; + +if (maxResources && + (prospectiveCpu > maxResources.cpu || prospectiveMem > maxResources.memory)) { // … return null; } } + +consumedResources.cpu = prospectiveCpu; +consumedResources.memory = prospectiveMem;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

consumedResources.cpu += machinePreset.cpu;

consumedResources.memory += machinePreset.memory;

//are we under the limit?

if (maxResources) {

if (

consumedResources.cpu > maxResources.cpu ||

consumedResources.memory > maxResources.memory

) {

this.$.logger.debug(

// compute what the totals would be *if* we accepted this message

const prospectiveCpu = consumedResources.cpu + machinePreset.cpu;

const prospectiveMem = consumedResources.memory + machinePreset.memory;

// reject early if we’d exceed our budget

if (maxResources &&

(prospectiveCpu > maxResources.cpu || prospectiveMem > maxResources.memory)) {

// …preserve existing debug logging

this.$.logger.debug(

`Resource limit exceeded: CPU ${prospectiveCpu}/${maxResources.cpu}, ` +

`Memory ${prospectiveMem}/${maxResources.memory}`

);

return null;

}

// now that we know it's safe, commit the consumption

consumedResources.cpu = prospectiveCpu;

consumedResources.memory = prospectiveMem;

matt-aitken added 5 commits April 23, 2025 11:21

Added error throwing tests to runlock

11e1cfc

Added some more run lock tests

16e89f4

Added customConditions so the test can use the Logger

b4bc069

Fix for JSON error

30712b0

Added a span to the run lock with a name

1b0f6dc

coderabbitai bot reviewed Apr 24, 2025

View reviewed changes

ericallam approved these changes Apr 24, 2025

View reviewed changes

matt-aitken merged commit 18f7766 into main Apr 24, 2025
11 of 13 checks passed

matt-aitken deleted the runlock branch April 24, 2025 12:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Runlock telemetry #1974

Runlock telemetry #1974

Uh oh!

matt-aitken commented Apr 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 24, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 24, 2025

Uh oh!

coderabbitai bot Apr 24, 2025

Uh oh!

coderabbitai bot Apr 24, 2025

Uh oh!

Uh oh!

Uh oh!

		return await this.runLock.lock("handleStalledSnapshot", [runId], 5_000, async () => {
		const latestSnapshot = await getLatestExecutionSnapshot(prisma, runId);

-                consumedResources.cpu += machinePreset.cpu;
-                consumedResources.memory += machinePreset.memory;
-                //are we under the limit?
-                if (maxResources) {
-                  if (
-                    consumedResources.cpu > maxResources.cpu ||
-                    consumedResources.memory > maxResources.memory
-                  ) {
-                    this.$.logger.debug(
+                // compute what the totals would be *if* we accepted this message
+                const prospectiveCpu = consumedResources.cpu + machinePreset.cpu;
+                const prospectiveMem = consumedResources.memory + machinePreset.memory;
+                // reject early if we’d exceed our budget
+                if (maxResources &&
+                    (prospectiveCpu > maxResources.cpu || prospectiveMem > maxResources.memory)) {
+                    // …preserve existing debug logging
+                    this.$.logger.debug(
+                      `Resource limit exceeded: CPU ${prospectiveCpu}/${maxResources.cpu}, ` +
+                      `Memory ${prospectiveMem}/${maxResources.memory}`
+                    );
+                    return null;
+                }
+                // now that we know it's safe, commit the consumption
+                consumedResources.cpu = prospectiveCpu;
+                consumedResources.memory = prospectiveMem;

Uh oh!

Runlock telemetry #1974

Runlock telemetry #1974

Uh oh!

Conversation

matt-aitken commented Apr 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

matt-aitken commented Apr 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 24, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)