Run Engine 2.0 trigger idempotency #1613

matt-aitken · 2025-01-14T12:22:57Z

In Run Engine 1.0 idempotency doesn't work well with triggerAndWait or batchTriggerAndWait. We did have support for this but there were some edge cases where the parent run wouldn't continue, so we disabled it.

This branch adds full idempotency support.

Dashboard

We now show idempotent runs when using triggerAndWait or batchTriggerAndWait. They appear like this:

Batch idempotencyKey change

There is one change to the behaviour, which we think is an improvement: when you specify an idempotencyKey on a batch (not an individual run inside a batch).

Batch idempotency example:

You specify an idempotencyKey on the batch itself
You specify some of the runs in the batch with their own idempotencyKey.

In this situation, the runs with idempotencyKey will use those. Any runs in the batch without an idempotencyKey will use the batches idempotencyKey plus the index of where they sit in the batch.

Previously the batches themselves had their own idempotencyKey. In practice this didn't work well as idempotencyKey are for preventing work from being done twice, and the unit of work is the run.

…potency

changeset-bot · 2025-01-14T12:23:02Z

⚠️ No Changeset found

Latest commit: 9bf52cc

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2025-01-14T12:23:05Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

…ing optional params through

… idempotencyKey in a single batch

…potency # Conflicts: # internal-packages/run-engine/src/engine/eventBus.ts # packages/core/src/v3/runtime/devRuntimeManager.ts

pkg-pr-new · 2025-01-15T12:41:37Z

@trigger.dev/react-hooks

npm i https://pkg.pr.new/triggerdotdev/trigger.dev/@trigger.dev/react-hooks@1613

@trigger.dev/rsc

npm i https://pkg.pr.new/triggerdotdev/trigger.dev/@trigger.dev/rsc@1613

@trigger.dev/build

npm i https://pkg.pr.new/triggerdotdev/trigger.dev/@trigger.dev/build@1613

@trigger.dev/sdk

npm i https://pkg.pr.new/triggerdotdev/trigger.dev/@trigger.dev/sdk@1613

trigger.dev

npm i https://pkg.pr.new/triggerdotdev/trigger.dev@1613

@trigger.dev/core

npm i https://pkg.pr.new/triggerdotdev/trigger.dev/@trigger.dev/core@1613

commit: 9bf52cc

@Map

* bump worker version * Suggested glossary for the RunEngine, TBC * Removed BatchTaskRun changes from this branch, they were done in main * Set the BatchTaskRun status to completed when all runs are completed * When dequeuing respect passed in maxResources * Ported over the new run props: idempotencyKeyExpiresAt, versions, oneTimeUseToken, maxDurationInSeconds * Didn’t hit save… the new props when triggering tasks passed through * Idempotency expiration + waitpoint edge case * WIP on creating checkpoint, parking for now * fix worker routes * upgrade webapp node types to support generic event emitter * separate event bus handler singleton and run failure alerts * duration waits * fix execution snapshot debug spans * task waits * fix event bus types * temporary fix for react hook run handle type * disable run notifications for now * convert any typecasts to expect errors to more easily fix later * fix webapp types after node types upgrade * updateEnvConcurrencyLimits across marqs and the runqueue * Pass proper values into the run engine * RunQueue settings and removed unused rebalancing workers * Remove rebalancing prop * Tidied more things up * Update/remove queue limits for MARQS and RunQueue * taskQueue/concurrencyLimit changes ported back into the RunEngine * Reworked completing waitpoints to improve performance and reduce race conditions * Improved test robustness * Down to a single run lock only when a run is totally unblocked and ready to continue * warm starts, worker notifications, wait fixes * Fix for Run Engine poll interval env var * Expect the waitpoint to be completed quickly * If a run is locked then it’s too late to expire it * Added VALKEY_ env vars and plugged them into the run engine * Extracted and updated the guard queue function so it can be used when batching * Added logging and universal concurrency changes to trigger task v1 * Added notes back in * Bump @trigger.dev/worker to 3.3.7 * reportInvocationUsage for the runAttemptStarted event * improve execution snapshot span debug span start times * Unfriendly IDs * update lockfile * Created a shared determineEngineVersion function * disable unfinished commands * save new cli config to different location, misc fixes * add basic engine version check via current deploy * new run engine will default to node 22 runtime * block some actions for projects on previous run engine * fix worker group tests * fix triggerAndWait test * one typescript version to rule them all * redlock type patch * fix type issues caused by ts-reset * improve cleanup scripts * add missing socket.io dep * fix run notification handler type * fix worker group test again * generate prisma client for e2e tests * remove worker group tests for now * prevent image pull rate limits during unit tests * increase timeout for queue concurrency limit test * generate prisma client for preview release * same node types everywhere * Updated engine readme, removed legacy system notes * use default machine preset from platform package * worker instances plural in schema * disable pnpm update notifications * return worker group details from connect call * add workers admin route * fix heartbeat route return type * move deployment labels to core apps * refactor run controller env schema * Add firstAttemptStartedAt to TaskRun * RunEngine 2.0 batch trigger support (#1581) * Make it clear when BatchTriggerV2Service is used * Copy of BatchTriggerV2Service * WIP batch triggering * Allow blocking a run with multiple waitpoints at once. Made it atomic * Removed unused param * New batch service * Pass through the parentRunId and resumeParentOnCompletion * Use the new batch service, and correct trigger task version * Force V1 engine if using BatchTriggerV2Service, we’ve already done the check at this point * Removed the $transaction and early exit if nothing changed * Adedd a simple batch task to the hello world reference catalog * Fix for batch waits not working * Added parentRunId in a couple more places * Removed waitForBatch log * Added another parentRunId * Expanded the example to include all the different triggers * More changes to blocking to support continuing after idempotent completed runs * Fix for the wrong type when blocking a run * remove @Map * optimise worker auth query * add engine version header to core api client requests * remove unique constraint for default group id * consolidate migrations * the first managed worker becomes the global default * Debug events off by default, added an admin toggle to show them * worker group name can't be an empty string * add exec helper to core * move machine resources to core * add pre-dequeue callback to determine max resources * optionally skip dequeue * bump worker package * move worker to core * fix ReadableStream type error * fix another type issue * update a few more tsconfigs * add metadata changes introduced in #1563 * Run Engine 2.0 trigger idempotency (#1613) * Return isCached from the trigger API endpoint * Fix for the wrong type when blocking a run * Render the idempotent run in the inspector * Event repository for idempotency * Debug events off by default, added an admin toggle to show them * triggerAndWait idempotency span * Some improvements to the reference idempotency task * Removed the cached tracing from the SDK * Server-side creating cached span * Improved idempotency test task * Create cached task spans in a better way * Idempotency span support inc batch trigger * Simplified how the spans are done, using more of the existing code * Improved the idempotency test task * Added Waitpoint Batch type, add to TaskRunWaitpoint with order * Pass batch ids through to the run engine when triggering * Added batchIndex * Better batch support in the run engine * Added settings to batch trigger service, before major overhaul * Allow the longer run/batch ids in the filters * Changed how batching works, includes breaking changes in CLI * Removed batch idempotency because it gets put on the runs instead * Added `runs` to the batch.retrieve call/API * Set firstAttemptStartedAt when creating the first attempt * Do nothing when receiving a BATCH waitpoint * Some fixes in the new batch trigger service… mostly just passing missing optional params through * Tweaked the idempotency test task for more situations * Only block with a batch if it’s a batchTriggerAndWait… 🤦‍♂️ * Added another case to the idempotency test task: multiple of the same idempotencyKey in a single batch * Support for the same run multiple times in the same batch * Small tweaks * Make sure to complete batches, even if they’re not andWait ones * Export RunDuplicateIdempotencyKeyError from the run engine * Latest lockfile * Trigger with a machine (old run engine) * RE2, allow setting machine when triggering * Fix for new glob patterns * add max run count to dequeue from version route * add worker instance name env var and header * queue consumer pre skip callback * poll for more runs after final execution errors * fix dequeue search param schema * add shortcut to debug switch * expose run engine timeouts as env vars * make warm start durations configurable * add optional status to json reply helper * fix preSkip hook, add debug logs * BLOCKED_BY_WAITPOINTS -> SUSPENDED * exit controller when run suspended * check if already replied before http reply * run controller will wait for next run after the current one is suspended * cancel run button shortcut * minimal event repository environment type * fix update metadata call * run suspension and misc fixes wip * change debug shortcut to shift + D * Started work on the Dev supervisor * Formatting * Fix for bad imports * Before rebuilding SSE * Presence updating from the CLI working via SSE * add worker notification debug logs * send run:stop when exiting run phase * skip current snapshot poll on worker notification * add more logs and route to submit run debug logs * add worker and runner ids to snapshots * improve run notification debug logs * add workload debug log route * misc run controller fixes and refactor * prevent parallel execution of critical functions * update bun to 1.2.1 * WIP with dev dequeuing * Method to convert friendlyIds to non-friendly, do nothing with actual ids * Set the engine on BackgroundWorker, lazily upgrade projects to engine V2 * Runs with ttls were getting immediately expired… oops. * Pass the Waiting for deploy reason through, so we have it on the execution snapshots * Fixed the logic for getting the right background worker for a run * Use the correct ID when dequeuing… * determineEngineVersion is now fully functional * Rate limiter ignores the dev endpoints * Retrieving a batch gives you the runIds * Set a unique version for the RE2 BatchTaskRun * add provisional changeset * The start of dev run execution is working * First dev run working * Moved the dev run controller closer to what Nick did with the managed one * export exec output type * Heartbeat fix: don’t heartbeat if _isHeartbeating == false * Dev runs get notifications, some dev bug fixes * Improved logging or dequeuing * We need to dequeue runs from the latest version too, for triggerAndWait * Ported Eric’s validateWorkerManifest with nicer errors * When flattening an idempotency key if part is undefined, return undefined * Dev logging fixes * Remove sigterm listener * Deprecating workers. Don’t specify a BackgroundWorker when dequeuing an environment * Deleted some old files. Renamed “managed” to “deploy” * When a build finishes, always copy the build dir (otherwise the first one gets trampled on by the 2nd) * Dev master queues should work differently * Deleting old workers * Added debounce function to core * Improvement to canceling * WIP on debounce canceling on socket disconnection * Added environment data to execution snapshots * Dev runs that have stalled get “Canceled” with a reason explaining why * Show CLI messaged when a connection to the platform is lost/restored * Fix TriggerTask after merge * Add trigger task v2 max attempts, replace some findUniques * Port the new queue logic to the run engine * More fixes post-merge * We weren’t setting a `retryConfig` up for the tests… it’s now required * Start the Redis worker inside the Run Engine… 🤦‍♂️ * Trying to make the testcontainers more reliable * Added keyPrefix: "engine:” * Badly placed bracket in trigger task * Better Redis namespacing * Fix for expired run not getting removed from the queue * Don’t create a redis client in the testcontainers, return the redisOptions instead * Cleanup redis client in the run lock tests * Fix for the RunQueue not supporting keyPrefix * Updated more of the RunQueue scripts rebalancing * Trying to make Redis more robust in the tests… * Improved test resiliciency more * Fix for delays (checkpoint check) * Increase the timeout slightly to fix ttl test * Added priority support when triggering * More wip trying to make test containers more reliable * batchTriggerAndWait test is still failing… some wip to try fix it * Fixed redis tests now we’re not providing a client * Separate Redis clients for the run engine worker/queue/runlock * Made the wait for duration test more resilient * Added idempotencyKeyExpiresAt to Waitpoints * Waitpoint timeouts and idempotency expiry * Use finishWaitpoint, removed extra worker job * Added waitpoint idempotency tests * Creating resume tokens is working * Some improvements to the resume tokens * Moved resumeTokens to just be wait functions 🥳 * Delete old RuntimeManagers * Wait for token is working * Better test for the wait tokens * Improved the test task some more * Hide the accessories in the span inspector * WIP on waitpoint inspector * WIP on complete waitpoint form * Span overview panel can be changed based on the entity type * Improved the waitpoint display * WIP on completing waitpoint form * Use the existing CodeBlock for the tip * Style improvements * Complete waitpoint * All waitpoint sidebar variants * Waits now use a pause icon * Durations waits use the API to create/block with a waitpoint, not the runtime * Fix for engine.blockRunWithWaitpoint required org id * Removed old wait code from the run controllers/task run process * Form action for skipping a datetime waitpoint * Move testDockerCheckpoint to a separate core package export (it can’t be bundled on the client) * Fix for glitchy hourglass animation * Completed waitpoints display better * Increase Redis maxRetriesPerRequest to 20 (default) * Completing and skipping waitpoints is working * Remove the database prisma dev command, since we need to use create only now. Updated docs * Added skip timeout, reworked the UI * Tweaked spacing * Added payload limit to waitpoint token completion from dashboard * Test idempotency works on wait.for and wait.until * Moved the worker-actions to /engine/ from /api/ * Moved dev engine endpoints to /engine/ from /api/ * Separate /engine/ rate limiter * Added parallel wait prevention, it’s working for duration waits but not well for triggerAndWait yet * WIP post-merge conflicts * Set taskEventStore column in the new engine * Remove duplicate keys * Post-merge fixes * Fix for span merge layout * Use executedAt instead of firstAttemptStartedAt --------- Co-authored-by: Matt Aitken <matt@mattaitken.com>

matt-aitken added 26 commits January 3, 2025 16:07

Return isCached from the trigger API endpoint

f1d5481

Fix for the wrong type when blocking a run

8642497

Render the idempotent run in the inspector

8d882d8

Event repository for idempotency

67c9085

Debug events off by default, added an admin toggle to show them

0e958e7

triggerAndWait idempotency span

c175ed9

Some improvements to the reference idempotency task

cf83a53

Merge remote-tracking branch 'origin/run-engine-2' into engine-2-idem…

3064c2a

…potency

Removed the cached tracing from the SDK

d3043da

Server-side creating cached span

1149c6e

Improved idempotency test task

4f9344a

Create cached task spans in a better way

59b30eb

Idempotency span support inc batch trigger

93c3d23

Simplified how the spans are done, using more of the existing code

fdfd064

Improved the idempotency test task

8a8aaac

Added Waitpoint Batch type, add to TaskRunWaitpoint with order

2d4c67c

Pass batch ids through to the run engine when triggering

4e9f8ca

Added batchIndex

bf6946a

Better batch support in the run engine

7038f4c

Added settings to batch trigger service, before major overhaul

041cd87

Allow the longer run/batch ids in the filters

641edd2

Changed how batching works, includes breaking changes in CLI

151a50a

Removed batch idempotency because it gets put on the runs instead

efc83c9

Added runs to the batch.retrieve call/API

1230871

Set firstAttemptStartedAt when creating the first attempt

5f0d45a

Do nothing when receiving a BATCH waitpoint

509438d

matt-aitken added 2 commits January 14, 2025 14:15

Some fixes in the new batch trigger service… mostly just passing miss…

7ec2726

…ing optional params through

Tweaked the idempotency test task for more situations

5dfa932

matt-aitken added 7 commits January 14, 2025 14:52

Only block with a batch if it’s a batchTriggerAndWait… 🤦‍♂️

bd6ee2c

Added another case to the idempotency test task: multiple of the same…

b73ca8b

… idempotencyKey in a single batch

Support for the same run multiple times in the same batch

6d0927e

Small tweaks

5ac9673

Make sure to complete batches, even if they’re not andWait ones

84ee4d3

Merge remote-tracking branch 'origin/run-engine-2' into engine-2-idem…

67060db

…potency # Conflicts: # internal-packages/run-engine/src/engine/eventBus.ts # packages/core/src/v3/runtime/devRuntimeManager.ts

Export RunDuplicateIdempotencyKeyError from the run engine

9bf52cc

matt-aitken merged commit c8b835a into run-engine-2 Jan 15, 2025
9 checks passed

matt-aitken deleted the engine-2-idempotency branch January 15, 2025 12:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run Engine 2.0 trigger idempotency #1613

Run Engine 2.0 trigger idempotency #1613

matt-aitken commented Jan 14, 2025 •

edited

Loading

changeset-bot bot commented Jan 14, 2025 •

edited

Loading

coderabbitai bot commented Jan 14, 2025

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

pkg-pr-new bot commented Jan 15, 2025

Run Engine 2.0 trigger idempotency #1613

Run Engine 2.0 trigger idempotency #1613

Conversation

matt-aitken commented Jan 14, 2025 • edited Loading

Dashboard

Batch idempotencyKey change

changeset-bot bot commented Jan 14, 2025 • edited Loading

⚠️ No Changeset found

coderabbitai bot commented Jan 14, 2025

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

pkg-pr-new bot commented Jan 15, 2025

matt-aitken commented Jan 14, 2025 •

edited

Loading

changeset-bot bot commented Jan 14, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)