Skip to content

Run Engine 2.0 trigger idempotency #1613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Jan 15, 2025
Merged

Conversation

matt-aitken
Copy link
Member

@matt-aitken matt-aitken commented Jan 14, 2025

In Run Engine 1.0 idempotency doesn't work well with triggerAndWait or batchTriggerAndWait. We did have support for this but there were some edge cases where the parent run wouldn't continue, so we disabled it.

This branch adds full idempotency support.

Dashboard

We now show idempotent runs when using triggerAndWait or batchTriggerAndWait. They appear like this:

CleanShot 2025-01-14 at 12 23 34

Batch idempotencyKey change

There is one change to the behaviour, which we think is an improvement: when you specify an idempotencyKey on a batch (not an individual run inside a batch).

Batch idempotency example:

  • You specify an idempotencyKey on the batch itself
  • You specify some of the runs in the batch with their own idempotencyKey.

In this situation, the runs with idempotencyKey will use those. Any runs in the batch without an idempotencyKey will use the batches idempotencyKey plus the index of where they sit in the batch.

Previously the batches themselves had their own idempotencyKey. In practice this didn't work well as idempotencyKey are for preventing work from being done twice, and the unit of work is the run.

Copy link

changeset-bot bot commented Jan 14, 2025

⚠️ No Changeset found

Latest commit: 9bf52cc

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

coderabbitai bot commented Jan 14, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

pkg-pr-new bot commented Jan 15, 2025

@trigger.dev/react-hooks

npm i https://pkg.pr.new/triggerdotdev/trigger.dev/@trigger.dev/react-hooks@1613

@trigger.dev/rsc

npm i https://pkg.pr.new/triggerdotdev/trigger.dev/@trigger.dev/rsc@1613

@trigger.dev/build

npm i https://pkg.pr.new/triggerdotdev/trigger.dev/@trigger.dev/build@1613

@trigger.dev/sdk

npm i https://pkg.pr.new/triggerdotdev/trigger.dev/@trigger.dev/sdk@1613

trigger.dev

npm i https://pkg.pr.new/triggerdotdev/trigger.dev@1613

@trigger.dev/core

npm i https://pkg.pr.new/triggerdotdev/trigger.dev/@trigger.dev/core@1613

commit: 9bf52cc

@matt-aitken matt-aitken merged commit c8b835a into run-engine-2 Jan 15, 2025
9 checks passed
@matt-aitken matt-aitken deleted the engine-2-idempotency branch January 15, 2025 12:50
ericallam pushed a commit that referenced this pull request Mar 5, 2025
* bump worker version

* Suggested glossary for the RunEngine, TBC

* Removed BatchTaskRun changes from this branch, they were done in main

* Set the BatchTaskRun status to completed when all runs are completed

* When dequeuing respect passed in maxResources

* Ported over the new run props: idempotencyKeyExpiresAt, versions, oneTimeUseToken, maxDurationInSeconds

* Didn’t hit save… the new props when triggering tasks passed through

* Idempotency expiration + waitpoint edge case

* WIP on creating checkpoint, parking for now

* fix worker routes

* upgrade webapp node types to support generic event emitter

* separate event bus handler singleton and run failure alerts

* duration waits

* fix execution snapshot debug spans

* task waits

* fix event bus types

* temporary fix for react hook run handle type

* disable run notifications for now

* convert any typecasts to expect errors to more easily fix later

* fix webapp types after node types upgrade

* updateEnvConcurrencyLimits across marqs and the runqueue

* Pass proper values into the run engine

* RunQueue settings and removed unused rebalancing workers

* Remove rebalancing prop

* Tidied more things up

* Update/remove queue limits for MARQS and RunQueue

* taskQueue/concurrencyLimit changes ported back into the RunEngine

* Reworked completing waitpoints to improve performance and reduce race conditions

* Improved test robustness

* Down to a single run lock only when a run is totally unblocked and ready to continue

* warm starts, worker notifications, wait fixes

* Fix for Run Engine poll interval env var

* Expect the waitpoint to be completed quickly

* If a run is locked then it’s too late to expire it

* Added VALKEY_ env vars and plugged them into the run engine

* Extracted and updated the guard queue function so it can be used when batching

* Added logging and universal concurrency changes to trigger task v1

* Added notes back in

* Bump @trigger.dev/worker to 3.3.7

* reportInvocationUsage for the runAttemptStarted event

* improve execution snapshot span debug span start times

* Unfriendly IDs

* update lockfile

* Created a shared determineEngineVersion function

* disable unfinished commands

* save new cli config to different location, misc fixes

* add basic engine version check via current deploy

* new run engine will default to node 22 runtime

* block some actions for projects on previous run engine

* fix worker group tests

* fix triggerAndWait test

* one typescript version to rule them all

* redlock type patch

* fix type issues caused by ts-reset

* improve cleanup scripts

* add missing socket.io dep

* fix run notification handler type

* fix worker group test again

* generate prisma client for e2e tests

* remove worker group tests for now

* prevent image pull rate limits during unit tests

* increase timeout for queue concurrency limit test

* generate prisma client for preview release

* same node types everywhere

* Updated engine readme, removed legacy system notes

* use default machine preset from platform package

* worker instances plural in schema

* disable pnpm update notifications

* return worker group details from connect call

* add workers admin route

* fix heartbeat route return type

* move deployment labels to core apps

* refactor run controller env schema

* Add firstAttemptStartedAt to TaskRun

* RunEngine 2.0 batch trigger support (#1581)

* Make it clear when BatchTriggerV2Service is used

* Copy of BatchTriggerV2Service

* WIP batch triggering

* Allow blocking a run with multiple waitpoints at once. Made it atomic

* Removed unused param

* New batch service

* Pass through the parentRunId and resumeParentOnCompletion

* Use the new batch service, and correct trigger task version

* Force V1 engine if using BatchTriggerV2Service, we’ve already done the check at this point

* Removed the $transaction and early exit if nothing changed

* Adedd a simple batch task to the hello world reference catalog

* Fix for batch waits not working

* Added parentRunId in a couple more places

* Removed waitForBatch log

* Added another parentRunId

* Expanded the example to include all the different triggers

* More changes to blocking to support continuing after idempotent completed runs

* Fix for the wrong type when blocking a run

* remove @Map

* optimise worker auth query

* add engine version header to core api client requests

* remove unique constraint for default group id

* consolidate migrations

* the first managed worker becomes the global default

* Debug events off by default, added an admin toggle to show them

* worker group name can't be an empty string

* add exec helper to core

* move machine resources to core

* add pre-dequeue callback to determine max resources

* optionally skip dequeue

* bump worker package

* move worker to core

* fix ReadableStream type error

* fix another type issue

* update a few more tsconfigs

* add metadata changes introduced in #1563

* Run Engine 2.0 trigger idempotency (#1613)

* Return isCached from the trigger API endpoint

* Fix for the wrong type when blocking a run

* Render the idempotent run in the inspector

* Event repository for idempotency

* Debug events off by default, added an admin toggle to show them

* triggerAndWait idempotency span

* Some improvements to the reference idempotency task

* Removed the cached tracing from the SDK

* Server-side creating cached span

* Improved idempotency test task

* Create cached task spans in a better way

* Idempotency span support inc batch trigger

* Simplified how the spans are done, using more of the existing code

* Improved the idempotency test task

* Added Waitpoint Batch type, add to TaskRunWaitpoint with order

* Pass batch ids through to the run engine when triggering

* Added batchIndex

* Better batch support in the run engine

* Added settings to batch trigger service, before major overhaul

* Allow the longer run/batch ids in the filters

* Changed how batching works, includes breaking changes in CLI

* Removed batch idempotency because it gets put on the runs instead

* Added `runs` to the batch.retrieve call/API

* Set firstAttemptStartedAt when creating the first attempt

* Do nothing when receiving a BATCH waitpoint

* Some fixes in the new batch trigger service… mostly just passing missing optional params through

* Tweaked the idempotency test task for more situations

* Only block with a batch if it’s a batchTriggerAndWait… 🤦‍♂️

* Added another case to the idempotency test task: multiple of the same idempotencyKey in a single batch

* Support for the same run multiple times in the same batch

* Small tweaks

* Make sure to complete batches, even if they’re not andWait ones

* Export RunDuplicateIdempotencyKeyError from the run engine

* Latest lockfile

* Trigger with a machine (old run engine)

* RE2, allow setting machine when triggering

* Fix for new glob patterns

* add max run count to dequeue from version route

* add worker instance name env var and header

* queue consumer pre skip callback

* poll for more runs after final execution errors

* fix dequeue search param schema

* add shortcut to debug switch

* expose run engine timeouts as env vars

* make warm start durations configurable

* add optional status to json reply helper

* fix preSkip hook, add debug logs

* BLOCKED_BY_WAITPOINTS -> SUSPENDED

* exit controller when run suspended

* check if already replied before http reply

* run controller will wait for next run after the current one is suspended

* cancel run button shortcut

* minimal event repository environment type

* fix update metadata call

* run suspension and misc fixes wip

* change debug shortcut to shift + D

* Started work on the Dev supervisor

* Formatting

* Fix for bad imports

* Before rebuilding SSE

* Presence updating from the CLI working via SSE

* add worker notification debug logs

* send run:stop when exiting run phase

* skip current snapshot poll on worker notification

* add more logs and route to submit run debug logs

* add worker and runner ids to snapshots

* improve run notification debug logs

* add workload debug log route

* misc run controller fixes and refactor

* prevent parallel execution of critical functions

* update bun to 1.2.1

* WIP with dev dequeuing

* Method to convert friendlyIds to non-friendly, do nothing with actual ids

* Set the engine on BackgroundWorker, lazily upgrade projects to engine V2

* Runs with ttls were getting immediately expired… oops.

* Pass the Waiting for deploy reason through, so we have it on the execution snapshots

* Fixed the logic for getting the right background worker for a run

* Use the correct ID when dequeuing…

* determineEngineVersion is now fully functional

* Rate limiter ignores the dev endpoints

* Retrieving a batch gives you the runIds

* Set a unique version for the RE2 BatchTaskRun

* add provisional changeset

* The start of dev run execution is working

* First dev run working

* Moved the dev run controller closer to what Nick did with the managed one

* export exec output type

* Heartbeat fix: don’t heartbeat if _isHeartbeating == false

* Dev runs get notifications, some dev bug fixes

* Improved logging or dequeuing

* We need to dequeue runs from the latest version too, for triggerAndWait

* Ported Eric’s validateWorkerManifest with nicer errors

* When flattening an idempotency key if part is undefined, return undefined

* Dev logging fixes

* Remove sigterm listener

* Deprecating workers. Don’t specify a BackgroundWorker when dequeuing an environment

* Deleted some old files. Renamed “managed” to “deploy”

* When a build finishes, always copy the build dir (otherwise the first one gets trampled on by the 2nd)

* Dev master queues should work differently

* Deleting old workers

* Added debounce function to core

* Improvement to canceling

* WIP on debounce canceling on socket disconnection

* Added environment data to execution snapshots

* Dev runs that have stalled get “Canceled” with a reason explaining why

* Show CLI messaged when a connection to the platform is lost/restored

* Fix TriggerTask after merge

* Add trigger task v2 max attempts, replace some findUniques

* Port the new queue logic to the run engine

* More fixes post-merge

* We weren’t setting a `retryConfig` up for the tests… it’s now required

* Start the Redis worker inside the Run Engine… 🤦‍♂️

* Trying to make the testcontainers more reliable

* Added keyPrefix: "engine:”

* Badly placed bracket in trigger task

* Better Redis namespacing

* Fix for expired run not getting removed from the queue

* Don’t create a redis client in the testcontainers, return the redisOptions instead

* Cleanup redis client in the run lock tests

* Fix for the RunQueue not supporting keyPrefix

* Updated more of the RunQueue scripts rebalancing

* Trying to make Redis more robust in the tests…

* Improved test resiliciency more

* Fix for delays (checkpoint check)

* Increase the timeout slightly to fix ttl test

* Added priority support when triggering

* More wip trying to make test containers more reliable

* batchTriggerAndWait test is still failing… some wip to try fix it

* Fixed redis tests now we’re not providing a client

* Separate Redis clients for the run engine worker/queue/runlock

* Made the wait for duration test more resilient

* Added idempotencyKeyExpiresAt to Waitpoints

* Waitpoint timeouts and idempotency expiry

* Use finishWaitpoint, removed extra worker job

* Added waitpoint idempotency tests

* Creating resume tokens is working

* Some improvements to the resume tokens

* Moved resumeTokens to just be wait functions 🥳

* Delete old RuntimeManagers

* Wait for token is working

* Better test for the wait tokens

* Improved the test task some more

* Hide the accessories in the span inspector

* WIP on waitpoint inspector

* WIP on complete waitpoint form

* Span overview panel can be changed based on the entity type

* Improved the waitpoint display

* WIP on completing waitpoint form

* Use the existing CodeBlock for the tip

* Style improvements

* Complete waitpoint

* All waitpoint sidebar variants

* Waits now use a pause icon

* Durations waits use the API to create/block with a waitpoint, not the runtime

* Fix for engine.blockRunWithWaitpoint required org id

* Removed old wait code from the run controllers/task run process

* Form action for skipping a datetime waitpoint

* Move testDockerCheckpoint to a separate core package export (it can’t be bundled on the client)

* Fix for glitchy hourglass animation

* Completed waitpoints display better

* Increase Redis maxRetriesPerRequest to 20 (default)

* Completing and skipping waitpoints is working

* Remove the database prisma dev command, since we need to use create only now. Updated docs

* Added skip timeout, reworked the UI

* Tweaked spacing

* Added payload limit to waitpoint token completion from dashboard

* Test idempotency works on wait.for and wait.until

* Moved the worker-actions to /engine/ from /api/

* Moved dev engine endpoints to /engine/ from /api/

* Separate /engine/ rate limiter

* Added parallel wait prevention, it’s working for duration waits but not well for triggerAndWait yet

* WIP post-merge conflicts

* Set taskEventStore column in the new engine

* Remove duplicate keys

* Post-merge fixes

* Fix for span merge layout

* Use executedAt instead of firstAttemptStartedAt

---------

Co-authored-by: Matt Aitken <matt@mattaitken.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant