-
-
Notifications
You must be signed in to change notification settings - Fork 708
Run Engine 2.0 trigger idempotency #1613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
… idempotencyKey in a single batch
…potency # Conflicts: # internal-packages/run-engine/src/engine/eventBus.ts # packages/core/src/v3/runtime/devRuntimeManager.ts
@trigger.dev/react-hooks
@trigger.dev/rsc
@trigger.dev/build
@trigger.dev/sdk
trigger.dev
@trigger.dev/core
commit: |
* bump worker version * Suggested glossary for the RunEngine, TBC * Removed BatchTaskRun changes from this branch, they were done in main * Set the BatchTaskRun status to completed when all runs are completed * When dequeuing respect passed in maxResources * Ported over the new run props: idempotencyKeyExpiresAt, versions, oneTimeUseToken, maxDurationInSeconds * Didn’t hit save… the new props when triggering tasks passed through * Idempotency expiration + waitpoint edge case * WIP on creating checkpoint, parking for now * fix worker routes * upgrade webapp node types to support generic event emitter * separate event bus handler singleton and run failure alerts * duration waits * fix execution snapshot debug spans * task waits * fix event bus types * temporary fix for react hook run handle type * disable run notifications for now * convert any typecasts to expect errors to more easily fix later * fix webapp types after node types upgrade * updateEnvConcurrencyLimits across marqs and the runqueue * Pass proper values into the run engine * RunQueue settings and removed unused rebalancing workers * Remove rebalancing prop * Tidied more things up * Update/remove queue limits for MARQS and RunQueue * taskQueue/concurrencyLimit changes ported back into the RunEngine * Reworked completing waitpoints to improve performance and reduce race conditions * Improved test robustness * Down to a single run lock only when a run is totally unblocked and ready to continue * warm starts, worker notifications, wait fixes * Fix for Run Engine poll interval env var * Expect the waitpoint to be completed quickly * If a run is locked then it’s too late to expire it * Added VALKEY_ env vars and plugged them into the run engine * Extracted and updated the guard queue function so it can be used when batching * Added logging and universal concurrency changes to trigger task v1 * Added notes back in * Bump @trigger.dev/worker to 3.3.7 * reportInvocationUsage for the runAttemptStarted event * improve execution snapshot span debug span start times * Unfriendly IDs * update lockfile * Created a shared determineEngineVersion function * disable unfinished commands * save new cli config to different location, misc fixes * add basic engine version check via current deploy * new run engine will default to node 22 runtime * block some actions for projects on previous run engine * fix worker group tests * fix triggerAndWait test * one typescript version to rule them all * redlock type patch * fix type issues caused by ts-reset * improve cleanup scripts * add missing socket.io dep * fix run notification handler type * fix worker group test again * generate prisma client for e2e tests * remove worker group tests for now * prevent image pull rate limits during unit tests * increase timeout for queue concurrency limit test * generate prisma client for preview release * same node types everywhere * Updated engine readme, removed legacy system notes * use default machine preset from platform package * worker instances plural in schema * disable pnpm update notifications * return worker group details from connect call * add workers admin route * fix heartbeat route return type * move deployment labels to core apps * refactor run controller env schema * Add firstAttemptStartedAt to TaskRun * RunEngine 2.0 batch trigger support (#1581) * Make it clear when BatchTriggerV2Service is used * Copy of BatchTriggerV2Service * WIP batch triggering * Allow blocking a run with multiple waitpoints at once. Made it atomic * Removed unused param * New batch service * Pass through the parentRunId and resumeParentOnCompletion * Use the new batch service, and correct trigger task version * Force V1 engine if using BatchTriggerV2Service, we’ve already done the check at this point * Removed the $transaction and early exit if nothing changed * Adedd a simple batch task to the hello world reference catalog * Fix for batch waits not working * Added parentRunId in a couple more places * Removed waitForBatch log * Added another parentRunId * Expanded the example to include all the different triggers * More changes to blocking to support continuing after idempotent completed runs * Fix for the wrong type when blocking a run * remove @Map * optimise worker auth query * add engine version header to core api client requests * remove unique constraint for default group id * consolidate migrations * the first managed worker becomes the global default * Debug events off by default, added an admin toggle to show them * worker group name can't be an empty string * add exec helper to core * move machine resources to core * add pre-dequeue callback to determine max resources * optionally skip dequeue * bump worker package * move worker to core * fix ReadableStream type error * fix another type issue * update a few more tsconfigs * add metadata changes introduced in #1563 * Run Engine 2.0 trigger idempotency (#1613) * Return isCached from the trigger API endpoint * Fix for the wrong type when blocking a run * Render the idempotent run in the inspector * Event repository for idempotency * Debug events off by default, added an admin toggle to show them * triggerAndWait idempotency span * Some improvements to the reference idempotency task * Removed the cached tracing from the SDK * Server-side creating cached span * Improved idempotency test task * Create cached task spans in a better way * Idempotency span support inc batch trigger * Simplified how the spans are done, using more of the existing code * Improved the idempotency test task * Added Waitpoint Batch type, add to TaskRunWaitpoint with order * Pass batch ids through to the run engine when triggering * Added batchIndex * Better batch support in the run engine * Added settings to batch trigger service, before major overhaul * Allow the longer run/batch ids in the filters * Changed how batching works, includes breaking changes in CLI * Removed batch idempotency because it gets put on the runs instead * Added `runs` to the batch.retrieve call/API * Set firstAttemptStartedAt when creating the first attempt * Do nothing when receiving a BATCH waitpoint * Some fixes in the new batch trigger service… mostly just passing missing optional params through * Tweaked the idempotency test task for more situations * Only block with a batch if it’s a batchTriggerAndWait… 🤦♂️ * Added another case to the idempotency test task: multiple of the same idempotencyKey in a single batch * Support for the same run multiple times in the same batch * Small tweaks * Make sure to complete batches, even if they’re not andWait ones * Export RunDuplicateIdempotencyKeyError from the run engine * Latest lockfile * Trigger with a machine (old run engine) * RE2, allow setting machine when triggering * Fix for new glob patterns * add max run count to dequeue from version route * add worker instance name env var and header * queue consumer pre skip callback * poll for more runs after final execution errors * fix dequeue search param schema * add shortcut to debug switch * expose run engine timeouts as env vars * make warm start durations configurable * add optional status to json reply helper * fix preSkip hook, add debug logs * BLOCKED_BY_WAITPOINTS -> SUSPENDED * exit controller when run suspended * check if already replied before http reply * run controller will wait for next run after the current one is suspended * cancel run button shortcut * minimal event repository environment type * fix update metadata call * run suspension and misc fixes wip * change debug shortcut to shift + D * Started work on the Dev supervisor * Formatting * Fix for bad imports * Before rebuilding SSE * Presence updating from the CLI working via SSE * add worker notification debug logs * send run:stop when exiting run phase * skip current snapshot poll on worker notification * add more logs and route to submit run debug logs * add worker and runner ids to snapshots * improve run notification debug logs * add workload debug log route * misc run controller fixes and refactor * prevent parallel execution of critical functions * update bun to 1.2.1 * WIP with dev dequeuing * Method to convert friendlyIds to non-friendly, do nothing with actual ids * Set the engine on BackgroundWorker, lazily upgrade projects to engine V2 * Runs with ttls were getting immediately expired… oops. * Pass the Waiting for deploy reason through, so we have it on the execution snapshots * Fixed the logic for getting the right background worker for a run * Use the correct ID when dequeuing… * determineEngineVersion is now fully functional * Rate limiter ignores the dev endpoints * Retrieving a batch gives you the runIds * Set a unique version for the RE2 BatchTaskRun * add provisional changeset * The start of dev run execution is working * First dev run working * Moved the dev run controller closer to what Nick did with the managed one * export exec output type * Heartbeat fix: don’t heartbeat if _isHeartbeating == false * Dev runs get notifications, some dev bug fixes * Improved logging or dequeuing * We need to dequeue runs from the latest version too, for triggerAndWait * Ported Eric’s validateWorkerManifest with nicer errors * When flattening an idempotency key if part is undefined, return undefined * Dev logging fixes * Remove sigterm listener * Deprecating workers. Don’t specify a BackgroundWorker when dequeuing an environment * Deleted some old files. Renamed “managed” to “deploy” * When a build finishes, always copy the build dir (otherwise the first one gets trampled on by the 2nd) * Dev master queues should work differently * Deleting old workers * Added debounce function to core * Improvement to canceling * WIP on debounce canceling on socket disconnection * Added environment data to execution snapshots * Dev runs that have stalled get “Canceled” with a reason explaining why * Show CLI messaged when a connection to the platform is lost/restored * Fix TriggerTask after merge * Add trigger task v2 max attempts, replace some findUniques * Port the new queue logic to the run engine * More fixes post-merge * We weren’t setting a `retryConfig` up for the tests… it’s now required * Start the Redis worker inside the Run Engine… 🤦♂️ * Trying to make the testcontainers more reliable * Added keyPrefix: "engine:” * Badly placed bracket in trigger task * Better Redis namespacing * Fix for expired run not getting removed from the queue * Don’t create a redis client in the testcontainers, return the redisOptions instead * Cleanup redis client in the run lock tests * Fix for the RunQueue not supporting keyPrefix * Updated more of the RunQueue scripts rebalancing * Trying to make Redis more robust in the tests… * Improved test resiliciency more * Fix for delays (checkpoint check) * Increase the timeout slightly to fix ttl test * Added priority support when triggering * More wip trying to make test containers more reliable * batchTriggerAndWait test is still failing… some wip to try fix it * Fixed redis tests now we’re not providing a client * Separate Redis clients for the run engine worker/queue/runlock * Made the wait for duration test more resilient * Added idempotencyKeyExpiresAt to Waitpoints * Waitpoint timeouts and idempotency expiry * Use finishWaitpoint, removed extra worker job * Added waitpoint idempotency tests * Creating resume tokens is working * Some improvements to the resume tokens * Moved resumeTokens to just be wait functions 🥳 * Delete old RuntimeManagers * Wait for token is working * Better test for the wait tokens * Improved the test task some more * Hide the accessories in the span inspector * WIP on waitpoint inspector * WIP on complete waitpoint form * Span overview panel can be changed based on the entity type * Improved the waitpoint display * WIP on completing waitpoint form * Use the existing CodeBlock for the tip * Style improvements * Complete waitpoint * All waitpoint sidebar variants * Waits now use a pause icon * Durations waits use the API to create/block with a waitpoint, not the runtime * Fix for engine.blockRunWithWaitpoint required org id * Removed old wait code from the run controllers/task run process * Form action for skipping a datetime waitpoint * Move testDockerCheckpoint to a separate core package export (it can’t be bundled on the client) * Fix for glitchy hourglass animation * Completed waitpoints display better * Increase Redis maxRetriesPerRequest to 20 (default) * Completing and skipping waitpoints is working * Remove the database prisma dev command, since we need to use create only now. Updated docs * Added skip timeout, reworked the UI * Tweaked spacing * Added payload limit to waitpoint token completion from dashboard * Test idempotency works on wait.for and wait.until * Moved the worker-actions to /engine/ from /api/ * Moved dev engine endpoints to /engine/ from /api/ * Separate /engine/ rate limiter * Added parallel wait prevention, it’s working for duration waits but not well for triggerAndWait yet * WIP post-merge conflicts * Set taskEventStore column in the new engine * Remove duplicate keys * Post-merge fixes * Fix for span merge layout * Use executedAt instead of firstAttemptStartedAt --------- Co-authored-by: Matt Aitken <matt@mattaitken.com>
In Run Engine 1.0 idempotency doesn't work well with
triggerAndWait
orbatchTriggerAndWait
. We did have support for this but there were some edge cases where the parent run wouldn't continue, so we disabled it.This branch adds full idempotency support.
Dashboard
We now show idempotent runs when using triggerAndWait or batchTriggerAndWait. They appear like this:
Batch idempotencyKey change
There is one change to the behaviour, which we think is an improvement: when you specify an idempotencyKey on a batch (not an individual run inside a batch).
Batch idempotency example:
idempotencyKey
on the batch itselfidempotencyKey
.In this situation, the runs with
idempotencyKey
will use those. Any runs in the batch without anidempotencyKey
will use the batchesidempotencyKey
plus the index of where they sit in the batch.Previously the batches themselves had their own
idempotencyKey
. In practice this didn't work well asidempotencyKey
are for preventing work from being done twice, and the unit of work is the run.