-
-
Notifications
You must be signed in to change notification settings - Fork 711
Retry batch item completion #1675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
WalkthroughThe changes introduce a new connection timeout property in both the database configuration and the environment settings. A new job, completeBatchTaskRunItem, has been added to the worker’s job catalog for handling batch task completions. The batch processing service has been enhanced with an additional retry parameter and refined error handling using new helper functions. Moreover, error handling in database transactions has been improved by exporting error-checking functions, and a new batchTask has been added to initiate batch processing via the trigger module. Changes
Sequence Diagram(s)sequenceDiagram
participant W as Worker
participant S as BatchTriggerV3 Service
participant DB as Database Transaction
participant EH as Error Helpers
W->>S: Initiate completeBatchTaskRunItem job
S->>DB: Begin transaction for batch item update
alt Successful transaction
DB-->>S: Commit changes
S-->>W: Job completed
else Error encountered
DB-->>S: Return error
S->>EH: Check if error is retriable or race condition
alt Error is retriable
EH-->>S: Confirm retriable error
S-->>W: Request retry scheduling
else Non-retriable error
S-->>W: Report failure
end
end
sequenceDiagram
participant U as User/Trigger
participant BT as batchTask Function
participant CT as Child Task
U->>BT: Trigger batchTask with payload
BT->>BT: Build items array based on count
BT->>CT: Invoke childTask.batchTriggerAndWait with items
CT-->>BT: Return processing results
BT-->>U: Return batch count and results
Possibly related PRs
Suggested reviewers
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure. 🔧 ESLint
apps/webapp/app/env.server.tsOops! Something went wrong! :( ESLint: 8.45.0 ESLint couldn't find the config "custom" to extend from. Please check that the name of the config is correct. The config "custom" was referenced from the config file in "/.eslintrc.js". If you still have problems, please stop by https://eslint.org/chat/help to chat with the team. apps/webapp/app/v3/legacyRunEngineWorker.server.tsOops! Something went wrong! :( ESLint: 8.45.0 ESLint couldn't find the config "custom" to extend from. Please check that the name of the config is correct. The config "custom" was referenced from the config file in "/.eslintrc.js". If you still have problems, please stop by https://eslint.org/chat/help to chat with the team. internal-packages/database/src/transaction.tsOops! Something went wrong! :( ESLint: 8.45.0 ESLint couldn't find the config "custom" to extend from. Please check that the name of the config is correct. The config "custom" was referenced from the config file in "/.eslintrc.js". If you still have problems, please stop by https://eslint.org/chat/help to chat with the team.
Tip 🌐 Web search-backed reviews and chat
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
internal-packages/database/src/transaction.ts (1)
22-35
: Improve error code management using TypeScript enums or constant objects.Consider using TypeScript enums or constant objects for better maintainability and type safety of error codes.
-const retryCodes = ["P2024", "P2028", "P2034"]; +enum PrismaRetryableErrorCode { + CONNECTION_TIMEOUT = "P2024", + TRANSACTION_TIMEOUT = "P2028", + TRANSACTION_CONFLICT = "P2034", +} + +const retryCodes = Object.values(PrismaRetryableErrorCode); export function isPrismaRetriableError(error: unknown): boolean { if (!isPrismaKnownError(error)) { return false; } return retryCodes.includes(error.code); }apps/webapp/app/v3/services/batchTriggerV3.server.ts (2)
924-931
: Consider adding JSDoc documentation for the new parameters.While the function signature changes are good, adding JSDoc documentation would help clarify:
- The purpose of
retryAttempt
- The relationship between
taskRunAttemptId
and retry logic+/** + * Completes a batch task run item with retry support. + * @param itemId - The ID of the batch task run item + * @param batchTaskRunId - The ID of the batch task run + * @param tx - The Prisma transaction + * @param scheduleResumeOnComplete - Whether to schedule a resume on completion + * @param taskRunAttemptId - Optional task run attempt ID + * @param retryAttempt - Optional retry attempt number, used for tracking retry progress + */
1011-1047
: Well-structured error handling with retry logic.The error handling implementation effectively:
- Distinguishes between retriable and non-retriable errors
- Implements exponential backoff (2s delay)
- Provides detailed error logging
- Handles both initial attempts and retries differently
However, consider extracting the retry delay into a constant for easier configuration.
+const RETRY_DELAY_MS = 2_000; + async function completeBatchTaskRunItemV3( // ... existing parameters ) { // ... existing code - availableAt: new Date(Date.now() + 2_000), + availableAt: new Date(Date.now() + RETRY_DELAY_MS),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
apps/webapp/app/db.server.ts
(2 hunks)apps/webapp/app/env.server.ts
(1 hunks)apps/webapp/app/v3/legacyRunEngineWorker.server.ts
(3 hunks)apps/webapp/app/v3/services/batchTriggerV3.server.ts
(3 hunks)internal-packages/database/src/transaction.ts
(2 hunks)references/hello-world/src/trigger/example.ts
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: typecheck / typecheck
- GitHub Check: units / 🧪 Unit Tests
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (6)
apps/webapp/app/v3/legacyRunEngineWorker.server.ts (1)
39-51
: LGTM! Well-configured batch completion job.The configuration is well-aligned with the PR objectives:
- Max retry attempts of 10 for handling retriable Prisma errors
- Reasonable visibility timeout of 60 seconds
- Complete schema definition for batch completion payload
apps/webapp/app/db.server.ts (1)
114-114
: LGTM! Consistent connection timeout configuration.The connection timeout is well-configured:
- Applied consistently to both main and replica clients
- Integrated properly with existing connection parameters
- Uses environment variable for configuration
Also applies to: 166-166
apps/webapp/app/env.server.ts (1)
16-16
: LGTM! The new database connection timeout setting is well-integrated.The addition of
DATABASE_CONNECTION_TIMEOUT
with a default of 20 seconds aligns well with the PR's objective to handle Prisma connection timeouts. The placement and implementation follow the established pattern for database configuration settings.apps/webapp/app/v3/services/batchTriggerV3.server.ts (3)
10-11
: LGTM! Good addition of error handling utilities.The new imports for
isPrismaRaceConditionError
andisPrismaRetriableError
support the PR's objective of handling specific Prisma errors more robustly.
944-1010
: Good implementation of transaction boundaries and timeout settings.The transaction implementation is robust with:
- Appropriate timeout settings (10s timeout, 4s max wait)
- Clear error boundaries
- Proper batch status management
985-1004
: Verify the batch completion logic.The batch completion logic has been updated to use multiple conditions:
- Status is PENDING
- Completed count equals expected count
- Batch is sealed
This looks correct but warrants verification to ensure it handles all edge cases.
✅ Verification successful
🏁 Script executed:
#!/bin/bash # Search for any other completion status checks that might need similar updates rg -A 5 'status === "COMPLETED"'Length of output: 12384
Batch completion logic verified as consistent
The conditions used in the batch trigger update—ensuring the batch run is in "PENDING" status, that the completedCount equals the expectedCount, and that the batch is sealed—are consistent with how completion is handled elsewhere in the codebase. Our repository scan (via the grep search) did not reveal any alternative or conflicting completion checks that would indicate unhandled edge cases. The logic to enqueue a resume operation when a dependent task exists also aligns with existing patterns.
It was possible for a Postgres transaction error to cause a batch to permanently fail in some rare cases.
We now retry this for these Prisma codes
We use the new Redis worker to reschedule the batch completion in the future. This is retried 10 times.
Other changes
Details
Error Handling & Retry Logic
isPrismaRetriableError()
helper function to identify retriable Prisma errorscompleteBatchTaskRunItem
on retriable Prisma errorsDatabase Improvements
Testing Notes
Summary by CodeRabbit