-
-
Notifications
You must be signed in to change notification settings - Fork 709
Improve checkpoint reliability and cleanup of temp files #1367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
WalkthroughThe pull request introduces significant changes across multiple files in the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
commit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
🧹 Outside diff range and nitpick comments (8)
apps/coordinator/src/util.ts (1)
1-19
: Overall assessment of util.tsThe introduction of
boolFromEnv
andnumFromEnv
utility functions is a positive step towards more consistent and safer handling of environment variables. These functions provide a good foundation for converting environment variables to boolean and numeric values, respectively.However, as noted in the individual function reviews, there's room for improvement in terms of robustness and versatility. Implementing the suggested enhancements will make these utilities more reliable and flexible, handling a wider range of input scenarios and providing better error handling.
To maximize the benefits of these new utilities:
- Consider implementing the suggested improvements for both functions.
- Use the verification script provided to identify areas in the codebase where these functions can replace direct
process.env
access.- Document these utilities and encourage their use across the project for consistent environment variable handling.
As the project grows, consider expanding this utility file with other commonly used functions for environment variable handling or configuration management. This can help maintain a centralized and consistent approach to dealing with environment-dependent values throughout the application.
apps/coordinator/package.json (1)
22-23
: Update project documentation to reflect dependency changes.Consider updating the project's README or documentation to reflect the replacement of
execa
withtinyexec
. This will help other developers understand the change and its implications for the project.apps/coordinator/src/chaosMonkey.ts (1)
60-60
: LGTM: Simplified implementation improves code clarity.The changes in the
call
method implementation are well-aligned with the PR objectives:
- Direct use of the
timeout
function simplifies the code and removes the dependency onexeca
.- Consistent use of
ChaosMonkey.Error
improves error handling uniformity.These changes contribute to a more straightforward and maintainable implementation.
Consider adding a brief comment explaining the purpose of the
timeout
function call, as it might not be immediately clear to new developers why a delay is being introduced. For example:// Introduce artificial delay to simulate network latency or processing time await timeout(this.delayInSeconds * 1000);Also applies to: 68-68
apps/coordinator/src/index.ts (4)
Line range hint
93-101
: Consider improving error handling in#returnValidatedExtraHeaders
The current implementation throws a generic Error when an extra header is undefined. Consider using a more specific error type or providing more context in the error message. This would make it easier to handle and debug issues related to header validation.
Here's a suggested improvement:
#returnValidatedExtraHeaders(headers: Record<string, string>) { for (const [key, value] of Object.entries(headers)) { if (value === undefined) { - throw new Error(`Extra header is undefined: ${key}`); + throw new HeaderValidationError(`Extra header '${key}' is undefined`); } } return headers; } +class HeaderValidationError extends Error { + constructor(message: string) { + super(message); + this.name = 'HeaderValidationError'; + } +}This change introduces a custom error type, making it easier to catch and handle specific header validation errors throughout the application.
Line range hint
359-397
: Improve error handling inREADY_FOR_LAZY_ATTEMPT
handlerThe current implementation catches all errors and only logs them before crashing the run. This approach might hide important errors or make debugging more difficult. Consider differentiating between expected and unexpected errors, and handle them accordingly.
Here's a suggested improvement:
socket.on("READY_FOR_LAZY_ATTEMPT", async (message) => { logger.log("[READY_FOR_LAZY_ATTEMPT]", message); try { const lazyAttempt = await this.#platformSocket?.sendWithAck("READY_FOR_LAZY_ATTEMPT", { ...message, envId: socket.data.envId, }); if (!lazyAttempt) { throw new Error("No lazy attempt ack"); } if (!lazyAttempt.success) { throw new Error("Failed to get lazy attempt payload"); } await chaosMonkey.call(); socket.emit("EXECUTE_TASK_RUN_LAZY_ATTEMPT", { version: "v1", lazyPayload: lazyAttempt.lazyPayload, }); } catch (error) { if (error instanceof ChaosMonkey.Error) { logger.error("ChaosMonkey error, won't crash run", { runId: socket.data.runId }); return; } - logger.error("Error", { error }); - - await crashRun({ - name: "ReadyForLazyAttemptError", - message: - error instanceof Error ? `Unexpected error: ${error.message}` : "Unexpected error", - }); + if (error instanceof Error) { + logger.error("ReadyForLazyAttemptError", { error: error.message, stack: error.stack }); + await crashRun({ + name: "ReadyForLazyAttemptError", + message: error.message, + stack: error.stack, + }); + } else { + logger.error("Unknown ReadyForLazyAttemptError", { error }); + await crashRun({ + name: "UnknownReadyForLazyAttemptError", + message: "An unknown error occurred", + }); + } return; } });This change provides more detailed error logging and differentiates between Error instances and other types of errors, which can help in debugging and error tracking.
Line range hint
1008-1022
: Enhance logging in#cancelCheckpoint
methodThe
#cancelCheckpoint
method currently logs only the final result. Consider adding more detailed logging to track the method's execution flow, which could be helpful for debugging and monitoring.Here's a suggested improvement:
#cancelCheckpoint(runId: string): boolean { + logger.debug("Attempting to cancel checkpoint", { runId }); const checkpointWait = this.#checkpointableTasks.get(runId); if (checkpointWait) { + logger.debug("Found checkpoint wait, rejecting", { runId }); // Stop waiting for task to reach checkpointable state checkpointWait.reject(new CheckpointCancelError()); + } else { + logger.debug("No checkpoint wait found", { runId }); } // Cancel checkpointing procedure const checkpointCanceled = this.#checkpointer.cancelCheckpoint(runId); - logger.log("cancelCheckpoint()", { runId, checkpointCanceled }); + logger.log("Checkpoint cancellation result", { runId, checkpointCanceled }); return checkpointCanceled; }This change adds more detailed logging throughout the method, which can help track the execution flow and make debugging easier.
Line range hint
1039-1043
: Add authentication to/checkpoint
endpointThe
/checkpoint
endpoint currently lacks authentication, which could pose a security risk. Consider adding an authentication mechanism to ensure that only authorized clients can trigger checkpoints.Here's a suggested improvement:
case "/checkpoint": { const body = await getTextBody(req); + const authToken = req.headers['authorization']; + + if (!authToken || !this.#isValidAuthToken(authToken)) { + return reply.empty(401); + } + // await this.#checkpointer.checkpointAndPush(body); return reply.text(`sent restore request: ${body}`); } + #isValidAuthToken(token: string): boolean { + // Implement your authentication logic here + // For example, compare against a stored secret or validate with an auth service + return token === process.env.CHECKPOINT_SECRET; + }This change adds a basic authentication check using an Authorization header. Make sure to implement proper token validation in the
#isValidAuthToken
method and securely manage theCHECKPOINT_SECRET
.apps/coordinator/src/cleaner.ts (1)
99-101
: Rename thex
getter for better readabilityThe getter
x
returns a bound function for executing commands, but the single-letter namex
is not descriptive.Consider renaming it to improve code clarity:
- private get x() { + private get execCommand() { return this.exec.x.bind(this.exec); }Update all references from
this.x
tothis.execCommand
:- const du = this.x("find", [...baseArgs, ...duArgs]); + const du = this.execCommand("find", [...baseArgs, ...duArgs]); - const rm = this.x("find", [...baseArgs, ...rmArgs]); + const rm = this.execCommand("find", [...baseArgs, ...rmArgs]);This change enhances readability and makes the purpose of the method clearer.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (7)
- apps/coordinator/package.json (1 hunks)
- apps/coordinator/src/chaosMonkey.ts (1 hunks)
- apps/coordinator/src/checkpointer.ts (12 hunks)
- apps/coordinator/src/cleaner.ts (1 hunks)
- apps/coordinator/src/exec.ts (1 hunks)
- apps/coordinator/src/index.ts (1 hunks)
- apps/coordinator/src/util.ts (1 hunks)
🧰 Additional context used
🪛 Biome
apps/coordinator/src/checkpointer.ts
[error] 583-583: Unsafe usage of 'return'.
'return' in 'finally' overwrites the control flow statements inside 'try' and 'catch'.
(lint/correctness/noUnsafeFinally)
apps/coordinator/src/exec.ts
[error] 276-276: Using this in a static context can be confusing.
this refers to the class.
Unsafe fix: Use the class name instead.(lint/complexity/noThisInStatic)
🔇 Additional comments (7)
apps/coordinator/package.json (1)
22-23
: LGTM! Dependency changes look good.The replacement of
execa
withtinyexec
aligns with the PR objectives to streamline binary execution. Thesocket.io
dependency remains unchanged, maintaining existing network functionality.To ensure a smooth transition from
execa
totinyexec
, please run the following script to verify the usage:This script will help identify any remaining
execa
imports, confirmtinyexec
usage, and highlight areas whereexeca
-like functionality might still be present.✅ Verification successful
Re-running verification with corrected commands.
The previous script encountered errors due to the unrecognized file type
typescript
. Please run the following corrected script to verify the replacement ofexeca
withtinyexec
:
Verified: execa has been successfully replaced with tinyexec. No remaining execa imports found in the codebase. All execution functionalities are appropriately handled using tinyexec.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the replacement of execa with tinyexec # Test 1: Check for any remaining imports of execa echo "Checking for remaining execa imports:" rg --type typescript 'import.*execa' apps/coordinator/src # Test 2: Verify tinyexec usage echo "Verifying tinyexec usage:" rg --type typescript 'import.*tinyexec' apps/coordinator/src # Test 3: Look for potential areas where execa might have been used echo "Checking for potential areas where execa might have been used:" rg --type typescript '(spawn|exec|execFile|execaCommand)' apps/coordinator/srcLength of output: 589
Script:
#!/bin/bash # Description: Verify the replacement of execa with tinyexec # Test 1: Check for any remaining imports of execa echo "Checking for remaining execa imports:" rg --type ts 'import.*execa' apps/coordinator/src # Test 2: Verify tinyexec usage echo "Verifying tinyexec usage:" rg --type ts 'import.*tinyexec' apps/coordinator/src # Test 3: Look for potential areas where execa might have been used echo "Checking for potential areas where execa might have been used:" rg --type ts '(spawn|exec|execFile|execaCommand)' apps/coordinator/srcLength of output: 3288
apps/coordinator/src/chaosMonkey.ts (2)
Line range hint
39-44
: LGTM: Method signature simplification aligns with PR objectives.The removal of the
$?: Execa$<string>
parameter from thecall
method signature is a positive change. It simplifies the method interface and aligns with the PR objective of replacing theexeca
library. This change contributes to reducing dependencies and streamlining the code.
Line range hint
1-92
: Overall, these changes improve code quality and align with PR objectives.The modifications to the
ChaosMonkey
class successfully simplify the implementation by removing theexeca
dependency and streamlining thecall
method. These changes contribute to:
- Reduced complexity in the codebase.
- Improved maintainability by removing external dependencies.
- Consistent error handling.
The changes align well with the PR objectives of improving reliability and streamlining execution processes.
apps/coordinator/src/index.ts (2)
17-17
: LGTM: Improved code organizationThe removal of
boolFromEnv
andnumFromEnv
functions and their import from./util
is a good refactoring. This change improves code organization and maintainability by centralizing utility functions in a separate file.
Line range hint
1-1068
: Overall code quality is good, with some areas for improvementThe TaskCoordinator class is well-structured and implements complex logic for managing tasks, checkpoints, and communication between workers and the platform. The code demonstrates good use of async/await, proper error handling, and extensive logging.
Main areas for improvement:
- Error handling: Consider using more specific error types and providing more context in error messages.
- Logging: Enhance logging in some methods to provide more detailed information for debugging.
- Security: Add authentication to sensitive endpoints, particularly the
/checkpoint
endpoint.These improvements will enhance the maintainability, debuggability, and security of the codebase.
apps/coordinator/src/cleaner.ts (1)
88-90
:⚠️ Potential issueEnsure safety when executing
rm -rf
commandsExecuting
rm -rf
commands with paths constructed from input can be dangerous if the inputs are not properly validated. There is a risk of deleting unintended files ifstartingPoint
is not securely validated.Consider adding validation to ensure that
startingPoint
is a safe and expected directory path. You might restrict it to a specific base directory or perform checks to prevent path traversal vulnerabilities.Run the following script to verify the usage of
startingPoint
and ensure it's derived from trusted sources:This script searches for all usages of
TempFileCleaner
and displays the paths being cleaned. Review the results to ensure they are safe.apps/coordinator/src/checkpointer.ts (1)
306-306
: Confirm thatdelay.milliseconds
is the correct propertyEnsure that the
delay
object provided byExponentialBackoff
has amilliseconds
property. Using an incorrect property may result in unexpected delays or runtime errors.Run this script to check if the
delay
object has amilliseconds
property:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (2)
- apps/coordinator/src/checkpointer.ts (13 hunks)
- apps/coordinator/src/cleaner.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/coordinator/src/cleaner.ts
🧰 Additional context used
📓 Learnings (1)
apps/coordinator/src/checkpointer.ts (1)
Learnt from: nicktrn PR: triggerdotdev/trigger.dev#1367 File: apps/coordinator/src/checkpointer.ts:588-631 Timestamp: 2024-09-30T11:35:37.554Z Learning: Errors in the `#createDockerCheckpoint` method are handled in the caller, so additional error logging within this method is unnecessary.
🪛 Biome
apps/coordinator/src/checkpointer.ts
[error] 571-571: Unsafe usage of 'return'.
'return' in 'finally' overwrites the control flow statements inside 'try' and 'catch'.
(lint/correctness/noUnsafeFinally)
commit 7d11123 Author: Eric Goldman <eric@sequin.io> Date: Mon Sep 30 17:54:06 2024 -0700 Add sequin guide (#1368) Co-authored-by: James Ritchie <james@trigger.dev> commit 8da495a Author: nicktrn <55853254+nicktrn@users.noreply.github.com> Date: Mon Sep 30 13:42:22 2024 +0100 Improve checkpoint reliability and cleanup of temp files (#1367) * improve cleanup reliability * improve logging * bye-bye execa * fix for trailing newlines * prettier errors * trim args and log output by default * fix archive cleanup * prevent potential memleak * more cleanup debug logs * ignore abort during cleanup * rename checkpoint dir env var and move to helper * add global never throw override * add tmp cleaner * also clean up checkpoint dir by default * split by any whitespace, not just tabs * only create tmp cleaner if paths to clean commit 69ec68e Author: Eric Allam <eallam@icloud.com> Date: Sun Sep 29 19:18:39 2024 -0700 Release 3.0.9 commit a6ea844 Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Sun Sep 29 19:17:26 2024 -0700 chore: Update version for release (#1366) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> commit 4c1ee3d Author: Eric Allam <eallam@icloud.com> Date: Sun Sep 29 19:09:38 2024 -0700 fix: run metadata not working when using npx/pnpm dlx
* Added a new dropdown help and feedback menu to the side menu * Added a shortcut to the popover menu * Removed dev cli connected button for now * Contact us form uses original Feedback component to prevent broken links * Improved the messaging when selecting different options in the email form * buttons style tweak * SideMenuItem supports the trailingIconClassName * Adding a consistent focus-visible states * Removing tooltips for now * Squashed commit of the following: commit 7d11123 Author: Eric Goldman <eric@sequin.io> Date: Mon Sep 30 17:54:06 2024 -0700 Add sequin guide (#1368) Co-authored-by: James Ritchie <james@trigger.dev> commit 8da495a Author: nicktrn <55853254+nicktrn@users.noreply.github.com> Date: Mon Sep 30 13:42:22 2024 +0100 Improve checkpoint reliability and cleanup of temp files (#1367) * improve cleanup reliability * improve logging * bye-bye execa * fix for trailing newlines * prettier errors * trim args and log output by default * fix archive cleanup * prevent potential memleak * more cleanup debug logs * ignore abort during cleanup * rename checkpoint dir env var and move to helper * add global never throw override * add tmp cleaner * also clean up checkpoint dir by default * split by any whitespace, not just tabs * only create tmp cleaner if paths to clean commit 69ec68e Author: Eric Allam <eallam@icloud.com> Date: Sun Sep 29 19:18:39 2024 -0700 Release 3.0.9 commit a6ea844 Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Sun Sep 29 19:17:26 2024 -0700 chore: Update version for release (#1366) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> commit 4c1ee3d Author: Eric Allam <eallam@icloud.com> Date: Sun Sep 29 19:09:38 2024 -0700 fix: run metadata not working when using npx/pnpm dlx * More support for custom-focus * More custom focus styles added * Support for focus-visible style for the Segmented control * Fixed table triple dot menu z-index issue * Improved help menu wording * When you submit the help form, close the modal * focus-visible style for radio buttons * button prop is now optional in the SideMenu component * focus styling for a text link * Deleted unused sequin files
Coordinator changes:
Summary by CodeRabbit
New Features
TempFileCleaner
for managing temporary file deletions.Improvements
ChaosMonkey
functionality by removing unnecessary parameters.Checkpointer
class with improved error handling and modular design.Bug Fixes
Chores
execa
and addingtinyexec
.