Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[K9VULN-2502] Decouple timeout watchdog from JsRuntime #596

Merged
merged 2 commits into from
Jan 16, 2025

Conversation

jasonforal
Copy link
Collaborator

What problem are you trying to solve?

We will be adding the ability to gracefully recover from a JavaScript execution that causes v8 to run out of memory.

To do this, we need to provide a callback (NearHeapLimitCallback) to v8, which is a fundamentally different model from the condvar/mutex we use for our timeout watchdog.

We should thus decouple the timeout logic from the JsRuntime so that we can handle the increased complexity in an encapsulated way. (But in my opinion, this would be a good refactor irrespective--it's a design smell that our scoped_execute function needs to manually deal with internal implementation details like notifying condvars).

What is your solution?

Remove all timeout logic from the ddsa JsRuntime and introduce a new, more generic "ResourceWatchdog". This will eventually contain the out-of-memory v8 callback implementation.

Note

There is no change to runtime behavior or thread synchronization behavior
This PR is just a structural refactor

Notable Implementation Details

Watchdog state now stores termination reason

struct WatchdogState {
    timeout: TimeoutState,
    /// This will be `Some` if there was a termination. Otherwise, it will be `None`.
    termination_err: Option<DDSAJsRuntimeError>,
}

We need to distinguish between a termination from the v8 callback (out of memory) and our thread (timeout). This requires a mutex to both perform the termination and signal the reason for termination. While channels are probably a better way to encapsulate that communication, I'd like to keep using a Mutex to take advantage of Condvar's wait_timeout api.

We now need to make sure that this WatchdogState is manually cleared after each execution. This is extensively tested with all possible state transition permutations here.

Caller is no longer concerned with implementation details
Calls needing resource limitation just need make a call through the watchdog. Simple:

let execution_result = self
    .v8_resource_watchdog
    .execute(timeout, tc_ctx_scope, |sc| bound_script.run(sc))?;

(See execute function signature here)

Alternatives considered

What the reviewer should know

  • For slightly less-intimidating diff, see d1836d2 to see how "JsExecutionState" was simplified.
  • spawn_timeout_thread is a copy/paste of the prior implementation with no behavior change.

@jasonforal jasonforal requested a review from juli1 January 4, 2025 01:03
@jasonforal jasonforal requested a review from a team as a code owner January 4, 2025 01:03
Copy link
Collaborator

@juli1 juli1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please review the comment below, fix if you think it's appropriate and ship

struct WatchdogState {
timeout: TimeoutState,
/// This will be `Some` if there was a termination. Otherwise, it will be `None`.
termination_err: Option<DDSAJsRuntimeError>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why Optional? We should have only DDSAJsRuntimeError. If we do not know the error, then, we have an unknown type that can wrap some error text or error code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be Option<DDSAJsRuntimeError> because after an execution, when we lock the mutex and read the WatchdogState, it's possible that the execution succeeded without any termination at all, so this option will be None.

Otherwise, if there was a termination from a watchdog, it will be Some(DDSAJsRuntimeError::JavaScriptTimeout) (or soon-to-be Some(DDSAJsRuntimeError::JavaScriptMemoryExceeded))

So here, it's not about "known" vs "unknown" error, but rather whether a termination occurred or not.

@jasonforal
Copy link
Collaborator Author

Thanks for reviewing -- rebasing and merging

…t field. (Using presence of an `Option` instead of manually synchronizing a boolean).
@jasonforal jasonforal merged commit 615111f into main Jan 16, 2025
71 checks passed
@jasonforal jasonforal deleted the jf/K9VULN-2502-2 branch January 16, 2025 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants