[K9VULN-2502] Decouple timeout watchdog from `JsRuntime` #596

jasonforal · 2025-01-04T01:03:19Z

What problem are you trying to solve?

We will be adding the ability to gracefully recover from a JavaScript execution that causes v8 to run out of memory.

To do this, we need to provide a callback (NearHeapLimitCallback) to v8, which is a fundamentally different model from the condvar/mutex we use for our timeout watchdog.

We should thus decouple the timeout logic from the JsRuntime so that we can handle the increased complexity in an encapsulated way. (But in my opinion, this would be a good refactor irrespective--it's a design smell that our scoped_execute function needs to manually deal with internal implementation details like notifying condvars).

What is your solution?

Remove all timeout logic from the ddsa JsRuntime and introduce a new, more generic "ResourceWatchdog". This will eventually contain the out-of-memory v8 callback implementation.

Note

There is no change to runtime behavior or thread synchronization behavior
This PR is just a structural refactor

Notable Implementation Details

Watchdog state now stores termination reason

struct WatchdogState {
    timeout: TimeoutState,
    /// This will be `Some` if there was a termination. Otherwise, it will be `None`.
    termination_err: Option<DDSAJsRuntimeError>,
}

We need to distinguish between a termination from the v8 callback (out of memory) and our thread (timeout). This requires a mutex to both perform the termination and signal the reason for termination. While channels are probably a better way to encapsulate that communication, I'd like to keep using a Mutex to take advantage of Condvar's wait_timeout api.

We now need to make sure that this WatchdogState is manually cleared after each execution. This is extensively tested with all possible state transition permutations here.

Caller is no longer concerned with implementation details
Calls needing resource limitation just need make a call through the watchdog. Simple:

let execution_result = self
    .v8_resource_watchdog
    .execute(timeout, tc_ctx_scope, |sc| bound_script.run(sc))?;

(See execute function signature here)

Alternatives considered

What the reviewer should know

For slightly less-intimidating diff, see d1836d2 to see how "JsExecutionState" was simplified.
spawn_timeout_thread is a copy/paste of the prior implementation with no behavior change.

juli1

please review the comment below, fix if you think it's appropriate and ship

juli1 · 2025-01-04T16:35:06Z

crates/static-analysis-kernel/src/analysis/ddsa_lib/resource_watchdog.rs

+struct WatchdogState {
+    timeout: TimeoutState,
+    /// This will be `Some` if there was a termination. Otherwise, it will be `None`.
+    termination_err: Option<DDSAJsRuntimeError>,


why Optional? We should have only DDSAJsRuntimeError. If we do not know the error, then, we have an unknown type that can wrap some error text or error code.

It has to be Option<DDSAJsRuntimeError> because after an execution, when we lock the mutex and read the WatchdogState, it's possible that the execution succeeded without any termination at all, so this option will be None.

Otherwise, if there was a termination from a watchdog, it will be Some(DDSAJsRuntimeError::JavaScriptTimeout) (or soon-to-be Some(DDSAJsRuntimeError::JavaScriptMemoryExceeded))

So here, it's not about "known" vs "unknown" error, but rather whether a termination occurred or not.

jasonforal · 2025-01-15T22:51:33Z

Thanks for reviewing -- rebasing and merging

…t field. (Using presence of an `Option` instead of manually synchronizing a boolean).

…esn't need to explicitly manage state.

jasonforal requested a review from juli1 January 4, 2025 01:03

jasonforal requested a review from a team as a code owner January 4, 2025 01:03

juli1 approved these changes Jan 15, 2025

View reviewed changes

jasonforal added 2 commits January 15, 2025 17:51

Consolidate state related to the timeout watchdog into a single struc…

5ca024a

…t field. (Using presence of an `Option` instead of manually synchronizing a boolean).

Refactor timeout watchdog thread into its own struct so the caller do…

5138f82

…esn't need to explicitly manage state.

jasonforal force-pushed the jf/K9VULN-2502-2 branch from 21d5918 to 5138f82 Compare January 15, 2025 22:52

jasonforal merged commit 615111f into main Jan 16, 2025
71 checks passed

jasonforal deleted the jf/K9VULN-2502-2 branch January 16, 2025 14:09

jasonforal mentioned this pull request Jan 16, 2025

[K9VULN-2502] Gracefully recover from v8 running out of memory #611

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[K9VULN-2502] Decouple timeout watchdog from `JsRuntime` #596

[K9VULN-2502] Decouple timeout watchdog from `JsRuntime` #596

jasonforal commented Jan 4, 2025

juli1 left a comment

juli1 Jan 4, 2025

jasonforal Jan 15, 2025

jasonforal commented Jan 15, 2025

[K9VULN-2502] Decouple timeout watchdog from JsRuntime #596

[K9VULN-2502] Decouple timeout watchdog from JsRuntime #596

Conversation

jasonforal commented Jan 4, 2025

What problem are you trying to solve?

What is your solution?

Notable Implementation Details

Alternatives considered

What the reviewer should know

juli1 left a comment

Choose a reason for hiding this comment

juli1 Jan 4, 2025

Choose a reason for hiding this comment

jasonforal Jan 15, 2025

Choose a reason for hiding this comment

jasonforal commented Jan 15, 2025

[K9VULN-2502] Decouple timeout watchdog from `JsRuntime` #596

[K9VULN-2502] Decouple timeout watchdog from `JsRuntime` #596