[K9VULN-2477] Fix (additional) PKU-related v8 segfaults #589
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem are you trying to solve?
#560 fixed an issue with v8 initialization that caused segfaults on Linux with PKU support. However, this was only tested on Intel Xeon 3rd gen CPUs.
We're still seeing segfaults on Xeon 1st/2nd gen CPUs, which are used by some cloud providers (for example, the t3 instance family on AWS EC2). A screenshot below shows a reproduction:
What is your solution?
Seemingly, when we initialize rayon before v8, v8 segfaults when trying to initialize a default platform. I haven't dug enough to determine if this is something inherent to what rayon does or if this is simply a timing thing masking a concurrency bug somewhere (perhaps initializing a rayon pool takes enough time such that something in v8 is sequenced correctly).
Before
(Condensed) output from GDB:
After
(Condensed) output from GDB:
GDB interpretation
Breakpoints were set on
v8::platform::Platform::new
(initialization of v8 platform)rayon_core::registry::Registry::new
(creation of a new rayon pool)v8::internal::wasm::WasmCodePointerTable::AllocateUninitializedEntry
(a proxy for creation of v8 isolate)In the before, we see that our attempts to create v8 isolates (threads 104, 105) occur before the "V8 DefaultWorker" threads are created. It's my interpretation that the presence of all v8 worker threads is an indicator that v8 has "properly" initialized.
In the after, the threads that create v8 isolates (threads 106, 107) are created after the "V8 DefaultWorker" threads (104, 105) are created:
Testing
We currently don't have the infrastructure to test this automatically, however you can manually confirm this by spinning up an Amazon EC2 instance with the t3 family (either Amazon Linux or Ubuntu) and attempting to run
datadog-static-analyzer
before and after this PR.Alternatives considered
What the reviewer should know