[release/7.0] [QUIC] Fix native crashes and heap corruption via "generated-like" interop #75192
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #74669 to release/7.0
Fixes #72696
/cc @CarnaViire
Customer Impact
HTTP/3 or QUIC application crashed with either "Aborted" or "Segmentation Fault" due to native heap corruption. The native crash happened in a time frame from several minutes to several hours, depending on how common was the race between Dispose and other QUIC calls (e.g. cancelling/disposing the stream while sending the data) in the user scenario.
The root cause of the native heap corruption was incorrect and unsynchronized usage of native pointers and arrays, which in case of multithreaded access led to use-after-free and other native memory access issues. This eventually led to native heap corruption which manifested as a crash after some time.
There are 2 main parts of the fix:
Discovered in HTTP/3 stress runs.
Testing
Multiple 10+ hours of general HTTP/3 stress test runs, multiple ~3h runs for targeted stress scenario with high race probability (POST Duplex Dispose with cancel rate 100%).
Before the fix, the issue would almost always manifest in ~1h timeframe for general HTTP/3 stress test run, and in ~5min for targeted stress scenario.
Risk
Low, System.Net.Quic is still in preview.