[HTTP/3] SIGABRT in stress tests #72696

CarnaViire · 2022-07-22T20:19:45Z

Today's stress test run crashed with segmentation fault after 28 mins https://dev.azure.com/dnceng/public/_build/results?buildId=1897576&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=1451f5f3-0108-5a08-5b92-e984b2a85bbd&l=1570

Funny enough, this run was scheduled with this ed5aa3d commit on top 😄 @rzikm (might be totally unrelated)

The text was updated successfully, but these errors were encountered:

ghost · 2022-07-22T20:19:54Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Today's stress test run crashed with segmentation fault after 28 mins https://dev.azure.com/dnceng/public/_build/results?buildId=1897576&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=1451f5f3-0108-5a08-5b92-e984b2a85bbd&l=1570

Funny enough, this run was scheduled with this ed5aa3d commit on top 😄 @rzikm (might be totally unrelated)

Author:	CarnaViire
Assignees:	-
Labels:	`area-System.Net.Http`
Milestone:	-

ManickaP · 2022-07-27T11:35:59Z

I'm not able to reproduce it and the crash dumps from the pipeline are useless since the native is in release build.
I'm keeping the draft PR open if anyone want to have a go on this. It builds msquic in Debug.

karelz · 2022-08-09T17:23:11Z

@CarnaViire can you please write down when was the first hit, and how often it happens / happened?

CarnaViire · 2022-08-09T19:32:40Z

It turned out that only the first occurrence was a segfault (exit code 139), all others are sigabrt (exit code 134). So it is still happening and it was not fixed by Mana's copying.

7/18-8/9 we've had ~32 not-crashing runs (30 min) and 5 6 crashing runs ~~most of them gathered around 7/22~~ (see below)

Date	Link	Exit code	Crashing after
7/22	Run #20220722.1	139	28 min
7/22	Run #20220722.3	134	1 min
7/23	Run #20220723.2	134	2 min
8/3	Run #20220803.6	134	8 min
8/5	Run #20220805.1	134	13 min
8/9	Run #20220809.3	134	7 min

UPD: new occurrences since 8/9

Date	Link	Exit code	Crashing after
8/12	Run #20220812.4	134	23 min
8/12	Run #20220812.7	134	13 min

wfurt · 2022-08-09T19:35:57Z

sigabrt is likely coming from Assert or unhandled exception. That should be visible IMHO from the dump.

CarnaViire · 2022-08-12T16:43:18Z

Abort is coming from MsQuic assert in MsQuicStreamSend https://github.com/microsoft/msquic/blob/d50ee4b831fca4f057031c50d0b31a7484e7208b/src/core/api.c#L1089

It seems like pthread_mutex_lock returned a non-zero exit code https://github.com/microsoft/msquic/blob/a7e98a6bbc609efb5fb2f7ef484827a0f453e816/src/inc/quic_platform_posix.h#L346

CarnaViire · 2022-08-18T10:18:56Z

Update: SIGABRT is most likely a result of the native heap corruption. pthread_mutex_lock returns EINVAL meaning "The value specified by mutex does not refer to an initialized mutex object."

Address Sanitizer has caught heap-use-after-free for .NET threads for Send buffers (they are allocated in native memory).

==47659==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200267b510 at pc 0x7ebf82297fd3 bp 0x7ebf7e6aa9f0 sp 0x7ebf7e6aa9e0
READ of size 4 at 0x60200267b510 thread T16
    #0 0x7ebf82297fd2 in QuicStreamSendBufferRequest /home/lia/dev/git/msquic/src/core/stream_send.c:449
    #1 0x7ebf82391c33 in QuicSendBufferFill /home/lia/dev/git/msquic/src/core/send_buffer.c:181
    #2 0x7ebf8229bf67 in QuicStreamSendFlush /home/lia/dev/git/msquic/src/core/stream_send.c:594
    #3 0x7ebf82307502 in QuicConnProcessApiOperation /home/lia/dev/git/msquic/src/core/connection.c:7205
    #4 0x7ebf82307fe2 in QuicConnDrainOperations /home/lia/dev/git/msquic/src/core/connection.c:7340
    #5 0x7ebf822afd82 in QuicWorkerProcessConnection /home/lia/dev/git/msquic/src/core/worker.c:510
    #6 0x7ebf822b1342 in QuicWorkerLoop /home/lia/dev/git/msquic/src/core/worker.c:668
    #7 0x7ebf822b1d5e in QuicWorkerThread /home/lia/dev/git/msquic/src/core/worker.c:733
    #8 0x7f00a0862608 in start_thread /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477
    #9 0x7f00a0433132 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x11f132)

0x60200267b510 is located 0 bytes inside of 16-byte region [0x60200267b510,0x60200267b520)
freed by thread T194 (.NET ThreadPool) here:
    #0 0x7f00a099d40f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:122
    #1 0x7f001d7d3433  (/usr/share/dotnet/shared/Microsoft.NETCore.App/7.0.0-rc.1.22403.8/System.Private.CoreLib.dll+0xf3433)
    ........

previously allocated by thread T212 (.NET ThreadPool) here:
    #0 0x7f00a099d808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0x7f001fe394c6  (/memfd:doublemapper (deleted)+0x96a4c6)
    ........

The same heap corruption most possibly manifests as INVALID_PARAMETER in #73688.

I am investigating further.

carlossanlop · 2022-08-18T16:52:21Z

Does this fix meet the bar to get backported to the RC1? One of the backport PRs hit this failure there.

CarnaViire · 2022-08-18T17:24:01Z

I believe it does @carlossanlop -- this is a significant reliability issue

CarnaViire · 2022-08-24T16:14:12Z

While I think I have caught all the send buffers related problems (I'll put up a PR shortly), there are still some problems remaining which also result in crashes. Address Sanitizer catches this:

/home/lia/dev/git/msquic/src/inc/quic_platform.h:395:36: runtime error: member access within misaligned address 0x000000000005 for type 'struct CXPLAT_SLIST_ENTRY', which requires 8 byte alignment
0x000000000005: note: pointer points here
<memory cannot be printed>
AddressSanitizer:DEADLYSIGNAL
=================================================================
==19205==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000005 (pc 0x7f1c6792a3f4 bp 0x7f1c5de95010 sp 0x7f1c5de94f30 T61)
==19205==The signal is caused by a READ memory access.
==19205==Hint: address points to the zero page.
    #0 0x7f1c6792a3f3 in CxPlatListPopEntry /home/lia/dev/git/msquic/src/inc/quic_platform.h:395
    #1 0x7f1c6792a3f3 in CxPlatPoolAlloc /home/lia/dev/git/msquic/src/inc/quic_platform_posix.h:521
    #2 0x7f1c6792a3f3 in QuicStreamInitialize /home/lia/dev/git/msquic/src/core/stream.c:35
    #3 0x7f1c6795a2ca in MsQuicStreamOpen /home/lia/dev/git/msquic/src/core/api.c:661
    #4 0x7f5d04298a5e  (/memfd:doublemapper (deleted)+0x219a5e)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/lia/dev/git/msquic/src/inc/quic_platform.h:395 in CxPlatListPopEntry

@nibanks do you possibly have any insights/hints on what could have caused this?

nibanks · 2022-08-24T16:20:31Z

My initial guess is that you're calling StreamOpen after ConnectionClose.

CarnaViire · 2022-09-06T18:11:11Z

Reopening for 7.0 backport

karelz · 2022-09-08T06:53:18Z

Fixed in 8.0 (main) in PR #74669 and in 7.0 (RC2) in PR #75192.

CarnaViire added the area-System.Net.Http label Jul 22, 2022

ghost added the untriaged New issue has not been triaged by the area owner label Jul 22, 2022

ManickaP self-assigned this Jul 25, 2022

ManickaP removed the untriaged New issue has not been triaged by the area owner label Jul 25, 2022

ManickaP added this to the 7.0.0 milestone Jul 25, 2022

ManickaP mentioned this issue Jul 25, 2022

[QUIC] Stress crash WIP #72769

Closed

ManickaP removed their assignment Jul 27, 2022

CarnaViire self-assigned this Aug 10, 2022

CarnaViire changed the title ~~[HTTP/3] Segmentation fault in stress tests~~ [HTTP/3] SIGABRT in stress tests Aug 12, 2022

CarnaViire mentioned this issue Aug 18, 2022

[QUIC] Abort on cancellation throws QUIC_STATUS_INVALID_PARAMETER #73688

Closed

CarnaViire mentioned this issue Aug 18, 2022

Ignore LoopbackServer exceptions in MaxHeadersLength test #73937

Closed

karelz assigned ManickaP and unassigned CarnaViire Aug 25, 2022

karelz added tenet-reliability Reliability/stability related issue (stress, load problems, etc.) bug labels Aug 25, 2022

CarnaViire mentioned this issue Aug 25, 2022

[QUIC] Fix native crashes and heap corruption #74611

Closed

ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 25, 2022

CarnaViire mentioned this issue Aug 26, 2022

[QUIC] Fix native crashes and heap corruption via "generated-like" interop #74669

Merged

ManickaP mentioned this issue Sep 2, 2022

[QUIC] Consider implementing send buffering on .NET side #73691

Closed

karelz assigned CarnaViire and unassigned ManickaP Sep 6, 2022

antonfirsov mentioned this issue Sep 6, 2022

Http Stress Status Report #42211

Open

CarnaViire closed this as completed in #74669 Sep 6, 2022

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 6, 2022

CarnaViire reopened this Sep 6, 2022

CarnaViire mentioned this issue Sep 7, 2022

[release/7.0] [QUIC] Fix native crashes and heap corruption via "generated-like" interop #75192

Merged

ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels Sep 7, 2022

karelz closed this as completed Sep 8, 2022

ghost locked as resolved and limited conversation to collaborators Oct 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HTTP/3] SIGABRT in stress tests #72696

[HTTP/3] SIGABRT in stress tests #72696

CarnaViire commented Jul 22, 2022

ghost commented Jul 22, 2022

ManickaP commented Jul 27, 2022

karelz commented Aug 9, 2022

CarnaViire commented Aug 9, 2022 •

edited

Loading

wfurt commented Aug 9, 2022

CarnaViire commented Aug 12, 2022

CarnaViire commented Aug 18, 2022

carlossanlop commented Aug 18, 2022

CarnaViire commented Aug 18, 2022

CarnaViire commented Aug 24, 2022

nibanks commented Aug 24, 2022

CarnaViire commented Sep 6, 2022

karelz commented Sep 8, 2022

[HTTP/3] SIGABRT in stress tests #72696

[HTTP/3] SIGABRT in stress tests #72696

Comments

CarnaViire commented Jul 22, 2022

ghost commented Jul 22, 2022

ManickaP commented Jul 27, 2022

karelz commented Aug 9, 2022

CarnaViire commented Aug 9, 2022 • edited Loading

wfurt commented Aug 9, 2022

CarnaViire commented Aug 12, 2022

CarnaViire commented Aug 18, 2022

carlossanlop commented Aug 18, 2022

CarnaViire commented Aug 18, 2022

CarnaViire commented Aug 24, 2022

nibanks commented Aug 24, 2022

CarnaViire commented Sep 6, 2022

karelz commented Sep 8, 2022

CarnaViire commented Aug 9, 2022 •

edited

Loading