Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUIC failures on linux #82771

Closed
AndyAyersMS opened this issue Feb 28, 2023 · 10 comments
Closed

QUIC failures on linux #82771

AndyAyersMS opened this issue Feb 28, 2023 · 10 comments
Assignees
Labels
area-System.Net.Quic blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Milestone

Comments

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 28, 2023

I have seen various QUIC failures on arm64 linux in different PRs and in different pipelines. Symptoms vary, but here is a crash.

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=187390
Build error leg or test failing: System.Net.Quic.Functional.Tests.WorkItemExecution
Pull request: #82752

Error Message

Fill the error message using known issues guidance.

{
  "ErrorMessage": "Segmentation fault      (core dumped) \"$RUNTIME_PATH/dotnet\" exec --runtimeconfig System.Net.Quic.Functional.Tests",
  "BuildRetry": false,
  "ErrorPattern": "",
  "ExcludeConsoleLog": false
}

Report

Build Definition Test Pull Request
225373 dotnet/runtime System.Net.Quic.Functional.Tests.WorkItemExecution #84207
220949 dotnet/runtime System.Net.Quic.Functional.Tests.WorkItemExecution

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 2
@AndyAyersMS AndyAyersMS added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab labels Feb 28, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 28, 2023
@ghost
Copy link

ghost commented Feb 28, 2023

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

I have seen various QUIC failures on arm64 linux in different PRs and in different pipelines. Symptoms vary, but here is a crash.

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=187390
Build error leg or test failing: System.Net.Quic.Functional.Tests.WorkItemExecution
Pull request: #82752

Error Message

Fill the error message using known issues guidance.

{
  "ErrorMessage": "Segmentation fault      (core dumped) \"$RUNTIME_PATH/dotnet\" exec --runtimeconfig System.Net.Quic.Functional.Tests",
  "BuildRetry": false,
  "ErrorPattern": "",
  "ExcludeConsoleLog": false
}
Author: AndyAyersMS
Assignees: -
Labels:

blocking-clean-ci, untriaged, area-System.Net.Quic, Known Build Error

Milestone: -

@wfurt
Copy link
Member

wfurt commented Feb 28, 2023

@AndyAyersMS
Copy link
Member Author

@AndyAyersMS AndyAyersMS changed the title QUIC failures on arm64 linux QUIC failures on linux Feb 28, 2023
@AndyAyersMS
Copy link
Member Author

@wfurt
Copy link
Member

wfurt commented Feb 28, 2023

(lldb) bt
* thread #1, name = 'dotnet', stop reason = signal SIGSEGV
  * frame #0: 0x00007eff8a02c85b libmsquic.so.2`CxPlatRefDecrement at platform_posix.c:339:5
    frame #1: 0x00007eff8a02c83c libmsquic.so.2`CxPlatRefDecrement(RefCount=0x0000001100000124) at platform_posix.c:325
    frame #2: 0x00007ebe44073ee0

It does seems like some MsQuic or Quic issue.

@wfurt
Copy link
Member

wfurt commented Mar 1, 2023

and msquic version

(lldb) dumpobj 00007ebefe0c9368
Name:        System.String
MethodTable: 00007eff13f5aed8
EEClass:     00007eff13fe0760
Size:        184(0xb8) bytes
File:        /datadisks/disk1/work/A2890920/p/shared/Microsoft.NETCore.App/8.0.0/System.Private.CoreLib.dll
String:      libmsquic.so version=2.1.7.329711 commit=2db78a2cd72d0c95111d5f4884fbc17307a4238d

so this is not the updated msquic @ManickaP

@CarnaViire CarnaViire added this to the 8.0.0 milestone Mar 14, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Mar 14, 2023
@AndyAyersMS
Copy link
Member Author

@wfurt
Copy link
Member

wfurt commented Mar 20, 2023

I don't. I was trying to reproduce it running tests in loop but it did not happen. It is interesting that it seems to happen on coreclr runs. Is there something special about them?

@AndyAyersMS
Copy link
Member Author

There is some customization -- I believe we use a checked runtime and perhaps run in a different machine pool? I don't know for sure.

At least we aren't setting any crazy DOTNET env vars.

@ManickaP
Copy link
Member

So I tried to look for some recent crashes to get my hands on any dump, but all are already gone.
Query:

WorkItems
| where FriendlyName == "System.Net.Quic.Functional.Tests" and Status == "BadExit" and ConsoleUri has "main" and ExitCode == 139
| join kind=leftouter Jobs on JobId
| order by Finished desc 

Also, some of the linked PRs have different failures, e.g.: System.Net.Quic.QuicException : Stream aborted by peer (654321)..
I'll look into that problem, but I'm closing this for now. If you encounter the Seg Fault again, feel free to re-open.

@ghost ghost locked as resolved and limited conversation to collaborators May 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Quic blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Projects
None yet
Development

No branches or pull requests

4 participants