Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host Test failures in rolling CI #69114

Closed
jakobbotsch opened this issue May 10, 2022 · 16 comments
Closed

Host Test failures in rolling CI #69114

jakobbotsch opened this issue May 10, 2022 · 16 comments
Assignees
Labels
area-Host blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms'

Comments

@jakobbotsch
Copy link
Member

In the latest rolling runtime pipeline run, two "Installer Build and Test" jobs failed. win-x86 timed out and linux-x64 segfaulted.
Job: https://dev.azure.com/dnceng/public/_build/results?buildId=1760633&view=results
win-x86 log: https://dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_apis/build/builds/1760633/logs/2460
linux-x64 log: https://dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_apis/build/builds/1760633/logs/1970

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label May 10, 2022
@jakobbotsch
Copy link
Member Author

I've seen similar hangs in both #69120 and #69117 so am marking this blocking.

@jakobbotsch jakobbotsch added the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label May 10, 2022
@jakobbotsch
Copy link
Member Author

@NikolaMilosavljevic do you know of any installer related changes that could have caused this?

@NikolaMilosavljevic
Copy link
Member

@NikolaMilosavljevic do you know of any installer related changes that could have caused this?

There were no recent changes in installer build.

@NikolaMilosavljevic
Copy link
Member

Tests are likely related to dotnet host - we don't do installer testing in this job.

@jakobbotsch
Copy link
Member Author

This seems to have cleared up so will close it.

@ghost ghost removed the untriaged New issue has not been triaged by the area owner label May 11, 2022
@jakobbotsch
Copy link
Member Author

jakobbotsch commented May 11, 2022

Not quite, looks like this is still happening, e.g. segfault in https://dnceng.visualstudio.com/public/_build/results?buildId=1764534&view=results.

  Running tests: /root/runtime/artifacts/bin/AppHost.Bundle.Tests/Release/net7.0/AppHost.Bundle.Tests.dll [net7.0|x64]
  /tmp/tmpf96f353826b14c3d8569be3032ed3769.exec.cmd: line 2:  8879 Segmentation fault      (core dumped) "/root/runtime/.dotnet/dotnet" exec --depsfile "/root/runtime/artifacts/bin/AppHost.Bundle.Tests/Release/net7.0/AppHost.Bundle.Tests.deps.json" --runtimeconfig "/root/runtime/artifacts/bin/AppHost.Bundle.Tests/Release/net7.0/AppHost.Bundle.Tests.runtimeconfig.json" "/root/runtime/.packages/xunit.runner.console/2.4.2-pre.22/tools/netcoreapp2.0/xunit.console.dll" "/root/runtime/artifacts/bin/AppHost.Bundle.Tests/Release/net7.0/AppHost.Bundle.Tests.dll" -noautoreporters -xml "/root/runtime/artifacts/TestResults/Release/AppHost.Bundle.Tests_net7.0_x64.xml" -html "/root/runtime/artifacts/TestResults/Release/AppHost.Bundle.Tests_net7.0_x64.html" -notrait category=failing > "/root/runtime/artifacts/log/Release/AppHost.Bundle.Tests_net7.0_x64.log" 2>&1
XUnit : error : Tests failed: /root/runtime/artifacts/log/Release/AppHost.Bundle.Tests_net7.0_x64.log [net7.0|x64] [/root/runtime/src/installer/tests/Microsoft.NET.HostModel.Tests/AppHost.Bundle.Tests/AppHost.Bundle.Tests.csproj]

@jakobbotsch jakobbotsch reopened this May 11, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label May 11, 2022
@ghost
Copy link

ghost commented May 11, 2022

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details

In the latest rolling runtime pipeline run, two "Installer Build and Test" jobs failed. win-x86 timed out and linux-x64 segfaulted.
Job: https://dev.azure.com/dnceng/public/_build/results?buildId=1760633&view=results
win-x86 log: https://dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_apis/build/builds/1760633/logs/2460
linux-x64 log: https://dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_apis/build/builds/1760633/logs/1970

Author: jakobbotsch
Assignees: -
Labels:

blocking-clean-ci, area-Host, area-Infrastructure-installer, untriaged

Milestone: -

@jakobbotsch jakobbotsch changed the title Installer Build and Test failures in rolling CI Host Test failures in rolling CI May 11, 2022
@jakobbotsch
Copy link
Member Author

We've seen the "Installer Build and Test coreclr windows_x86 Release" timeout in two more rolling builds now, 1764261 and 1765644. As mentioned above, the issues seem to be happening while executing host tests. @vitek-karas any ideas here?

@vitek-karas
Copy link
Member

@elinor-fung would you have some time to take a look?

@jakobbotsch
Copy link
Member Author

PR #68382 run https://dev.azure.com/dnceng/public/_build/results?buildId=1765490&view=results hit the timeout on win-x64, so it seems to be occurring on multiple platforms

@elinor-fung
Copy link
Member

We seem to have just about no information for host tests on timeout - no build logs, no test results, no test output. I'm looking at getting more info collected so we can actually investigate.

I don't think we've changed anything in the host or host tests recently, but the start of this does seem to coincide with merge of the toolset updates in #67771.

@agocke
Copy link
Member

agocke commented May 12, 2022

Hmm, unfortunate -- I guess because we execute the host tests on the build machines instead of Helix.

Probably another reason to run these on Helix.

@elinor-fung
Copy link
Member

Intermittent crashes look like another manifestation of #68443

  * frame #0: 0x00007f7816900390 ld-musl-x86_64.so.1
    frame #1: 0x00007f78164a7041 libcoreclr.so`sigsegv_handler(int, siginfo_t*, void*) [inlined] invoke_previous_action(action=<unavailable>, code=<unavailable>, siginfo=<unavailable>, context=<unavailable>, signalRestarts=<unavailable>) at signal.cpp:431:5
    frame #2: 0x00007f78164a6fa0 libcoreclr.so`sigsegv_handler(code=66983920, siginfo=0x00007f3703fe1838, context=0x00007f7810ca0340) at signal.cpp:638
    frame #3: 0x00007f78168f1c89 ld-musl-x86_64.so.1`sigwaitinfo + 7
    frame #4: 0x00007f781633929a libcoreclr.so`WKS::gc_heap::mark_through_cards_for_uoh_objects(void (*)(unsigned char**), int, int) [inlined] WKS::gc_heap::mark_through_cards_helper(poo=0x00007f379ec67048, cg_pointers_found=0x00007f3703143bb8, fn=(libcoreclr.so`WKS::gc_heap::mark_object_simple(unsigned char**) at gc.cpp:24007), nhigh=0x0000000000000000, next_boundary=0x0000000000000000, condemned_gen=0, current_gen=2)(unsigned char**), unsigned char*, unsigned char*, int, int) at gc.cpp:36755:9
    frame #5: 0x00007f781633924b libcoreclr.so`WKS::gc_heap::mark_through_cards_for_uoh_objects(fn=(libcoreclr.so`WKS::gc_heap::mark_object_simple(unsigned char**) at gc.cpp:24007), gen_num=<unavailable>, relocating=NO)(unsigned char**), int, int) at gc.cpp:42167
    frame #6: 0x00007f781632939d libcoreclr.so`WKS::gc_heap::mark_phase(condemned_gen_number=<unavailable>, mark_only_p=NO) at gc.cpp:25777:25
    frame #7: 0x00007f78163254d4 libcoreclr.so`WKS::gc_heap::gc1() at gc.cpp:20597:13
    frame #8: 0x00007f781633186b libcoreclr.so`WKS::gc_heap::garbage_collect(n=0) at gc.cpp:0:5
    frame #9: 0x00007f78163208fa libcoreclr.so`WKS::GCHeap::GarbageCollectGeneration(this=<unavailable>, gen=0, reason=reason_alloc_soh) at gc.cpp:45930:9
    frame #10: 0x00007f7816322075 libcoreclr.so`WKS::gc_heap::trigger_gc_for_alloc(gen_number=<unavailable>, gr=<unavailable>, msl=<unavailable>, loh_p=<unavailable>, take_state=<unavailable>) at gc.cpp:17322:14 [artificial]
    frame #11: 0x00007f7816322d30 libcoreclr.so`WKS::gc_heap::try_allocate_more_space(acontext=0x00007f37039189c8, size=32, flags=0, gen_number=<unavailable>) at gc.cpp:17472:21
    frame #12: 0x00007f781634a0b0 libcoreclr.so`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) [inlined] WKS::gc_heap::allocate_more_space(acontext=0x00007f37039189c8, size=32, flags=0, alloc_generation_number=0) at gc.cpp:17943:18
    frame #13: 0x00007f781634a096 libcoreclr.so`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) at gc.cpp:17974
    frame #14: 0x00007f781634a078 libcoreclr.so`WKS::GCHeap::Alloc(this=<unavailable>, context=0x00007f37039189c8, size=25, flags=0) at gc.cpp:44892
    frame #15: 0x00007f78161df56b libcoreclr.so`AllocateSzArray(MethodTable*, int, GC_ALLOC_FLAGS) at gchelpers.cpp:226:48
    frame #16: 0x00007f78161df503 libcoreclr.so`AllocateSzArray(pArrayMT=<unavailable>, cElements=1, flags=GC_ALLOC_NO_FLAGS) at gchelpers.cpp:0
    frame #17: 0x00007f78161fc992 libcoreclr.so`JIT_NewArr1(arrayMT=0x00007f779d047970, size=1) at jithelpers.cpp:2627:16

@elinor-fung
Copy link
Member

We'll get the fix when the repo updates to preview5 - closing against #68443.

@ghost ghost locked as resolved and limited conversation to collaborators Jul 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Host blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms'
Projects
None yet
Development

No branches or pull requests

6 participants