-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.NET 9 Arm32 tests are randomly failing from PowerShell hang #104106
Comments
Confirmed that I can still repro with this changed reverted: dotnet/dotnet-docker#5587. |
I've also tried reverting dotnet/dotnet-docker#5584 and the issue still repros. |
could this be coincident with some other infrastructure changes? The connect is handled by kernel so it seems like network infrastructure problem to me. |
The IP address in question is the IP address of the aspnet container being tested. In other words, the app container isn't responding. |
I've attempted to isolate which set of tests was causing this: runtime, aspnet, sdk, etc. I ran jobs that tested each of those sets separately... and they all passed. 🤷 |
It could be timing as well e.g. the over is still starting on slow platform. I don't know if there is any synchronization for that. |
It attempts to get a response from the container, retrying 5 times with a 2 second delay for each retry. This test code hasn't changed in years. If it was a machine thing, I would expect to see it in other .NET versions and in the arm64 jobs (since arm32 and arm64 use the same machines). But it's very specific to 9.0 arm32. |
Confirming this same behavior on internal builds. Example build (internal link) |
I think the I added some logging to the tests and was able to identify a test that hangs in two separate jobs: There hasn't been a new drop of PowerShell for 9.0 though in quite a while. This is the last one: dotnet/dotnet-docker#5506. We would have seen this earlier if it was solely a PowerShell thing. My only guess is that it's related to the interaction between PowerShell and a new drop of .NET 9. @adaggarwal - are you aware of any behavior like this? To recap, it seems that execution of PowerShell sporadically hangs when running in an Arm32 Debian/Ubuntu container environment. |
I just realized the reason this only occurs for Debian and Ubuntu and not Alpine or Azure Linux. For Alpine, we don't have PowerShell installed since PowerShell doesn't release binaries for Arm32 linux-musl (this is just further evidence that the issue is related to PowerShell). And for Azure Linux, we don't have any Arm32 images at all. |
Repro Steps
|
I attempted to collect a dump using |
Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov |
I've moved this to the runtime repo to investigate further. |
Tagging subscribers to this area: @mangod9 |
Looks like the main thread is in
|
I ran some tests: dotnet/dotnet-docker#5643. This succeeded 3 times in a row. Normally that would for sure have run into this issue. |
Awesome! Would you prefer I keep this open until you can consume the fix or close it as fixed? |
I'm fine with closing it. |
There have been random test jobs failing with either one of two issues: a timeout
or a. I've only seen this happening in the Noble and Bookworm jobs, not Alpine.HttpRequestException
This first popped up with this PR: dotnet/dotnet-docker#5587. But I can't imagine it was the cause. The most recent .NET change prior to that was this: dotnet/dotnet-docker#5584.
I've only seen this in public builds but there haven't been many internal builds yet to determine if it's only limited to public.
Example timeout build
Note
The rest of the information below is not relevant to this particular issue. Keeping it here because the conversation below references it. But this error is not a pervasive error like the timeout issue is.
Example
HttpRequestException
buildMicrosoft.DotNet.Docker.Tests.SdkImageTests.VerifyBlazorWasmScenario
The text was updated successfully, but these errors were encountered: