-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unhandled exception causes docker container to hang on ARM64 #66707
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
I can confirm the bad behavior. I am investigating it. |
This is quite strange. The hang occurs in the abort implementation in the glibc here: https://code.woboq.org/userspace/glibc/stdlib/abort.c.html#107 |
@janvorli how does SIGTRAP come into play here? Or did you mean SIGABRT? Does it happen also with a simple C program that calls |
The abort instruction is an invalid instruction that the abort in the standard c library seems to use as a way to make the process exit as a last resort if SIGABRT didn't have any effect. So it is handled by SIGTRAP. At least that's what I've figured from the |
ENV COMPlus_ZapDisable=1 UPDATE: Apologies, I had a cached layer that had an explicit Thank you for your example repo @blaskoa, the |
Hi @AntonLapounov , since Jan is out could you please check if this repros on .NET 6? If so it might not block 7. |
From the original post:
|
ah ok, sorry missed that. We can leave in 7 to investigate but probably would not make it into 7. |
Any updates on this? Having difficulties running in production. We are trying to migrate to ARM64 servers. |
Trying this
Use:
|
I have investigated this issue. The reason for the different behavior between running a bash in the container and then running the .NET binary manually vs running the .NET binary via the entrypoint is interesting. The difference is that in the case of the bash, there are two processes running in the container, the .NET one being the second. In the case of the entrypoint, there is just the .NET process and Linux considers it to be an init process that has different ways of handling signals like SIGABRT. The default for the init process is to ignore the signals while in the other case, the handlers are set to a default handler that tears down the process. See https://ddanilov.me/how-signals-are-handled-in-a-docker-container for a very nice description. In the case this issue is about, the I was going to fix it, but I've found that a recent change with a different purpose already fixed the process. Before the change, we were unregistering only our SIGABRT handler, so the SIGTRAP stayed in place. After that change, all signal handlers are unregistered. The change is #80474 by @mikem8361. It seems we want to backport this change to .NET 7. @blaskoa as a workaround, you can add |
@janvorli do you want to backport this or should I? |
@mikem8361 if you could do it, it would be great! |
Description
Dotnet applications running in docker containers on arm64 stop responding when unhandled exception occurs and keep a 100% utilization on a single CPU core (as measured by
docker stats
).I tested the issue on the following environments:
The only difference in the environments is that with qemu an additional line is present in the output and shell stops responding to
Ctrl+C
signals:Issue is only present when the dotnet process is set as docker entrypoint e.g.
ENTRYPOINT [ "dotnet", "arm-dotnet-repro.dll" ]
Issue is NOT present when running the dotnet binaries natively outside of docker containers, directly on the host.
Issue is NOT present when wrapping the entrypoint with shell script (see reproduction repo).
Issue is NOT present when the dotnet binary is exectued via interactive shell in docker container e.g.
docker run --rm -it --entrypoint bash ex-repro:arm64-1
and then inside the container shell calldotnet arm-dotnet-repro.dll
The behavior is present on dotnet 5 and dotnet 6 runtimes (I just tested these ones).
Reproduction Steps
see https://github.com/blaskoa/arm-dotnet-repro for minimal reproducer with some available workarounds I found
Expected behavior
On unhandled exception the container should exit with non-0 exit code.
Basically the same as it behaves on x64
Actual behavior
On unhandled exception the container keeps "running" and utilizes 1 CPU core to 100%
Regression?
No response
Known Workarounds
see https://github.com/blaskoa/arm-dotnet-repro
some of the workarounds have some side effects which might render them unsuitable
Configuration
No response
Other information
No response
The text was updated successfully, but these errors were encountered: