-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aspnet app crash with core dump when using 7.0.11-bookworm-slim docker image (debian 12) #93205
Comments
@agjini can you please try to set the |
I've made two tests with I've set the environment variable in the env, before running the dotnet program (inside the docker container). But I doesn't give me any better output : only Does I have to do a specific manipulation to get malloc output ? |
I had a similar issue that I investigated recently which was resulting in the exact same error message on the same image as you when performing load testing on my .NET 7.0 WebAPI endpoint. I was able to resolve it by replacing the unmanaged memory operations in my hashing code with managed ones.
Here are some other things that I did that worked:
We're now using Ubunty jammy and have not seen this come back up. Hope it helps. |
Thanks @Usualdosage for the comment. I can understand that using interop type operations as app developers we should be responsible for thread safety, however using a system library in a managed language (C#), I expect that thread safety issues should either be handled by the native dll (OpenSSL in this case) or the library (System.Security.Cryptography). In my case with SHA256 however, I think it is OpenSSL native dll related as it works fine on an older version of OpenSSL in buster image, so I think Microsoft need to address this otherwise trying to wrap system libraries for thread safety do not sound suitable for a managed language/runtime. @janvorli do you have any comments on this issue? |
@gokhansengun let me cc @karelz who should be able to pull in the right folks that work on this part of the openssl related code. |
Without memory dumps or repro it si very hard to investigate. @gokhansengun, @agjini would it be possible to get either of those? Small self-contained repro project would be prefferred. |
Given that you have an instance of Are you using an instance of If it worked on previous versions of OpenSSL or containers, it worked by "accident" as there is no thread safety guarantee of hash algorithm instances, either from managed code or within OpenSSL itself. OpenSSL's internal structures are not thread safe in this circumstance. You can use the static methods like Based on your screenshot, you should use var hashedText = Convert.ToBase64String(SHA256.HashData(Encoding.UTF8.GetBytes(text))); Then the |
Didn't they introduce excessive locking in OpenSSL 3.0 to make everything thread safe? (which also supposedly caused the performance regressions there) |
OpenSSL 3 does have some additional locking in place, but as far as I understand it, most of that is around how providers and algorithms get resolved, not primitives themselves. OpenSSL does not make any guarantees about thread safety in this circumstance.
In this case, EVP_DigestUpdate and EVP_MD_CTX are not documented as thread safe. I can pretty easily reproduce a crash by mutating an EVP_MD_CTX from multiple threads using EVP_DigestUpdate. This will crash almost instantly for me using OpenSSL 3.2. #include <assert.h>
#include <openssl/evp.h>
#include <pthread.h>
static void* updater(void *data)
{
EVP_MD_CTX* ctx = (EVP_MD_CTX*)data;
unsigned char foo[8] = { 0, 1, 2, 3, 4, 5, 6, 7 };
for (int i = 0; i < 0x7FFFFFFF; i++)
{
EVP_DigestUpdate(ctx, foo, sizeof(foo));
}
return NULL;
}
int main(int argc, char *argv[])
{
EVP_MD* sha256 = EVP_MD_fetch(NULL, "SHA256", NULL);
assert(sha256);
EVP_MD_CTX* ctx = EVP_MD_CTX_new();
assert(ctx);
if (!EVP_DigestInit_ex2(ctx, sha256, NULL))
{
assert(0 && "init failed");
return 1;
}
pthread_t thread1;
pthread_t thread2;
pthread_t thread3;
pthread_create(&thread1, NULL, updater, ctx);
pthread_create(&thread2, NULL, updater, ctx);
pthread_create(&thread3, NULL, updater, ctx);
pthread_join(thread1, NULL);
pthread_join(thread2, NULL);
pthread_join(thread3, NULL);
} |
To emphasize though: even if OpenSSL 3 did not crash, using a hash object from multiple threads can result in incorrect hash results, even if the method "looks" like it is doing everything at once like You should either use the statics, or create a new HashAlgorithm instance as needed. |
After chatting with @bartonjs we are going to make some changes so we can at least throw a managed exception instead of crashing the process. It won't make anything thread safe, but it will prevent a hard crash of the whole process. |
Thanks for the detailed analysis @vcsjones, preventing hard crash with a managed exception will be a great improvement. I can not explain how but I am confident that is has something to do with the base image change to bookworm which in turn changes OpenSSL version from 1.1.1n-0 to 3.0.11-1. With the old debian release, same test with the same aspnet runtime version never reproduces the crash however with the new bookworm debian release it is 100%. I will try to slim down the app and will try to come up with a clean repro until next week. |
Reproducing this is as simple as HashAlgorithm hash = SHA256.Create();
Thread one = new(ThreadWork);
Thread two = new(ThreadWork);
one.Start(hash);
two.Start(hash);
one.Join();
two.Join();
static void ThreadWork(object obj)
{
HashAlgorithm hash = (HashAlgorithm)obj;
byte[] data = "potato"u8.ToArray();
for (int i = 0; i < 1_000_000; I++)
{
try
{
hash.ComputeHash(data);
}
catch // do not care about managed exceptions.
{
}
}
}
it is very likely something changed in OpenSSL 3 to make it more stateful because it has to juggle legacy engines and providers.
I do want to caution one more time though: just because it did not crash does not mean it worked correctly. It could have been producing incorrect hashes, even on OpenSSL 1.1.1. |
I think we can close this out, as the root cause appears to be using instance of
In .NET 9, we introduced a concurrency-misuse check that will prevent the process from crashing, and instead raise an exception with appropriate error text in #100371. This was backported to .NET 8.0.6 in #101737 for Linux platforms. |
Thanks a ton, the solution looks great |
Description
We recently switch our base docker image from
7.0.11-bullseye-slim
to7.0.11-bookworm-slim
.On one of our container based on this image (the auth service) crash after a while with the following outputs. We encountered 3 different error messages, all causing a core dump crash.
Reproduction Steps
We build our docker image based on
Expected behavior
We expect the application to run normally without crashing (of course) as it runs with debian 11.
Actual behavior
The app crash with the following outputs (alternatively) :
Regression?
It seems to be a bug introduced with the new debian 12 based image. When we switch back to previous docker image (debian 11), the problem disappears.
Known Workarounds
Rollback to debian 11 based docker image fix the problem.
Configuration
.Net 7.0.11
Host OS : Ubuntu 22.04.2 LTS
docker info : Server Version: 23.0.1
x64 architecture
Other information
No response
The text was updated successfully, but these errors were encountered: