-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JVM crash with grpc-java 1.42.x and alpine docker image #8751
Comments
Without the crash report or reproduction, there's not much we can do. The crash report would let us verify the failure is within netty-tcnative or netty's epoll, for example. Without knowing the source of crash, we can't really narrow much down. The problem may exist for non-Alpine as well, just doesn't result in a crash due to slightly different memory layout/library versions. It is interesting that C core also started seeing a recent crash, but there's not much in that report either. It is strange that the program counter is at a very low address though, in both cases. I fired up openjdk:8-alpine (a3562aa0b991), installed gcompat and libc6-compat, and ran grpc's interop-client[1] to hit an HTTPS server. No crashes.
|
@ejona86 This makes sense. |
I've started seeing this also, but only on Alpine in combination with Kubernetes (specifically I used |
@Spikhalskiy, are you also seeing this with k8s? @cfredri4, do you know what OS your k8s Node is using? |
@ejona86 yeah. |
Could this be related to netty/netty#11701 as a project I'm working in has been experiencing similar issues which appear to indicate the netty build being pulled in by |
Interesting. Changing from |
Not likely. We have this issue on an alpine image with installed glibc. Also, the image works with grpc-java (,1.42.0) and doesn't work with [1.42.0,). So, the problem doesn't look like just an absent glibc and not handling it by netty gracefully. |
TL;DR: Alpine doesn't have compatibility for the __strndup symbol. I don't know why the behavior is k8s-dependent, though. And it'll take some more research to determine appropriate next steps. Looking at objdump output, it looks like the problem is happening in
But I don't see any obvious places the stack could get corrupted in
The address displayed is relative to the .so, so isn't the problem itself. It jumps to the PLT:
And the indirect jump goes to the GOT which should be filled with the adjusted address of 3efe. But maybe something is broken in the linker and it didn't get adjusted?
Well, there we go... Older versions of epoll linked against strndup, not __strndup. This difference may have been caused by a glibc upgrade when compiling. |
TL;DR: Try setting the
But I think that isn't the full story. Looking at the older Alpine:
Here though, gcompat is providing the linker which loads lib/libgcompat.so.0. That means I don't think the ldd output is accurate. I see that gcompat 0.3.0 (Alpine 3.9) and 1.0.0 (Alpine 3.15) have __strndup, so I think the wrong linker on older Alpine versions is the trouble. Trying the gcompat linker approach "manually" on the old Alpine seems to work:
|
I had the same issue having
and
Thanks to @ejona86, what I've made:
Or Dockerfile should include just:
|
Thank you, @artemptushkin ! Your solution works. :-) But there's a typo in the path. It should be |
It sounds like people have confirmed my discoveries. I suggest users upgrade to newer Alpine versions, but if unable, then use the LD_PRELOAD trick. |
Just FYI, latest alpine without LD_PRELOAD does not fix my issues. I tried Dockerfile:
Which is the latest alpine, with the latest azul alpine jdk. I run a jar that uses micrometer 1.8.0 which has a shaded netty with native calls to glibc. adding I'm adding the head of my SIGSEGV crash so people researching will discover this thread:
|
@alexfeigin we hope for the one after |
Given what I saw with grpc/grpc#27995, I expect that if the binary you execute is musl-based then gcompat wouldn't be used automatically. That's just a deficiency of how gcompat linker works. libc6-compat could have provided symbols, but not with its symlink-to-musl approach. So I guess LD_PRELOAD is with us long-term. |
When
|
- Move libraries to LIBS where they should be, avoiding need for -Wl,--no-as-needed. - Use -O2 instead of -O3; there are no tight loops so -O3 just increases code size for no benefit. - Add -pipe for faster compilation. - Add -D_FORTIFY_SOURCE=2 and -Wl,-z,relro for security. - Add -Wl,-z,now for security and to improve musl compatibility. musl does not implement __strdup and __strndup which old glibc aliases strdup and strndup to, but OpenJDK loads libraries with RTLD_LAZY, so this is not discovered until too late. See grpc/grpc-java#8751 (comment) for more information.
- Move libraries to LIBS where they should be, avoiding need for -Wl,--no-as-needed. - Use -O2 instead of -O3; there are no tight loops so -O3 just increases code size for no benefit. - Add -pipe for faster compilation. - Add -D_FORTIFY_SOURCE=2 and -Wl,-z,relro for security. - Add -Wl,-z,now for security and to improve musl compatibility. musl does not implement __strdup and __strndup which old glibc aliases strdup and strndup to, but OpenJDK loads libraries with RTLD_LAZY, so this is not discovered until too late. See grpc/grpc-java#8751 (comment) for more information. - Add -ffunction-sections -fdata-sections -Wl,--gc-sections to reduce output size and avoid linking librt when not really needed (only used for kqueue) "Fixes" netty#11701 by making the native library load fail.
Relying on glibc-compat is not a safe or wise thing to do... and I can confirm that using
adding the |
Motivation: Loading of the native epoll transport might segfault the JVM if musl is used and no glibc-compat is installed: - Move libraries to LIBS where they should be, avoiding need for -Wl,--no-as-needed. - Use -O2 instead of -O3; there are no tight loops so -O3 just increases code size for no benefit. - Add -pipe for faster compilation. - Add -D_FORTIFY_SOURCE=2 and -Wl,-z,relro for security. - Add -Wl,-z,now for security and to improve musl compatibility. musl does not implement __strdup and __strndup which old glibc aliases strdup and strndup to, but OpenJDK loads libraries with RTLD_LAZY, so this is not discovered until too late. See grpc/grpc-java#8751 (comment) for more information. - Add -ffunction-sections -fdata-sections -Wl,--gc-sections to reduce output size and avoid linking librt when not really needed (only used for kqueue) Result: "Fixes" #11701 by making the native library load fail.
Motivation: Loading of the native epoll transport might segfault the JVM if musl is used and no glibc-compat is installed: Modifications: - Move libraries to LIBS where they should be, avoiding need for -Wl,--no-as-needed. - Use -O2 instead of -O3; there are no tight loops so -O3 just increases code size for no benefit. - Add -pipe for faster compilation. - Add -D_FORTIFY_SOURCE=2 and -Wl,-z,relro for security. - Add -Wl,-z,now for security and to improve musl compatibility. musl does not implement __strdup and __strndup which old glibc aliases strdup and strndup to, but OpenJDK loads libraries with RTLD_LAZY, so this is not discovered until too late. See grpc/grpc-java#8751 (comment) for more information. - Add -ffunction-sections -fdata-sections -Wl,--gc-sections to reduce output size and avoid linking librt when not really needed (only used for kqueue) Result: "Fixes" #11701 by making the native library load fail.
Motivation: Loading of the native epoll transport might segfault the JVM if musl is used and no glibc-compat is installed: Modifications: - Move libraries to LIBS where they should be, avoiding need for -Wl,--no-as-needed. - Use -O2 instead of -O3; there are no tight loops so -O3 just increases code size for no benefit. - Add -pipe for faster compilation. - Add -D_FORTIFY_SOURCE=2 and -Wl,-z,relro for security. - Add -Wl,-z,now for security and to improve musl compatibility. musl does not implement __strdup and __strndup which old glibc aliases strdup and strndup to, but OpenJDK loads libraries with RTLD_LAZY, so this is not discovered until too late. See grpc/grpc-java#8751 (comment) for more information. - Add -ffunction-sections -fdata-sections -Wl,--gc-sections to reduce output size and avoid linking librt when not really needed (only used for kqueue) Result: "Fixes" #11701 by making the native library load fail. Co-authored-by: Alex Xu <351006+Hello71@users.noreply.github.com>
Strange situation.
But if I try to execute some command from java
I have an error
Without |
There is an issue with the latest alpine versions and io.grpc used by google libs (check grpc/grpc-java#8751) Change-Id: I3b32e9963176725513d0e3ac99732e0cde796618
Motivation: Loading of the native epoll transport might segfault the JVM if musl is used and no glibc-compat is installed: Modifications: - Move libraries to LIBS where they should be, avoiding need for -Wl,--no-as-needed. - Use -O2 instead of -O3; there are no tight loops so -O3 just increases code size for no benefit. - Add -pipe for faster compilation. - Add -D_FORTIFY_SOURCE=2 and -Wl,-z,relro for security. - Add -Wl,-z,now for security and to improve musl compatibility. musl does not implement __strdup and __strndup which old glibc aliases strdup and strndup to, but OpenJDK loads libraries with RTLD_LAZY, so this is not discovered until too late. See grpc/grpc-java#8751 (comment) for more information. - Add -ffunction-sections -fdata-sections -Wl,--gc-sections to reduce output size and avoid linking librt when not really needed (only used for kqueue) Result: "Fixes" netty#11701 by making the native library load fail. Co-authored-by: Alex Xu <351006+Hello71@users.noreply.github.com>
An attempt to upgrade from grpc-java 1.41.1 to 1.42.x ends with JVM crash.
It looks like the problem is specific to Alpine Linux. It reproduces on
openjdk:15-jdk-alpine
andopenjdk:8-alpine
and goes away with a switch onopenjdk:X-slim
[debian] images.Maybe also affected by the fact that
openjdk:X-alpine
images are not maintained anymore, hence have no new JDK updates.The first version of grpc-java with the problem is 1.42.0, the versions before work fine.
It may be related to grpc/grpc#27995
hs_err_pid372.log
The text was updated successfully, but these errors were encountered: