Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JVM crash with grpc-java 1.42.x and alpine docker image #8751

Closed
Spikhalskiy opened this issue Dec 9, 2021 · 19 comments
Closed

JVM crash with grpc-java 1.42.x and alpine docker image #8751

Spikhalskiy opened this issue Dec 9, 2021 · 19 comments

Comments

@Spikhalskiy
Copy link

Spikhalskiy commented Dec 9, 2021

An attempt to upgrade from grpc-java 1.41.1 to 1.42.x ends with JVM crash.
It looks like the problem is specific to Alpine Linux. It reproduces on openjdk:15-jdk-alpine and openjdk:8-alpine and goes away with a switch on openjdk:X-slim [debian] images.
Maybe also affected by the fact that openjdk:X-alpine images are not maintained anymore, hence have no new JDK updates.
The first version of grpc-java with the problem is 1.42.0, the versions before work fine.

It may be related to grpc/grpc#27995

    #
  | # A fatal error has been detected by the Java Runtime Environment:
  | #
  | #  SIGSEGV (0xb) at pc=0x0000000000003efe, pid=372, tid=0x00007fbffbc9bb10
  | #
  | # JRE version: OpenJDK Runtime Environment (8.0_212-b04) (build 1.8.0_212-b04)
  | # Java VM: OpenJDK 64-Bit Server VM (25.212-b04 mixed mode linux-amd64 compressed oops)
  | # Derivative: IcedTea 3.12.0
  | # Distribution: Custom build (Sat May  4 17:33:35 UTC 2019)
  | # Problematic frame:
  | # C  0x0000000000003efe
  | #
  | # Core dump written. Default location: /temporal-java-client/temporal-kotlin/core or core.372
  | #
  | # An error report file with more information is saved as:
  | # /temporal-java-client/temporal-kotlin/hs_err_pid372.log
  | #
  | # If you would like to submit a bug report, please include
  | # instructions on how to reproduce the bug and visit:
  | #   https://icedtea.classpath.org/bugzilla
  | #

hs_err_pid372.log

@ejona86
Copy link
Member

ejona86 commented Dec 9, 2021

Without the crash report or reproduction, there's not much we can do. The crash report would let us verify the failure is within netty-tcnative or netty's epoll, for example. Without knowing the source of crash, we can't really narrow much down. The problem may exist for non-Alpine as well, just doesn't result in a crash due to slightly different memory layout/library versions.

It is interesting that C core also started seeing a recent crash, but there's not much in that report either. It is strange that the program counter is at a very low address though, in both cases.

I fired up openjdk:8-alpine (a3562aa0b991), installed gcompat and libc6-compat, and ran grpc's interop-client[1] to hit an HTTPS server. No crashes.

/grpc-interop-testing-1.42.1 # ./bin/test-client --server_host=grpc-test.sandbox.googleapis.com --server_port=443
Running test empty_unary
Test completed.
/grpc-interop-testing-1.42.1 # # Running ./bin/test-server --use_tls=false outside the container
/grpc-interop-testing-1.42.1 # ./bin/test-client --server_host=my.host.ip.address --use_tls=false
Running test empty_unary
Test completed.
  1. https://repo1.maven.org/maven2/io/grpc/grpc-interop-testing/1.42.1/grpc-interop-testing-1.42.1.tar

@Spikhalskiy
Copy link
Author

Spikhalskiy commented Dec 9, 2021

@ejona86 This makes sense.
Here is the crash report: hs_err_pid372.log
Thank you for the quick turnaround and an attempt to reproduce.

@cfredri4
Copy link
Contributor

I've started seeing this also, but only on Alpine in combination with Kubernetes (specifically I used azul/zulu-openjdk-alpine:17-jre-headless with no other changes, and Kubernetes 1.20). It works fine with an Ubuntu image on same Kubernetes environment, and it works fine with the Alpine image on a local Docker environment.

@ejona86
Copy link
Member

ejona86 commented Dec 10, 2021

@Spikhalskiy, are you also seeing this with k8s?

@cfredri4, do you know what OS your k8s Node is using?

@Spikhalskiy
Copy link
Author

Spikhalskiy commented Dec 11, 2021

@ejona86 yeah.
This crash log that I attached is from build container spawned by buildkite CI/CD that uses agents deployed on aws k8s.
Another crash that we saw with openjdk:15-alpine is from a docker pod that uses the image (also aws k8s)

@djtuBIG-MaliceX
Copy link

Could this be related to netty/netty#11701 as a project I'm working in has been experiencing similar issues which appear to indicate the netty build being pulled in by grpc-netty-shaded breaking in alpine Docker bases.

@cfredri4
Copy link
Contributor

Could this be related to netty/netty#11701 as a project I'm working in has been experiencing similar issues which appear to indicate the netty build being pulled in by grpc-netty-shaded breaking in alpine Docker bases.

Interesting. Changing from azul/zulu-openjdk-alpine:17-jre-headless (musl) to bellsoft/liberica-openjre-alpine:17 (glibc) made the problem go away for me.

@Spikhalskiy
Copy link
Author

Spikhalskiy commented Dec 14, 2021

Could this be related to netty/netty#11701 as a project I'm working in has been experiencing similar issues which appear to indicate the netty build being pulled in by grpc-netty-shaded breaking in alpine Docker bases.

Not likely. We have this issue on an alpine image with installed glibc. Also, the image works with grpc-java (,1.42.0) and doesn't work with [1.42.0,). So, the problem doesn't look like just an absent glibc and not handling it by netty gracefully.

@ejona86
Copy link
Member

ejona86 commented Dec 15, 2021

TL;DR: Alpine doesn't have compatibility for the __strndup symbol. I don't know why the behavior is k8s-dependent, though. And it'll take some more research to determine appropriate next steps.

Looking at objdump output, it looks like the problem is happening in parsePackagePrefix.

libio_grpc_netty_shaded_netty_transport_native_epoll_x86_642526953976876250345.so+0xb487:

    b47c:	48 89 de             	mov    %rbx,%rsi
    b47f:	4c 89 e7             	mov    %r12,%rdi
    b482:	e8 59 fe ff ff       	call   b2e0 <parsePackagePrefix>
    b487:	83 7d dc ff          	cmpl   $0xffffffff,-0x24(%rbp)

But I don't see any obvious places the stack could get corrupted in parsePackagePrefix and there's no callout to a passed function. I'm suspicious though that the linker is broken. This is in parsePackagePrefix:

    b395:	e8 5e 8b ff ff       	call   3ef8 <__strndup@plt>

The address displayed is relative to the .so, so isn't the problem itself. It jumps to the PLT:

0000000000003ef8 <__strndup@plt>:
    3ef8:       ff 25 52 c2 20 00       jmp    *0x20c252(%rip)        # 210150 <__strndup@GLIBC_2.2.5>
    3efe:       68 36 00 00 00          push   $0x36
    3f03:       e9 80 fc ff ff          jmp    3b88 <.plt>

And the indirect jump goes to the GOT which should be filled with the adjusted address of 3efe. But maybe something is broken in the linker and it didn't get adjusted?

/ # apk add gcompat
/ # ldd libnetty_transport_native_epoll_x86_64-4.1.63.so 
	ldd (0x7fa4cfc2f000)
	librt.so.1 => ldd (0x7fa4cfc2f000)
	libdl.so.2 => ldd (0x7fa4cfc2f000)
	libc.so.6 => ldd (0x7fa4cfc2f000)
Error relocating libnetty_transport_native_epoll_x86_64-4.1.63.so: __strndup: symbol not found

Well, there we go... Older versions of epoll linked against strndup, not __strndup. This difference may have been caused by a glibc upgrade when compiling.

@ejona86
Copy link
Member

ejona86 commented Dec 20, 2021

TL;DR: Try setting the LD_PRELOAD=/lib/libgcompat.so.0 environment variable on old Alpine versions, so gcompat is actually used. Hopefully that won't break any of the musl binaries. Best approach is probably "upgrade Alpine to 3.13 or later" though. (Edit: New versions exhibit similar behavior, because this flow avoids libc6-compat glue)

openjdk:8-alpine is based on alpine 3.9.4, and using apline:3.9.4 directly produces similar ldd results. Interestingly apline:3.15.0 (latest) has more unresolved symbols:

/ # ldd libnetty_transport_native_epoll_x86_64-4.1.63.so 
	/lib/ld-musl-x86_64.so.1 (0x7f7bfe2f5000)
	librt.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f7bfe2f5000)
	libdl.so.2 => /lib/ld-musl-x86_64.so.1 (0x7f7bfe2f5000)
	libc.so.6 => /lib/ld-musl-x86_64.so.1 (0x7f7bfe2f5000)
Error relocating libnetty_transport_native_epoll_x86_64-4.1.63.so: __strdup: symbol not found
Error relocating libnetty_transport_native_epoll_x86_64-4.1.63.so: __strndup: symbol not found

But I think that isn't the full story. Looking at the older Alpine:

/ # cat /etc/os-release 
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.9.4
...
/ # apk info -L gcompat libc6-compat
gcompat-0.3.0-r0 contains:
lib/ld-linux-x86-64.so.2
lib/libgcompat.so.0

libc6-compat-1.1.20-r6 contains:
lib/libm.so.6
lib/libutil.so.1
lib/libc.so.6
lib/libpthread.so.0
lib/libcrypt.so.1
lib/librt.so.1
lib64/ld-linux-x86-64.so.2
/ # ls -l lib64/ld-linux-x86-64.so.2
lrwxrwxrwx    1 root     root            26 Dec 20 18:30 lib64/ld-linux-x86-64.so.2 -> /lib/libc.musl-x86_64.so.1

lib64/ld-linux-x86-64.so.2 is used for x86_64, not lib/ld-linux-x86-64.so.2. So I suspect gcompat is useless on this older version. Instead, libc6-compat is providing the linker which just forwards to the musl linker, and is missing the symbol. So I think ldd on this older Alpine is accurate.

/ # cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.15.0
...
/ # apk info -L gcompat libc6-compat
gcompat-1.0.0-r4 contains:
lib/ld-linux-x86-64.so.2
lib/libgcompat.so.0
lib64/ld-linux-x86-64.so.2

libc6-compat-1.2.2-r7 contains:
lib/libc.so.6
lib/libcrypt.so.1
lib/libm.so.6
lib/libpthread.so.0
lib/librt.so.1
lib/libutil.so.1

/ # ls -l lib64/ld-linux-x86-64.so.2
lrwxrwxrwx    1 root     root            27 Dec 20 18:57 lib64/ld-linux-x86-64.so.2 -> ../lib/ld-linux-x86-64.so.2

Here though, gcompat is providing the linker which loads lib/libgcompat.so.0. That means I don't think the ldd output is accurate. I see that gcompat 0.3.0 (Alpine 3.9) and 1.0.0 (Alpine 3.15) have __strndup, so I think the wrong linker on older Alpine versions is the trouble. Trying the gcompat linker approach "manually" on the old Alpine seems to work:

/ # cat /etc/os-release 
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.9.4
...
/ # LD_PRELOAD=/lib/libgcompat.so.0 ldd libnetty_transport_native_epoll_x86_64-4.1.63.so 
	ldd (0x7f72fc25e000)
	/lib/libgcompat.so.0 => /lib/libgcompat.so.0 (0x7f72fc03c000)
	librt.so.1 => ldd (0x7f72fc25e000)
	libdl.so.2 => ldd (0x7f72fc25e000)
	libc.so.6 => ldd (0x7f72fc25e000)

@artemptushkin
Copy link

artemptushkin commented Jan 6, 2022

I had the same issue having grpc libraries in the classpath like:

io.grpc:grpc-stub:1.42.1

and apline:3.15.0 with JDK

openjdk version "11.0.13" 2021-10-19
OpenJDK Runtime Environment (build 11.0.13+8-alpine-r0)
OpenJDK 64-Bit Server VM (build 11.0.13+8-alpine-r0, mixed mode)

Thanks to @ejona86, what I've made:

  • I added gcompat lib
apk add gcompat
  • Run Java like this now
LD_PRELOAD=/lib/libgcompat.so.0 java -cp @/app/jib-classpath-file io.my.MyApplication

Or Dockerfile should include just:

RUN apk add gcompat

ENV LD_PRELOAD=/lib/libgcompat.so.0

@FelixSFD
Copy link

FelixSFD commented Jan 7, 2022

Or Dockerfile should include just:

RUN apk add gcompat

ENV LD_PRELOAD=/lib/lib/libgcompat.so.0

Thank you, @artemptushkin ! Your solution works. :-) But there's a typo in the path. It should be /lib/libgcompat.so.0 like you used for the previous command.

@ejona86
Copy link
Member

ejona86 commented Jan 11, 2022

It sounds like people have confirmed my discoveries. I suggest users upgrade to newer Alpine versions, but if unable, then use the LD_PRELOAD trick.

@ejona86 ejona86 closed this as completed Jan 11, 2022
@alexfeigin
Copy link

It sounds like people have confirmed my discoveries. I suggest users upgrade to newer Alpine versions, but if unable, then use the LD_PRELOAD trick.

Just FYI, latest alpine without LD_PRELOAD does not fix my issues.

I tried Dockerfile:

FROM alpine:3.15

ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

RUN wget --quiet https://cdn.azul.com/public_keys/alpine-signing@azul.com-5d5dc44c.rsa.pub -P /etc/apk/keys/ && \
    echo "https://repos.azul.com/zulu/alpine" >> /etc/apk/repositories && \
    apk --no-cache add zulu11-jdk gcompat

ENV JAVA_HOME=/usr/lib/jvm/zulu11-ca

Which is the latest alpine, with the latest azul alpine jdk. I run a jar that uses micrometer 1.8.0 which has a shaded netty with native calls to glibc.

adding ENV LD_PRELOAD=/lib/libgcompat.so.0 fixes my problem though.

I'm adding the head of my SIGSEGV crash so people researching will discover this thread:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000000000401e, pid=1, tid=8
#
# JRE version: OpenJDK Runtime Environment Zulu11.52+13-CA (11.0.13+8) (build 11.0.13+8-LTS)
# Java VM: OpenJDK 64-Bit Server VM Zulu11.52+13-CA (11.0.13+8-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  0x000000000000401e
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://www.azul.com/support/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  S U M M A R Y ------------

Command Line: server-1.0.0-SNAPSHOT.jar

Host: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz, 6 cores, 1G, Alpine Linux v3.15
Time: Tue Jan 18 11:07:13 2022 UTC elapsed time: 9.056303 seconds (0d 0h 0m 9s)

---------------  T H R E A D  ---------------

Current thread (0x00007f91b5f38800):  JavaThread "main" [_thread_in_native, id=8, stack(0x00007f91b5fc7000,0x00007f91b60c7ad8)]

Stack: [0x00007f91b5fc7000,0x00007f91b60c7ad8],  sp=0x00007f91b60bdb88,  free space=986k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  0x000000000000401e
C  [libio_micrometer_shaded_netty_transport_native_epoll_x86_6411463712870926256624.so+0xbd97]  netty_jni_util_JNI_OnLoad+0x67
C  [libjava.so+0xfc3a]  Java_java_lang_ClassLoader_00024NativeLibrary_load0+0xba
j  java.lang.ClassLoader$NativeLibrary.load0(Ljava/lang/String;Z)Z+0 java.base@11.0.13
...

@artemptushkin
Copy link

@alexfeigin we hope for the one after 3.15

@ejona86
Copy link
Member

ejona86 commented Jan 18, 2022

Given what I saw with grpc/grpc#27995, I expect that if the binary you execute is musl-based then gcompat wouldn't be used automatically. That's just a deficiency of how gcompat linker works. libc6-compat could have provided symbols, but not with its symlink-to-musl approach. So I guess LD_PRELOAD is with us long-term.

@Hello71
Copy link

Hello71 commented Apr 3, 2022

But maybe something is broken in the linker and it didn't get adjusted?

When RTLD_LAZY is passed to dlopen, and the library is not compiled with -Wl,-z,now, functions imported by the library are resolved when they are called. In this case, resolution of __strndup is delayed until it is actually called. This is true on both glibc and musl; however, on glibc, when a lazy function load fails, an informative message is printed, and on musl, the program just segfaults. Example:

$ cat test.sh
#!/bin/sh

gcc -fPIC -shared $1 -o libfunc.so -x c - << EOF
#include <stdio.h>
void bar(void);
void func(void) {
    puts("1");
    fflush(stdout);
    bar();
    puts("2");
    fflush(stdout);
}
EOF

gcc -o prog -x c - << EOF
#include <dlfcn.h>
#include <stdio.h>
int main() {
    void *lib = dlopen("libfunc.so", RTLD_LAZY);
    if (!lib) {
        fprintf(stderr, "dlopen: %s\n", dlerror());
        return 1;
    }
    void (*func)(void) = dlsym(lib, "func");
    if (!func) {
        fprintf(stderr, "dlsym: %s\n", dlerror());
        return 1;
    }
    func();
}
EOF

LD_LIBRARY_PATH=. ./prog
$ ./test.sh
1
./prog: symbol lookup error: ./libfunc.so: undefined symbol: bar
$ ./test.sh -Wl,-z,now
dlopen: ./libfunc.so: undefined symbol: bar
$ docker run -it --rm -v ./test.sh:/test.sh alpine:edge sh -c 'apk add gcc musl-dev && echo lazy: && ./test.sh -Wl,-z,lazy; echo now: && ./test.sh -Wl,-z,now'
fetch https://dl-cdn.alpinelinux.org/alpine/edge/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/edge/community/x86_64/APKINDEX.tar.gz
(1/11) Installing libgcc (11.2.1_git20220219-r1)
(2/11) Installing libstdc++ (11.2.1_git20220219-r1)
(3/11) Installing binutils (2.38-r1)
(4/11) Installing libgomp (11.2.1_git20220219-r1)
(5/11) Installing libatomic (11.2.1_git20220219-r1)
(6/11) Installing gmp (6.2.1-r1)
(7/11) Installing isl22 (0.22-r0)
(8/11) Installing mpfr4 (4.1.0-r0)
(9/11) Installing mpc1 (1.2.1-r0)
(10/11) Installing gcc (11.2.1_git20220219-r1)
(11/11) Installing musl-dev (1.2.2-r7)
Executing busybox-1.35.0-r6.trigger
OK: 119 MiB in 26 packages
lazy:
1
Segmentation fault (core dumped)
now:
dlopen: Error relocating ./libfunc.so: bar: symbol not found

Hello71 added a commit to Hello71/netty that referenced this issue Apr 4, 2022
- Move libraries to LIBS where they should be, avoiding need for
  -Wl,--no-as-needed.
- Use -O2 instead of -O3; there are no tight loops so -O3 just increases
  code size for no benefit.
- Add -pipe for faster compilation.
- Add -D_FORTIFY_SOURCE=2 and -Wl,-z,relro for security.
- Add -Wl,-z,now for security and to improve musl
  compatibility. musl does not implement __strdup and __strndup which
  old glibc aliases strdup and strndup to, but OpenJDK loads libraries
  with RTLD_LAZY, so this is not discovered until too late. See
  grpc/grpc-java#8751 (comment)
  for more information.
Hello71 added a commit to Hello71/netty that referenced this issue Apr 10, 2022
- Move libraries to LIBS where they should be, avoiding need for
  -Wl,--no-as-needed.
- Use -O2 instead of -O3; there are no tight loops so -O3 just increases
  code size for no benefit.
- Add -pipe for faster compilation.
- Add -D_FORTIFY_SOURCE=2 and -Wl,-z,relro for security.
- Add -Wl,-z,now for security and to improve musl
  compatibility. musl does not implement __strdup and __strndup which
  old glibc aliases strdup and strndup to, but OpenJDK loads libraries
  with RTLD_LAZY, so this is not discovered until too late. See
  grpc/grpc-java#8751 (comment)
  for more information.
- Add -ffunction-sections -fdata-sections -Wl,--gc-sections to reduce
  output size and avoid linking librt when not really needed (only used
  for kqueue)

"Fixes" netty#11701 by making the native library load fail.
@briceburg
Copy link

Relying on glibc-compat is not a safe or wise thing to do... and I can confirm that using -Dio.grpc.netty.shaded.io.netty.transport.noNative=true avoids the segfault. E.g.

# hello, I am an alpine 3.15
java -D-Dio.grpc.netty.shaded.io.netty.transport.noNative=true my_grpc_app.jar

adding the grpc-java package did not seem to help.

normanmaurer pushed a commit to netty/netty that referenced this issue Apr 19, 2022
Motivation:

Loading of the native epoll transport might segfault the JVM if musl is used and no glibc-compat is installed:


- Move libraries to LIBS where they should be, avoiding need for
  -Wl,--no-as-needed.
- Use -O2 instead of -O3; there are no tight loops so -O3 just increases
  code size for no benefit.
- Add -pipe for faster compilation.
- Add -D_FORTIFY_SOURCE=2 and -Wl,-z,relro for security.
- Add -Wl,-z,now for security and to improve musl
  compatibility. musl does not implement __strdup and __strndup which
  old glibc aliases strdup and strndup to, but OpenJDK loads libraries
  with RTLD_LAZY, so this is not discovered until too late. See
  grpc/grpc-java#8751 (comment)
  for more information.
- Add -ffunction-sections -fdata-sections -Wl,--gc-sections to reduce
  output size and avoid linking librt when not really needed (only used
  for kqueue)

Result: 
"Fixes" #11701 by making the native library load fail.
normanmaurer pushed a commit to netty/netty that referenced this issue Apr 19, 2022
Motivation:

Loading of the native epoll transport might segfault the JVM if musl is used and no glibc-compat is installed:

Modifications:

- Move libraries to LIBS where they should be, avoiding need for
  -Wl,--no-as-needed.
- Use -O2 instead of -O3; there are no tight loops so -O3 just increases
  code size for no benefit.
- Add -pipe for faster compilation.
- Add -D_FORTIFY_SOURCE=2 and -Wl,-z,relro for security.
- Add -Wl,-z,now for security and to improve musl
  compatibility. musl does not implement __strdup and __strndup which
  old glibc aliases strdup and strndup to, but OpenJDK loads libraries
  with RTLD_LAZY, so this is not discovered until too late. See
  grpc/grpc-java#8751 (comment)
  for more information.
- Add -ffunction-sections -fdata-sections -Wl,--gc-sections to reduce
  output size and avoid linking librt when not really needed (only used
  for kqueue)

Result:
"Fixes" #11701 by making the native library load fail.
normanmaurer added a commit to netty/netty that referenced this issue Apr 19, 2022
Motivation:

Loading of the native epoll transport might segfault the JVM if musl is used and no glibc-compat is installed:

Modifications:

- Move libraries to LIBS where they should be, avoiding need for
  -Wl,--no-as-needed.
- Use -O2 instead of -O3; there are no tight loops so -O3 just increases
  code size for no benefit.
- Add -pipe for faster compilation.
- Add -D_FORTIFY_SOURCE=2 and -Wl,-z,relro for security.
- Add -Wl,-z,now for security and to improve musl
  compatibility. musl does not implement __strdup and __strndup which
  old glibc aliases strdup and strndup to, but OpenJDK loads libraries
  with RTLD_LAZY, so this is not discovered until too late. See
  grpc/grpc-java#8751 (comment)
  for more information.
- Add -ffunction-sections -fdata-sections -Wl,--gc-sections to reduce
  output size and avoid linking librt when not really needed (only used
  for kqueue)

Result:
"Fixes" #11701 by making the native library load fail.

Co-authored-by: Alex Xu <351006+Hello71@users.noreply.github.com>
@varpa89
Copy link

varpa89 commented Apr 28, 2022

Strange situation.
This workaround fixes the problem

RUN apk add gcompat
ENV LD_PRELOAD=/lib/lib/libgcompat.so.0

But if I try to execute some command from java

final ProcessBuilder builder = new ProcessBuilder("ls");
builder.start();

I have an error

Caused by: java.io.IOException: Cannot run program "ls": error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?]

Without LD_PRELOAD builder.start does not throw any exceptions

streamsets-ci pushed a commit to streamsets/datacollector-docker that referenced this issue May 16, 2022
There is an issue with the latest alpine versions and io.grpc used by google libs (check grpc/grpc-java#8751)

Change-Id: I3b32e9963176725513d0e3ac99732e0cde796618
raidyue pushed a commit to raidyue/netty that referenced this issue Jul 8, 2022
Motivation:

Loading of the native epoll transport might segfault the JVM if musl is used and no glibc-compat is installed:

Modifications:

- Move libraries to LIBS where they should be, avoiding need for
  -Wl,--no-as-needed.
- Use -O2 instead of -O3; there are no tight loops so -O3 just increases
  code size for no benefit.
- Add -pipe for faster compilation.
- Add -D_FORTIFY_SOURCE=2 and -Wl,-z,relro for security.
- Add -Wl,-z,now for security and to improve musl
  compatibility. musl does not implement __strdup and __strndup which
  old glibc aliases strdup and strndup to, but OpenJDK loads libraries
  with RTLD_LAZY, so this is not discovered until too late. See
  grpc/grpc-java#8751 (comment)
  for more information.
- Add -ffunction-sections -fdata-sections -Wl,--gc-sections to reduce
  output size and avoid linking librt when not really needed (only used
  for kqueue)

Result:
"Fixes" netty#11701 by making the native library load fail.

Co-authored-by: Alex Xu <351006+Hello71@users.noreply.github.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants