Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PubSub - 1.115.0 crashes JVM on JDK/JRE 17 #947

Closed
busches opened this issue Dec 7, 2021 · 19 comments
Closed

PubSub - 1.115.0 crashes JVM on JDK/JRE 17 #947

busches opened this issue Dec 7, 2021 · 19 comments
Assignees
Labels
api: pubsub Issues related to the googleapis/java-pubsub API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@busches
Copy link

busches commented Dec 7, 2021

Environment details

  1. OS type and version: Alpine 3.10-3.15
  2. Java version:
  • JRE version: OpenJDK Runtime Environment (17.0.1+12) (build 17.0.1+12-alpine-r0)
  • Java VM: OpenJDK 64-Bit Server VM (17.0.1+12-alpine-r0, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
  1. pubsub version(s): Starting with 1.115.0, 1.115.1, 1.115.2, 1.115.3, 1.115.4, 1.115.5

Steps to reproduce

  1. Upgrade from com.google.cloud:google-cloud-pubsub:1.114.7 to 1.115.0
  2. Try to use the library (or as seen here https://github.com/busches/gcp-pubsub-crash just start the app with it on the path)

Stack trace

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000003efe, pid=11333, tid=11339
#
# JRE version: OpenJDK Runtime Environment (17.0.1+12) (build 17.0.1+12-alpine-r0)
# Java VM: OpenJDK 64-Bit Server VM (17.0.1+12-alpine-r0, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  0x0000000000003efe
#
/src # ldd /tmp/libio_grpc_netty_shaded_netty_transport_native_epoll_x86_6415072523310110693670.so
        /lib/ld-musl-x86_64.so.1 (0x7f5c0dd4f000)
        librt.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f5c0dd4f000)
        libdl.so.2 => /lib/ld-musl-x86_64.so.1 (0x7f5c0dd4f000)
        libc.so.6 => /lib/ld-musl-x86_64.so.1 (0x7f5c0dd4f000)
Error relocating /tmp/libio_grpc_netty_shaded_netty_transport_native_epoll_x86_6415072523310110693670.so: __strdup: symbol not found
Error relocating /tmp/libio_grpc_netty_shaded_netty_transport_native_epoll_x86_6415072523310110693670.so: __strndup: symbol not found
/src # readelf -d /tmp/libio_grpc_netty_shaded_netty_transport_native_epoll_x86_6415072523310110693670.so

Dynamic section at offset 0xfd98 contains 23 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000e (SONAME)             Library soname: [libnetty_transport_native_epoll_x86_64-4.1.63.Final.so]
 0x000000000000000c (INIT)               0x3b70
 0x000000000000000d (FINI)               0xb618
 0x000000006ffffef5 (GNU_HASH)           0x1b8
 0x0000000000000005 (STRTAB)             0x938
 0x0000000000000006 (SYMTAB)             0x200
 0x000000000000000a (STRSZ)              885 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000003 (PLTGOT)             0x20ff88
 0x0000000000000002 (PLTRELSZ)           1488 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x35a0
 0x0000000000000007 (RELA)               0xdf8
 0x0000000000000008 (RELASZ)             10152 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0xd48
 0x000000006fffffff (VERNEEDNUM)         2
 0x000000006ffffff0 (VERSYM)             0xcae
 0x000000006ffffff9 (RELACOUNT)          415
 0x0000000000000000 (NULL)               0x0
@product-auto-label product-auto-label bot added the api: pubsub Issues related to the googleapis/java-pubsub API. label Dec 7, 2021
@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Dec 8, 2021
@meredithslota meredithslota added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Dec 14, 2021
@hannahrogers-google hannahrogers-google removed their assignment Jan 20, 2022
@busches
Copy link
Author

busches commented Jan 27, 2022

I was able to recreate this on 1.115.1 still. I was also able to recreate it on JDK 11.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000003efe, pid=3563, tid=3568
#
# JRE version: OpenJDK Runtime Environment (11.0.4+4) (build 11.0.4+4-alpine-r1)
# Java VM: OpenJDK 64-Bit Server VM (11.0.4+4-alpine-r1, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  0x0000000000003efe

@busches
Copy link
Author

busches commented Jan 28, 2022

To provide more detail, here's the code we're running:

    private var internalPublisher: Publisher?

    init {
        val channel = ManagedChannelBuilder.forTarget("localhost:7000").usePlaintext().build()
        val channelProvider = FixedTransportChannelProvider.create(GrpcTransportChannel.create(channel))
        val credentialsProvider = NoCredentialsProvider.create()

        val topicName = ProjectTopicName.of(project, topic)

        // GCP docs say to create this, but they don't use the variable for anything, having it changes nothing
        TopicAdminClient.create(
            TopicAdminSettings.newBuilder()
                .setTransportChannelProvider(channelProvider)
                .setCredentialsProvider(credentialsProvider)
                .build()
        )

        internalPublisher = Publisher.newBuilder(topicName)
            .setChannelProvider(channelProvider)
            .setCredentialsProvider(credentialsProvider)
            .build()
    }

Based off this example: https://cloud.google.com/pubsub/docs/emulator#accessing_environment_variables
localhost:7000 is the Emulator from google/cloud-sdk:370.0.0-emulators docker image.

And here's the crash log: hs_err_pid329.log

Again this happens only with 1.115.0 (or higher), doesn't happen with 1.114.7.
Happens on JDK 11 or 17
Happens on Alpine 3.10 and 3.15 (latest).

If this is an Alpine issue, I can report it there, but since the change is in versions of the PubSub library, it seems related to this.

Ironically, whatever code Spring is using to talk to the emulator in the application works fine, as we can still run the application, but our integration tests connecting to the emulator all fail with the core dump. It is also using 1.115.0 of the library.

@busches
Copy link
Author

busches commented Feb 9, 2022

Still broken with 1.115.3

@busches
Copy link
Author

busches commented Feb 11, 2022

Still broken with 1.115.4

@busches
Copy link
Author

busches commented Feb 15, 2022

Still broken with 1.115.5

@elefeint
Copy link

These issues may be related: netty/netty#11879, googleapis/java-secretmanager#692

@busches Do you happen to have a more detailed stack trace from the crash dump?

@busches
Copy link
Author

busches commented Feb 17, 2022

@elefeint if this isn't what you're after, can you send an example? hs_err_pid329.log There is no java stacktrace itself, just the JVM crash.

@elefeint
Copy link

Sorry, I missed the link when reading. This is it.

  1. Are you using the Libraries BOM to manage versions Pub/Sub versions? That would be the first thing to try rather than hardcoding the specific client library version.

  2. Your crash dump matches this issue exactly: Version 2.0.38 and 2.0.39 Crash on Alpine Linux netty/netty-tcnative#649. Switching to alpine-slim image resolved the issue for them.

@busches
Copy link
Author

busches commented Feb 18, 2022

@elefeint We're using the Spring BOM

extra["springCloudGcpVersion"] = "3.1.0"
extra["springCloudVersion"] = "2021.0.0"

dependencyManagement {
  imports {
    mavenBom("com.google.cloud:spring-cloud-gcp-dependencies:${property("springCloudGcpVersion")}")
    mavenBom("org.springframework.cloud:spring-cloud-dependencies:${property("springCloudVersion")}")
  }
}

Which then brings in the BOM you linked: https://github.com/GoogleCloudPlatform/spring-cloud-gcp/blob/main/spring-cloud-gcp-dependencies/pom.xml

We are not using the OpenJDK image, we're using Alpine as the base, so there is no Alpine Slim tag to swap to.

@busches
Copy link
Author

busches commented Feb 18, 2022

Here is a simple reproduction: https://github.com/busches/gcp-pubsub-crash
Run:

docker-compose up --build

And you will see it immediately crashes.

Adding: implementation(platform("com.google.cloud:libraries-bom:24.3.0")) does not change anything. There's only one version of netty-shaded and google dependencies included. 🤷

Gradle: io.grpc:grpc-alts:1.43.2
Gradle: io.grpc:grpc-api:1.43.2
Gradle: io.grpc:grpc-auth:1.43.2
Gradle: io.grpc:grpc-context:1.43.2
Gradle: io.grpc:grpc-core:1.43.2
Gradle: io.grpc:grpc-grpclb:1.43.2
Gradle: io.grpc:grpc-netty-shaded:1.43.2
Gradle: io.grpc:grpc-protobuf:1.43.2
Gradle: io.grpc:grpc-protobuf-lite:1.43.2
Gradle: io.grpc:grpc-services:1.43.2
Gradle: io.grpc:grpc-stub:1.43.2
Gradle: io.grpc:grpc-xds:1.43.2

@busches
Copy link
Author

busches commented Feb 18, 2022

@busches
Copy link
Author

busches commented Mar 7, 2022

Still present in 1.116.0

@KasperKiiskinen
Copy link

I think I'm running into same problem: adoptium/adoptium-support#465
error.txt

@elefeint
Copy link

grpc/grpc-java#8751 seems to be the same issue. Try the LD_PRELOAD workaround described here.

@ncopa
Copy link

ncopa commented Mar 17, 2022

0x0000000000000001 (NEEDED) Shared library: [libc.so.6]

The problem is that /tmp/libio_grpc_netty_shaded_netty_transport_native_epoll_x86_6415072523310110693670.so is linked to GNU libc (libc.so.6). It needs to be rebuidt with musl libc.

@ivangrujic09
Copy link

Still having issues on 1.120.0

@nishit130
Copy link

Still facing the same issue, Any solid solution?

@kamalaboulhosn
Copy link
Contributor

Based on grpc/grpc-java#8751, it looks like the long-term workaround remains the use of LD_PRELOAD. For follow-up on the issue, please use the grpc repo ticket.

@kimnamcham
Copy link

still crash with docker image: eclipse-temurin:17-jre-alpine

with dependcy:

com.google.cloud
spring-cloud-gcp-pubsub
4.5.1
compile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the googleapis/java-pubsub API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests