-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpc-java libio_grpc_netty_shaded_netty_transport_native_epoll_x86_64 jvm crash after update to 1.45.1 #9083
Comments
This seems likely caused by tcnative or epoll, as those are both JNI components so can cause crashes when things go awry. It looks like your tests do use Netty; I see InProcessOrAlternativeChannelFactory creating a Netty channel. It is unlikely to be tcnative because I don't see it in the logs and you'd most likely use plaintext in a test. Importantly, you are using grpc-netty-shaded and I don't see any other Netty usages (at least no other binary components), which removes multiple classes of potential problems. Nothing jumps out immediately, but I've only glanced. I'll need to look deeper tomorrow. |
This is the test case that fails on Linux but not on Windows. I stripped the assert sections ... For now I added @DisabledOnOs(value={OS.LINUX}). I used plaintext for the junit. @ActiveProfiles("test")
@SpringBootTest(properties = {
"grpc.server.inProcessName=test",
"grpc.server.port=-1",
"grpc.client.inProcess.address=in-process:test"
})
@ExtendWith(SpringExtension.class)
@TestPropertySource(locations={"classpath:application-test.yml"})
@ContextConfiguration(initializers={ConfigDataApplicationContextInitializer.class},
classes = {CacheManager.class, GrpcClientAutoConfiguration.class, GrpcChannelConfigurer.class,
GrpcChannelFactory.class}
)
@EnableConfigurationProperties(value = {GrpcConfigurer.class, ConnectorProperties.class})
@DirtiesContext
@DisabledOnOs(value={OS.LINUX})
@SuppressWarnings({"PMD.CommentDefaultAccessModifier", "PMD.DefaultPackage",
"PMD.UnusedPrivateField", "PMD.UnnecessaryAnnotationValueElement"})
class GrpcConfigurerITTest {
private static final String SINGLE_SERVICE_DOUBLE_METHOD = "single-service-double-method";
private static final String SINGLE_SERVICE_SINGLE_METHOD = "single-service-single-method";
private static final String SINGLE_SERVICE_EMPTY_METHOD = "single-service-empty-method";
@GrpcClient(SINGLE_SERVICE_DOUBLE_METHOD)
private ConnectorServiceGrpc.ConnectorServiceBlockingStub connectorServiceBlockingStub;
@GrpcClient(SINGLE_SERVICE_SINGLE_METHOD)
private ConnectorServiceGrpc.ConnectorServiceBlockingStub connectorServiceBlockingStubTwo;
@GrpcClient(SINGLE_SERVICE_EMPTY_METHOD)
private ConnectorServiceGrpc.ConnectorServiceBlockingStub connectorServiceBlockingStubThree;
......
} |
I have looked at runtime to see if the so libio_grpc_netty_shaded_netty_transport_native_epoll_x86 does appear and it looks like it does. But my runtime is an ubuntu image. 7f06b1800000-7f06b1810000 r-xp 00000000 fd:00 655529 /tmp/libio_grpc_netty_shaded_netty_transport_native_epoll_x86_645818328392020533487.so (deleted) I used that ubuntu image to build & the junit passes ok. I noticed on the alpine that: Thus I think the issue is with the alpine image. |
My issue is exactly the same as: #8751 grpc netty version 1.40 was running ok under musl but at least from 1.42 that compatibility has been lost. |
Ah, yeah, musl is being used. So you use musl during your test exception, but glibc for production? Note there was a solution/workaround for the musl issue: #8751 (comment) |
I actually have two runtimes env's: one using ubuntu which has no issue and one using alpine. The build is done on an alpine img for both environments. I confused between both for few moments. Since version 1.40.1 was working for musl ... it would be 'nice' to keep that backward compatibility ... |
Well, we never officially supported musl to begin with. It used to just be broken and there were some community repositories that would build musl .so's. And then musl did some glibc compat stuff that made it work. And we were okay with that. And then glibc upgrades broke it again (although there is the environment variable workaround). Given what I learned about the musl linker in #8751, personally I think it is now dead to me (personally speaking); that was too time consuming to figure out the cause of a mundane issue. I saw the email notification where you mention |
I guess also I learned that the libc-compat glue in Alpine just won't help us, because it only helps if the main binary was built for glibc but in these cases people are trying to use a native-musl java with a native-glibc .so and that combination doesn't work with the compat stuff automatically. |
I can confirm that using |
I'm not sure about '-Dio.grpc.netty.shaded.io.netty.transport.noNative=true' - as you noticed I have deleted from my comments regarding this setting. The section of code where the crash occurs is related to a static code in java so I'm not sure that setting will work. Initially due to my mix environment I taught that it worked but after retries I realized that it does not actually. So I'll have to retry it myself once more & make sure that I use alpine ): io/grpc/netty/Utils.java where:
So how can I disable the Epoll and failback to NioSocketChannel ? Do I miss something ? |
That's not an option with grpc-netty-shaded. Epoll is included directly in that artifact. You would need to swap to using grpc-netty instead. |
OK - thank you. |
Something does not add-up. Practically I can do:
In the grpc-java using GrpcChannelConfigurer. That should allow me to avoid having the Utils static block kick-in which forces the EPoll. So even tough I should have a choice between Nio/Epoll, actually I do not have. That sounds like a defect to me. no ?!? |
I agreed with everything up unto this part. It seems you forgot to say what broke? |
Well, in alpine, even if I set the channelType(NioSocketChannel.class) somehow the Utils class with it's static blocks triggers a vm crash because Epoll is not available. So, to me, it looks like I cannot by-pass the Epoll even tough I set Nio. Do I miss something ? |
What is the point to have ".channelType(NioSocketChannel.class)" ?!? If I cannot set it ... |
I do not think the NettyChannelBuilder is using properly the builder pattern. The default Epoll/Nio event pool groups should be created ONLY if no one has been executed a channelType method. They should not come by default loaded in memory. So let's say in ubuntu, I will get by default the EPoll and then if I want to use Nio, I will end up having two classes Epoll & Nio. I do not think I'm wrong. |
I second @lmcdasi 's observation. NettyChannelBuilder statically refers to Utils class. Utils class has a static initialized calling isEpollAvailable() method. This method loads Epoll class (io.netty.channel.epoll.Epoll) through reflection and invokes isAvailable() method. This triggers the static initializer in Epoll that attempts to load the native library. And this is where the JVM crashes as the library crashes. There is not a single flag or system property allowing to break this chain. Or to provide a configurable name for that Epoll class. Or provide a factory. This chain is static. |
I see. There is a choice between Nio and Epoll threads and polling, but there isn't a way to avoid Epoll native library loading, except using grpc-netty instead of grpc-netty-shaded. It would be possible to delay initialization of the builder fields to avoid the epoll initialization, but it doesn't seem to buy us too much as it would require all Alpine users to manage their own loops, lest they get a runtime crash. A crash is too horrible of a failure mode. I think netty/netty#12272 will be the real fix here, as it does |
Seems like there's nothing more to do here. There will be a netty release containing the -z,now change and gRPC will upgrade to it in normal course. If there's something remaining, comment, and it can be reopened. |
@ejona86 could you clarify please, will it be just an exception instead of jvm crash or we could work without glibc (but with musl)? |
With the change to Netty, gRPC will fallback to other options. If you are running on OpenJDK 8u252 or later, then it will still work. Although performance may be lower compared to a glibc system, especially on Java 8. |
@ejona86 when can we expect a new grpc release with updated netty please? |
Right now we are blocked from upgrading at least because of netty/netty-tcnative#716 . There was some java module stuff as well that we noticed, but I don't recall if that has already been fixed. #9027 is where we were going through iterations trying to upgrade. |
I'm having a weird issue when I execute an junit integration test after I upgrade grpc versions. It is getting a JVM crash on Linux only, while executing the same test case on Windows poses no issue. And I do not understand the reason.
Theoretically, I would expect the same behavior. It is the same mvnw clean verify -Djacoco.skip=false cmd on both OS'es.
Attached is the hs file.
At runtime it does not seem to pose any issue on a Linux OS. The junit is using 'grpc inProcess for both client/server'.
When using gpc version 1.40.1, my test passes successfully
After upgrade:
hs_err_pid1033.log
Any ideas of what is wrong ?
The text was updated successfully, but these errors were encountered: