Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spring GCP subscribers will softlock against large queues with "Exactly Once" delivery enabled. #2491

Closed
chases2 opened this issue Dec 20, 2023 · 6 comments
Labels
priority: p3 pubsub type: bug Something isn't working

Comments

@chases2
Copy link

chases2 commented Dec 20, 2023

Describe the bug
If you boot a GCP PubSub consumer under these circumstances...

  • There are a lot (500 - 10,000) of messages already on the queue
  • "Exactly Once" delivery is enabled

... then the consumer will fail to consume with no errors, potentially for hours.

Exact versions are in the pom.xml of the example below.

Sample
A sample project with installation instructions can be found here: https://github.com/chases2/ahab-sturdy-pancake

@meltsufin
Copy link
Member

Do you feel this is caused by something in Spring Cloud GCP, or in the underlying Pub/Sub client library, or the Pub/Sub service itself?

@chases2
Copy link
Author

chases2 commented Dec 21, 2023

I suspect the Spring GCP libraries, but I don't know for certain yet. I'm planning on adding a Java subscriber that doesn't use any of these libraries to the example project.

@zhumin8 zhumin8 added type: bug Something isn't working pubsub priority: p3 labels Dec 27, 2023
@chases2
Copy link
Author

chases2 commented Jan 8, 2024

I've also been able to find the following error in the example code when it does fail. It will usually run for long hours at a time and not process messages, but will occasionally throw the following exception and restart:

textPayload: "2024-01-07T13:50:12.639Z ERROR 1 --- [bscriber-SE-1-1] c.g.c.util.concurrent.AbstractFuture     : RuntimeException while executing runnable CallbackListener{com.google.api.core.ApiFutures$1@2c27f549} with executor MoreExecutors.directExecutor()"

"java.util.ConcurrentModificationException: null
        at java.base/java.util.LinkedHashMap$LinkedHashIterator.nextNode(Unknown Source) ~[na:na]
        at java.base/java.util.LinkedHashMap$LinkedEntryIterator.next(Unknown Source) ~[na:na]
        at java.base/java.util.LinkedHashMap$LinkedEntryIterator.next(Unknown Source) ~[na:na]
        at com.google.cloud.pubsub.v1.MessageDispatcher.notifyAckSuccess(MessageDispatcher.java:423) ~[google-cloud-pubsub-1.125.8.jar:1.125.8]
        at com.google.cloud.pubsub.v1.StreamingSubscriberConnection$2.onSuccess(StreamingSubscriberConnection.java:530) ~[google-cloud-pubsub-1.125.8.jar:1.125.8]
        at com.google.cloud.pubsub.v1.StreamingSubscriberConnection$2.onSuccess(StreamingSubscriberConnection.java:523) ~[google-cloud-pubsub-1.125.8.jar:1.125.8]
        at com.google.api.core.ApiFutures$1.onSuccess(ApiFutures.java:89) ~[api-common-2.19.0.jar:2.19.0]
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1138) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1270) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1038) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:783) ~[guava-31.1-jre.jar:na]
        at com.google.api.gax.retrying.BasicRetryingFuture.handleAttempt(BasicRetryingFuture.java:203) ~[gax-2.36.0.jar:2.36.0]
        at com.google.api.gax.retrying.CallbackChainRetryingFuture$AttemptCompletionListener.handle(CallbackChainRetryingFuture.java:135) ~[gax-2.36.0.jar:2.36.0]
        at com.google.api.gax.retrying.CallbackChainRetryingFuture$AttemptCompletionListener.run(CallbackChainRetryingFuture.java:115) ~[gax-2.36.0.jar:2.36.0]
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1270) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1038) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:783) ~[guava-31.1-jre.jar:na]
        at com.google.api.core.AbstractApiFuture$InternalSettableFuture.set(AbstractApiFuture.java:87) ~[api-common-2.19.0.jar:2.19.0]
        at com.google.api.core.AbstractApiFuture.set(AbstractApiFuture.java:70) ~[api-common-2.19.0.jar:2.19.0]
        at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onSuccess(GrpcExceptionCallable.java:88) ~[gax-grpc-2.36.0.jar:2.36.0]
        at com.google.api.core.ApiFutures$1.onSuccess(ApiFutures.java:89) ~[api-common-2.19.0.jar:2.19.0]
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1138) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1270) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1038) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:783) ~[guava-31.1-jre.jar:na]
        at io.grpc.stub.ClientCalls$GrpcFuture.set(ClientCalls.java:563) ~[grpc-stub-1.58.0.jar:1.58.0]
        at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:536) ~[grpc-stub-1.58.0.jar:1.58.0]
        at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[grpc-api-1.58.0.jar:1.58.0]
        at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[grpc-api-1.58.0.jar:1.58.0]
        at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[grpc-api-1.58.0.jar:1.58.0]
        at com.google.api.gax.grpc.ChannelPool$ReleasingClientCall$1.onClose(ChannelPool.java:570) ~[gax-grpc-2.36.0.jar:2.36.0]
        at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:567) ~[grpc-core-1.58.0.jar:1.58.0]
        at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:71) ~[grpc-core-1.58.0.jar:1.58.0]
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:735) ~[grpc-core-1.58.0.jar:1.58.0]
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:716) ~[grpc-core-1.58.0.jar:1.58.0]
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.58.0.jar:1.58.0]
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[grpc-core-1.58.0.jar:1.58.0]
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[na:na]
        at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]"
timestamp: "2024-01-07T13:50:12.651112623Z"

These all seem to be in the GCP library or its dependencies (grpc, gax), but it's not clear to me if this is a bug in those dependencies or in the way that Spring GCP calls these dependencies. What do you think?

@burkedavison
Copy link
Member

This looks like an issue that has been resolved in google-cloud-pubsub 1.125.12, which is available in libraries-bom 26.28.0, which is available in Spring Cloud GCP 3.7.5, 4.9.0, and 5.0.0.

Could you verify which version you are using, and update if prior to one of the listed releases?

@chases2
Copy link
Author

chases2 commented Jan 18, 2024

As indicated in the original bug, the bug is reproduced with these versions:

<java.version>17</java.version>
<spring-cloud-gcp.version>4.8.3</spring-cloud-gcp.version>
<spring-cloud.version>2022.0.4</spring-cloud.version>

Updating to these versions with the exact same code make the issue not reproducible anymore:

<java.version>21</java.version>
<spring-cloud-gcp.version>5.0.0</spring-cloud-gcp.version>
<spring-cloud.version>2023.0.0</spring-cloud.version>

I'll be updating the demonstration code with the known fix(es).

@chases2
Copy link
Author

chases2 commented Jan 18, 2024

Demonstration code and fixes: https://github.com/chases2/ahab-sturdy-pancake

@chases2 chases2 closed this as completed Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p3 pubsub type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants