Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][client] TransactionCoordinatorClient support retry #23081

Merged
merged 4 commits into from
Jul 30, 2024

Conversation

chenhongSZ
Copy link
Contributor

@chenhongSZ chenhongSZ commented Jul 26, 2024

Motivation

we have two brokerUrls(one is dead) in PulsarClient serviceUrl, it is working well without enabling transactions.

            # broker test.pulsar.com:6651 is dead
            PulsarClient client = PulsarClient.builder()
                    //.enableTransaction(true)
                    .serviceUrl("pulsar://test.pulsar.com:6650,test.pulsar.com:6651")
                    .build();

If enabling transactions

            # broker test.pulsar.com:6651 is dead
            PulsarClient client = PulsarClient.builder()
                    .enableTransaction(true)
                    .serviceUrl("pulsar://test.pulsar.com:6650,test.pulsar.com:6651")
                    .build();

the following error occurs and the PulsarClient init fails, It will not retry the other brokerUrl

2024-07-27T00:52:27,787 - ERROR - [pulsar-client-io-1-1:TransactionMetaStoreHandler@120] - Transaction meta handler with transaction coordinator id 14 connection failed.
org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: test.pulsar.com/127.0.0.1:6651
	at org.apache.pulsar.client.impl.ConnectionPool.lambda$createConnection$14(ConnectionPool.java:311) ~[classes/:?]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: test.pulsar.com/127.0.0.1:6651
	at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1097) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
	at org.apache.pulsar.common.util.netty.ChannelFutures.lambda$toCompletableFuture$0(ChannelFutures.java:58) ~[classes/:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118) ~[netty-common-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:326) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:342) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: test.pulsar.com/127.0.0.1:6651
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
	at sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.108.Final.jar:4.1.108.Final]
	... 4 more

The root cause is that the TransactionCoordinatorClient doesn't support retry.

Modifications

support retry in TransactionCoordinatorClient.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jul 26, 2024
@lhotari
Copy link
Member

lhotari commented Jul 29, 2024

Great work @chenhongSZ! Would it be possible to add a test case?

@lhotari lhotari added this to the 3.4.0 milestone Jul 29, 2024
@chenhongSZ
Copy link
Contributor Author

Great work @chenhongSZ! Would it be possible to add a test case?

This pr contains correlation test cases in TransactionCoordinatorClientTest.java

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@aloyszhang aloyszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenhongSZ
Copy link
Contributor Author

/pulsarbot run-failure-checks

@codecov-commenter
Copy link

codecov-commenter commented Jul 29, 2024

Codecov Report

Attention: Patch coverage is 31.25000% with 11 lines in your changes missing coverage. Please review.

Project coverage is 73.41%. Comparing base (bbc6224) to head (c6b3c47).
Report is 603 commits behind head on master.

Files with missing lines Patch % Lines
...ulsar/client/impl/TransactionMetaStoreHandler.java 21.42% 8 Missing and 3 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23081      +/-   ##
============================================
- Coverage     73.57%   73.41%   -0.16%     
- Complexity    32624    33206     +582     
============================================
  Files          1877     1917      +40     
  Lines        139502   144077    +4575     
  Branches      15299    15744     +445     
============================================
+ Hits         102638   105774    +3136     
- Misses        28908    30179    +1271     
- Partials       7956     8124     +168     
Flag Coverage Δ
inttests 27.44% <25.00%> (+2.86%) ⬆️
systests 24.78% <0.00%> (+0.45%) ⬆️
unittests 72.47% <31.25%> (-0.37%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../transaction/TransactionCoordinatorClientImpl.java 66.07% <100.00%> (+0.30%) ⬆️
...ulsar/client/impl/TransactionMetaStoreHandler.java 67.04% <21.42%> (-0.86%) ⬇️

... and 510 files with indirect coverage changes

@aloyszhang aloyszhang merged commit 6bbaec1 into apache:master Jul 30, 2024
61 checks passed
lhotari pushed a commit that referenced this pull request Aug 6, 2024
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Aug 8, 2024
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Aug 12, 2024
lhotari pushed a commit that referenced this pull request Aug 16, 2024
@chenhongSZ chenhongSZ deleted the tc-support-retry branch September 13, 2024 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants