Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fix currently client retries until operation timeout if the topic does not exist #23530

Conversation

BewareMyPower
Copy link
Contributor

@BewareMyPower BewareMyPower commented Oct 30, 2024

Motivation

#23291 introduces a regression that when the authentication is enabled, if the topic does not exist and cannot be automatically created (e.g. namespace does not exist), broker will respond with a ServerError.MetadataError that is converted to a trivial PulsarClientException, which is not retriable. In this case, the client will loop in sending the PartitionMetadata request and failing until the operation timeout happens (30 seconds by default)

Besides, PulsarClientImpl#checkPartitions is incorrect. Even if forceNoPartitioned is false, checkPartitions.complete(0) will still be called.

            if (forceNoPartitioned && actEx instanceof PulsarClientException.NotFoundException
                    || actEx instanceof PulsarClientException.TopicDoesNotExistException
                    || actEx instanceof PulsarAdminException.NotFoundException) {
                checkPartitions.complete(0);

Because a && b || c || d is true when a and b are false while c is true. Without this fix, the producer creation on a topic whose namespace does not exist will fail with AuthorizationError because it will go to the topic lookup phase.

This is not an expected result because the case that the error is not retriable and the timeout error is confusing to users.

Modifications

  • Return ServerError.TopicNotFound when isTopicOperationAllowed fails with HTTP 404 error.
  • Fix the incorrect check in PulsarClientImpl#checkPartitions.
  • Add testTopicNotFound to cover the fixes above.
  • Fix the testNamespaceNotExist

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@BewareMyPower BewareMyPower self-assigned this Oct 30, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Oct 30, 2024
@BewareMyPower BewareMyPower changed the title [fix][broker] Fix client will retry for operation timeout if the topic does not exist [fix][broker] Fix currently client retries until operation timeout if the topic does not exist Oct 30, 2024
@codecov-commenter
Copy link

codecov-commenter commented Oct 31, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 74.36%. Comparing base (bbc6224) to head (280ce90).
Report is 704 commits behind head on master.

Files with missing lines Patch % Lines
...rg/apache/pulsar/client/impl/PulsarClientImpl.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23530      +/-   ##
============================================
+ Coverage     73.57%   74.36%   +0.78%     
- Complexity    32624    34427    +1803     
============================================
  Files          1877     1943      +66     
  Lines        139502   147045    +7543     
  Branches      15299    16205     +906     
============================================
+ Hits         102638   109346    +6708     
- Misses        28908    29274     +366     
- Partials       7956     8425     +469     
Flag Coverage Δ
inttests 27.56% <0.00%> (+2.98%) ⬆️
systests 24.40% <0.00%> (+0.08%) ⬆️
unittests 73.73% <50.00%> (+0.88%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...va/org/apache/pulsar/broker/service/ServerCnx.java 73.20% <100.00%> (+1.06%) ⬆️
...rg/apache/pulsar/client/impl/PulsarClientImpl.java 75.20% <0.00%> (+0.90%) ⬆️

... and 646 files with indirect coverage changes

@BewareMyPower BewareMyPower merged commit 8eeb0e2 into apache:master Nov 6, 2024
74 of 78 checks passed
@BewareMyPower BewareMyPower deleted the bewaremypower/fix-ns-not-found-authorization-error branch November 6, 2024 09:01
lhotari pushed a commit that referenced this pull request Nov 13, 2024
… the topic does not exist (#23530)

(cherry picked from commit 8eeb0e2)
lhotari pushed a commit that referenced this pull request Nov 13, 2024
… the topic does not exist (#23530)

(cherry picked from commit 8eeb0e2)
@lhotari
Copy link
Member

lhotari commented Nov 13, 2024

@BewareMyPower There's a problem in cherry-picking to branch-3.3 and branch-3.0 since TokenAuthenticatedProducerConsumerTest fails.

reproducing issue:

mvn -DredirectTestOutputToFile=false -DtestRetryCount=0 test -pl pulsar-broker -Dtest=TokenAuthenticatedProducerConsumerTest
[ERROR] Tests run: 13, Failures: 1, Errors: 0, Skipped: 10, Time elapsed: 14.379 s <<< FAILURE! - in org.apache.pulsar.client.api.TokenAuthenticatedProducerConsumerTest
[ERROR] org.apache.pulsar.client.api.TokenAuthenticatedProducerConsumerTest.testTopicNotFound[true, false](4)  Time elapsed: 2.08 s  <<< FAILURE!
java.lang.AssertionError: expected [true] but found [false]
	at org.testng.Assert.fail(Assert.java:110)
	at org.testng.Assert.failNotEquals(Assert.java:1577)
	at org.testng.Assert.assertTrue(Assert.java:56)
	at org.testng.Assert.assertTrue(Assert.java:66)
	at org.apache.pulsar.client.api.TokenAuthenticatedProducerConsumerTest.testTopicNotFound(TokenAuthenticatedProducerConsumerTest.java:211)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
	at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

It seems to be some sort of isolation problem since the test passes when the single method testTopicNotFound is run

mvn -DredirectTestOutputToFile=false -DtestRetryCount=0 test -pl pulsar-broker -Dtest=TokenAuthenticatedProducerConsumerTest#testTopicNotFound

@BewareMyPower do you have a chance to fix the test in branch-3.3 and also cherry-pick this PR to branch-3.0 after addressing the problem? thanks

@BewareMyPower
Copy link
Contributor Author

Okay, I will take a look tomorrow.

@BewareMyPower
Copy link
Contributor Author

@lhotari It's weird that this test should also fail with the master branch. But the CI is not broken

lhotari pushed a commit that referenced this pull request Nov 18, 2024
… the topic does not exist (#23530)

(cherry picked from commit 8eeb0e2)
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 20, 2024
… the topic does not exist (apache#23530)

(cherry picked from commit 8eeb0e2)
(cherry picked from commit c676b95)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 21, 2024
… the topic does not exist (apache#23530)

(cherry picked from commit 8eeb0e2)
(cherry picked from commit c676b95)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants