Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] Fix compatibility issues for PIP-344 #23136

Merged

Conversation

poorbarcode
Copy link
Contributor

Motivation & Modifications

After PIP-344 add a public API pulsarClient.getPartitionsForTopic(String topic, boolean metadataAutoCreationEnabled), there are 3 compatibility issues:


1. Using DLQ( brokers’ version is <3.0.6, and clients’ version is >=3.0.6)

PIP-344 fixed a bug that the Retry/DLQ consumer will still trigger a Retry/DLQ topic with the old rule {namespace}/{subscription}-RETRY/DLQ, but the rule was changed to {namespace}/{topic}-{subscription}-RETRY/DLQ after 2.8.0. This behavior has been defined in the PIP-344, so it is fine.

Modifications for compatibility: modify the error message to let users know what happened.


2. Using Geo-Replication, the version of the local cluster and the remote cluster is not the same ( local clusters are >=3.0.6 , and remote clusters are <3.0.6 )

  1. This error will only occur when using Geo-Replication, and one version of the two clusters is larger or equal to 3.0.6 and "3.3.1" and another is smaller than 3.0.6 and 3.3.1.
  2. Reason why getting the not supported error when creating internal Replicator: The feature method above was supported at 3.0.6 and 3.3.1, before that the API "getPartitionedTopicMetadata" will trigger a creation for partitioned topic metadata automatically even if you just want to query it. So the brokers whose version is less than 3.0.6 and 3.3.1 do not support the new API.
  3. Modifications for compatibility: Skip the check of comparing of topic's partitions, and force connect to the non-partitioned topic, it may cause both the partitioned topic and non-partitioned topic to exist at the same time. But this is still better than the behavior before fix [fix][broker] Replication stuck when partitions count between two clusters is not the same #22983, without fix [fix] [broker] response not-found error if topic does not exist when calling getPartitionedTopicMetadata #22838, there is an issue that may cause replication stuck and topics to be created in confusion, see more details in [fix] [broker] response not-found error if topic does not exist when calling getPartitionedTopicMetadata #22838's motivation.

3. Using non-persistent topics, the versions of the brokers in the same cluster are different( some are less than 3.0.6 )

  1. Reason why getting the not supported error when creating a non-persistent consumer/producer: The feature method getPartitionedTopicMetadata(topic, false) was supported at 3.0.6 and 3.3.1, before that the API getPartitionedTopicMetadata(topic) will trigger a creation for partitioned topic metadata automatically even if you just want query it. So the brokers whose version is less than 3.0.6 and 3.3.1 do not support the new API.
  2. The conditions for this error occur: There are 2 brokers in a cluster, and the version is less than 3.0.1, rolling upgrade brokers to 3.0.6. After the first broker restarted, there was one broker with version 3.0.6 and another is 3.0.1, and when the internal client tries to call getPartitionedTopicMetadata(topic, false) to the broker with a lower version, it will get this error.
  3. Modifications for compatibility: Rollback to the original behavior before the fix [fix] [broker] response not-found error if topic does not exist when calling getPartitionedTopicMetadata #22838. Without fix [fix] [broker] response not-found error if topic does not exist when calling getPartitionedTopicMetadata #22838, there is an issue that may cause a non-partitioned non-persistent topic and a partitioned non-persistent topic with the same name to exist at the same time.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode added this to the 3.4.0 milestone Aug 7, 2024
@poorbarcode poorbarcode self-assigned this Aug 7, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 7, 2024
@poorbarcode
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

@codecov-commenter
Copy link

codecov-commenter commented Aug 7, 2024

Codecov Report

Attention: Patch coverage is 11.11111% with 8 lines in your changes missing coverage. Please review.

Project coverage is 74.52%. Comparing base (bbc6224) to head (5a72f51).
Report is 501 commits behind head on master.

Files Patch % Lines
...ache/pulsar/broker/namespace/NamespaceService.java 25.00% 2 Missing and 1 partial ⚠️
...rg/apache/pulsar/client/impl/PulsarClientImpl.java 0.00% 2 Missing and 1 partial ⚠️
...apache/pulsar/client/impl/ConsumerBuilderImpl.java 0.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23136      +/-   ##
============================================
+ Coverage     73.57%   74.52%   +0.94%     
- Complexity    32624    33566     +942     
============================================
  Files          1877     1919      +42     
  Lines        139502   144174    +4672     
  Branches      15299    15761     +462     
============================================
+ Hits         102638   107442    +4804     
+ Misses        28908    28499     -409     
- Partials       7956     8233     +277     
Flag Coverage Δ
inttests 27.73% <11.11%> (+3.14%) ⬆️
systests 24.74% <0.00%> (+0.41%) ⬆️
unittests 73.88% <11.11%> (+1.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...apache/pulsar/client/impl/ConsumerBuilderImpl.java 85.06% <0.00%> (-1.88%) ⬇️
...ache/pulsar/broker/namespace/NamespaceService.java 73.49% <25.00%> (+1.25%) ⬆️
...rg/apache/pulsar/client/impl/PulsarClientImpl.java 74.82% <0.00%> (+0.52%) ⬆️

... and 493 files with indirect coverage changes

Copy link
Contributor

@heesung-sn heesung-sn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I agree with this direction, have we discussed with the community about this backward compatibility resolutions?

@heesung-sn
Copy link
Contributor

It may cause both partitioned topic and non-partitioned topic to exist at the same time.

Can pulsar detect and fix this inconsistency automatically?

@poorbarcode
Copy link
Contributor Author

It may cause both partitioned topic and non-partitioned topic to exist at the same time.

Can pulsar detect and fix this inconsistency automatically?

After the producers/consumers that are connected to the non-partitioned are offline, every thing will be fine.

@poorbarcode poorbarcode requested a review from heesung-sn August 8, 2024 02:20
Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old behavior is not good and it looks like this change does not only resolve the compatibility issue. Instead, it is going to find a more "better" solution.

IMO, we can split it into two tasks.

  • Purely fix the compatibility issue. The logic should be simple and easy to understand, we only use the new API If the new API is supported by the broker.
  • Find a potential solution to continue to improve it.

So that we can make sure the both PIP-334 and this change will not break anything.

@heesung-sn heesung-sn self-requested a review August 8, 2024 20:13
@github-actions github-actions bot added the PIP label Aug 9, 2024
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@poorbarcode poorbarcode merged commit 702c0b3 into apache:master Aug 10, 2024
51 checks passed
poorbarcode added a commit that referenced this pull request Aug 10, 2024
Co-authored-by: Lari Hotari <lhotari@apache.org>
(cherry picked from commit 702c0b3)
poorbarcode added a commit that referenced this pull request Aug 10, 2024
Co-authored-by: Lari Hotari <lhotari@apache.org>
(cherry picked from commit 702c0b3)
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Aug 14, 2024
Co-authored-by: Lari Hotari <lhotari@apache.org>
(cherry picked from commit 702c0b3)
(cherry picked from commit 617c110)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Aug 20, 2024
Co-authored-by: Lari Hotari <lhotari@apache.org>
(cherry picked from commit 702c0b3)
(cherry picked from commit 617c110)
grssam pushed a commit to grssam/pulsar that referenced this pull request Sep 4, 2024
Co-authored-by: Lari Hotari <lhotari@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants