Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fix broken topic policy implementation compatibility with old pulsar version #22535

Merged
merged 2 commits into from
Apr 19, 2024

Conversation

rdhabalia
Copy link
Contributor

@rdhabalia rdhabalia commented Apr 18, 2024

Fixes #22534

Motivation

Modifications

Again we have a broken version compatibility change in Pulsar which is causing issues for systems which are running old Pulsar versions and upgrading them to >=2.10.
After upgrading from Pulsar-2.7 to >= Pulsar-2.10 broker, broker is not considering various broker configuration threshold and one of them is maxUnackedMessagesPerSubscription. We have broker configured with MaxUnackedMessagesPerSubscription=200000 and after sub reaches 200K unack messages, broker blocks the dispatching. It was working fine till broker-2.7 but it stopped working after upgrading to Pulsar-2.10 and higher version.

subscriptions" : {
    "UnackMsgConsumersSubscription" : {
      "msgRateOut" : 41.66667806597534,
      "msgThroughputOut" : 63924.01748854578,
      "bytesOutCounter" : 9813637567183,
      "msgOutCounter" : 6346168888,
      "msgRateRedeliver" : 1825.0005374626583,
      "chuckedMessageRate" : 0,
      "msgBacklog" : 63122123,
      "backlogSize" : 0,
      "msgBacklogNoDelayed" : 63122123,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "unackedMessages" : 614260,
      "type" : "Shared",
      "msgRateExpired" : 0.0,
      "totalMsgExpired" : 0,
      "lastExpireTimestamp" : 0,
      "lastConsumedFlowTimestamp" : 1712610077925,
      "lastConsumedTimestamp" : 1712610077938,
      "lastAckedTimestamp" : 1712610088862,
      "lastMarkDeleteAdvancedTimestamp" : 1712609791878,
      "consumers" : [ {
        "msgRateOut" : 41.66667806597534,
        "msgThroughputOut" : 63924.01748854578,
        "bytesOutCounter" : 5106838658,
        "msgOutCounter" : 3324500,
        "msgRateRedeliver" : 1825.0005374626583,
        "chuckedMessageRate" : 0.0,
        "consumerName" : "0f455",
        "availablePermits" : 0,
        "unackedMessages" : 614260,
        "avgMessagesPerEntry" : 1,
        "blockedConsumerOnUnackedMsgs" : false,
        "lastAckedTimestamp" : 1712610088862,
        "lastConsumedTimestamp" : 1712610077938,
        "metadata" : { },
        "connectedSince" : "2024-04-08T20:52:27.525995Z",

It's happening because Pulsar considers disabling behavior of various threshold values if they are less than 0 and broker disregards those configurations. However, with TopicPolicy implementation, we made a broken change and those values were assumed disabled with only NULL value.
all the namespaces created in earlier version have default disabled value as -1 in namespace metadata and TopicPolicyImplementaion overrides over broker configuration values because those are NON-NULL value and because of that TopicPolicy implementation broke the compatibility and it stopped enforcing various broker threshold configurations.

This PR, makes sure that those old configurations which can be disabled with value < 0, must be normalized with TopicPolicyImplementation and they don't override Broker configuration.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@rdhabalia rdhabalia added type/bug The PR fixed a bug or issue reported a bug area/broker doc-not-needed Your PR changes do not impact docs ready-to-test labels Apr 18, 2024
@rdhabalia rdhabalia added this to the 3.3.0 milestone Apr 18, 2024
@rdhabalia rdhabalia self-assigned this Apr 18, 2024
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@merlimat
Copy link
Contributor

@rdhabalia There seems to be some mocked test issue:

  Error:    PulsarStandaloneTest.testMetadataInitialization:121 » ServerSideError 
   --- An unexpected error occurred in the server ---
  
  Message: Cannot invoke "java.lang.Integer.intValue()" because the return value of "org.apache.pulsar.common.policies.data.PolicyHierarchyValue.get()" is null
  
  Stacktrace:
  
  java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "org.apache.pulsar.common.policies.data.PolicyHierarchyValue.get()" is null
  	at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplication(PersistentTopic.java:1722)
  	at org.apache.pulsar.broker.service.BrokerService$2.lambda$openLedgerComplete$1(BrokerService.java:1718)
  	at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
  	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
  	at java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:614)
  	at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:726)
  	at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
  	at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
  	at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113)
  	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
  	at java.base/java.lang.Thread.run(Thread.java:840)

@rdhabalia
Copy link
Contributor Author

@merlimat @lhotari I have changed one of the test behavior in the last commit.

With previous broker version: namespace policy (eg: maxUnackMessagesPerSubscription) default value in metadata was <= 0 . It means if broker sees maxUnackMessagesPerSubscription=-1 in the namespace policy metadata then broker was falling back to broker's configuration which is 200K.
However, after TopicPolicy Implementation, it broke the expectation and if it sees policy value <=0, then it won't restrict. and that was also implemented as a part of the test AdminApiTest2::testMaxConsumersPerTopicUnlimited which I had to change. can you please take a look change and we can merge the PR if there are no concerns.

@rdhabalia
Copy link
Contributor Author

I found many tests which are having above behavior where if namespace-policy value is 0 then it means broker should override 0 value over broker-config and disable all restrictions.
Therefore, as a common ground, I am making change to normalize -1 value for namespace-policy as fallback to broker-config. and 0 means override namespace-config value and disable all restrictions. that will not require to make any test changes and will mostly address old version compatibility for at least few configurations: max_unacked_messages_per_consumer, max_unacked_messages_per_subscription because these config had default value -1 where as below config will be still impacted as earlier version had their default value 0
max_producers_per_topic,max_consumers_per_topic,max_consumers_per_subscription

Old Release policies
https://github.com/apache/pulsar/blob/branch-2.6/pulsar-common/src/main/java/org/apache/pulsar/common/policies/data/Policies.java

So, Users running old Pulsar release must update namespace policy manually with -1 for below policies before upgrading version to avoid breaking policy changes
max_producers_per_topic,max_consumers_per_topic,max_consumers_per_subscription

So, with current PR, it will not break current new behavior and will still address the default behavior for policies max_unacked_messages_per_consumer, max_unacked_messages_per_subscription.

@rdhabalia rdhabalia merged commit 59daac6 into apache:master Apr 19, 2024
49 of 50 checks passed
@rdhabalia rdhabalia deleted the dispatch_block branch April 19, 2024 17:30
lhotari pushed a commit that referenced this pull request Apr 22, 2024
lhotari pushed a commit that referenced this pull request Apr 22, 2024
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 23, 2024
…th old pulsar version (apache#22535)

(cherry picked from commit 59daac6)
(cherry picked from commit 8439082)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 23, 2024
…th old pulsar version (apache#22535)

(cherry picked from commit 59daac6)
(cherry picked from commit 8439082)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Broker is not blocking the dispatch even after reaching max unack-message limit per subscription
4 participants