Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] fix how ns-isolation-policy API works for replicated namespaces #23094

Closed
wants to merge 9 commits into from

Conversation

iosdev747
Copy link
Contributor

Fixes #23092

Motivation

  1. Problem is namespace unload operation happens on all the namespaces that matches the policy regex. Check should be done only on namespaces in current cluster.
  2. When there are too many namespaces in a cluster and adding a single namespace to the ns isolation policy data leads to too many bundle unload calls. This can exhaust broker resources (open connections). This should be controlled by a flag if bundle unload can be done post setPolicy call.

Modifications

  1. Fix namespace filter so only namespaces in current cluster should be used for unload operation.
  2. Adding a unload flag in set namespaceIsolationPolicies API so that applying a isolation policy on live cluster doesn't lead to too many unload calls.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change added tests and can be verified as follows:

  • Multi cluster setup where namespace is not present in one cluster but is there in other.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: iosdev747#1

@github-actions github-actions bot added the doc-required Your PR changes impact docs and you will update later. label Jul 29, 2024
.filter(namespaceName -> adminClient.namespaces()
.getPoliciesAsync(namespaceName)
.thenApply(policies -> policies.replication_clusters.contains(cluster))
.join())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a blocking operation. it would be better to make it asynchronous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the filter to work, I need to call join to wait for the future completion. The reason for doing it this way is that policy is required to decide whether to remove the namespace or not. Can you suggest how I can make it async? Nothing better comes to my mind that doesn't need too much code refactoring...

Copy link
Member

@lhotari lhotari Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you suggest how I can make it async?

There's a real problem already with the existing code, even without unloading calls. I've explained some of that in the previous comment in #23094 (comment) .
One of the problems is that all tenants and namespaces will be listed concurrently at once, without any concurrency limits. That alone will cause problems.

To fix the problem, the solution for making asynchronous calls will need concurrency limits. I'd suggest introducing a dependency to

    <dependency>
        <groupId>com.spotify</groupId>
        <artifactId>completable-futures</artifactId>
        <version>0.3.6</version>
    </dependency>

and using the https://github.com/spotify/completable-futures/blob/master/src/main/java/com/spotify/futures/ConcurrencyReducer.java class for controlling the concurrency.

This challenge is that this is a systemic problem at a higher level and solving this problem in this PR might feel overly complex. However, it's possible to handle it incrementally and refactor later.

For making the code asynchronous without blocking calls, composition is needed by using thenCompose/thenApply.
In this case, it's not trivial, so it requires a bit more thought than usual since the unlimited concurrency problem needs to also be solved.

@iosdev747
Copy link
Contributor Author

Hey @lhotari, while going through the tests, I realized that it will need more changes to add this flag in the admin client.
Is it better to add new methods in org.apache.pulsar.client.admin.Clusters to keep things backward compatible? Or modify existing methods for cleaner interface?

@lhotari
Copy link
Member

lhotari commented Jul 30, 2024

2. When there are too many namespaces in a cluster and adding a single namespace to the ns isolation policy data leads to too many bundle unload calls. This can exhaust broker resources (open connections). This should be controlled by a flag if bundle unload can be done post setPolicy call

Good catch!
This problem is related to the migration from synchronous calls to asynchronous calls in the Pulsar code base (/cc @mattisonchao). I have discussed the problem with @mattisonchao and there's some description in this comment.
The main problem is that with asynchronous calls there aren't concurrency limits in place. One of the possible solutions would be to use https://github.com/spotify/completable-futures/blob/master/src/main/java/com/spotify/futures/ConcurrencyReducer.java to limit the concurrency of asynchronous calls.
I wonder if we would need to also fix this problem although it seems like a useful solution to prevent unloading when policies are modified to reduce load on the system.

@iosdev747
Copy link
Contributor Author

Yes, agreed on this part, concurrency should be limited. And along with that we can have this flag that will prevent unloading in cases where there are too many namespaces to unload and/or unload bundles which are required (not on primary/secondary broker group).

For the bug fix, I'll move it out in a separate PR so it can be merged quickly.

@iosdev747
Copy link
Contributor Author

I have created #23100 PR for addressing first point only (Bug part). I'll continue on this PR for limiting the async calls and adding ConcurrencyReducer.java

@grssam
Copy link
Contributor

grssam commented Jul 30, 2024

  1. When there are too many namespaces in a cluster and adding a single namespace to the ns isolation policy data leads to too many bundle unload calls. This can exhaust broker resources (open connections). This should be controlled by a flag if bundle unload can be done post setPolicy call

Good catch! This problem is related to the migration from synchronous calls to asynchronous calls in the Pulsar code base (/cc @mattisonchao). I have discussed the problem with @mattisonchao and there's some description in this comment. The main problem is that with asynchronous calls there aren't concurrency limits in place. One of the possible solutions would be to use https://github.com/spotify/completable-futures/blob/master/src/main/java/com/spotify/futures/ConcurrencyReducer.java to limit the concurrency of asynchronous calls. I wonder if we would need to also fix this problem although it seems like a useful solution to prevent unloading when policies are modified to reduce load on the system.

The unbounded concurrent calls is only part of the problem.

IMO, the set call should only be touching the "changed/affected" namespaces and unloading those, if at all. It should be very straight forward to compare the list of namespaces matching before and after the change and only unload the delta (newly added and now-removed namespaces) so that their placement can be updated as per the policy.

Currently, it can unload 100s of namespaces in one go, practically making the cluster down as it struggles to cope up with so many placement calls. When this feature was added in #8976 , there was a flag to control it, but the async work done in #15527 forgot about that and broke the contract, then it was later removed in #22449 after being called "deprecated".. This feels like a weird loophole to ignore backward compatibility and bring in breaking changes easily :)

I believe that the flag was useful, but not at a broker level config. It should be part of the command itself. I am thinking a three way behavior that users may want based on their usecases:

  1. unload nothing as part of the set policy call
  2. unload every matching namespace as part of the policy set call
  3. unload only the changed namespaces (newly added + removed)

My goal is to tackle this, rather than tackling the general problem of unbounded concurrency, which really is a much wider problem not only limited to this particular "logical" flaw/regression.

What are everyone else's thoughts?

@lhotari
Copy link
Member

lhotari commented Jul 30, 2024

The unbounded concurrent calls is only part of the problem.

@grssam I agree.

IMO, the set call should only be touching the "changed/affected" namespaces and unloading those, if at all. It should be very straight forward to compare the list of namespaces matching before and after the change and only unload the delta (newly added and now-removed namespaces) so that their placement can be updated as per the policy.

I think that this part is now addressed as part of #23100

Currently, it can unload 100s of namespaces in one go, practically making the cluster down as it struggles to cope up with so many placement calls. When this feature was added in #8976 , there was a flag to control it, but the async work done in #15527 forgot about that and broke the contract, then it was later removed in #22449 after being called "deprecated".. This feels like a weird loophole to ignore backward compatibility and bring in breaking changes easily :)

That's unfortunate.

I believe that the flag was useful, but not at a broker level config. It should be part of the command itself. I am thinking a three way behavior that users may want based on their usecases:

  1. unload nothing as part of the set policy call
  2. unload every matching namespace as part of the policy set call
  3. unload only the changed namespaces (newly added + removed)

Makes sense.

My goal is to tackle this, rather than tackling the general problem of unbounded concurrency, which really is a much wider problem not only limited to this particular "logical" flaw/regression.

+1

@grssam
Copy link
Contributor

grssam commented Jul 31, 2024

I think that this part is now addressed as part of #23100

#23100 only fixes a bug so that the code doesn't try to unload non existent namespaces in the cluster.

I will create a PIP for adding the parameter to the policy set call.

@iosdev747
Copy link
Contributor Author

Closing this PR in favor of PIP raised: #23116

@iosdev747 iosdev747 closed this Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-required Your PR changes impact docs and you will update later.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Breaking contract b/w 2.9 / 2.10 and 3.0 for ns-isolation-policy
3 participants