Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][broker]Ensure namespace deletion doesn't fail #22627

Merged
merged 5 commits into from
May 13, 2024

Conversation

eolivelli
Copy link
Contributor

@eolivelli eolivelli commented May 1, 2024

Motivation

Deleting a namespace sometimes fails and leaves znodes around, especially the "bundle owner" nodes /namespace/NAMESPACE/xxxxxx

Modifications

  • ensure that all the children of /namespace/NAMESPACE are deleted, especially the ephemeral nodes for ownership
  • ensure that on the code path that handles the deletion of the namespace we make progress, the main fact is about not relying on "cache.exists" before using a "delete": the new code is more pessimistic that goes with "deleteIfExists", handling "NotFound" gracefully
  • add more logs about the delete namespace execution path

With this patch

  • delete namespace seems to consistently work
  • there are no more spammy logs about errors around the topic policies event reader (changelog_events...)
  • in case we see an error we log at which point the procedure was (because we log all the paths that are deleted from zk)

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: eolivelli#27

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label May 1, 2024
@eolivelli
Copy link
Contributor Author

This script reproduces MANY problems with namespace deletions, just run it in a loop on a Pulsar cluster with a few brokers (like 6) and more than one ZK server (I am using 3 nodes)

reproduce.sh

#! /bin/bash
export PATH=$PATH:/pulsar/bin
set -x
pulsar-admin topics delete-partitioned-topic non-persistent://T1/N1/P1 -f
bin/pulsar zookeeper-shell -server pulsar-zookeeper-ca:2181 ls -R /T1/N1 | grep namespace
pulsar-admin namespaces delete T1/N1
pulsar-admin tenants delete T1

pulsar-admin tenants create T1
pulsar-admin namespaces create T1/N1 -b 64
pulsar-admin topics create-partitioned-topic non-persistent://T1/N1/P1 -p 20

exit 0

Execute it in a loop

while true ; do ./reproduce.sh ; done

@eolivelli eolivelli changed the title Ensure namespace deletion doesn't fail [improve][broker]Ensure namespace deletion doesn't fail May 1, 2024
@eolivelli eolivelli self-assigned this May 1, 2024
@eolivelli eolivelli added the type/bug The PR fixed a bug or issue reported a bug label May 1, 2024
@eolivelli eolivelli added this to the 3.3.0 milestone May 1, 2024
…sources/BaseResources.java

Co-authored-by: Andrey Yegorov <8622884+dlg99@users.noreply.github.com>
@eolivelli
Copy link
Contributor Author

@lhotari I think that it is worth to port this fix to all the active branches, maybe up to 3.0. Wdyt?

@lhotari
Copy link
Member

lhotari commented May 2, 2024

@lhotari I think that it is worth to port this fix to all the active branches, maybe up to 3.0. Wdyt?

@eolivelli makes sense. Btw. there's a rabbit hole in namespace deletion. Currently concurrency limits are missing and a namespace deletion will attempt to concurrency delete all possible topics in the namespace. This is something that would have to be addressed to improve the reliability of namespace deletion. More details in #22541 comments such as #22541 (comment) and #22541 (comment) .

@coderzc coderzc modified the milestones: 3.3.0, 3.4.0 May 8, 2024
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 43.13725% with 29 lines in your changes are missing coverage. Please review.

Project coverage is 32.47%. Comparing base (bbc6224) to head (23c39c1).
Report is 261 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #22627       +/-   ##
=============================================
- Coverage     73.57%   32.47%   -41.10%     
+ Complexity    32624       87    -32537     
=============================================
  Files          1877     1745      -132     
  Lines        139502   133886     -5616     
  Branches      15299    14673      -626     
=============================================
- Hits         102638    43482    -59156     
- Misses        28908    84033    +55125     
+ Partials       7956     6371     -1585     
Flag Coverage Δ
inttests 27.52% <43.13%> (+2.94%) ⬆️
systests 24.80% <5.88%> (+0.47%) ⬆️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../service/SystemTopicBasedTopicPoliciesService.java 52.26% <100.00%> (-21.93%) ⬇️
...he/pulsar/metadata/impl/AbstractMetadataStore.java 55.31% <100.00%> (-29.25%) ⬇️
...ulsar/broker/resources/LocalPoliciesResources.java 44.82% <0.00%> (-31.04%) ⬇️
...he/pulsar/broker/resources/NamespaceResources.java 53.90% <25.00%> (-26.25%) ⬇️
...apache/pulsar/broker/resources/TopicResources.java 39.58% <33.33%> (-57.03%) ⬇️
.../org/apache/pulsar/metadata/api/MetadataStore.java 60.00% <50.00%> (-20.00%) ⬇️
.../apache/pulsar/broker/resources/BaseResources.java 55.85% <50.00%> (-16.88%) ⬇️
...pache/pulsar/broker/admin/impl/NamespacesBase.java 15.22% <0.00%> (-57.86%) ⬇️

... and 1517 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants