Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][storage] Periodically rollover Cursor ledgers #22622

Merged
merged 2 commits into from
May 1, 2024

Conversation

eolivelli
Copy link
Contributor

Motivation

Each ManagedCursor stores its own status on a BookKeeper ledger and this ledger is rolled over when one of these conditions are met: some time passed or too many entries have been written.

The problem is that the condition is checked only while writing to the ledger: if no consumer sends acknowledgments then the ledger is never rolled over.
This may become a problem because there will be a long lived ledger that doesn't respect the configuration set by the administrator for "ledger rollover".
Creating a new ledger allows the Bookies to reclaim space, especially thanks to the GC (Garbage collection) mechanism, and then with the Compaction.
Having a long lived ledger prevents the Bookie to release disk space promplty with the GC, and so only Compaction can release space, but compaction is an expensive operation.
For this reason in some usecases you want to configure a frequent rollover of the ledgers for the topics, and thus you want that such rollover happens at the same frequency also for the cursor ledgers.

Modifications

This patch simply allows the broker to periodically check every ManagedCursor and apply the rollover configuration even when there is no consumer activity.

Verifying this change

This change added tests

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: eolivelli#26

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Apr 30, 2024
@thetumbled
Copy link
Member

LGTM.
I agree that we should find out the long live ledgers and shorten their lifecycle, so that we do not need to speed lots of effort on compaction.
According to my finding, the space reclaimed by compaction take up a large part of the total space reclaimed, which will spend lots of I/O resource. We should increase the ratio of space reclaimed by deleting the entire entry log.

@eolivelli eolivelli self-assigned this Apr 30, 2024
@eolivelli eolivelli added ready-to-test type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/broker labels Apr 30, 2024
@eolivelli eolivelli added this to the 3.3.0 milestone Apr 30, 2024
Copy link
Member

@michaeljmarshall michaeljmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eolivelli eolivelli merged commit 084daf0 into apache:master May 1, 2024
54 of 56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker doc-not-needed Your PR changes do not impact docs ready-to-test type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants