[managed-ledger] Do not send duplicate reads to BK/offloaders #17241

eolivelli · 2022-08-23T15:53:01Z

Motivation

When you have many subscriptions on the same topic and they are catching up (for instance after some downtime of the consumer application) Pulsar will start to perform many reads from BK for the same entries.

Modifications

Prevent concurrent reads of the same entries.

Verifying this change

This change is already covered by existing tests.

I also tested manually that with this fix we are reducing a lot the pressure on the bookies. This is something we cannot code in a integration test as it will use too many resources on CI.

With some tests with only 64 subscription we saw a reduction from 100K reads/s from BK to 250 reads/s.

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java

lhotari

LGTM, Good work @eolivelli

nicoloboschi

+1

codelipenghui

I understand the problem that the PR wants to resolve for now. And I think it's a good point of the reads improvement.

I have some questions

We are using the PendingReadKey as the key to determining whether the new one is duplicated read or not. It looks can only cover the case that all the subscriptions with the same read position. It will be great if we can have a more general solution for the duplicated reads which able to handle the case like [0, 20], [2, 22]
After this change, even if the subscription with different read position e.g. sub-a 1:10, sub-b 3:20. We will always introduce the new heap memory overhead.

I haven't thought of a specific way yet, but I think it's very necessary to discuss looking for a general solution for duplicated reads.

I think we can start the discussion on the mailing list. So that many contributors can share their ideas.

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java

zymap

Great work!

zymap · 2022-08-25T02:19:59Z

We are using the PendingReadKey as the key to determining whether the new one is duplicated read or not. It looks can only cover the case that all the subscriptions with the same read position. It will be great if we can have a more general solution for the duplicated reads which able to handle the case like [0, 20], [2, 22]

I have an idea about this that is to split the read range into small pieces, then different read can share the pieces to construct their read. There will have a tradeoff, if the pieces are smaller, the different reads will share more.
Something like this:

Jason918

LGTM

eolivelli · 2022-08-25T06:28:09Z

@zymap thank you for your idea.
I am also thinking to the same think in order to allow more chances to match the same range.
There is an open discussion on the ML, maybe you can post your idea there.

eolivelli · 2022-08-25T15:42:30Z

@codelipenghui @Jason918 @zymap @lhotari @nicoloboschi
I have updated this patch.

We did more testing and actually trying to find pending reads with ranges that "include" the requested range allows us to have many more "hits".

in the picture:

green line is for "perfect matches" (previous version of the patch)
the blue line is the rate of "partial matches" (excluding the perfect matches)
the yellow line is "no pending read to attach to"

the rate of "partial matches" is usually higher or equals to the rate of "misses" both during tailing reads and catch up reads.

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java

codelipenghui · 2022-08-26T14:27:06Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java

+                AtomicBoolean createdByThisThread = new AtomicBoolean();
+                CachedPendingRead cachedPendingRead = findBestCandidate(key,


I think we can only make the findBestCandidate() return null if no candidate and then create a new one to continue the read operation.

So that we don't need AtomicBoolean createdByThisThread = new AtomicBoolean(); and the code is easy to read.

findBestCandidate is kind of a computeIfAbsent method (when I was not looking for "includes" it was actually computeIfAbsent on a ConcurrentHashMap).
So I have to create the object and put it into the map inside the "lock"

Yes, I just want to avoid creating AtomicBoolean here.
Can we use the callback size of the cachedPendingRead or a boolean inside CachedPendingRead?

codelipenghui · 2022-08-26T14:30:10Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java

+            return result;
+        }
+
+        public void attach(CompletableFuture<List<EntryImpl>> handle) {


Do we need to remove the CachedPendingRead from the cachedPendingReads?

it is done here
https://github.com/apache/pulsar/pull/17241/files#diff-c55509e3ab1389d89a58fd564f2e318dbb95f50121ab33c729a7ca4a21d02ef1R340

codelipenghui · 2022-08-26T14:33:47Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java

        }
    }

    @Override
    public void clear() {
        Pair<Integer, Long> removedPair = entries.clear();
        manager.entriesRemoved(removedPair.getRight(), removedPair.getLeft());
+        cachedPendingReads.clear();


I noticed the element of the cachedPendingReads only be removed here. This one will be called if all the cached data should be clean up

It's better to add a unit test for this one to make sure we will not introduce any heap memory leak.

the elements here are one entry per ledger.
the per ledger maps are evicted here
https://github.com/apache/pulsar/pull/17241/files#diff-c55509e3ab1389d89a58fd564f2e318dbb95f50121ab33c729a7ca4a21d02ef1R340

We could remove the entry for a ledger in case of rollover.
but I thought it requires more coordination.
I will try to improve this

We could remove the entry for a ledger in case of rollover.
but I thought it requires more coordination.
I will try to improve this

Yes, we are on the same page.

codelipenghui · 2022-08-26T14:36:39Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java

+
+    private CachedPendingRead findBestCandidate(PendingReadKey key, Map<PendingReadKey, CachedPendingRead> ledgerCache,
+                                                AtomicBoolean created) {
+        synchronized (ledgerCache) {


We can move the synchronized to method

I want to synchronize only on the Map that is for the specific "ledger", not on the whole RangeEntryCacheImpl
this way the lock is more fine grained

Oh, sorry. I read it wrong, I thought it was equal to synchronized(this).

codelipenghui · 2022-08-26T14:51:09Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java

+                            List<EntryImpl> copy = new ArrayList<>(entriesToReturn.size());
+                            long callbackStartEntry = callback.startEntry;
+                            long callbackEndEntry = callback.endEntry;


Looks like we can change to

long callbackStartEntry = callback.startEntry;
long callbackEndEntry = callback.endEntry
List copy = new ArrayList<>(callbackEndEntry - callbackStartEntry + 1);

codelipenghui · 2022-08-26T14:53:17Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java

+            }, ml.getExecutor().chooseThread(ml.getName())).exceptionally(exception -> {
+                synchronized (CachedPendingRead.this) {
+                    for (ReadEntriesCallbackWithContext callback : callbacks) {
+                        if (exception instanceof BKException


We should unwrap the exception

And looks like we can just use createManagedLedgerException directly without the if check here.

public static ManagedLedgerException createManagedLedgerException(Throwable t) { if (t instanceof org.apache.bookkeeper.client.api.BKException) { return createManagedLedgerException(((org.apache.bookkeeper.client.api.BKException) t).getCode()); } else if (t instanceof CompletionException && !(t.getCause() instanceof CompletionException) /* check to avoid stackoverlflow */) { return createManagedLedgerException(t.getCause()); } else { log.error("Unknown exception for ManagedLedgerException.", t); return new ManagedLedgerException("Other exception", t); } }

It already handled the CompletionException and TooManyRequestsException

sure, thanks. previous code had a "invalidateLedger" that has been moved

eolivelli · 2022-08-30T14:28:09Z

@zymap @codelipenghui @Jason918 @lhotari
I have added one last step to the patch and now it is really ready:

added the ability to reuse pending reads that partially overlap with the requested range
refactored the new code to a dedicated class, in order to reduce the complexity of the code (and to ease testability)

I tried to do something more sophisticated, like reusing only overlapping pending reads only if the amount of overlapping entries is big enough, but in my testing I have seen that it seems always a good idea to reuse pending reads.

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/PendingReadsManager.java

eolivelli · 2022-08-31T07:02:09Z

@Jason918 thanks for your suggestions. I have applied them

zymap

Nice improve! Looks good to me.

Do we need to add a unit test for the PendingReadsManager?

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/PendingReadsManager.java

eolivelli · 2022-09-01T06:44:55Z

@zymap

Do we need to add a unit test for the PendingReadsManager?

I haven't pushed this part of my patch. I will update this PR soon. Thanks for the reminder.

eolivelli · 2022-09-01T07:43:08Z

@zymap tests added

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/PendingReadsManager.java

…#17241) (cherry picked from commit 3a3a993) Signed-off-by: Zixuan Liu <nodeces@gmail.com>

eolivelli added this to the 2.12.0 milestone Aug 23, 2022

eolivelli requested review from merlimat, rdhabalia, codelipenghui and michaeljmarshall August 23, 2022 15:53

eolivelli self-assigned this Aug 23, 2022

eolivelli requested a review from lhotari August 23, 2022 15:53

lhotari reviewed Aug 23, 2022

View reviewed changes

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java Outdated Show resolved Hide resolved

lhotari requested a review from hangc0276 August 24, 2022 04:57

lhotari approved these changes Aug 24, 2022

View reviewed changes

nicoloboschi approved these changes Aug 24, 2022

View reviewed changes

eolivelli changed the title ~~[managed-ledger] prevent sending duplicate reads to BK/offloaders~~ [managed-ledger] Do not send duplicate reads to BK/offloaders Aug 24, 2022

codelipenghui reviewed Aug 24, 2022

View reviewed changes

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java Outdated Show resolved Hide resolved

zymap approved these changes Aug 25, 2022

View reviewed changes

Jason918 approved these changes Aug 25, 2022

View reviewed changes

eolivelli requested review from zymap, Jason918, nicoloboschi, lhotari and codelipenghui August 25, 2022 15:51

eolivelli commented Aug 26, 2022

View reviewed changes

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java Outdated Show resolved Hide resolved

codelipenghui reviewed Aug 26, 2022

View reviewed changes

eolivelli added 5 commits August 30, 2022 14:33

[managed-ledger] prevent sending duplicate reads to BK/offloaders

3a338ad

fix build and address some review comments

cd3ca91

Address Penghui comments

0fa56d6

[enh][broker] Allow to leverage pending reads with partial coverage

1befd8b

remove comments

67bed76

eolivelli force-pushed the impl/master-skip-duplicate-bk-reads branch from 5d07b94 to 64b8101 Compare August 30, 2022 12:33

eolivelli added 2 commits August 30, 2022 16:42

checkstyle

742e841

Fix MessageDispatchThrottlingTest - mocks

d7e8bc9

Jason918 reviewed Aug 31, 2022

View reviewed changes

eolivelli added 3 commits August 31, 2022 08:44

Clean up the ledgers cache in PendingReadsManager

9fadeb4

Merge branch 'master' into impl/master-skip-duplicate-bk-reads

968ae86

checkstyle

2bca0e8

lhotari approved these changes Aug 31, 2022

View reviewed changes

zymap approved these changes Sep 1, 2022

View reviewed changes

Jason918 reviewed Sep 1, 2022

View reviewed changes

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/PendingReadsManager.java Show resolved Hide resolved

Add tests for PendingReadsManager

523924d

eolivelli added 3 commits September 1, 2022 09:45

remove debug

73a7891

Merge branch 'master' into impl/master-skip-duplicate-bk-reads

09100a2

Merge branch 'master' into impl/master-skip-duplicate-bk-reads

1de2103

eolivelli merged commit 3a3a993 into apache:master Sep 2, 2022

eolivelli deleted the impl/master-skip-duplicate-bk-reads branch September 2, 2022 13:09

michaeljmarshall reviewed Sep 9, 2022

View reviewed changes

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/PendingReadsManager.java Show resolved Hide resolved

Technoboy- modified the milestones: 2.12.0, 2.11.0 Oct 13, 2022

Technoboy- added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/broker labels Oct 13, 2022

Technoboy- pushed a commit that referenced this pull request Oct 13, 2022

[managed-ledger] Do not send duplicate reads to BK/offloaders (#17241)

55a0cb4

lhotari mentioned this pull request Feb 21, 2024

InflightReadsLimiter - limit the memory used by reads end-to-en streamnative/pulsar-archived#5920

Merged

15 tasks

lhotari mentioned this pull request Apr 5, 2024

PIP-174: New managed ledger entry cache implementation #15955

Open

1 task

nodece added a commit to nodece/pulsar that referenced this pull request Sep 10, 2024

[managed-ledger] Do not send duplicate reads to BK/offloaders (apache…

f264db7

…#17241) (cherry picked from commit 3a3a993) Signed-off-by: Zixuan Liu <nodeces@gmail.com>

lhotari mentioned this pull request Oct 23, 2024

[Enhancement] Make cursor caching eligibility logic reactive since reads don't get cached until checkCursorsToCacheEntries has been called #23503

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[managed-ledger] Do not send duplicate reads to BK/offloaders #17241

[managed-ledger] Do not send duplicate reads to BK/offloaders #17241

eolivelli commented Aug 23, 2022 •

edited

Loading

lhotari left a comment

nicoloboschi left a comment

codelipenghui left a comment •

edited

Loading

zymap left a comment

zymap commented Aug 25, 2022

Jason918 left a comment

eolivelli commented Aug 25, 2022

eolivelli commented Aug 25, 2022

codelipenghui Aug 26, 2022

eolivelli Aug 26, 2022

codelipenghui Aug 27, 2022

codelipenghui Aug 26, 2022

eolivelli Aug 26, 2022

codelipenghui Aug 26, 2022

eolivelli Aug 26, 2022

codelipenghui Aug 27, 2022

codelipenghui Aug 26, 2022

eolivelli Aug 26, 2022

codelipenghui Aug 27, 2022

codelipenghui Aug 26, 2022

eolivelli Aug 26, 2022

codelipenghui Aug 26, 2022

codelipenghui Aug 26, 2022

eolivelli Aug 26, 2022

eolivelli commented Aug 30, 2022

eolivelli commented Aug 31, 2022

zymap left a comment

eolivelli commented Sep 1, 2022 •

edited

Loading

eolivelli commented Sep 1, 2022

		AtomicBoolean createdByThisThread = new AtomicBoolean();
		CachedPendingRead cachedPendingRead = findBestCandidate(key,

[managed-ledger] Do not send duplicate reads to BK/offloaders #17241

[managed-ledger] Do not send duplicate reads to BK/offloaders #17241

Conversation

eolivelli commented Aug 23, 2022 • edited Loading

Motivation

Modifications

Verifying this change

lhotari left a comment

Choose a reason for hiding this comment

nicoloboschi left a comment

Choose a reason for hiding this comment

codelipenghui left a comment • edited Loading

Choose a reason for hiding this comment

zymap left a comment

Choose a reason for hiding this comment

zymap commented Aug 25, 2022

Jason918 left a comment

Choose a reason for hiding this comment

eolivelli commented Aug 25, 2022

eolivelli commented Aug 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eolivelli commented Aug 30, 2022

eolivelli commented Aug 31, 2022

zymap left a comment

Choose a reason for hiding this comment

eolivelli commented Sep 1, 2022 • edited Loading

eolivelli commented Sep 1, 2022

eolivelli commented Aug 23, 2022 •

edited

Loading

codelipenghui left a comment •

edited

Loading

eolivelli commented Sep 1, 2022 •

edited

Loading