[improve] [broker] replace HashMap with inner implementation ConcurrentLongLongPairHashMap in Negative Ack Tracker. #23582

thetumbled · 2024-11-11T08:15:12Z

Motivation

Negative ack feature need to retain the message id and timestamp info in the memory of the consumer client side, leading to great memory consumption.
This PR aim to replace the HashMap with the inner map implementation ConcurrentLongLongPairHashMap to reduce the memory consumption. Though HashMap is faster than the inner map implementation ConcurrentLongLongPairHashMap in some cases, but the most important issue in this case is memory consumption instead of the speed.

Some test data list as follows:

experiment 1


    public static void main(String[] args) throws IOException {
        ConcurrentLongLongPairHashMap map1 = ConcurrentLongLongPairHashMap.newBuilder()
                .autoShrink(true)
                .concurrencyLevel(16)
                .build();
        HashMap<MessageId, Long> map2 = new HashMap<>();
        long numMessages = 5000000;
        long ledgerId, entryId, partitionIndex, timestamp;
        for (long i = 0; i < numMessages; i++) {
            ledgerId = 10000+i;
            entryId = i;
            partitionIndex = 0;
            timestamp = System.currentTimeMillis();
            map1.put(ledgerId, entryId, partitionIndex, timestamp);
            map2.put(new MessageIdImpl(ledgerId, entryId, (int)partitionIndex), timestamp);
        }
        System.out.println("map1 size: " + map1.size());
        System.out.println("map2 size: " + map2.size());
        try {
            Thread.sleep(10000000);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

100w entry

HashMap：178Mb
ConcurrentLongLongPairHashMap：64Mb

500w entry

HashMap：566Mb
ConcurrentLongLongPairHashMap：256Mb

1000w entry

HashMap：1132MB
Approximately each entry consume 1132MB/10000000=118byte.

ConcurrentLongLongPairHashMap：512MB
Approximately each entry consume 512MB/10000000=53byte.

With this improvement, we can reduce 50+% of the memory consumption!

experiment 2

Test three candidate data structures:

HashMap<LongPair, Long>
org.apache.pulsar.common.util.collections.ConcurrentLongPairSet.LongPair
HashMap<LongLongPair, Long>
it.unimi.dsi.fastutil.longs.LongLongPair
ConcurrentLongLongPairHashMap

Test code:

    public static void main(String[] args) throws IOException {
        ConcurrentLongLongPairHashMap map1 = ConcurrentLongLongPairHashMap.newBuilder()
                .autoShrink(true)
                .concurrencyLevel(16)
                .build();
        HashMap<LongPair, Long> map4 = new HashMap<>();
        HashMap<LongLongPair, Long> map5 = new HashMap<>();
        long numMessages = 5000000, numLedgers=100;
        long numEntries = numMessages/numLedgers;
        long ledgerId, entryId, partitionIndex, timestamp;
        for(long i=0; i<numLedgers; i++) {
            ledgerId = 10000+i;
            for(long j=0; j<numEntries; j++) {
                entryId = 10000+j;
                partitionIndex = 0;
                timestamp = System.currentTimeMillis();
                map1.put(ledgerId, entryId, partitionIndex, timestamp);
                map4.put(new LongPair(ledgerId, entryId), timestamp);
                map5.put(LongLongPair.of(ledgerId, entryId), timestamp);
            }
        }
        
        System.out.println("map1 size: " + map1.size());
        System.out.println("map4 size: " + map4.size());
        System.out.println("map5 size: " + map5.size());
        try {
            Thread.sleep(10000000);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

The results list as follows:

100w entry

Conclusion:
HashMap<LongPair, Long> 91MB
HashMap<LongLongPair, Long> 114MB
ConcurrentLongLongPairHashMap 64MB

500w entry

HashMap<LongPair, Long> 451MB
HashMap<LongLongPair, Long> 566MB
ConcurrentLongLongPairHashMap 256MB

It shows that the ConcurrentLongLongPairHashMap is still the best option to store enormous amount of entries.

Modifications

Replace HashMap with ConcurrentLongLongPairHashMap in Negative Ack Tracker.

Verifying this change

Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is already covered by existing tests, such as (please describe tests).

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository: thetumbled#63

to reduce memory consumption.

github-actions · 2024-11-11T08:15:41Z

@thetumbled Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

pulsar-client/src/main/java/org/apache/pulsar/client/impl/NegativeAcksTracker.java

lhotari · 2024-11-12T11:00:21Z

HashMap<LongLongPair, Long>

btw. In Fastutil, there's also Obj2LongMap interface which would be applicable in this case when the value is a long, for example using Object2LongOpenHashMap implementation. In Object2LongOpenHashMap, there's a trim method to reduce the size. I guess the benefit of ConcurrentLongLongPairHashMap is that it has the auto shrink feature.

thetumbled · 2024-11-12T14:26:31Z

's also Obj2LongMap interface which would be applicable in this case when the value is a long, for example using Object2LongOpenHashMap implementation. In Object2LongOpenHashMap, there's a trim method to reduce

No, there is no shrink logic triggerd in the test code, as i only add new item into the map, without any deletion. Shrinking logic is triggered by item deletion.
The reason why ConcurrentLongLongPairHashMap is space efficient is that it use open hash addrressing with linear probing, which require less space to implement, while HashMap require more space to implement the data structure, and there is no any wrapper in ConcurrentLongLongPairHashMap.
As for Object2LongOpenHashMap, i guess it take up more space than ConcurrentLongLongPairHashMap too, as it use wrapper. There is no any wrapper in ConcurrentLongLongPairHashMap .

…ntLongLongPairHashMap in Negative Ack Tracker. (#23582) (cherry picked from commit 9d65a85)

…ntLongLongPairHashMap in Negative Ack Tracker. (apache#23582) (cherry picked from commit 9d65a85) (cherry picked from commit 431c232)

replace HashMap with inner implementation ConcurrentLongLongPairHashMap

0ca887c

to reduce memory consumption.

thetumbled requested review from BewareMyPower, lhotari and codelipenghui November 11, 2024 08:15

github-actions bot added the doc-label-missing label Nov 11, 2024

thetumbled requested review from dao-jun and poorbarcode November 11, 2024 08:17

github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Nov 11, 2024

thetumbled mentioned this pull request Nov 11, 2024

[improve] [broker] replace HashMap with inner implementation ConcurrentLongLongPairHashMap in Negative Ack Tracker. thetumbled/pulsar#63

Closed

thetumbled requested a review from nodece November 11, 2024 08:31

poorbarcode approved these changes Nov 11, 2024

View reviewed changes

BewareMyPower reviewed Nov 11, 2024

View reviewed changes

pulsar-client/src/main/java/org/apache/pulsar/client/impl/NegativeAcksTracker.java Show resolved Hide resolved

set the concurrencyLevel to 1.

c2e20c5

BewareMyPower approved these changes Nov 12, 2024

View reviewed changes

fix check style.

9c2cb70

nodece approved these changes Nov 12, 2024

View reviewed changes

fix test code.

004eff9

dao-jun approved these changes Nov 13, 2024

View reviewed changes

shibd added release/3.3.3 release/3.0.8 release/4.0.1 labels Nov 13, 2024

thetumbled added 2 commits November 13, 2024 14:49

fix.

8fe0b93

fix checkstyle.

35863de

thetumbled merged commit 9d65a85 into apache:master Nov 13, 2024
49 of 52 checks passed

lhotari pushed a commit that referenced this pull request Nov 13, 2024

[improve] [broker] replace HashMap with inner implementation Concurre…

ac466dc

…ntLongLongPairHashMap in Negative Ack Tracker. (#23582) (cherry picked from commit 9d65a85)

lhotari pushed a commit that referenced this pull request Nov 13, 2024

[improve] [broker] replace HashMap with inner implementation Concurre…

1ca3bfd

…ntLongLongPairHashMap in Negative Ack Tracker. (#23582) (cherry picked from commit 9d65a85)

lhotari pushed a commit that referenced this pull request Nov 13, 2024

[improve] [broker] replace HashMap with inner implementation Concurre…

431c232

…ntLongLongPairHashMap in Negative Ack Tracker. (#23582) (cherry picked from commit 9d65a85)

lhotari added cherry-picked/branch-4.0 cherry-picked/branch-3.3 cherry-picked/branch-3.0 labels Nov 13, 2024

thetumbled mentioned this pull request Nov 15, 2024

[improve][client] PIP-393: Improve performance of Negative Acknowledgement #23600

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve] [broker] replace HashMap with inner implementation ConcurrentLongLongPairHashMap in Negative Ack Tracker. #23582

[improve] [broker] replace HashMap with inner implementation ConcurrentLongLongPairHashMap in Negative Ack Tracker. #23582

thetumbled commented Nov 11, 2024 •

edited

Loading

github-actions bot commented Nov 11, 2024

lhotari commented Nov 12, 2024

thetumbled commented Nov 12, 2024

[improve] [broker] replace HashMap with inner implementation ConcurrentLongLongPairHashMap in Negative Ack Tracker. #23582

[improve] [broker] replace HashMap with inner implementation ConcurrentLongLongPairHashMap in Negative Ack Tracker. #23582

Conversation

thetumbled commented Nov 11, 2024 • edited Loading

Motivation

experiment 1

experiment 2

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Matching PR in forked repository

github-actions bot commented Nov 11, 2024

lhotari commented Nov 12, 2024

thetumbled commented Nov 12, 2024

thetumbled commented Nov 11, 2024 •

edited

Loading