Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: speed up ConcurrentHashMap#computeIfAbsent of JDK8 #1245

Closed
wants to merge 1 commit into from

Conversation

SteNicholas
Copy link
Member

@SteNicholas SteNicholas commented Jan 9, 2025

Which issue does this PR close?

Closes #1244.

Rationale for this change

Comet supports JDK8, which could meet the bug mentioned in JDK-8161372. Therefore, we could check the key existence before invoking computeIfAbsent.

What changes are included in this PR?

Introduce ConcurrentHashMapForJDK8 to check the key existence for speed up, which solves the bug JDK-8161372 to speed up ConcurrentHashMap#computeIfAbsent.

Backport apache/incubator-uniffle#519.

How are these changes tested?

CI.

@SteNicholas
Copy link
Member Author

Ping @andygrove.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 33.33333% with 8 lines in your changes missing coverage. Please review.

Project coverage is 34.68%. Comparing base (ca7b4a8) to head (3a060d1).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...src/main/java/org/apache/comet/util/JavaUtils.java 11.11% 7 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1245      +/-   ##
============================================
- Coverage     34.69%   34.68%   -0.01%     
- Complexity      991      992       +1     
============================================
  Files           116      117       +1     
  Lines         44885    44895      +10     
  Branches       9863     9864       +1     
============================================
+ Hits          15572    15574       +2     
- Misses        26165    26172       +7     
- Partials       3148     3149       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andygrove
Copy link
Member

Thanks for the contribution, @SteNicholas, but I am not convinced this helps with performance. We only have a single use of
computeIfAbsent and the "compute" part is just instantiating a class, so it should not block. Do I understand this correctly?

taskIdMapsForShuffle.computeIfAbsent(handle.shuffleId, _ => new OpenHashSet[Long](16))

@SteNicholas
Copy link
Member Author

SteNicholas commented Jan 10, 2025

@andygrove, when using ConcurrentHashMap in a Java 8 environment, be sure to pay attention to whether computeIfAbsent will be called concurrently on the same key. If so, you need to try calling get first, which refers to apache/shardingsphere#13275.

BTW, the JMH testing is as follows:

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Fork(3)
@Warmup(iterations = 3, time = 5)
@Measurement(iterations = 3, time = 5)
@Threads(16)
@State(Scope.Benchmark)
public class ConcurrentHashMapBenchmark {
    
    private static final String KEY = "key";
    
    private static final Object VALUE = new Object();
    
    private final Map<String, Object> concurrentMap = new ConcurrentHashMap<>(1, 1);
    
    @Setup(Level.Iteration)
    public void setup() {
        concurrentMap.clear();
    }
    
    @Benchmark
    public Object benchGetBeforeComputeIfAbsent() {
        Object result = concurrentMap.get(KEY);
        if (null == result) {
            result = concurrentMap.computeIfAbsent(KEY, __ -> VALUE);
        }
        return result;
    }
    
    @Benchmark
    public Object benchComputeIfAbsent() {
        return concurrentMap.computeIfAbsent(KEY, __ -> VALUE);
    }
}
  • JDK-8: The performance of the two methods is many orders of magnitude higher. The performance of directly calling computeIfAbsent is one million per second. The performance of calling get first to check is one billion per second, and this is equivalent to a 16-thread test. In terms of resources, the CPU utilization during the benchComputeIfAbsent test has been maintained at around 20%; while the CPU utilization during the benchGetBeforeComputeIfAbsent test has been maintained at around 100%.
# JMH version: 1.33
# VM version: JDK 1.8.0_311, Java HotSpot(TM) 64-Bit Server VM, 25.311-b11
# VM invoker: /usr/local/java/jdk1.8.0_311/jre/bin/java
# VM options: -Dvisualvm.id=172855224679674
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 3 iterations, 5 s each
# Measurement: 3 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark:ConcurrentHashMapBenchmark.benchComputeIfAbsent

# Run progress: 0.00% complete, ETA 00:03:00
# Fork: 1 of 3
# Warmup Iteration   1: 11173878.242 ops/s
# Warmup Iteration   2: 8471364.065 ops/s
# Warmup Iteration   3: 8766401.960 ops/s
Iteration   1: 8776260.796 ops/s
Iteration   2: 8632907.974 ops/s
Iteration   3: 8557264.788 ops/s

# Run progress: 16.67% complete, ETA 00:02:33
# Fork: 2 of 3
# Warmup Iteration   1: 7757506.431 ops/s
# Warmup Iteration   2: 8176991.807 ops/s
# Warmup Iteration   3: 8795107.589 ops/s
Iteration   1: 8668883.337 ops/s
Iteration   2: 8866318.073 ops/s
Iteration   3: 8848517.540 ops/s

# Run progress: 33.33% complete, ETA 00:02:02
# Fork: 3 of 3
# Warmup Iteration   1: 8154698.571 ops/s
# Warmup Iteration   2: 8317945.491 ops/s
# Warmup Iteration   3: 8884286.732 ops/s
Iteration   1: 8912555.062 ops/s
Iteration   2: 8894750.001 ops/s
Iteration   3: 8780504.227 ops/s


Result "ConcurrentHashMapBenchmark.benchComputeIfAbsent":
  8770884.644 ±(99.9%) 210678.797 ops/s [Average]
  (min, avg, max) = (8557264.788, 8770884.644, 8912555.062), stdev = 125371.573
  CI (99.9%): [8560205.847, 8981563.442] (assumes normal distribution)


# JMH version: 1.33
# VM version: JDK 1.8.0_311, Java HotSpot(TM) 64-Bit Server VM, 25.311-b11
# VM invoker: /usr/local/java/jdk1.8.0_311/jre/bin/java
# VM options: -Dvisualvm.id=172855224679674
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 3 iterations, 5 s each
# Measurement: 3 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent

# Run progress: 50.00% complete, ETA 00:01:31
# Fork: 1 of 3
# Warmup Iteration   1: 1881091972.510 ops/s
# Warmup Iteration   2: 1843432746.197 ops/s
# Warmup Iteration   3: 2353506882.860 ops/s
Iteration   1: 2389458285.091 ops/s
Iteration   2: 2391001171.657 ops/s
Iteration   3: 2387181602.010 ops/s

# Run progress: 66.67% complete, ETA 00:01:01
# Fork: 2 of 3
# Warmup Iteration   1: 1872514017.315 ops/s
# Warmup Iteration   2: 1855584197.510 ops/s
# Warmup Iteration   3: 2342392977.207 ops/s
Iteration   1: 2378551289.692 ops/s
Iteration   2: 2374081014.168 ops/s
Iteration   3: 2389909613.865 ops/s

# Run progress: 83.33% complete, ETA 00:00:30
# Fork: 3 of 3
# Warmup Iteration   1: 1880210774.729 ops/s
# Warmup Iteration   2: 1804266170.900 ops/s
# Warmup Iteration   3: 2337740394.373 ops/s
Iteration   1: 2363741084.192 ops/s
Iteration   2: 2372565304.724 ops/s
Iteration   3: 2388015878.515 ops/s


Result "ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent":
  2381611693.768 ±(99.9%) 16356182.057 ops/s [Average]
  (min, avg, max) = (2363741084.192, 2381611693.768, 2391001171.657), stdev = 9733301.586
  CI (99.9%): [2365255511.711, 2397967875.825] (assumes normal distribution)


# Run complete. Total time: 00:03:03

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                                  Mode  Cnt           Score          Error  Units
ConcurrentHashMapBenchmark.benchComputeIfAbsent           thrpt    9     8770884.644 ±   210678.797  ops/s
ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent  thrpt    9  2381611693.768 ± 16356182.057  ops/s
  • JDK-17: The performance of computeIfAbsent is slightly lower than get first, but the performance is at least the same order of magnitude. Moreover, the CPU is fully loaded during the running of both use cases.
# JMH version: 1.33
# VM version: JDK 17.0.1, Java HotSpot(TM) 64-Bit Server VM, 17.0.1+12-LTS-39
# VM invoker: /usr/local/java/jdk-17.0.1/bin/java
# VM options: -Dvisualvm.id=173221627574053
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 3 iterations, 5 s each
# Measurement: 3 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: ConcurrentHashMapBenchmark.benchComputeIfAbsent

# Run progress: 0.00% complete, ETA 00:03:00
# Fork: 1 of 3
# Warmup Iteration   1: 1544327446.565 ops/s
# Warmup Iteration   2: 1475077923.449 ops/s
# Warmup Iteration   3: 1565544222.606 ops/s
Iteration   1: 1564346089.698 ops/s
Iteration   2: 1560062375.891 ops/s
Iteration   3: 1552569020.412 ops/s

# Run progress: 16.67% complete, ETA 00:02:33
# Fork: 2 of 3
# Warmup Iteration   1: 1617143507.004 ops/s
# Warmup Iteration   2: 1433136907.916 ops/s
# Warmup Iteration   3: 1527623176.866 ops/s
Iteration   1: 1522331660.180 ops/s
Iteration   2: 1524798683.186 ops/s
Iteration   3: 1522686827.744 ops/s

# Run progress: 33.33% complete, ETA 00:02:02
# Fork: 3 of 3
# Warmup Iteration   1: 1671732222.173 ops/s
# Warmup Iteration   2: 1462966231.429 ops/s
# Warmup Iteration   3: 1553792663.545 ops/s
Iteration   1: 1549840468.944 ops/s
Iteration   2: 1549245571.349 ops/s
Iteration   3: 1554801575.735 ops/s


Result "ConcurrentHashMapBenchmark.benchComputeIfAbsent":
  1544520252.571 ±(99.9%) 27953594.118 ops/s [Average]
  (min, avg, max) = (1522331660.180, 1544520252.571, 1564346089.698), stdev = 16634735.479
  CI (99.9%): [1516566658.453, 1572473846.689] (assumes normal distribution)


# JMH version: 1.33
# VM version: JDK 17.0.1, Java HotSpot(TM) 64-Bit Server VM, 17.0.1+12-LTS-39
# VM invoker: /usr/local/java/jdk-17.0.1/bin/java
# VM options: -Dvisualvm.id=173221627574053
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 3 iterations, 5 s each
# Measurement: 3 iterations, 5 s each
# Timeout: 10 min per iteration的
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent

# Run progress: 50.00% complete, ETA 00:01:31
# Fork: 1 of 3
# Warmup Iteration   1: 1813078468.960 ops/s
# Warmup Iteration   2: 1944438216.902 ops/s
# Warmup Iteration   3: 2232703681.960 ops/s
Iteration   1: 2233727123.664 ops/s
Iteration   2: 2233657163.983 ops/s
Iteration   3: 2229008772.953 ops/s

# Run progress: 66.67% complete, ETA 00:01:01
# Fork: 2 of 3
# Warmup Iteration   1: 1767187585.805 ops/s
# Warmup Iteration   2: 1900420998.518 ops/s
# Warmup Iteration   3: 2175122268.840 ops/s
Iteration   1: 2180409680.029 ops/s
Iteration   2: 2181398523.091 ops/s
Iteration   3: 2176454597.329 ops/s

# Run progress: 83.33% complete, ETA 00:00:30
# Fork: 3 of 3
# Warmup Iteration   1: 1822355551.990 ops/s
# Warmup Iteration   2: 1832618832.110 ops/s
# Warmup Iteration   3: 2225265888.631 ops/s
Iteration   1: 2240765668.888 ops/s
Iteration   2: 2225847700.599 ops/s
Iteration   3: 2232257415.965 ops/s


Result "ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent":
  2214836294.056 ±(99.9%) 45190341.578 ops/s [Average]
  (min, avg, max) = (2176454597.329, 2214836294.056, 2240765668.888), stdev = 26892047.412
  CI (99.9%): [2169645952.478, 2260026635.633] (assumes normal distribution)


# Run complete. Total time: 00:03:03

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                                  Mode  Cnt           Score          Error  Units
ConcurrentHashMapBenchmark.benchComputeIfAbsent           thrpt    9  1544520252.571 ± 27953594.118  ops/s
ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent  thrpt    9  2214836294.056 ± 45190341.578  ops/s

@mbutrovich
Copy link
Contributor

Thanks for raising this issue. I definitely learned something new about JDK8's performance issue with ConcurrentHashMap, as your microbenchmark demonstrates. However, it's not clear to me that the overall query performance will be meaningfully impacted by the amount of code being brought in to be maintained, and I'm not sure if we want to set a precedent for maintaining too many code paths for different JDK versions.

@SteNicholas
Copy link
Member Author

@mbutrovich, is it better to close this pull request and reopen if needed?

@parthchandra
Copy link
Contributor

@mbutrovich, is it better to close this pull request and reopen if needed?

Personally, I think so. Also, do we want to continue supporting JDK8 for long (FWIW, redhat will support openjdk 8 till Nov 2026)?

@andygrove, @kazuyukitanimura, @viirya wdyt?

@andygrove
Copy link
Member

@mbutrovich, is it better to close this pull request and reopen if needed?

Personally, I think so. Also, do we want to continue supporting JDK8 for long (FWIW, redhat will support openjdk 8 till Nov 2026)?

@andygrove, @kazuyukitanimura, @viirya wdyt?

I would say that we should drop JDK 8 support if and when it becomes an effort to maintain (I'm not sure that it is at the moment?).

I will go ahead and close this PR. Thanks @SteNicholas for making us aware of the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up ConcurrentHashMap#computeIfAbsent of JDK8
5 participants