-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Fix thread unsafe access on the bundle range cache for load manager #23217
[fix][broker] Fix thread unsafe access on the bundle range cache for load manager #23217
Conversation
ead813a
to
27e7ac6
Compare
…load managers ### Background Knowledge A concurrent hash map (no matter the `ConcurrentOpenHashMap` in Pulsar or the official `ConcurrentHashMap`) is not a synchronized hash map. For example, given a `ConcurrentHashMap<Integer, Integer>` object `m`, ```java synchronized (m) { m.computeIfAbsent(1, __ -> 100); // [1] m.computeIfAbsent(2, __ -> 200); // [2] } ``` ```java m.computeIfAbsent(1, __ -> 300); // [3] ``` If these two code blocks are called in two threads, `[1]->[3]->[2]` is a possible case. ### Motivation `SimpleLoadManagerImpl` and `ModularLoadManagerImpl` both maintain the bundle range cache: ```java // The 1st key is broker, the 2nd key is namespace private final ConcurrentOpenHashMap<String, ConcurrentOpenHashMap<String, ConcurrentOpenHashSet<String>>> brokerToNamespaceToBundleRange; ``` However, when accessing the `namespace -> bundle` map, it still needs a synchronized code block: https://github.com/apache/pulsar/blob/1c495e190b3c569e9dfd44acef2a697c93a1f771/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java#L591-L595 The code above is a composite operation of `clear()` and multiple `computeIfAbsent` operations on the `ConcurrentOpenHashMap<String, ConcurrentOpenHashSet<String>>` object. So the other place that access this map also requires the same lock even if the operation itself is thread safe: https://github.com/apache/pulsar/blob/1c495e190b3c569e9dfd44acef2a697c93a1f771/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java#L882-L886 P.S. `SimpleLoadManagerImpl` does not apply the synchronized block. However, when accessing `brokerToNamespaceToBundleRange` in the static methods of `LoadManagerShared`, they are not synchronized. So the access on the `Map<String, Set<String>>` value is not thread safe. ### Modifications Add a `BundleRangeCache` abstraction to provide some methods to support required operations on the bundle range cache.
27e7ac6
to
a86fe70
Compare
@BewareMyPower btw. not the same same, but might be useful information: Some map implementations don't have an atomic computeIfAbsent implementation, for example ConcurrentSkipListMap. Issue #21301 was about that. |
A more accurate description is that the Given the following example: Thread 1: final var result1 = map.computeIfAbsent("key", __ -> {
System.out.println("value1");
return "value1";
}); Thread 2: final var result2 = map.computeIfAbsent("key", __ -> {
System.out.println("value2");
return "value2";
}); There is a case that both "value1" and "value2" are printed but However, it's allowed because the Java Language Specification only guarantees the happens-before relationship on concurrent collections, see https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-summary.html#Weakly
Let's simplify and extend the example above. Thread 1: final var result1 = map.computeIfAbsent("key", __ -> "value1"); // A
final var result3 = map.get("key"); // C Thread 2: final var result2 = map.computeIfAbsent("key", __ -> "value2"); // B
final var result4 = map.get("key"); // D There are only two possible cases for the
The "concurrent" hash map only guarantees B could not return a value other than "value1". Because the last write operation before the read operation of B is A so B could only see "value1" written by A. Besides, the concurrent map only guarantees:
|
@BewareMyPower Yes, I agree. I mentioned it since #21301 has been an issue in the past. |
BTW, the API document in * The function
* is <em>NOT</em> guaranteed to be applied once atomically only
* if the value is not present. It's required by the
Here is the document for * The supplied
* function is invoked exactly once per invocation of this method
* if the key is absent, else not at all. Some attempted update
* operations on this map by other threads may be blocked while
* computation is in progress, so the computation should be short
* and simple.
*
* <p>The mapping function must not modify this map during computation. I believe many calls of |
pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/BundleRangeCache.java
Outdated
Show resolved
Hide resolved
...r-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/BundleRangeCache.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/BundleRangeCache.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #23217 +/- ##
============================================
+ Coverage 73.57% 74.54% +0.97%
- Complexity 32624 34233 +1609
============================================
Files 1877 1923 +46
Lines 139502 144762 +5260
Branches 15299 15838 +539
============================================
+ Hits 102638 107918 +5280
+ Misses 28908 28577 -331
- Partials 7956 8267 +311
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Background Knowledge
A concurrent hash map (no matter the
ConcurrentOpenHashMap
in Pulsar or the officialConcurrentHashMap
) is not a synchronized hash map.For example, given a
ConcurrentHashMap<Integer, Integer>
objectm
,If these two code blocks are called in two threads,
[1]->[3]->[2]
is a possible case.Motivation
SimpleLoadManagerImpl
andModularLoadManagerImpl
both maintain the bundle range cache:However, when accessing the
namespace -> bundle
map, it still needs a synchronized code block:pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java
Lines 591 to 595 in 1c495e1
The code above is a composite operation of
clear()
and multiplecomputeIfAbsent
operations on theConcurrentOpenHashMap<String, ConcurrentOpenHashSet<String>>
object.So the other place that access this map also requires the same lock even if the operation itself is thread safe:
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java
Lines 882 to 886 in 1c495e1
P.S.
SimpleLoadManagerImpl
does not apply the synchronized block.However, when accessing
brokerToNamespaceToBundleRange
in the static methods ofLoadManagerShared
, they are not synchronized. So the access on theMap<String, Set<String>>
value is not thread safe.Modifications
Add a
BundleRangeCache
abstraction to provide some methods to support required operations on the bundle range cache. Applysynchronized
key word to access any internal map (namespace -> bundle range set
) to guarantee thread safety.Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: