Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure read-lock is not continuously held on a section while iterating over concurrent maps #9787

Merged
merged 2 commits into from
Mar 4, 2021

Conversation

merlimat
Copy link
Contributor

@merlimat merlimat commented Mar 3, 2021

Motivation

As discussed in #9764 the fact that we're potentially holding a read-lock while scanning through a section of the map has several implications:

  1. If the process functions is taking a long time (eg: making a blocking request to ZK that might even timeout), the writes operations on that section of the map are stalled during that time.
  2. It's deadlock prone:
    1. If a thread tries to use the map while scanning through it can deadlock itself
    2. If the processing operation waits for the completion of some operation from a different thread and that thread tries to use the same map, it can create a deadlock.

Instead of holding the lock throughout the scan of the section, we should instead release the read lock before calling the processing function, going back into the optimistic read mode.

This will not add any overhead (in terms of volatile reads) compared to the current implementation, but will avoid all the possible deadlock traps, since we're never going to be holding the lock while calling the user code.

@merlimat merlimat added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Mar 3, 2021
@merlimat merlimat added this to the 2.8.0 milestone Mar 3, 2021
@merlimat merlimat self-assigned this Mar 3, 2021
} finally {
if (acquiredReadLock) {
storedKey = (K) table[bucket];
storedValue = (V) table[bucket + 1];
unlockRead(stamp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try-finally for unlock? (Same comment to all unlock locations)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think in all places here it should be guaranteed to not throw, but makes sense to do it as general practice.

@lhotari
Copy link
Member

lhotari commented Mar 3, 2021

Nice! Can't wait to try it out.

@merlimat merlimat merged commit c86f2f7 into apache:master Mar 4, 2021
@merlimat merlimat deleted the for-each branch March 4, 2021 05:26
@eolivelli
Copy link
Contributor

Sorry. I came too late to the party.
Btw I believe this approach is good

+1

mlyahmed pushed a commit to mlyahmed/pulsar that referenced this pull request Mar 5, 2021
…g over concurrent maps (apache#9787)

* Ensure read-lock is not continuously held on a section while iterating over concurrent maps

* Added try/finally
@lhotari
Copy link
Member

lhotari commented Mar 8, 2021

btw. I happened to come across #8877 which was a fix to a deadlock.
This PR would also prevent such problems.

merlimat added a commit that referenced this pull request May 22, 2021
…g over concurrent maps (#9787)

* Ensure read-lock is not continuously held on a section while iterating over concurrent maps

* Added try/finally
eolivelli pushed a commit to datastax/pulsar that referenced this pull request May 24, 2021
…g over concurrent maps (apache#9787)

* Ensure read-lock is not continuously held on a section while iterating over concurrent maps

* Added try/finally

(cherry picked from commit 4c369c9)
codelipenghui pushed a commit that referenced this pull request May 25, 2021
### Motivation

In several places in the code when iterating over the custom hashmaps, we are taking over a copy of the map. This was done every time the iteration could end up modifying the map, since there was a non-reentrant mutex taken during the iteration. Any modification would lead to a deadlock. 

Since the behavior was changed in #9787 to not hold the section mutex during the iteration, there's no more need to make a copy of the maps.
@codelipenghui codelipenghui added the cherry-picked/branch-2.7 Archived: 2.7 is end of life label May 27, 2021
yangl pushed a commit to yangl/pulsar that referenced this pull request Jun 23, 2021
### Motivation

In several places in the code when iterating over the custom hashmaps, we are taking over a copy of the map. This was done every time the iteration could end up modifying the map, since there was a non-reentrant mutex taken during the iteration. Any modification would lead to a deadlock. 

Since the behavior was changed in apache#9787 to not hold the section mutex during the iteration, there's no more need to make a copy of the maps.
BewareMyPower added a commit to streamnative/kop that referenced this pull request Jul 23, 2021
Fixes #618 

### Motivation

See #618 (comment) for the deadlock analysis.

### Modifications
- Use `ConcurrentHashMap` instead of `ConcurrentLongHashMap`. Though this bug may already be fixed in apache/pulsar#9787, the `ConcurrentHashMap` from Java standard library is more reliable. The possible performance enhancement brought by `ConcurrentLongHashMap` still needs to be proved.
- Use `AtomicBoolean` as `KafkaTopicConsumerManager`'s state instead of read-write lock to avoid `close()` method that tries to acquire write lock blocking.
- Run a single cursor expire task instead one task per channel, since #404 changed `consumerTopicManagers` to a static field, there's no reason to run a task for each connection.
BewareMyPower added a commit to BewareMyPower/kop that referenced this pull request Jul 25, 2021
Fixes streamnative#618 

### Motivation

See streamnative#618 (comment) for the deadlock analysis.

### Modifications
- Use `ConcurrentHashMap` instead of `ConcurrentLongHashMap`. Though this bug may already be fixed in apache/pulsar#9787, the `ConcurrentHashMap` from Java standard library is more reliable. The possible performance enhancement brought by `ConcurrentLongHashMap` still needs to be proved.
- Use `AtomicBoolean` as `KafkaTopicConsumerManager`'s state instead of read-write lock to avoid `close()` method that tries to acquire write lock blocking.
- Run a single cursor expire task instead one task per channel, since streamnative#404 changed `consumerTopicManagers` to a static field, there's no reason to run a task for each connection.
BewareMyPower added a commit to streamnative/kop that referenced this pull request Jul 25, 2021
Fixes #618 

### Motivation

See #618 (comment) for the deadlock analysis.

### Modifications
- Use `ConcurrentHashMap` instead of `ConcurrentLongHashMap`. Though this bug may already be fixed in apache/pulsar#9787, the `ConcurrentHashMap` from Java standard library is more reliable. The possible performance enhancement brought by `ConcurrentLongHashMap` still needs to be proved.
- Use `AtomicBoolean` as `KafkaTopicConsumerManager`'s state instead of read-write lock to avoid `close()` method that tries to acquire write lock blocking.
- Run a single cursor expire task instead one task per channel, since #404 changed `consumerTopicManagers` to a static field, there's no reason to run a task for each connection.
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
### Motivation

In several places in the code when iterating over the custom hashmaps, we are taking over a copy of the map. This was done every time the iteration could end up modifying the map, since there was a non-reentrant mutex taken during the iteration. Any modification would lead to a deadlock. 

Since the behavior was changed in apache#9787 to not hold the section mutex during the iteration, there's no more need to make a copy of the maps.
Copy link
Contributor

@yws-tracy yws-tracy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi mrlimat @merlimat @eolivelli @lhotari :
I do not quite understand enhance for loop is replaced with "for (int i = 0...) " in this mr, as I know for array loop, enhanced for loop is quite equivalent to "for (int i = 0...) ", Is there any other purpose? look forward to your reply,
thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.7 Archived: 2.7 is end of life release/2.7.3 type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants