Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-65821] Fix race condition by using ConcurrentHashMap #160

Merged
merged 2 commits into from
Jun 23, 2021

Conversation

bitwiseman
Copy link
Contributor

@bitwiseman bitwiseman commented Jun 21, 2021

JENKINS-65821 described a race condition resulting in high cpu usage. #153 attempted to address these using locking, but resulted in deadlocks.

This change goes the other direction and attempts to address the issue using ConcurrentHashMap instead of HashMap.

@twasyl

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

@bitwiseman bitwiseman requested a review from jglick June 21, 2021 21:59
@bitwiseman bitwiseman changed the title [JENKINS-65821] Fix race condition in by using ConcurrentHashMap [JENKINS-65821] Fix race condition by using ConcurrentHashMap Jun 21, 2021
@bitwiseman bitwiseman force-pushed the issue/JENKINS-65821 branch from 965dd97 to ea384e4 Compare June 21, 2021 22:03
@jglick jglick requested a review from dwnusbaum June 21, 2021 22:11
Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess? I trust you have examined the code flow carefully and verified that there is no risk of one thread having written to just one of these maps when another thread comes along and tries to read from them. I suppose it cannot be worse than using HashMap. @dwnusbaum mentioned that the usages here were idempotent and thus safe to use locklessly.

@twasyl
Copy link
Contributor

twasyl commented Jun 22, 2021

I was wondering if one possible fix would be, in each entry point of a StandardGraphLookupView instance, to create new maps from the instance ones and pass them along to all required methods. This way you only need to synchronize when creating those copies and then you ensure that even if the instance maps are updated in one thread, computations in other threads are based on the same data from their beginning to their end.

@bitwiseman
Copy link
Contributor Author

@jglick Yes, in my opinion it has to be better than HashMap.

@twasyl
That seems like a lot of extra work being done.

@bitwiseman
Copy link
Contributor Author

@jglick From what I can see, this is approach is safe. As you say, idempotent, the two caches do not depend on each other to update, and any read failure results in calling the "brute force" method the recalculate missing values.

@bitwiseman bitwiseman merged commit 7da343b into jenkinsci:master Jun 23, 2021
@bitwiseman bitwiseman deleted the issue/JENKINS-65821 branch June 23, 2021 18:13
bitwiseman added a commit to bitwiseman/workflow-api-plugin that referenced this pull request Jun 23, 2021
[JENKINS-65821] Fix race condition by using ConcurrentHashMap
@@ -25,10 +25,10 @@
static final String INCOMPLETE = "";

/** Map the blockStartNode to its endNode, to accellerate a range of operations */
HashMap<String, String> blockStartToEnd = new HashMap<>();
ConcurrentHashMap<String, String> blockStartToEnd = new ConcurrentHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine since the value type is just String that already exists. Just a reminder that in general you need to be careful about having one thread add an entry to a map (or whatever) that another thread might pull off without the use of any synchronization barriers: if the object were recently constructed, its constructor might not have completed by the time the other thread uses it. One of those weird theoretical problems that probably does not happen often in reality but is hard to reason about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the object were recently constructed, its constructor might not have completed by the time the other thread uses it.

Wow, good to know. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants