[JENKINS-65821] Introducing some synchronisation mechanisms to prevent some race condition #153

twasyl · 2021-06-04T08:29:30Z

On some big pipelines, the graph may take some time to be generated due to possible synchronous updates to Maps. Trying to introduce some synchronisation mechanism to prevent those race conditions.

Make sure you are opening from a topic/feature/bugfix branch (right side) and not your master branch!
Ensure that the pull request title represents the desired changelog entry
Please describe what you did
Link to relevant issues in GitHub or Jira
Link to relevant pull requests, esp. upstream and downstream changes
Ensure you have provided tests - that demonstrates feature works or fixes the issue

src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java

twasyl · 2021-06-04T08:34:10Z

@jglick FYI

amuniz · 2021-06-04T10:20:26Z

Wouldn't be better to use a rw lock so reads do not block each other?

twasyl · 2021-06-04T11:09:30Z

Wouldn't be better to use a rw lock so reads do not block each other?

I also thought about that, but was not really sure due to some discussion that were happening. I could probably follow this path.

dwnusbaum

Are you trying to improve performance, or is this intended to fix some concurrency issue you ran into? It's not clear to me how adding locks would increase performance. If you did see a concurrency issue, do you have stack traces or any more details about the issue?

(I don't see any specific problems with a patch like this, but depending on the details of the issue this might not be the best place to add synchronization. Also, since there are always fallback code paths here in case the caches are empty, and the computed values should always be the same for the same Pipeline graph, you might be able to just switch to ConcurrentHashMap to resolve any concurrency issues without needing to add explicit locks to this code, maybe with some minor updates to use methods like compute if you are seeing a lot of concurrent reads and want to make sure the cached values are used.)

jglick · 2021-06-04T20:18:01Z

I have seen the thread dumps and I suspect the bug was improper concurrent access to HashMap leading to corruption and infinite loops. Would be best to file in Jira and include (sanitized) dumps there.

jglick

Looks right.

src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java

jglick · 2021-06-04T20:36:58Z

src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java

+                blockStartToEnd.put(((BlockEndNode) newHead).getStartNode().getId(), newHead.getId());
+                String overallEnclosing = nearestEnclosingBlock.get(((BlockEndNode) newHead).getStartNode().getId());


The sort of thing that makes me dubious you could a lockless data structure for this purpose.

@dwnusbaum suggests that you probably could, if you wanted to review the logic carefully.

jglick · 2021-06-04T20:38:18Z

leading to corruption and infinite loops

Meaning that the PR description here is incorrect. The problem seems to have been a race condition and a full-blown spin hang, not just less-than-ideal performance.

src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java

jtnord · 2021-06-07T16:45:24Z

what is the impact of this change - AFAICT it will force multiple read threads (the UI) to be single threaded.
Would a ConcurrentHashMap be better than this level of synchronization here? or if you do not need re-entrant locks using a read/write lock?

jglick · 2021-06-07T17:16:15Z

@jtnord yes perhaps. Unclear if concurrent access is common enough to be worth the extra risk & complexity though. These methods should all complete quickly enough that I doubt we want to bother optimizing more.

Backport pull request #153 from twasyl/synchronization-for-optimisation

timja · 2021-06-11T07:58:53Z

FYI we're seeing performance problems on ci.jenkins.io and this is a likely culprit

https://www.irccloud.com/pastebin/JGLey7jD/

https://www.irccloud.com/pastebin/zWjk17uH/

Yesterday @daniel-beck noticed:

more than a 100 threads are blocked at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getCurrentHeads for a generate-data build

ci.jenkins.io is down atm because it's system log filled up with the stacktrace posted at the top

jglick · 2021-06-14T21:31:53Z

this is a likely culprit

You mean, a regression from this PR, or the bug that this PR purports to fix?

timja · 2021-06-15T06:42:50Z

Regression from this PR, will see if ci.jenkins.io mis-behaves again and get a thread dump

jglick · 2021-06-15T11:49:50Z

I see. We can try switching to a ConcurrentHashMap.

timja · 2021-06-16T07:09:55Z

Few people are hitting this on Jira: https://issues.jenkins.io/browse/JENKINS-65885

olamy · 2021-06-16T08:20:48Z

Definitely using a simple ConcurrentHashMap worths a try.

jglick · 2021-06-16T17:22:21Z

Oh that is a deadlock, not just higher thread contention. In such a case this must be reverted immediately and an emergency fix cut. @bitwiseman @twasyl @car-roll @dwnusbaum whoever else may be listening

bitwiseman · 2021-06-16T17:33:14Z

I'll file as soon as I'm back at my desk.

bitwiseman · 2021-06-16T18:07:52Z

FYI, StandardGraphLookupView implements GraphListener.Synchronous:

workflow-api-plugin/src/main/java/org/jenkinsci/plugins/workflow/flow/GraphListener.java

Lines 46 to 51 in ed2467f

    
               /** 
        
                * Listener which should be notified of events immediately as they occur. 
        
                * You must be very careful not to acquire locks or block. 
        
                * If you do not implement this marker interface, you will receive notifications in batched deliveries. 
        
                */ 
        
               interface Synchronous extends GraphListener {}

I'm not sure, but it seems like synchronized is not a good idea here.

[JENKINS-65885] Revert "Merge pull request #153 from twasyl/synchronization-for-optimisation"

…isation" This reverts commit 927249b, reversing changes made to 620362f.

Introducing some synchronization mechanisms to try improving performance

61da1b2

twasyl commented Jun 4, 2021

View reviewed changes

src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java Show resolved Hide resolved

Using a lock instead of synchronization

d13c9a6

dwnusbaum reviewed Jun 4, 2021

View reviewed changes

jglick approved these changes Jun 4, 2021

View reviewed changes

twasyl changed the title ~~Introducing some synchronisation mechanisms to try improving performance~~ Introducing some synchronisation mechanisms to prevent some race condition Jun 7, 2021

twasyl added 3 commits June 7, 2021 15:07

Synchronization at instance level as suggested in PR

689f233

Revert some unrelated changes

8f5016c

Revert some unrelated changes

4f4aaa5

twasyl changed the title ~~Introducing some synchronisation mechanisms to prevent some race condition~~ [JENKINS-65821] Introducing some synchronisation mechanisms to prevent some race condition Jun 7, 2021

twasyl requested a review from jglick June 7, 2021 13:34

jglick approved these changes Jun 7, 2021

View reviewed changes

src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java Show resolved Hide resolved

dwnusbaum approved these changes Jun 7, 2021

View reviewed changes

bitwiseman merged commit 927249b into jenkinsci:master Jun 7, 2021

twasyl deleted the synchronization-for-optimisation branch June 7, 2021 18:35

twasyl mentioned this pull request Jun 7, 2021

Fixing Javadoc for fixing the pipeline #155

Merged

6 tasks

bitwiseman added a commit that referenced this pull request Jun 7, 2021

Merge pull request #156 from bitwiseman/2.40.1

cae318e

Backport pull request #153 from twasyl/synchronization-for-optimisation

jglick mentioned this pull request Jun 16, 2021

[JENKINS-65885] Revert "Merge pull request #153 from twasyl/synchronization-for-optimisation" #158

Merged

6 tasks

bitwiseman added a commit that referenced this pull request Jun 16, 2021

Merge pull request #158 from bitwiseman/revert-153

77c9983

[JENKINS-65885] Revert "Merge pull request #153 from twasyl/synchronization-for-optimisation"

bitwiseman mentioned this pull request Jun 21, 2021

[JENKINS-65821] Fix race condition by using ConcurrentHashMap #160

Merged

6 tasks

bitwiseman added a commit that referenced this pull request Jun 23, 2021

Revert "Merge pull request #153 from twasyl/synchronization-for-optim…

a9560d0

…isation" This reverts commit 927249b, reversing changes made to 620362f.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JENKINS-65821] Introducing some synchronisation mechanisms to prevent some race condition #153

[JENKINS-65821] Introducing some synchronisation mechanisms to prevent some race condition #153

twasyl commented Jun 4, 2021 •

edited

Loading

twasyl commented Jun 4, 2021

amuniz commented Jun 4, 2021

twasyl commented Jun 4, 2021

dwnusbaum left a comment

jglick commented Jun 4, 2021

jglick left a comment

jglick Jun 4, 2021

jglick Jun 7, 2021

jglick commented Jun 4, 2021

jtnord commented Jun 7, 2021

jglick commented Jun 7, 2021

timja commented Jun 11, 2021 •

edited

Loading

jglick commented Jun 14, 2021

timja commented Jun 15, 2021

jglick commented Jun 15, 2021

timja commented Jun 16, 2021

olamy commented Jun 16, 2021

jglick commented Jun 16, 2021

bitwiseman commented Jun 16, 2021

bitwiseman commented Jun 16, 2021

		blockStartToEnd.put(((BlockEndNode) newHead).getStartNode().getId(), newHead.getId());
		String overallEnclosing = nearestEnclosingBlock.get(((BlockEndNode) newHead).getStartNode().getId());

[JENKINS-65821] Introducing some synchronisation mechanisms to prevent some race condition #153

[JENKINS-65821] Introducing some synchronisation mechanisms to prevent some race condition #153

Conversation

twasyl commented Jun 4, 2021 • edited Loading

twasyl commented Jun 4, 2021

amuniz commented Jun 4, 2021

twasyl commented Jun 4, 2021

dwnusbaum left a comment

Choose a reason for hiding this comment

jglick commented Jun 4, 2021

jglick left a comment

Choose a reason for hiding this comment

jglick Jun 4, 2021

Choose a reason for hiding this comment

jglick Jun 7, 2021

Choose a reason for hiding this comment

jglick commented Jun 4, 2021

jtnord commented Jun 7, 2021

jglick commented Jun 7, 2021

timja commented Jun 11, 2021 • edited Loading

jglick commented Jun 14, 2021

timja commented Jun 15, 2021

jglick commented Jun 15, 2021

timja commented Jun 16, 2021

olamy commented Jun 16, 2021

jglick commented Jun 16, 2021

bitwiseman commented Jun 16, 2021

bitwiseman commented Jun 16, 2021

twasyl commented Jun 4, 2021 •

edited

Loading

timja commented Jun 11, 2021 •

edited

Loading