-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-65821] Introducing some synchronisation mechanisms to prevent some race condition #153
[JENKINS-65821] Introducing some synchronisation mechanisms to prevent some race condition #153
Conversation
src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java
Show resolved
Hide resolved
@jglick FYI |
Wouldn't be better to use a rw lock so reads do not block each other? |
I also thought about that, but was not really sure due to some discussion that were happening. I could probably follow this path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you trying to improve performance, or is this intended to fix some concurrency issue you ran into? It's not clear to me how adding locks would increase performance. If you did see a concurrency issue, do you have stack traces or any more details about the issue?
(I don't see any specific problems with a patch like this, but depending on the details of the issue this might not be the best place to add synchronization. Also, since there are always fallback code paths here in case the caches are empty, and the computed values should always be the same for the same Pipeline graph, you might be able to just switch to ConcurrentHashMap
to resolve any concurrency issues without needing to add explicit locks to this code, maybe with some minor updates to use methods like compute
if you are seeing a lot of concurrent reads and want to make sure the cached values are used.)
I have seen the thread dumps and I suspect the bug was improper concurrent access to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks right.
src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java
Outdated
Show resolved
Hide resolved
blockStartToEnd.put(((BlockEndNode) newHead).getStartNode().getId(), newHead.getId()); | ||
String overallEnclosing = nearestEnclosingBlock.get(((BlockEndNode) newHead).getStartNode().getId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sort of thing that makes me dubious you could a lockless data structure for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dwnusbaum suggests that you probably could, if you wanted to review the logic carefully.
Meaning that the PR description here is incorrect. The problem seems to have been a race condition and a full-blown spin hang, not just less-than-ideal performance. |
src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java
Show resolved
Hide resolved
what is the impact of this change - AFAICT it will force multiple read threads (the UI) to be single threaded. |
@jtnord yes perhaps. Unclear if concurrent access is common enough to be worth the extra risk & complexity though. These methods should all complete quickly enough that I doubt we want to bother optimizing more. |
Backport pull request #153 from twasyl/synchronization-for-optimisation
FYI we're seeing performance problems on ci.jenkins.io and this is a likely culprit https://www.irccloud.com/pastebin/JGLey7jD/ https://www.irccloud.com/pastebin/zWjk17uH/ Yesterday @daniel-beck noticed:
ci.jenkins.io is down atm because it's system log filled up with the stacktrace posted at the top |
You mean, a regression from this PR, or the bug that this PR purports to fix? |
Regression from this PR, will see if ci.jenkins.io mis-behaves again and get a thread dump |
I see. We can try switching to a |
Few people are hitting this on Jira: https://issues.jenkins.io/browse/JENKINS-65885 |
Definitely using a simple |
Oh that is a deadlock, not just higher thread contention. In such a case this must be reverted immediately and an emergency fix cut. @bitwiseman @twasyl @car-roll @dwnusbaum whoever else may be listening |
I'll file as soon as I'm back at my desk. |
FYI, workflow-api-plugin/src/main/java/org/jenkinsci/plugins/workflow/flow/GraphListener.java Lines 46 to 51 in ed2467f
I'm not sure, but it seems like |
[JENKINS-65885] Revert "Merge pull request #153 from twasyl/synchronization-for-optimisation"
JENKINS-65821
On some big pipelines, the graph may take some time to be generated due to possible synchronous updates to
Map
s. Trying to introduce some synchronisation mechanism to prevent those race conditions.