-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call ResetTimer synchronously from eol and skip the Tick when channel is null #226
Conversation
@jglick Could you please review? (I tried to apply a fix with a minimal impact to not break anything.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I guess? I have no memory of writing that comment. Does it seem to work?
@@ -336,11 +336,13 @@ private Object readResolve() { | |||
@Override | |||
public OutputStream decorateLogger(@SuppressWarnings("rawtypes") Run build, final OutputStream logger) | |||
throws IOException, InterruptedException { | |||
// TODO if channel == null, we can safely ResetTimer.call synchronously from eol and skip the Tick |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was from #74 FTR
Flakiness gone with this change. All other tests works. Just in case I also tested manually the case described in https://issues.jenkins.io/browse/JENKINS-54078?focusedCommentId=351432&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-351432. So yes, It seem to work. |
After upgrading to v991 (which includes this PR) we're now facing cpu pressure and eventually Jenkins POD is being terminated.
JavaMelody__jenkins-0_8_29_22.pdf Any suggestions? Many thanks! |
Maybe worth adding, the jobs run with a timeout of 2h, so if I got this right the reset call before happened only every 1h and would now happen on each log line. |
via diff --git src/test/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepTest.java src/test/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepTest.java
index f18f476..faea2f4 100644
--- src/test/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepTest.java
+++ src/test/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepTest.java
@@ -219,6 +219,7 @@ public class TimeoutStepTest {
WorkflowRun b = p.scheduleBuild2(0).getStartCondition().get();
SemaphoreStep.waitForStart("restarted/1", b);
});
+ Thread.sleep(10_000);
sessions.then(j -> {
WorkflowJob p = j.jenkins.getItemByFullName("restarted", WorkflowJob.class);
WorkflowRun b = p.getBuildByNumber(1); |
@chwehrli @tarioch thanks for the report and sorry for any inconvenience. I believe #234 should fix the performance issue as well as address the original problem motivating this PR. If you want to test prior to release, that would be great; just download a |
Thank you very much for that ultra-fast response @jglick ! |
Thanks. Please use #234 for further comments. |
Cloudbees CI reported a flakiness in TimeoutStepExecutionTest#activityRestart.
Logs:
activityRestart_stacktrace.txt
activityRestart_stderr.txt
The issue could be reproduced locally, (test crashes quite often), when after restart timeout set to expire in less then 7.5 sec the test fails.
From what I see the reason of the flake is the race condition between Tick (which gets scheduled with timeout / 2, in our particular case 7.5 sec) and Killer's delay (which is after restart in our particular case was 3.3 sec). When
delay < timeout / 2
no reset happens and TimeoutStepExecution cancelled the rest of the pipeline after delay.There is a
TODO if channel == null, we can safely ResetTimer.call synchronously from eol and skip the Tick
, if we will follow this recommendation we could reset timer synchronously instead of scheduling it intimeout / 2
time, which in itself will fix a flakiness.