-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better fix of timeout
activity tick after restart
#234
Conversation
Thanks @jglick for quick reaction on this regression, just in case I've setup 300 executions run in test selector https://gauntlet-2.cloudbees.com/rosie/job/playground/job/flakebusters/job/selectors/job/pct-test-selector/168/ |
Seems it occasionally fails in workflow-basic-steps-plugin/src/test/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepTest.java Line 233 in 85b6c33
Unclear if this is a bug in production code or just in the test. In any case, I am inclined to ship this amendment anyway, at least if @chwehrli @tarioch can confirm it addresses the performance regression. |
TimeoutStepTest#activityRestart still flaky, failed 3 times in 300 executions. |
This also looks good from our end with regards to performance issues we noticed with 226 - Jenkins has been running fine over several hours at regular load - thanks! |
Probably test changes are not needed? If I run ‘old’-versioned test against your fix the test does not flake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting the reason for test changes.
@@ -71,6 +73,8 @@ public class TimeoutStepTest { | |||
|
|||
@Rule public GitSampleRepoRule git = new GitSampleRepoRule(); | |||
|
|||
@Rule public LoggerRule logging = new LoggerRule().record(TimeoutStepExecution.class, Level.FINE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to help debug.
@@ -212,13 +216,14 @@ public void activityRestart() throws Throwable { | |||
+ " echo 'NotHereYet';\n" | |||
+ " sleep 10;\n" | |||
+ " echo 'JustHere!';\n" | |||
+ " sleep 30;\n" | |||
+ " sleep 20;\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pre-commit version the main patch contained a mistake—the build did not time out until 1½× the stated period; in this case sleep
would run for ~23s before being terminated. I was trying to strengthen the test here to verify that the 15s activity timeout really applies.
+ " echo 'ShouldNot!';\n" | ||
+ " }\n" | ||
+ "}\n", true)); | ||
WorkflowRun b = p.scheduleBuild2(0).getStartCondition().get(); | ||
SemaphoreStep.waitForStart("restarted/1", b); | ||
}); | ||
Thread.sleep(10_000); // restarting should count as activity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making the test reliably reproduce the originally reported problem.
Like what I did in 512d920? OK, if that is what it takes to address the regression and prevent test flakes, then we can go with that. I just want to ship this promptly. |
#226 correctly diagnosed a problem brought to light by a flaky test, but the fix was not correct (only applied to messages printed within the controller as opposed to from an agent) and apparently introduced a severe performance regression.
I think the root issue was that restarting the controller ought to count as “activity” for purposes of delaying the timeout (better to err on the side of caution) but it did not.