-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-75067] Unexport step bodies when completed in synchronous mode #966
Conversation
Previously, the unexport of a step's body would only happen when scheduling the next execution (async mode). If the step was not run async (due to having an outcome set prior to the step starting), the body was not removed until the CpsThreadGroup was finishing. At this point the body's presence was treated as a resource leak and logged a warning. Doing an explicit unexport for the sync case at the proper time eliminates the warnings from the fall-back cleanup.
plugin/src/test/java/org/jenkinsci/plugins/workflow/StepListenerTest.java
Outdated
Show resolved
Hide resolved
@@ -359,6 +359,12 @@ private static boolean refersTo(Throwable t1, Throwable t2) { | |||
*/ | |||
private void scheduleNextRun() { | |||
if (syncMode) { | |||
// probably rare for a legit sync step to have a body, but it's possible for a (typically) async step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be right, though TBH I am not entirely sure I know how this code works.
Besides the case you identified in the test (normally asynch block-scoped step like node
terminated in StepListener
), there are a couple of cases that may or may not currently occur in production code but which seem worth testing:
- A block-scoped step which throws an exception from its
start
method (rather than scheduling its body). For example, bad arguments. This is probably commonplace enough and might already have some test coverage here. - A block-scoped step which calls
onSuccess
fromstart
and thenreturn true
to complete in synch mode. (E.g.,retry(0) {…}
if that did not just throw an error saying that the count needed to be positive.)
It is not just rare, but illegal, for a step to return in synchronous mode from start
if it actually ran a body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the first, I think I've seen such a test; I'll see if I can find it, though it should be trivial enough to have an (effective) duplicate for the sake of clarity. A test for the second shouldn't be a problem. I'll also update that comment with stronger language (around illegality of running a block but claiming sync).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the two additional tests for those scenarios, and all is well. However, they need a home. My initial test actually used StepListener
and was validating the function of its documented use-case, so StepListenerTest
was a logical home. The two new tests have nothing to do with StepListener
. The most logical place would be in a test class for either CpsThreadGroup
or CpsStepExecution
, but neither has a dedicated test class.
Thoughts? I'll check back later tonight, and if you haven't given a recommendation, I'll plan on reverting StepListenerTest
and move all three to a new CpsStepExecutionTest
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it matters too much where the tests go. I could see arguments for using CpsFlowExecutionTest
, CpsBodyExecutionTest
, DSLTest
, StepListenerTest
, and probably others depending on what exactly the tests are checking and where the relevant non-test logic lives. There is no real harm in creating a new test class either if you prefer.
Also, looking at code relevant to what is being discussed I am a bit surprised that we use assert
here instead of throwing an exception or otherwise doing something that will still happen when assertions are disabled:
workflow-cps-plugin/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/DSL.java
Line 337 in 80cad0f
assert context.withBodyInvokers(List::isEmpty) : "If a step claims synchronous completion, it shouldn't invoke body"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And FWIW, as far as the overall approach, the only thing that seems a bit unusual to me is the direct usage of the threadGroup
field, which until now was only checked in methods which would trigger a load if needed. Whether this matters in any practical scenario, I am not sure. One alternative could be to handle this as a special case in DSL
itself after we apply StepListener
s and then check to see if the step was aborted before starting:
workflow-cps-plugin/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/DSL.java
Lines 313 to 327 in 89b2432
for (StepListener sl : ExtensionList.lookup(StepListener.class)) { | |
try { | |
sl.notifyOfNewStep(s, context); | |
} catch (Throwable e) { | |
LOGGER.log(Level.WARNING, "failed to notify step listener before starting " + s.getDescriptor().getFunctionName(), e); | |
} | |
} | |
if (!context.isCompleted()) { | |
StepExecution e = s.start(context); | |
thread.setStep(e); | |
sync = e.start(); | |
} else { | |
s = null; | |
sync = true; | |
} |
Not sure what is clearest. The way you currently have things keeps all of the unexport logic in one place, but if the bug can only be observed with StepListener
maybe it makes sense to unexport the body, if it exists, in the else
block there. I don't feel strongly about it either way at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit surprised that we use assert here
Want that changed while I'm at it, or is that more of note-to-self for you guys?
Not sure what is clearest.
With those two new tests that were requested, I asserted the things I cared about, but didn't actually look at the behavior without my fix (i.e. if those scenarios result in the undesired warnings). When I refactor them, I'll do so. Eyeballing it, I think the first one ("exception thrown in start
") will trigger the bug as well. But I don't think your alternative approach will fix that scenario unless we put the unexport in two places... in the else
block, but also in the catch
block a few lines below.
I'll verify those assumptions, but I'll assume doing one (additional) export, (where I currently have it), is a small tiebreaker over having two unexports in DSL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want that changed while I'm at it, or is that more of note-to-self for you guys?
More of a note-to-self I think. Seems to date at least to jenkinsci/pipeline-plugin@efd3a64. Since this is checking something about external (plugin) code it is incorrect to use an assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Both new tests proposed by Jesse did in fact trigger the bug (and are both covered by my original fix)
- It is possible to cover all three of those cases in
DSL
itself by moving the unexport a little lower, but would require either increasing the visibility ofCpsStepContext#body
or by making the assumption that raw closure will be stored verbatim inCpsThreadGroup#closures
, and find/remove theNamedArgsAndClosure#body
(relying on identity). That seemed like a bad/limiting assumption. But increasing the visibility, if you're willing, does work if put afterworkflow-cps-plugin/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/DSL.java
Lines 336 to 341 in 80cad0f
if (sync) { assert context.withBodyInvokers(List::isEmpty) : "If a step claims synchronous completion, it shouldn't invoke body"; if (context.getOutcome()==null) { context.onFailure(new AssertionError("Step "+s+" claimed to have ended synchronously, but didn't set the result via StepContext.onSuccess/onFailure")); }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the new tests all trigger the bug and the current fix handles all of them in one place, I would stick with your current approach.
There are two additional confirmed scenarios where closures were leaked, both of which are covered by this fix. As these new tests have nothing to do with `StepListener` (and to keep the tests together), they have all moved and `StepListenerTest` has been effectively reverted to its previous state.
plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsStepContextTest.java
Outdated
Show resolved
Hide resolved
plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsStepContextTest.java
Outdated
Show resolved
Hide resolved
plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsStepContextTest.java
Outdated
Show resolved
Hide resolved
plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsStepContextTest.java
Outdated
Show resolved
Hide resolved
plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsStepContextTest.java
Show resolved
Hide resolved
plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsStepContextTest.java
Outdated
Show resolved
Hide resolved
plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsStepContextTest.java
Outdated
Show resolved
Hide resolved
plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsStepContextTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks right.
…pContextTest.java Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>
…pContextTest.java Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>
…pContextTest.java Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>
…pContextTest.java Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>
…pContextTest.java Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>
…pContextTest.java Co-authored-by: Jesse Glick <jglick@cloudbees.com>
…pContextTest.java Co-authored-by: Jesse Glick <jglick@cloudbees.com>
…pContextTest.java Co-authored-by: Jesse Glick <jglick@cloudbees.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(BTW GH lets you apply a batch of suggestions in a single commit.)
Ugh, sorry, I didn't see that 😄. It's my first PR with these magic GH-specific suggestions (as opposed to textual feedback) and didn't really know what to expect. Lesson learned. Also, I assume it's standard practice for you guys to squash these on merge? |
Depends. In this case, some commits undid effects of previous commits, so definitely. |
Previously, the unexport of a step's body would only happen when scheduling the next execution (async mode). If the step was not run async (due to having an outcome set prior to the step starting), the body was not removed until the CpsThreadGroup was finishing. At this point the body's presence was treated as a resource leak and logged a warning. Doing an explicit unexport for the sync case at the proper time eliminates the warnings from the fall-back cleanup.
Potential fix for https://issues.jenkins.io/browse/JENKINS-75067
I'm not sure if this solution is "correct", but it logically makes sense and all existing tests continue to pass. Even if the proper fix is different (and/or best handled by you folks), I figured this PR could still prove valuable by providing a test case.
Testing done
I created a new test representing the scenario that led me to creating the original issue. There are two relevant assertions; both fail without my change, and pass after the change. I don't think both are needed, but didn't know which would be preferred. The assertion for the (lack of) warning log is obviously simpler and asserts what I truly cared about. The other asserts based on state, which seemed to be the better approach, but feels a bit heavy/hacky (but it's the best I could come up with without touching unrelated code).
Submitter checklist