[JENKINS-75067] Unexport step bodies when completed in synchronous mode #966

rrjjvv · 2024-12-30T04:10:02Z

Previously, the unexport of a step's body would only happen when scheduling the next execution (async mode). If the step was not run async (due to having an outcome set prior to the step starting), the body was not removed until the CpsThreadGroup was finishing. At this point the body's presence was treated as a resource leak and logged a warning. Doing an explicit unexport for the sync case at the proper time eliminates the warnings from the fall-back cleanup.

Potential fix for https://issues.jenkins.io/browse/JENKINS-75067

I'm not sure if this solution is "correct", but it logically makes sense and all existing tests continue to pass. Even if the proper fix is different (and/or best handled by you folks), I figured this PR could still prove valuable by providing a test case.

Testing done

I created a new test representing the scenario that led me to creating the original issue. There are two relevant assertions; both fail without my change, and pass after the change. I don't think both are needed, but didn't know which would be preferred. The assertion for the (lack of) warning log is obviously simpler and asserts what I truly cared about. The other asserts based on state, which seemed to be the better approach, but feels a bit heavy/hacky (but it's the best I could come up with without touching unrelated code).

Submitter checklist

Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
Ensure that the pull request title represents the desired changelog entry
Please describe what you did
Link to relevant issues in GitHub or Jira
Link to relevant pull requests, esp. upstream and downstream changes
Ensure you have provided tests - that demonstrates feature works or fixes the issue

Previously, the unexport of a step's body would only happen when scheduling the next execution (async mode). If the step was not run async (due to having an outcome set prior to the step starting), the body was not removed until the CpsThreadGroup was finishing. At this point the body's presence was treated as a resource leak and logged a warning. Doing an explicit unexport for the sync case at the proper time eliminates the warnings from the fall-back cleanup.

plugin/src/test/java/org/jenkinsci/plugins/workflow/StepListenerTest.java

jglick · 2025-01-02T16:13:06Z

plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java

@@ -359,6 +359,12 @@ private static boolean refersTo(Throwable t1, Throwable t2) {
     */
    private void scheduleNextRun() {
        if (syncMode) {
+            // probably rare for a legit sync step to have a body, but it's possible for a (typically) async step


Might be right, though TBH I am not entirely sure I know how this code works.

Besides the case you identified in the test (normally asynch block-scoped step like node terminated in StepListener), there are a couple of cases that may or may not currently occur in production code but which seem worth testing:

A block-scoped step which throws an exception from its start method (rather than scheduling its body). For example, bad arguments. This is probably commonplace enough and might already have some test coverage here.

A block-scoped step which calls onSuccess from start and then return true to complete in synch mode. (E.g., retry(0) {…} if that did not just throw an error saying that the count needed to be positive.)

It is not just rare, but illegal, for a step to return in synchronous mode from start if it actually ran a body.

On the first, I think I've seen such a test; I'll see if I can find it, though it should be trivial enough to have an (effective) duplicate for the sake of clarity. A test for the second shouldn't be a problem. I'll also update that comment with stronger language (around illegality of running a block but claiming sync).

I have the two additional tests for those scenarios, and all is well. However, they need a home. My initial test actually used StepListener and was validating the function of its documented use-case, so StepListenerTest was a logical home. The two new tests have nothing to do with StepListener. The most logical place would be in a test class for either CpsThreadGroup or CpsStepExecution, but neither has a dedicated test class.

Thoughts? I'll check back later tonight, and if you haven't given a recommendation, I'll plan on reverting StepListenerTest and move all three to a new CpsStepExecutionTest class.

I do not think it matters too much where the tests go. I could see arguments for using CpsFlowExecutionTest, CpsBodyExecutionTest, DSLTest, StepListenerTest, and probably others depending on what exactly the tests are checking and where the relevant non-test logic lives. There is no real harm in creating a new test class either if you prefer.

Also, looking at code relevant to what is being discussed I am a bit surprised that we use assert here instead of throwing an exception or otherwise doing something that will still happen when assertions are disabled:

workflow-cps-plugin/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/DSL.java

Line 337 in 80cad0f

assert context.withBodyInvokers(List::isEmpty) : "If a step claims synchronous completion, it shouldn't invoke body";

And FWIW, as far as the overall approach, the only thing that seems a bit unusual to me is the direct usage of the threadGroup field, which until now was only checked in methods which would trigger a load if needed. Whether this matters in any practical scenario, I am not sure. One alternative could be to handle this as a special case in DSL itself after we apply StepListeners and then check to see if the step was aborted before starting:

workflow-cps-plugin/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/DSL.java

Lines 313 to 327 in 89b2432

for (StepListener sl : ExtensionList.lookup(StepListener.class)) {

try {

sl.notifyOfNewStep(s, context);

} catch (Throwable e) {

LOGGER.log(Level.WARNING, "failed to notify step listener before starting " + s.getDescriptor().getFunctionName(), e);

}

}

if (!context.isCompleted()) {

StepExecution e = s.start(context);

thread.setStep(e);

sync = e.start();

} else {

s = null;

sync = true;

}

Not sure what is clearest. The way you currently have things keeps all of the unexport logic in one place, but if the bug can only be observed with StepListener maybe it makes sense to unexport the body, if it exists, in the else block there. I don't feel strongly about it either way at this point.

I am a bit surprised that we use assert here

Want that changed while I'm at it, or is that more of note-to-self for you guys?

Not sure what is clearest.

With those two new tests that were requested, I asserted the things I cared about, but didn't actually look at the behavior without my fix (i.e. if those scenarios result in the undesired warnings). When I refactor them, I'll do so. Eyeballing it, I think the first one ("exception thrown in start") will trigger the bug as well. But I don't think your alternative approach will fix that scenario unless we put the unexport in two places... in the else block, but also in the catch block a few lines below.

I'll verify those assumptions, but I'll assume doing one (additional) export, (where I currently have it), is a small tiebreaker over having two unexports in DSL?

Want that changed while I'm at it, or is that more of note-to-self for you guys?

More of a note-to-self I think. Seems to date at least to jenkinsci/pipeline-plugin@efd3a64. Since this is checking something about external (plugin) code it is incorrect to use an assertion.

Both new tests proposed by Jesse did in fact trigger the bug (and are both covered by my original fix)

It is possible to cover all three of those cases in DSL itself by moving the unexport a little lower, but would require either increasing the visibility of CpsStepContext#body or by making the assumption that raw closure will be stored verbatim in CpsThreadGroup#closures, and find/remove the NamedArgsAndClosure#body (relying on identity). That seemed like a bad/limiting assumption. But increasing the visibility, if you're willing, does work if put after

workflow-cps-plugin/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/DSL.java

Lines 336 to 341 in 80cad0f

if (sync) {

assert context.withBodyInvokers(List::isEmpty) : "If a step claims synchronous completion, it shouldn't invoke body";

if (context.getOutcome()==null) {

context.onFailure(new AssertionError("Step "+s+" claimed to have ended synchronously, but didn't set the result via StepContext.onSuccess/onFailure"));

}

Given that the new tests all trigger the bug and the current fix handles all of them in one place, I would stick with your current approach.

There are two additional confirmed scenarios where closures were leaked, both of which are covered by this fix. As these new tests have nothing to do with `StepListener` (and to keep the tests together), they have all moved and `StepListenerTest` has been effectively reverted to its previous state.

plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsStepContextTest.java

jglick

Looks right.

…pContextTest.java Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>

…pContextTest.java Co-authored-by: Jesse Glick <jglick@cloudbees.com>

jglick

(BTW GH lets you apply a batch of suggestions in a single commit.)

rrjjvv · 2025-01-03T16:53:54Z

Ugh, sorry, I didn't see that 😄. It's my first PR with these magic GH-specific suggestions (as opposed to textual feedback) and didn't really know what to expect. Lesson learned. Also, I assume it's standard practice for you guys to squash these on merge? ~~I didn't see any docs about your policy on that.~~ Just saw the banner at the bottom saying it will indeed squash automatically.

jglick · 2025-01-03T18:41:23Z

it's standard practice for you guys to squash these on merge

Depends. In this case, some commits undid effects of previous commits, so definitely.

rrjjvv requested a review from a team as a code owner December 30, 2024 04:10

jglick added the bug label Jan 2, 2025

jglick reviewed Jan 2, 2025

View reviewed changes

plugin/src/test/java/org/jenkinsci/plugins/workflow/StepListenerTest.java Outdated Show resolved Hide resolved

jglick reviewed Jan 2, 2025

View reviewed changes

rrjjvv added 2 commits January 2, 2025 17:45

[JENKINS-58084] Make comment more informative

964eceb

dwnusbaum approved these changes Jan 3, 2025

View reviewed changes

jglick reviewed Jan 3, 2025

View reviewed changes

plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsStepContextTest.java Outdated Show resolved Hide resolved

jglick reviewed Jan 3, 2025

View reviewed changes

plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsStepContextTest.java Outdated Show resolved Hide resolved

jglick reviewed Jan 3, 2025

View reviewed changes

plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsStepContextTest.java Outdated Show resolved Hide resolved

jglick approved these changes Jan 3, 2025

View reviewed changes

rrjjvv and others added 8 commits January 3, 2025 09:34

Update plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsSte…

184f18b

…pContextTest.java Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>

Update plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsSte…

2f0cf8b

…pContextTest.java Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>

Update plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsSte…

7ff8c05

…pContextTest.java Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>

Update plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsSte…

46103d3

…pContextTest.java Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>

Update plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsSte…

ea19882

…pContextTest.java Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>

Update plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsSte…

51d0b81

…pContextTest.java Co-authored-by: Jesse Glick <jglick@cloudbees.com>

Update plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsSte…

d5d23c7

…pContextTest.java Co-authored-by: Jesse Glick <jglick@cloudbees.com>

Update plugin/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsSte…

f7c005b

…pContextTest.java Co-authored-by: Jesse Glick <jglick@cloudbees.com>

jglick approved these changes Jan 3, 2025

View reviewed changes

dwnusbaum approved these changes Jan 3, 2025

View reviewed changes

jglick enabled auto-merge (squash) January 3, 2025 16:53

jglick merged commit f9c5614 into jenkinsci:master Jan 3, 2025
17 checks passed

rrjjvv deleted the stale-closure-fix branch January 3, 2025 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JENKINS-75067] Unexport step bodies when completed in synchronous mode #966

[JENKINS-75067] Unexport step bodies when completed in synchronous mode #966

rrjjvv commented Dec 30, 2024

jglick Jan 2, 2025

rrjjvv Jan 2, 2025

rrjjvv Jan 2, 2025

dwnusbaum Jan 2, 2025

dwnusbaum Jan 2, 2025 •

edited

Loading

rrjjvv Jan 2, 2025

jglick Jan 3, 2025

rrjjvv Jan 3, 2025 •

edited

Loading

dwnusbaum Jan 3, 2025

jglick left a comment

jglick left a comment

rrjjvv commented Jan 3, 2025 •

edited

Loading

jglick commented Jan 3, 2025

	for (StepListener sl : ExtensionList.lookup(StepListener.class)) {
	try {
	sl.notifyOfNewStep(s, context);
	} catch (Throwable e) {
	LOGGER.log(Level.WARNING, "failed to notify step listener before starting " + s.getDescriptor().getFunctionName(), e);
	}
	}
	if (!context.isCompleted()) {
	StepExecution e = s.start(context);
	thread.setStep(e);
	sync = e.start();
	} else {
	s = null;
	sync = true;
	}

	if (sync) {
	assert context.withBodyInvokers(List::isEmpty) : "If a step claims synchronous completion, it shouldn't invoke body";

	if (context.getOutcome()==null) {
	context.onFailure(new AssertionError("Step "+s+" claimed to have ended synchronously, but didn't set the result via StepContext.onSuccess/onFailure"));
	}

[JENKINS-75067] Unexport step bodies when completed in synchronous mode #966

[JENKINS-75067] Unexport step bodies when completed in synchronous mode #966

Conversation

rrjjvv commented Dec 30, 2024

Testing done

Submitter checklist

jglick Jan 2, 2025

Choose a reason for hiding this comment

rrjjvv Jan 2, 2025

Choose a reason for hiding this comment

rrjjvv Jan 2, 2025

Choose a reason for hiding this comment

dwnusbaum Jan 2, 2025

Choose a reason for hiding this comment

dwnusbaum Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

rrjjvv Jan 2, 2025

Choose a reason for hiding this comment

jglick Jan 3, 2025

Choose a reason for hiding this comment

rrjjvv Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

dwnusbaum Jan 3, 2025

Choose a reason for hiding this comment

jglick left a comment

Choose a reason for hiding this comment

jglick left a comment

Choose a reason for hiding this comment

rrjjvv commented Jan 3, 2025 • edited Loading

jglick commented Jan 3, 2025

dwnusbaum Jan 2, 2025 •

edited

Loading

rrjjvv Jan 3, 2025 •

edited

Loading

rrjjvv commented Jan 3, 2025 •

edited

Loading