Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-75067] Unexport step bodies when completed in synchronous mode #966

Merged
merged 11 commits into from
Jan 3, 2025

Conversation

rrjjvv
Copy link
Contributor

@rrjjvv rrjjvv commented Dec 30, 2024

Previously, the unexport of a step's body would only happen when scheduling the next execution (async mode). If the step was not run async (due to having an outcome set prior to the step starting), the body was not removed until the CpsThreadGroup was finishing. At this point the body's presence was treated as a resource leak and logged a warning. Doing an explicit unexport for the sync case at the proper time eliminates the warnings from the fall-back cleanup.

Potential fix for https://issues.jenkins.io/browse/JENKINS-75067

I'm not sure if this solution is "correct", but it logically makes sense and all existing tests continue to pass. Even if the proper fix is different (and/or best handled by you folks), I figured this PR could still prove valuable by providing a test case.

Testing done

I created a new test representing the scenario that led me to creating the original issue. There are two relevant assertions; both fail without my change, and pass after the change. I don't think both are needed, but didn't know which would be preferred. The assertion for the (lack of) warning log is obviously simpler and asserts what I truly cared about. The other asserts based on state, which seemed to be the better approach, but feels a bit heavy/hacky (but it's the best I could come up with without touching unrelated code).

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

Previously, the unexport of a step's body would only happen when
scheduling the next execution (async mode).  If the step was not run
async (due to having an outcome set prior to the step starting), the
body was not removed until the CpsThreadGroup was finishing.  At this
point the body's presence was treated as a resource leak and logged a
warning.  Doing an explicit unexport for the sync case at the proper
time eliminates the warnings from the fall-back cleanup.
@rrjjvv rrjjvv requested a review from a team as a code owner December 30, 2024 04:10
@jglick jglick added the bug label Jan 2, 2025
@@ -359,6 +359,12 @@ private static boolean refersTo(Throwable t1, Throwable t2) {
*/
private void scheduleNextRun() {
if (syncMode) {
// probably rare for a legit sync step to have a body, but it's possible for a (typically) async step
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be right, though TBH I am not entirely sure I know how this code works.

Besides the case you identified in the test (normally asynch block-scoped step like node terminated in StepListener), there are a couple of cases that may or may not currently occur in production code but which seem worth testing:

  • A block-scoped step which throws an exception from its start method (rather than scheduling its body). For example, bad arguments. This is probably commonplace enough and might already have some test coverage here.
  • A block-scoped step which calls onSuccess from start and then return true to complete in synch mode. (E.g., retry(0) {…} if that did not just throw an error saying that the count needed to be positive.)

It is not just rare, but illegal, for a step to return in synchronous mode from start if it actually ran a body.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the first, I think I've seen such a test; I'll see if I can find it, though it should be trivial enough to have an (effective) duplicate for the sake of clarity. A test for the second shouldn't be a problem. I'll also update that comment with stronger language (around illegality of running a block but claiming sync).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the two additional tests for those scenarios, and all is well. However, they need a home. My initial test actually used StepListener and was validating the function of its documented use-case, so StepListenerTest was a logical home. The two new tests have nothing to do with StepListener. The most logical place would be in a test class for either CpsThreadGroup or CpsStepExecution, but neither has a dedicated test class.

Thoughts? I'll check back later tonight, and if you haven't given a recommendation, I'll plan on reverting StepListenerTest and move all three to a new CpsStepExecutionTest class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it matters too much where the tests go. I could see arguments for using CpsFlowExecutionTest, CpsBodyExecutionTest, DSLTest, StepListenerTest, and probably others depending on what exactly the tests are checking and where the relevant non-test logic lives. There is no real harm in creating a new test class either if you prefer.

Also, looking at code relevant to what is being discussed I am a bit surprised that we use assert here instead of throwing an exception or otherwise doing something that will still happen when assertions are disabled:

assert context.withBodyInvokers(List::isEmpty) : "If a step claims synchronous completion, it shouldn't invoke body";

Copy link
Member

@dwnusbaum dwnusbaum Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And FWIW, as far as the overall approach, the only thing that seems a bit unusual to me is the direct usage of the threadGroup field, which until now was only checked in methods which would trigger a load if needed. Whether this matters in any practical scenario, I am not sure. One alternative could be to handle this as a special case in DSL itself after we apply StepListeners and then check to see if the step was aborted before starting:

for (StepListener sl : ExtensionList.lookup(StepListener.class)) {
try {
sl.notifyOfNewStep(s, context);
} catch (Throwable e) {
LOGGER.log(Level.WARNING, "failed to notify step listener before starting " + s.getDescriptor().getFunctionName(), e);
}
}
if (!context.isCompleted()) {
StepExecution e = s.start(context);
thread.setStep(e);
sync = e.start();
} else {
s = null;
sync = true;
}

Not sure what is clearest. The way you currently have things keeps all of the unexport logic in one place, but if the bug can only be observed with StepListener maybe it makes sense to unexport the body, if it exists, in the else block there. I don't feel strongly about it either way at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit surprised that we use assert here

Want that changed while I'm at it, or is that more of note-to-self for you guys?

Not sure what is clearest.

With those two new tests that were requested, I asserted the things I cared about, but didn't actually look at the behavior without my fix (i.e. if those scenarios result in the undesired warnings). When I refactor them, I'll do so. Eyeballing it, I think the first one ("exception thrown in start") will trigger the bug as well. But I don't think your alternative approach will fix that scenario unless we put the unexport in two places... in the else block, but also in the catch block a few lines below.

I'll verify those assumptions, but I'll assume doing one (additional) export, (where I currently have it), is a small tiebreaker over having two unexports in DSL?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want that changed while I'm at it, or is that more of note-to-self for you guys?

More of a note-to-self I think. Seems to date at least to jenkinsci/pipeline-plugin@efd3a64. Since this is checking something about external (plugin) code it is incorrect to use an assertion.

Copy link
Contributor Author

@rrjjvv rrjjvv Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Both new tests proposed by Jesse did in fact trigger the bug (and are both covered by my original fix)
  2. It is possible to cover all three of those cases in DSL itself by moving the unexport a little lower, but would require either increasing the visibility of CpsStepContext#body or by making the assumption that raw closure will be stored verbatim in CpsThreadGroup#closures, and find/remove the NamedArgsAndClosure#body (relying on identity). That seemed like a bad/limiting assumption. But increasing the visibility, if you're willing, does work if put after
    if (sync) {
    assert context.withBodyInvokers(List::isEmpty) : "If a step claims synchronous completion, it shouldn't invoke body";
    if (context.getOutcome()==null) {
    context.onFailure(new AssertionError("Step "+s+" claimed to have ended synchronously, but didn't set the result via StepContext.onSuccess/onFailure"));
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the new tests all trigger the bug and the current fix handles all of them in one place, I would stick with your current approach.

rrjjvv added 2 commits January 2, 2025 17:45
There are two additional confirmed scenarios where closures were leaked,
 both of which are covered by this fix.

As these new tests have nothing to do with `StepListener` (and to keep
the tests together), they have all moved and `StepListenerTest` has been
 effectively reverted to its previous state.
Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks right.

rrjjvv and others added 8 commits January 3, 2025 09:34
…pContextTest.java

Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>
…pContextTest.java

Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>
…pContextTest.java

Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>
…pContextTest.java

Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>
…pContextTest.java

Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>
…pContextTest.java

Co-authored-by: Jesse Glick <jglick@cloudbees.com>
…pContextTest.java

Co-authored-by: Jesse Glick <jglick@cloudbees.com>
…pContextTest.java

Co-authored-by: Jesse Glick <jglick@cloudbees.com>
Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(BTW GH lets you apply a batch of suggestions in a single commit.)

@jglick jglick enabled auto-merge (squash) January 3, 2025 16:53
@rrjjvv
Copy link
Contributor Author

rrjjvv commented Jan 3, 2025

Ugh, sorry, I didn't see that 😄. It's my first PR with these magic GH-specific suggestions (as opposed to textual feedback) and didn't really know what to expect. Lesson learned. Also, I assume it's standard practice for you guys to squash these on merge? I didn't see any docs about your policy on that. Just saw the banner at the bottom saying it will indeed squash automatically.

@jglick jglick merged commit f9c5614 into jenkinsci:master Jan 3, 2025
17 checks passed
@rrjjvv rrjjvv deleted the stale-closure-fix branch January 3, 2025 17:10
@jglick
Copy link
Member

jglick commented Jan 3, 2025

it's standard practice for you guys to squash these on merge

Depends. In this case, some commits undid effects of previous commits, so definitely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants