Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-54078] JEP-210 compatibility #74

Merged
merged 10 commits into from
Oct 26, 2018

Conversation

jglick
Copy link
Member

@jglick jglick commented Oct 8, 2018

Demonstrates that abayer#1 was a necessary amendment to #62 in order to make timeout work properly with remote durable tasks as of JEP-210. If you disable that fix using

diff --git a/src/main/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepExecution.java b/src/main/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepExecution.java
index 20f2f21..0b4933c 100644
--- a/src/main/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepExecution.java
+++ b/src/main/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepExecution.java
@@ -250,11 +250,6 @@ public class TimeoutStepExecution extends AbstractStepExecutionImpl {
             this.callback = callback;
         }
 
-        private Object writeReplace() {
-            Channel ch = Channel.current();
-            return ch == null ? this : new ConsoleLogFilterImpl(ch.export(ResetCallback.class, callback));
-        }
-
         @Override
         public OutputStream decorateLogger(@SuppressWarnings("rawtypes") Run build, final OutputStream logger)
                 throws IOException, InterruptedException {

then the build fails as expected:

[Pipeline] sh
[p] Running shell script
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
java.io.NotSerializableException: hudson.model.Executor
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
	at …
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
	at hudson.remoting.UserRequest._serialize(UserRequest.java:264)
	at hudson.remoting.UserRequest.serialize(UserRequest.java:273)
	at hudson.remoting.UserRequest.<init>(UserRequest.java:101)
	at hudson.remoting.Channel.call(Channel.java:953)
	at hudson.Launcher$RemoteLauncher.launch(Launcher.java:1052)
	at hudson.Launcher$ProcStarter.start(Launcher.java:449)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:185)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:98)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:266)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:270)
	at …
Caused: java.io.IOException: Unable to serialize hudson.Launcher$RemoteLaunchCallable@2b4d80f
	at hudson.remoting.UserRequest.serialize(UserRequest.java:275)
	at hudson.remoting.UserRequest.<init>(UserRequest.java:101)
	at hudson.remoting.Channel.call(Channel.java:953)
	at hudson.Launcher$RemoteLauncher.launch(Launcher.java:1052)
	at hudson.Launcher$ProcStarter.start(Launcher.java:449)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:185)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:98)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:266)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:270)
	at …
Finished: FAILURE

Some more discussion: jenkinsci/workflow-support-plugin#15 (comment)


And then amends that change for JENKINS-54078 because it turns out (as I just happened to learn yesterday in jenkinsci/pipeline-cloudwatch-logs-plugin@f66f638!) that Channel.export is pretty much useless; for some reason I have yet to track down, FilePath calls from the Git plugin (but not those from, say, the sh step) trigger collection of references.

@@ -28,7 +28,7 @@
<parent>
<groupId>org.jenkins-ci.plugins</groupId>
<artifactId>plugin</artifactId>
<version>3.19</version>
<version>3.25</version>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various changes to pick up JEP-210 binaries.

pom.xml Outdated
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jenkins-ci.plugins.workflow</groupId>
<artifactId>workflow-durable-task-step</artifactId>
<version>2.3</version>
<version>2.22-rc713.c51e25cf0eef</version> <!-- TODO https://github.com/jenkinsci/workflow-durable-task-step-plugin/pull/75 -->
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jenkinsci/workflow-durable-task-step-plugin#75, to get ABORTED rather than FAILURE in the test.

Note that this build is actually older than the 2.22 on the update center, but it does contain jenkinsci/workflow-durable-task-step-plugin#63 and the subsequent commits are uninteresting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Released as workflow-durable-task-step 2.24.

p.setDefinition(new CpsFlowDefinition("" +
"node('!master') {\n" +
" timeout(time:5, unit:'SECONDS', activity: true) {\n" +
" sh 'set +x; echo NotHere; sleep 3; echo NotHereYet; sleep 3; echo JustHere; sleep 10; echo ShouldNot'\n" +
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot use echo steps as in the other tests, because we need to demonstrate serialization of the ConsoleLogFilter impl, which only happens if a nested step actually remotes its TaskListener.

@dwnusbaum
Copy link
Member

Looks like the test failure is because EnvStepTest needs to be updated to handle the new log format now that the parallel branch name isn't shown in the logs:

pom.xml Outdated
<workflow-cps-plugin.version>2.32</workflow-cps-plugin.version>
<workflow-support-plugin.version>2.14</workflow-support-plugin.version>
<workflow-api-plugin.version>2.28</workflow-api-plugin.version>
<workflow-cps-plugin.version>2.58-beta-1</workflow-cps-plugin.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be updated to non-beta versions now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me to do so once jenkinsci/workflow-durable-task-step-plugin#75 is in a release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if that PR isn't released by next Friday I will release it and ping you then.

@jglick
Copy link
Member Author

jglick commented Oct 16, 2018

I will use this PR to investigate JENKINS-54078.

@jglick jglick changed the title [JENKINS-26521] Prove that Channel.export is needed for JEP-210 compatibility [JENKINS-54078] JEP-210 compatibility Oct 16, 2018
@jglick jglick requested a review from dwnusbaum October 16, 2018 17:21
@jglick jglick requested a review from joseblas October 23, 2018 18:12
dwnusbaum
dwnusbaum previously approved these changes Oct 23, 2018
Copy link
Member

@dwnusbaum dwnusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, although it would be nice to understand the root cause.

pom.xml Outdated
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jenkins-ci.plugins.workflow</groupId>
<artifactId>workflow-durable-task-step</artifactId>
<version>2.3</version>
<version>2.22-rc713.c51e25cf0eef</version> <!-- TODO https://github.com/jenkinsci/workflow-durable-task-step-plugin/pull/75 -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Released as workflow-durable-task-step 2.24.

}

private Object writeReplace() {
Channel ch = Channel.current();
return ch == null ? this : new ConsoleLogFilterImpl(ch.export(ResetCallback.class, callback));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the issue is that only the marshaller holds the reference to the exported object, so after the object is completely serialized it is free to be garbage collected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the issue was that this code released all exported objects when a Callable finished, insofar as those objects had originally been exported within the dynamic scope of that Callable’s serialization. Or something like this. I could not follow exactly what the logic was, and the documentation was scant. At any rate, the revised strategy with the SlaveToMasterCallable is simpler to reason about explicitly.

@jglick
Copy link
Member Author

jglick commented Oct 23, 2018

I intend to amend 0918918 to be more secure. Also the CI seems to be broken and I would like to get a clean build.

<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jenkins-ci.plugins.workflow</groupId>
<artifactId>workflow-durable-task-step</artifactId>
<version>2.3</version>
<version>2.24</version>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR this was to pick up the fix of JENKINS-28822.

@jglick
Copy link
Member Author

jglick commented Oct 25, 2018

Having difficulty coming up with a convincing test case for the upgrade scenario. Was playing with something like

@Issue("JENKINS-54078")
@LocalData
@Test public void activityUpgrade() {
    story.then(r -> {
        r.createSlave();
        /* Setup, with workflow-job reverted to 2.25 and src/main/ from workflow-basic-steps 2.11:
        WorkflowJob p = r.jenkins.createProject(WorkflowJob.class, "p");
        p.setDefinition(new CpsFlowDefinition(
            "node('!master') {\n" +
            "  timeout(time: 5, unit: 'SECONDS', activity: true) {\n" +
            "    remotePrint 'NotHere'\n" +
            "    sleep 3\n" +
            "    remotePrint 'NotHereYet'\n" +
            "    sleep 3\n" +
            "    remotePrint 'JustHere'\n" +
            "    while (true) {\n" +
            "      sleep 10\n" +
            "      remotePrint 'ShouldNot'\n" +
            "    }\n" +
            "  }\n" +
            "}\n", true));
        WorkflowRun b = p.scheduleBuild2(0).waitForStart();
        r.waitForMessage("“JustHere” from agent", b);
        r.waitForMessage("Sleeping for 10 sec", b);
        Runtime.getRuntime().halt(1);
        */
        WorkflowRun b = r.jenkins.getItemByFullName("p", WorkflowJob.class).getBuildByNumber(1);
        r.assertBuildStatus(Result.ABORTED, r.waitForCompletion(b));
        r.assertLogContains("Cancelling nested steps due to timeout", b);
        /* Actually we get this, once, because the sleep expired:
        r.assertLogNotContains("ShouldNot", b);
        */
        b.getLogText().writeRawLogTo(0, System.out);
    });
}
// TODO adapted from workflow-job; consider pushing into tests JAR of workflow-support for reuse
public static final class RemotePrintStep extends Step {
    public final String message;
    @DataBoundConstructor public RemotePrintStep(String message) {
        this.message = message;
    }
    @Override public StepExecution start(StepContext context) throws Exception {
        return new Execution(context, message);
    }
    private static final class Execution extends SynchronousNonBlockingStepExecution<Void> {
        private final String message;
        Execution(StepContext context, String message) {
            super(context);
            this.message = message;
        }
        @Override protected Void run() throws Exception {
            return getContext().get(Node.class).getChannel().call(new PrintCallable(getContext().get(TaskListener.class), message));
        }
    }
    private static final class PrintCallable extends MasterToSlaveCallable<Void, RuntimeException> {
        private final TaskListener listener;
        private final String message;
        PrintCallable(TaskListener listener, String message) {
            this.listener = listener;
            this.message = message;
        }
        @Override public Void call() throws RuntimeException {
            listener.getLogger().println("“" + message + "” from " + (JenkinsJVM.isJenkinsJVM() ? "master" : "agent"));
            return null;
        }
    }
    @TestExtension public static final class DescriptorImpl extends StepDescriptor {
        @Override public Set<? extends Class<?>> getRequiredContext() {
            return ImmutableSet.of(Node.class, TaskListener.class);
        }
        @Override public String getFunctionName() {
            return "remotePrint";
        }
    }
}

However even before the upgrade it does not work unless I also

diff --git a/src/main/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepExecution.java b/src/main/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepExecution.java
index 20f2f21..76c0154 100644
--- a/src/main/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepExecution.java
+++ b/src/main/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepExecution.java
@@ -75,7 +75,11 @@ public class TimeoutStepExecution extends AbstractStepExecutionImpl {
 
     @Override
     public void onResume() {
-        setupTimer(System.currentTimeMillis(), false);
+        if (activity) {
+            resetTimer();
+        } else {
+            setupTimer(System.currentTimeMillis(), false);
+        }
     }
 
     private TaskListener listener() {

because end is set to be however many seconds since the last activity before the restart, and so as soon as the build resumes the timeout block is cancelled. I am not sure how to reproduce this problem, if it is a problem, in activityRestart. Probably it is necessary to have a single (atom) step running across the upgrade and periodically emitting output, so that we do not see the effect of the Pipeline step notes resetting the timer; sh cannot be used for this because it is not easily resumed in @LocalData.

Related to that, there seems to be no existing coverage for the case that activity is seen after a restart and the build succeeds.

Anyway, I think for now I am giving up on a test case for the upgrade, and will just write the probable fix based on reasoning.

Copy link
Member

@dwnusbaum dwnusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good to me, and I did some manual testing across upgrades which worked as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants