Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: JTE Pipelines resuming execution after successful run #328

Open
brosmar opened this issue Sep 27, 2023 · 10 comments
Open

[Bug]: JTE Pipelines resuming execution after successful run #328

brosmar opened this issue Sep 27, 2023 · 10 comments
Labels
bug Something isn't working Could not reproduce issue The JTE team was not able to reproduce the issue.

Comments

@brosmar
Copy link

brosmar commented Sep 27, 2023

Jenkins Version

CloudBees CI Client Controller Latest 2.414.2.2-rolling

JTE Version

2.5.3

Bug Description

Same issue as:
#309
#187

If more than 3 People are reporting the same problematic behavior than the issue should not be closed.

image

all the Jobs where formerly green as the job in the first row.

Relevant log output

And even if the Job result was succesful the JTE Templating job is arbitary restarted:

10:52:37  stepFailed: false 10:52:37  result: null 10:52:37  current: SUCCESS 10:52:37  ------------------------------------------------------------------------------------------------- 10:52:37  end Notify step null/null (Lifecycle Hook) 10:52:37  ------------------------------------------------------------------------------------------------- 10:52:37  ------------------------------------------------------------------------------------------------- 10:52:37  [Pipeline] End of Pipeline 10:52:37  Finished: SUCCESS 09:06:15  Resuming build at Sat Sep 23 09:06:15 CEST 2023 after Jenkins restart 09:06:15  [Pipeline] End of Pipeline 09:06:15  java.io.FileNotFoundException: /var/jenkins_home/jobs/MarketData/jobs/XENTRIC/jobs/visitorscenter/jobs/external-ui/jobs/build-ui/branches/develop/builds/41/program.dat (No such file or directory) 09:06:15  	at java.base/java.io.FileInputStream.open0(Native Method) 09:06:15  	at java.base/java.io.FileInputStream.open(FileInputStream.java:219) 09:06:15  	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157) 09:06:15  	at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverReader.openStreamAt(RiverReader.java:196) 09:06:15  	at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverReader.restorePickles(RiverReader.java:140) 09:06:15  	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.loadProgramAsync(CpsFlowExecution.java:804) 09:06:15  	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onLoad(CpsFlowExecution.java:770) 09:06:15  	at org.jenkinsci.plugins.workflow.job.WorkflowRun.getExecution(WorkflowRun.java:728) 09:06:15  	at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:582) 09:06:15  	at hudson.model.RunMap.retrieve(RunMap.java:233) 09:06:15  	at hudson.model.RunMap.retrieve(RunMap.java:61) 09:06:15  	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:660) 09:06:15  	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:642) 09:06:15  	at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:540) 09:06:15  	at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:240) 09:06:15  	at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:234) 09:06:15  	at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:105) 09:06:15  	at jenkins.model.PeepholePermalink.resolve(PeepholePermalink.java:105) 09:06:15  	at hudson.model.Job.getLastCompletedBuild(Job.java:990) 09:06:15  	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$PipelineInternalCalls$1.writeTo(CpsFlowExecution.java:2052) 09:06:15  	at com.cloudbees.jenkins.support.SupportPlugin.writeBundle(SupportPlugin.java:418) 09:06:15  	at com.cloudbees.jenkins.support.SupportPlugin.writeBundle(SupportPlugin.java:353) 09:06:15  	at com.cloudbees.jenkins.support.SupportPlugin$PeriodicWorkImpl.lambda$doRun$0(SupportPlugin.java:946) 09:06:15  Also:   org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: 121af19f-483b-46d0-8c50-87e831d00429 09:06:15  Caused: java.io.IOException: Failed to load build state 09:06:15  	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:878) 09:06:15  	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:874) 09:06:15  	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5$1.run(CpsFlowExecution.java:951) 09:06:15  	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:38) 09:06:15  	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 09:06:15  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 09:06:15  	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139) 09:06:15  	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) 09:06:15  	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) 09:06:15  	at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51) 09:06:15  	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 09:06:15  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 09:06:15  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 09:06:15  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 09:06:15  	at java.base/java.lang.Thread.run(Thread.java:829) 09:06:18  Finished: FAILURE

Steps to Reproduce

Install and use the JTE in day to day business in the above configuration and you will get the behavior.

@brosmar brosmar added the bug Something isn't working label Sep 27, 2023
@steven-terrana steven-terrana added the Could not reproduce issue The JTE team was not able to reproduce the issue. label Oct 9, 2023
@steven-terrana
Copy link

I am happy to leave this issue open.


Install and use the JTE in day to day business in the above configuration and you will get the behavior

Is there any more information you're able to provide?

The primary blocker for resolving this bug has been the absence of a consistently reproducible test case that could be translated into a failing unit test from which to begin debugging.

@brosmar
Copy link
Author

brosmar commented Oct 9, 2023

Which information beside the two redundant issues do you think i should provide:

#309
#187

The problem cannot be connected to a specific job run. But it affects all Jobs that are using the JTE. In the moment that are far more than 100 Jobs. And more than 1000 Build Results. No other Job is affected.
All JTE jobs build results that where formerly green and ok are tagged red as in the above screenshot. All at once. But the event that causes the behavior is yet unknown. You can imagine that the users of our templates are heavily annoyed. The CloudBees support is not helpful in this case because the JTE is not a supported CloudBees plugin.

Sorry for that I am not able to give you more details. Maybe you can request specific Information. I will try to get it from our Jenkins operations team.

Kind regards
Martin

@brosmar brosmar changed the title [Bug]: Pipeline resuming execution after successful run [Bug]: JTE Pipelines resuming execution after successful run Oct 9, 2023
@brosmar
Copy link
Author

brosmar commented Nov 16, 2023

Hello JTE Team. I can add the following Information.

  • The problem occurs when the TemplationgEngine is used in a multibranch Pipeline Script.
  • Always the build step is Tagged as failed.
  • the problem occurs the first time after migrating from JTE 1.7x -> 2.5x,
  • The groovy script setps/build.groovy has to renamed to steps/BUILD.groovy because of some naming collisions with predefined functions.

Maybe this helps by the investigation for the reason.

@brosmar
Copy link
Author

brosmar commented Dec 11, 2023

Hello JTE Team. I have feedback from the Jenkins Cloudbees Team. They have analyzed the issue and gave me the hint to share this information with you. Maybe this will help to find the root cause.

Here I the Answer form the CloudBees Support:

I've discussed the issue with some colleagues in the Engineering team. The done attribute in the execution build.xml drives the resume on startup: https://github.com/jenkinsci/workflow-cps-plugin/blob/3817.vd20b_7e2b_692b_/doc/persistence.md. As I anticipated, this value is set to false in your builds even if the execution completed successfully:

...
   <done>false</done>
   <resumeBlocked>false</resumeBlocked>
</execution>
<completed>false</completed>
...

JTE plugin seems to inject some logic around the pipeline run execution. There are lifecycle hooks that you can define, in particular:

https://github.com/jenkinsci/templating-engine-plugin/blob/0af836f6465f80a078a02c6[…]3/docs/how-to/library-development/lifecycle-hooks-on-failure.md

We tend to believe that this implementation might be breaking the and attributes in the build.xml file. If this is only happening in JTE jobs, you might want to share this finding with the plugin maintainers.

On the other hand, the property seems to be not editable because it is configured from a template. However, when you use disableResume() in the Jenkinsfile, it doesn't pass the property to the job, which seems to be a bug that you could report to the plugin maintainer.

@lvalverderodriguez
Copy link

Hi team,

It seems that JTE pipelines ignore the value of resumeBlocked in the build.xml and/or the pipeline property disableResume in the configuration.

Symptom

What is the end user experiencing?
Failed JTE Pipelines are getting resumed after a Jenkins restart even if resume is disabled in the pipeline configuration. It happens regardless the syntax used, declarative or scripted.

Evidence/Detail

What information has been collected or researched so far that helps with the analysis
The done and resumeBlocked attributes in the execution build.xml drive the resume on startup: https://github.com/jenkinsci/workflow-cps-plugin/blob/3817.vd20b_7e2b_692b_/doc/persistence.md.

Point to relevant files if appropriate
The issue can be observed using any of the code below as JTE pipeline code:

// Scripted syntax
properties([disableResume()])

node {
    echo "Hello World!"
  	sleep 60
    echo "Bye World!"
    }
// Declarative syntax
pipeline {
    agent none
    options { disableResume()
            }
    stages {
        stage('Example') {
            agent any
            steps {
                echo 'Hello World'
                sleep 60
                echo "Bye World!"
            }
        }
    }
}

Reproduction Steps

How to reproduce the issue

1/ Install JTE plugin
2/ Create a JTE pipeline providing pipeline configuration from console:

properties([disableResume()])

node {
    echo "Hello World!"
    sleep 60
    echo "Bye World!"
}

3/ Abruptly restart the controller before the job has finished successfully.
4/ Check that the pipeline execution was attempted to be resumed.
5/ Create a regular pipeline providing pipeline configuration from console:

properties([disableResume()])
node {
    echo "Hello World!"
    sleep 60
    echo "Bye World!"
}

6/ Abruptly restart the controller before the job has finished successfully.
7/ Check that the pipeline execution was not attempted to be resumed (as expected).

Has there been a successful attempt to reproduce the issue?
Yes, following the steps above in CloudBees CI Client Controller 2.426.1.1-rolling.

If issue is intermittent/not reproduceable, say that.
It is consistent.

What is expected behavior vs the actual behavior?
It is expected that builds are not resumed if resumeBlocked is set to false in the JTE pipeline build.

I hope this helps with the investigation.

@brosmar
Copy link
Author

brosmar commented Jan 30, 2024

@steven-terrana Hello Steven the above post is from the cloudbees support team. They had investigated the problem an found that disableResume flag seems to be ignored or manipulatd by your templating engine.

Is this information helpful?

@madhu91s
Copy link

madhu91s commented Jul 8, 2024

Is there a solution for this problem? We have the same problem in our organization too.

@cokieffebah
Copy link

@brosmar @madhu91s looking at this ticket instead of #309

@madhu91s
Copy link

Just as Info to reproduce the scenario: We have been using Clodogu Systems with integrated Git, Jenkins as Docker containers. Jenkins is scheduled for an overnightly restart everyday. That's when JTE Plugin (after restart) cannot fetch the actual status of the job but instead fails on a particular stage and marks all the previous builds as failed (just like in the image @brosmar posted). Looking at Jenkins logs did not really help.

@cokieffebah
Copy link

@madhu91s it would be really helpful is you could give us a minimal public repository: JTE configuration and target build repository, that replicates the problem.
Also is it only replicable in Cloud Bees Controller and not Jenkins LTE ? I will have to get management signoff to get Cloud Bees, mostly to check that the license does not unexpectedly bind my company.
Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Could not reproduce issue The JTE team was not able to reproduce the issue.
Projects
None yet
Development

No branches or pull requests

5 participants