Backup/Restore issue with Multibranch plugin #104

blucas · 2019-09-16T14:08:04Z

I'm using Jenkins Operator v0.2.0 and backup-pvc v0.0.6

When restoring a Bitbucket Team job (Organization Folder) using the cloudbees-bitbucket-branch-source:2.4.6 plugin, the operator fails to "fully" restore the folder and jobs. After the restore completes and when you navigate to the Bitbucket Team job in the Jenkins UI, it displays as if it has yet to scan the Bitbucket Team and all of that team's repositories / branches.

I checked the backup file. It does contain the builds for each repository/branch and did successfully restore them on disk. The only way to get the builds to display after restore is to issue a new "Scan Organization Folder Now" operation on the Organization Folder. But this creates further issues as that scan assumes it's the first and so resets nextBuildNumber to 1.

The issue is that the plugin or framework expects that the job's config.xml file is also restored. If I remove --exclude jobs/*/config.xml from backup.sh and trigger a restore, then everything displays as expected in the UI. Jenkins knows what the nextBuildNumber should be for all jobs under the Organization Folder.

I would like to suggest that exclusion of the config.xml file become a configurable setting. For example add an env_var EXCLUDE_CONFIG_XML=true|false whereby the default is true and one can disable that functionality on the backup container by providing a false value when configuring the container's environment variables

The text was updated successfully, but these errors were encountered:

tomaszsek · 2019-09-16T14:40:36Z

Hi @blucas

How you create your BitBucket jobs? Do you run groovy scripts or Configuration as Code?
You can force Jenkins to reload configuration from disk by running groovy script Jenkins.instance.reload().

Cheers

blucas · 2019-09-16T14:49:26Z

Hi @tomaszsek

We create a single job using JobDSL via the seed-job agent. The DSL is below with some sections omitted. I am aware of the reload functionality. I did issue that command, but it still didn't display the folders/jobs under my team/organization folder.

organizationFolder('my-bitbucket-team') {
    description('This contains branch source jobs for Bitbucket Team: My Bitbucket Team')
    displayName('My Bitbucket Team')
    triggers {
        periodic(30) // minutes
    }

    organizations {
        bitbucket {
            // Credentials MUST be of type USERNAME-PASSWORD
            credentialsId('bitbucket-credentials')
            repoOwner('my-bitbucket-team')
            traits {
            }
        }
    }

    // We need to configure this stuff by hand until JobDSL gets some support (https://github.com/jenkinsci/git-plugin/pull/595)
    configure { node ->
        def traits = node / navigators / 'com.cloudbees.jenkins.plugins.bitbucket.BitbucketSCMNavigator' / traits
        traits << 'com.cloudbees.jenkins.plugins.bitbucket.BranchDiscoveryTrait' {
            strategyId(1)
        }
        traits << 'com.cloudbees.jenkins.plugins.bitbucket.OriginPullRequestDiscoveryTrait' {
            strategyId(1)
        }
        // Clone repositories using SSH
        traits << 'com.cloudbees.jenkins.plugins.bitbucket.SSHCheckoutTrait' {
            credentialsId('my-key')
        }
    }
}

tomaszsek · 2019-09-16T14:59:44Z

I suggest:

create my-bitbucket-team job throughout seed job
run my-bitbucket-team job in groovy script
wait for the my-bitbucket-team to complete
run Jenkins.instance.reload() groovy script

blucas · 2019-09-16T15:30:00Z

create my-bitbucket-team job throughout seed job

That's what we do (see above DSL)

run my-bitbucket-team job in groovy script

I'm not too sure what you mean by this. the seed job automatically runs this the first time it gets created.

wait for the my-bitbucket-team to complete

This is what causes the nextBuildNumber to reset

run Jenkins.instance.reload() groovy script

This does nothing other than pickup the wrong build number because the previous step has reset it.

For example, lets say My Bitbucket Team has the following repositories in it:

repo-1
repo-2
repo-3

Each repo has a master branch with a Jenkinsfile in it.

Install Operator and Jenkins CR with a seed job which picks up the DSL I mentioned previously.
Seed-agent will create job Organization Folder 'my-bitbucket-team' and trigger an "Scan Organization Folder" operation
Scan will pick up the repos above and create my-bitbucket-team/repo-N folders in Jenkins
Each repo-N folder will trigger a "Multibranch Pipeline Scan" to detect all branches with a Jenkinsfile
The scan will generate a 'master' pipeline job for each repo-N
The scan will trigger a build of each of the jobs in step 5.
Suppose time passes, and repo-1/master job has 5 builds, repo-2/master has 6 builds and repo-3/master has 10 builds. (confirm builds are in backup file)
Trigger a restore (for example install a plugin on the Jenkins CR yaml file).
Jenkins Operator forces a new Jenkins Master to be created and restores backup file.
Login to Jenkins.

You will see that the seed-job has been triggered again and created the 'my-bitbucket-team' Organization Folder. Click on that folder and you'll be presented with a page similar to the one below. If the restore had worked properly, it would list repo-1, repo-2 and repo-3 instead and if you click on them, they would list a master job for each of those repos. Each master job would list each of its builds 5, 6, and 10 respectively.

Check the jenkins-master disk and you will see the jobs and their respective builds. You can to http://<jenkins-url>:<port>/script and run Jenkins.instance.getItemByFullName("my-bitbucket-team/repo-1/master").getNextBuildNumber() and it will fail as in-memory, Jenkins doesn't now about that job.

If you trigger a reload-from-disk (Jenkins.instance.reload()) and the UI still displays the "This folder is empty" page and the script above will still fail. Now, if you trigger a "Scan Organization Folder Now" (see screenshot) Jenkins will re-scan bitbucket and find all those repos/branches and re-trigger builds on those branch job. BUT the problem with this is Jenkins assumes it's the first build of these jobs and resets nextBuildNumber on disk. Jenkins will silently (unless you look at the Jenkins log) fail to build these jobs as there are already builds for them. At this point your configuration on disk is broken as the "Scan Organization Folder Now" functionality has reset the build number on disk.

I hope this helps clear up any confusion.

tumevoiz · 2019-09-26T09:27:06Z

Hi, @blucas

I apologize for late response, but I have a fix for your problem.

Please put this to your groovy scripts, and everything should work.

def jobName = "my-bitbucket-team"
def job = Jenkins.instance.getItem(jobName)

job.scheduleBuild()
sleep 10000
job.doReload()

If this will solve your problem, please close issue.

Cheers

blucas · 2019-10-01T13:33:11Z

I haven't had time to test this out, but I don't see how this will solve the problem. The nextBuildNumber, in theory, will still be reset to 1.

tomaszsek · 2019-10-01T15:14:08Z

Hi @blucas

job.scheduleBuild() will trigger the scan and job.doReload() will read the nextBuildNumber from the disk(restored by the operator).

Cheers

blucas · 2019-10-01T15:51:36Z

I still don't see how this will help, not to mention it feels more like a hack than a solution (the sleep NNNN).

scheduleBuild() will trigger a scan, that scan will reset the nextBuildNumber on disk to 2.
sleep N makes me assume we have to wait for the scan to finish, this could take seconds, or hours for larger organizations
doReload() will pickup the wrong build number on disk because of scheduleBuild().

pawelprazak · 2019-10-23T11:34:54Z

I've hit a different issue with restore and the multibranch jobs, the "Build with parameters" button no longer appears. And scanning/reindexing doesn't help like it used to.

In general I've hit multiple problems with incompatibility of multibranch plugin with configuration as code and job dsl plugins.

If at all possible, I would advice to reconsider using multibranch plugin at all.

pawelprazak · 2019-10-23T12:33:58Z

After I excluded jobs/*/branches/*/config.xml I could see the "Build with parameters" button (after reindexing).

And now I can confirm @blucas 's issue, I was able to reproduce it.

#104 (comment) wont work because the jobs won't be loaded, in a best case scenario it will cause a race condition.

pawelprazak · 2019-10-23T12:42:40Z

On the other hand if we start to include jobs/*/config.xml I'm afraid it would cause even more problems with "rotten state" that the whole "immutable configuration as code" thing we have going one in the operator tries to solve.

I wonder are there any documented issues regarding job dsl plugin and job state backups...

pawelprazak · 2019-10-24T14:04:00Z

I've been experimenting with a custom backup/restore provider:
https://jenkinsci.github.io/kubernetes-operator/docs/getting-started/latest/custom-backup-and-restore/

Maybe you could just overwrite the backup script in similar fashion like I did in the above example?

Also I've talked about this issue with @tomaszsek and he came up with this nice workaround of trying to detect if there is the branches directory, if yes then do not exclude the jobs/*/config.xml.

This would limit any problems with stale state to the Multibranch Jobs (that are already problematic) and not cause any additional damage to other types of jobs.

pawelprazak · 2019-10-29T10:46:51Z

I've got it to work:

I've included the jobs/*/config.xml
I've added a workaround (add unique id) for https://issues.jenkins-ci.org/browse/JENKINS-48571

I also have a job like this, just in case I need to re-index all:

import jenkins.branch.*

pipeline {
    agent none // master
    stages {
        stage('Reindex') {
            steps {
                script {
                    for (project in Jenkins.instance.getAllItems(jenkins.branch.MultiBranchProject.class)) {
                        stage(project.getName()) {
                            project.getComputation().run() // force reindexing
                        }
                    }
                    stage("Reload") {
                        Jenkins.instance.reload()
                    }
                }
            }
        }
    }
}

dee-kryvenko · 2019-11-07T12:57:09Z

To wait for scan to complete

    def job = Jenkins.instance.getItem(jobName)
    def scan = job.scheduleBuild2(0) // 0 = don't wait for the completion of the build
    scan.getFuture().get()
    job.doReload()

https://javadoc.jenkins-ci.org/hudson/model/AbstractProject.html#scheduleBuild2-int-

pawelprazak · 2019-11-15T09:45:47Z

thank you @llibicpep for a hint

re-indexing pipeline after those changes:

import jenkins.branch.*

pipeline {
    agent none // master
    stages {
        stage('Reindex') {
            steps {
                script {
                    for (project in Jenkins.instance.getAllItems(jenkins.branch.MultiBranchProject.class)) {
                        stage(project.getName()) {
                            def scan = project.scheduleBuild2(0 /* quiet period */) // force reindexing
                            scan.getFuture().get()
                            project.doReload()
                        }
                    }
                }
            }
        }
    }
}

also you need script approvals for:

staticMethod jenkins.model.Jenkins getInstance
method hudson.model.ItemGroup getAllItems java.lang.Class
method hudson.model.Item getName
method com.cloudbees.hudson.plugins.folder.computed.ComputedFolder scheduleBuild2 int hudson.model.Action[]
method hudson.model.Queue$Item getFuture
method java.util.concurrent.Future get
method hudson.model.AbstractItem doReload

For easier workarounds for #104 and #210 specifically

pawelprazak · 2019-11-26T10:23:51Z

As reported in #210, when restoring a multibranch backup, the build info is missing and Jenkins job "run" silently fails with an exception in logs:

Nov 25, 2019 9:00:08 AM SEVERE hudson.model.Executor run
Executor #-1 for master: Unexpected executor death
java.lang.IllegalStateException: JENKINS-23152: /var/jenkins/home/jobs/xxxx/jobs/yyyy/branches/master/builds/3 already existed; will not overwrite with xxxx/yyyy/master #3
	at hudson.model.RunMap.put(RunMap.java:189)
	at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:182)
Caused: java.lang.Error
	at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:190)
	at jenkins.model.ParameterizedJobMixIn$ParameterizedJob.createExecutable(ParameterizedJobMixIn.java:511)
	at jenkins.model.ParameterizedJobMixIn$ParameterizedJob.createExecutable(ParameterizedJobMixIn.java:321)
	at hudson.model.Executor$1.call(Executor.java:365)
	at hudson.model.Executor$1.call(Executor.java:347)
	at hudson.model.Queue._withLock(Queue.java:1438)
	at hudson.model.Queue.withLock(Queue.java:1299)
	at hudson.model.Executor.run(Executor.java:347)

The backup.sh must be patched like in #211 for the workaround to work.

Further investigation on how we can work around multi-branch issues with the operator itself is necessary.

pawelprazak · 2019-11-26T10:45:43Z

The #211 as is right now has implications for every job not only multi-branch and it breaks immutability in some ways we cannot easily predict.

TBH we won't probably have enough bandwidth to make a proper PR for this backups issue with multi-branch before Jenkins World, so if there is some bash haxor out there a hint on how to approach it would be nice :)

tumevoiz · 2019-12-17T12:49:55Z

Hi, @blucas did you solved your problem?

pbecotte · 2020-01-10T20:38:45Z

Have been trying to give this a shot. Builds not getting scheduled after every deploy (until I push "build now" to get the current build up past the last history build number) is kind of a deal breaker- I can't imagine there is anyone actually NOT using multibranch builds with Jenkins? Is there a workaround that I am just missing?

tomaszsek · 2020-01-11T08:58:41Z

@pbecotte There is WIP pull request #211 which should fix the issue. The main goal is to add jobs/*/config.xml files in backup if the type of the job is multibranch.

tomaszsek · 2020-04-02T18:21:59Z

Fixed in v0.0.8 PVC backup provider.

agnewp · 2020-12-28T23:13:20Z

This issue should be re-opened. this 'fix' does not work for me for the most basic example of a multi-branch pipeline job. in specific the job doesn't show up after restoring. furthermore the assumption the fix is based on is that multi-branch pipeline jobs are the only thing that has a top-level config.xml. This is not the case at all. folders and org folders fall into this category as well and this approach will exclude backup and restore of these things as well.

agnewp · 2020-12-28T23:23:31Z

I have a work-around:

disable operator's backup restore mechanism
map the backup volume directly into your Jenkins pod on /backup
install the thinbackup plugin
use this groovy script in your CASC config map

1-thin-backup-restore-latest.groovy: |
    import org.jvnet.hudson.plugins.thinbackup.ThinBackupPluginImpl
    import org.jvnet.hudson.plugins.thinbackup.utils.Utils
    import org.jvnet.hudson.plugins.thinbackup.ThinBackupPeriodicWork.BackupType
    import org.jvnet.hudson.plugins.thinbackup.backup.BackupSet
    import org.jvnet.hudson.plugins.thinbackup.restore.HudsonRestore
    import java.util.logging.Logger
    import java.text.ParseException
    import java.text.SimpleDateFormat
    import java.io.File

    final Logger LOGGER = Logger.getLogger("hudson.plugins.thinbackup");

    Date getLatestFullBackupDate(String rootDirectory) {
        final List<File> fullBackups = Utils.getBackupTypeDirectories(new File(rootDirectory), BackupType.FULL);
        if ((fullBackups == null) || (fullBackups.isEmpty())) {
          return null;
        }

        Date result = new Date(0);
        for (final File fullBackup : fullBackups) {
          final Date tmp = Utils.getDateFromBackupDirectory(fullBackup);
          if (tmp != null) {
            if (tmp.after(result)) {
              result = tmp;
            }
          } //else {
          // LOGGER.log(Level.INFO, "Cannot parse directory name ' {0} ', thus ignoring it when getting latest backup date.",
          //     fullBackup.getName());
        // }
        }

        return result;
    }

    void doRestore(Date restoreFromDate, Logger LOGGER) {
        LOGGER.info("Starting restore operation (${restoreFromDate}).");

        final Jenkins jenkins = Jenkins.getInstance();
        if (jenkins == null) {
          return;
        }

        jenkins.doQuietDown();
        LOGGER.fine("Waiting until executors are idle to perform restore...");
        Utils.waitUntilIdle();

        try {
          final File hudsonHome = jenkins.getRootDir();

          final HudsonRestore hudsonRestore = new HudsonRestore(hudsonHome, ThinBackupPluginImpl.getInstance()
              .getExpandedBackupPath(), restoreFromDate, false, false);
          hudsonRestore.restore();

          LOGGER.info("Restore finished.");
        } catch (ParseException e) {
          LOGGER.severe("Cannot parse restore option. Aborting.");
        } catch (final Exception ise) {
          LOGGER.severe("Could not restore. Aborting.");
        } finally {
          jenkins.doCancelQuietDown();
        }
    }

    thinBackup = ThinBackupPluginImpl.getInstance()
    thinBackup.setBackupPath("/backup")

    latestDate = getLatestFullBackupDate("/backup")
    if (latestDate != null) {
      //run restore job
      doRestore(latestDate, LOGGER) 
    } else {
      LOGGER.info("No full backups found in backup directory");
    }

    thinBackup.setFullBackupSchedule("0 */1 * * *")
    thinBackup.setDiffBackupSchedule("*/5 * * * *")
    thinBackup.setNrMaxStoredFull(1000)
    //thinBackup.setNrMaxStoredFullAsString()
    thinBackup.setExcludedFilesRegex("")
    thinBackup.setWaitForIdle(true)
    thinBackup.setForceQuietModeTimeout(120)
    thinBackup.setBackupBuildResults(true)
    thinBackup.setBackupBuildArchive(false)
    thinBackup.setBackupBuildsToKeepOnly(false)
    thinBackup.setBackupUserContents(false)
    thinBackup.setBackupNextBuildNumber(false)
    thinBackup.setBackupPluginArchives(false)
    thinBackup.setBackupAdditionalFiles(false)
    thinBackup.setBackupAdditionalFilesRegex("")
    thinBackup.setCleanupDiff(true)
    thinBackup.setMoveOldBackupsToZipFile(false)

this approach has problems around credentials, and needing a (user initiated) reload after the Jenkins pod is restored. I would much rather use a more reliable mechanism external to Jenkins for backup and restore. My thoughts here are to look at the source of thinbackup and copy the method of backup directly into the backup script you guys have started.

pbecotte · 2020-12-29T02:44:26Z

I am not sure what the difference is, but this did fix it for us using GitHub organization folders. We have had quite a bit of pain from the persistence options in this syst, but our jobs do come back after restore now.

agnewp · 2021-01-06T15:40:37Z

i came to a realization last night that Jenkins operator on purpose does NOT restore the job configs, rather it depends on having Jenkins seed jobs in place to fully restore all the jobs configurations back to the original state. i believe this is what i'm missing from my setup. Is this a correct thought? is everyone else utilizing seed jobs on startup to fully restore job configs?

pbecotte · 2021-01-06T19:26:16Z

Yes, this system is designed to provide basically immutable configuration- the whole thing is reverted to the state from the CRD on every restart.

I create my jobs with config like this (tons of detail left out!)

configurations:
  10-job-compute-config.yaml: |
      jobs:
          - script: >-
                organizationFolder('Compute-Config') {
                    description('Jenkins jobs from the Compute-Config Github organization')

                    organizations {
                       github {repoOwner('Compute-Config')}
                    }
...............

Everything is in my CRD. Secrets, org-folder jobs, RBAC, ldap, global libraries... everything.

Whether that is a good thing I am still open on- getting the manifests working right is frequently painful, and it means your instance will just not start up after a reboot some percentage of the time when some plugin that you didn't have pinned to a specific version breaks. But having the manifest get out of sync isn't the best either...

agnewp · 2021-01-06T19:56:24Z

okay, then your backup / restore pod is really only meant to restore the job history for example. which is why you exclude all job config.xml files. it is mean to be re-seeded from scripts called from operator after a new pod is created. i think I missed this point in the architecture description of how this is supposed to work. I am pretty new to the operational side of Jenkins, but it might help to spell this point out a little better in the documentation website for how all of this is meant to work. Thanks for the clarification!

pawelprazak changed the title ~~Backup/Restore issue~~ Backup/Restore issue with Miltibranch plugin Oct 23, 2019

pawelprazak changed the title ~~Backup/Restore issue with Miltibranch plugin~~ Backup/Restore issue with Multibranch plugin Oct 23, 2019

pawelprazak added bug Something isn't working enhancement New feature or request labels Oct 24, 2019

pawelprazak self-assigned this Nov 15, 2019

tomaszsek mentioned this issue Nov 25, 2019

trouble with persistence of build-numbers with backup/restore #210

Closed

pawelprazak added a commit that referenced this issue Nov 26, 2019

Include jobs/*/config.xml for backups

2a6c8e8

For easier workarounds for #104 and #210 specifically

pawelprazak mentioned this issue Nov 26, 2019

[WIP] Include jobs/*/config.xml for backups #211

Closed

pbecotte mentioned this issue Jan 17, 2020

backups should only exclude config.xml in top level job directories, … #251

Closed

tomaszsek pushed a commit that referenced this issue Jan 18, 2020

#104 Backups should only exclude config.xml in top level job directories

c6c3987

tomaszsek closed this as completed Apr 2, 2020

Nuru mentioned this issue May 3, 2020

Cannot specify jenkins-home volumeMount for jenkins-master container #265

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backup/Restore issue with Multibranch plugin #104

Backup/Restore issue with Multibranch plugin #104

blucas commented Sep 16, 2019 •

edited

Loading

tomaszsek commented Sep 16, 2019

blucas commented Sep 16, 2019

tomaszsek commented Sep 16, 2019

blucas commented Sep 16, 2019 •

edited

Loading

tumevoiz commented Sep 26, 2019 •

edited

Loading

blucas commented Oct 1, 2019

tomaszsek commented Oct 1, 2019

blucas commented Oct 1, 2019

pawelprazak commented Oct 23, 2019 •

edited

Loading

pawelprazak commented Oct 23, 2019 •

edited

Loading

pawelprazak commented Oct 23, 2019

pawelprazak commented Oct 24, 2019

pawelprazak commented Oct 29, 2019 •

edited

Loading

dee-kryvenko commented Nov 7, 2019 •

edited by pawelprazak

Loading

pawelprazak commented Nov 15, 2019

pawelprazak commented Nov 26, 2019

pawelprazak commented Nov 26, 2019 •

edited

Loading

tumevoiz commented Dec 17, 2019

pbecotte commented Jan 10, 2020

tomaszsek commented Jan 11, 2020

tomaszsek commented Apr 2, 2020

agnewp commented Dec 28, 2020 •

edited

Loading

agnewp commented Dec 28, 2020 •

edited

Loading

pbecotte commented Dec 29, 2020

agnewp commented Jan 6, 2021

pbecotte commented Jan 6, 2021

agnewp commented Jan 6, 2021

Backup/Restore issue with Multibranch plugin #104

Backup/Restore issue with Multibranch plugin #104

Comments

blucas commented Sep 16, 2019 • edited Loading

tomaszsek commented Sep 16, 2019

blucas commented Sep 16, 2019

tomaszsek commented Sep 16, 2019

blucas commented Sep 16, 2019 • edited Loading

tumevoiz commented Sep 26, 2019 • edited Loading

blucas commented Oct 1, 2019

tomaszsek commented Oct 1, 2019

blucas commented Oct 1, 2019

pawelprazak commented Oct 23, 2019 • edited Loading

pawelprazak commented Oct 23, 2019 • edited Loading

pawelprazak commented Oct 23, 2019

pawelprazak commented Oct 24, 2019

pawelprazak commented Oct 29, 2019 • edited Loading

dee-kryvenko commented Nov 7, 2019 • edited by pawelprazak Loading

pawelprazak commented Nov 15, 2019

pawelprazak commented Nov 26, 2019

pawelprazak commented Nov 26, 2019 • edited Loading

tumevoiz commented Dec 17, 2019

pbecotte commented Jan 10, 2020

tomaszsek commented Jan 11, 2020

tomaszsek commented Apr 2, 2020

agnewp commented Dec 28, 2020 • edited Loading

agnewp commented Dec 28, 2020 • edited Loading

pbecotte commented Dec 29, 2020

agnewp commented Jan 6, 2021

pbecotte commented Jan 6, 2021

agnewp commented Jan 6, 2021

blucas commented Sep 16, 2019 •

edited

Loading

blucas commented Sep 16, 2019 •

edited

Loading

tumevoiz commented Sep 26, 2019 •

edited

Loading

pawelprazak commented Oct 23, 2019 •

edited

Loading

pawelprazak commented Oct 23, 2019 •

edited

Loading

pawelprazak commented Oct 29, 2019 •

edited

Loading

dee-kryvenko commented Nov 7, 2019 •

edited by pawelprazak

Loading

pawelprazak commented Nov 26, 2019 •

edited

Loading

agnewp commented Dec 28, 2020 •

edited

Loading

agnewp commented Dec 28, 2020 •

edited

Loading