Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs/build: Add a release lock to the job #581

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dustymabe
Copy link
Member

We're hitting a subtle issue here where newer x86_64 pipelines are
running (and early uploading their builds.json) before the release
job runs for the previous build. That release job then fails.

This commit adds a new lock to try to prevent newer x86_64 (main)
pipeline jobs from running before the fleet of jobs from the previous
run are complete.

We're hitting a subtle issue here where newer x86_64 pipelines are
running (and early uploading their builds.json) before the release
job runs for the previous build. That release job then fails.

This commit adds a new lock to try to prevent newer x86_64 (main)
pipeline jobs from running before the fleet of jobs from the previous
run are complete.
@dustymabe
Copy link
Member Author

I think this should fix issues like seen in https://jenkins-fedora-coreos-pipeline.apps.ocp.fedoraproject.org/blue/organizations/jenkins/release/detail/release/465/pipeline

Before this change:

job starts:
 - takes build lock

job 2 starts:
 - wait on build lock

job approaches end:
  - start release job

release job starts:
  - takes release lock (waits on multi-arch builds)

job ends
 - release build lock

job 2 starts:
  - takes build lock

release job still waiting on multi-arch builds:

Now:

job starts:
 - takes build lock
 - takes release lock

job 2 starts:
 - wait on build lock

job approaches end:
  - start release job

release job starts:
  - wait on release lock 

job ends
 - release build lock
 - release release lock

release job continues:
  - takes release lock

job 2 continues and then waits again:
  - takes build lock
  - wait on release lock

release job finishes:
  - release release lock

job 2 continues:
  - take release lock

@jlebon
Copy link
Member

jlebon commented Jul 29, 2022

Hmm, I think the problem here is that cosa push-container doesn't support taking the build ID and always tries to use the latest build. Independently of what we do here, we definitely should fix that.

I think this PR could still make sense, but it needs more thought. I think our release bits should be completely impervious to the build being released not being the latest, so we shouldn't need this at a technical level. But I can imagine wanting it as a semantic anyway for non-production streams to regulate how quickly we build and release changes.

@dustymabe
Copy link
Member Author

Hmm, I think the problem here is that cosa push-container doesn't support taking the build ID and always tries to use the latest build. Independently of what we do here, we definitely should fix that.

Right. I'm planning to fix that because I'm going to re-work that entire file soon.

I think this PR could still make sense, but it needs more thought. I think our release bits should be completely impervious to the build being released not being the latest, so we shouldn't need this at a technical level. But I can imagine wanting it as a semantic anyway for non-production streams to regulate how quickly we build and release changes.

Yes, I was worried less about the push-container specific part, and more about: "what does it mean to run a release out of order like this. I guess it can't happen that a later release runs before a previous release because of existing locking, which was what I was partially worried about.

So.. Do you want to just go with the surgical fix for push-container for now (I'll fix soon) and leave this for if we find other issues later? I'm cool with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants