Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keeping old binaries to allow resuming fanned jobs #1496

Closed
joaocgreis opened this issue Sep 15, 2018 · 15 comments
Closed

Keeping old binaries to allow resuming fanned jobs #1496

joaocgreis opened this issue Sep 15, 2018 · 15 comments

Comments

@joaocgreis
Copy link
Member

Currently, we delete the temporary branch at the end of the fanned jobs. This means that when a fanned job is resumed, if the compile job succeeded only the test job will run again, and will not be able to find the temporary branch from last run.

I'd like to stop deleting temporary branches at the end of the runs, but this change is not urgent so it should be discussed with @nodejs/build first. Also, it would be good to do this when we have people available in the next few days to put out any fire.

At first, this sounds like the size of the binary repo could become a problem. However, git keeps references around long after the branches are deleted, so this might have a very small impact. Note that we have a job that runs daily and deletes branches for which the Jenkins job is no longer accessible, so branches will still get deleted after about one week.

Emergency cleanup of the binary repo
ssh jenkins-workspace-1
su binary_tmp -s /bin/bash
cd ~/binary_tmp.git
du -sh .
git prune
du -sh .

@Trott
Copy link
Member

Trott commented Sep 15, 2018

this change is not urgent

If we can get this working before Code + Learn, that could be immensely helpful.

@joaocgreis
Copy link
Member Author

I've changed the jobs, will monitor the temporary repo.

For now the changes were minimal (changed the delete parameter to false). If all goes well, in a couple of months will remove the parameter and the build step.

@joyeecheung
Copy link
Member

https://ci.nodejs.org/job/node-test-commit-windows-fanned/20827/

https://ci.nodejs.org/job/node-test-binary-windows/COMPILED_BY=vs2017-x86,RUNNER=win2012r2,RUN_SUBSET=1/20072/console


10:25:29 c:\workspace\node-test-binary-windows>git checkout -f refs/remotes/jenkins_tmp/jenkins-node-test-commit-windows-fanned-20827-bin-win-vs2017-x86 
10:25:29 error: pathspec 'refs/remotes/jenkins_tmp/jenkins-node-test-commit-windows-fanned-20827-bin-win-vs2017-x86' did not match any file(s) known to git.

This job started 6hrs ago. Looks like it did not resume successfully

@refack
Copy link
Contributor

refack commented Sep 18, 2018

It did not work because the test branch was named using:

BUILD_TAG
String of "jenkins-${JOB_NAME}-${BUILD_NUMBER}"

Which changes in the resumed job.

I've changed in to ${JOB_NAME}-${GIT_COMMIT}
https://ci.nodejs.org/job/node-test-commit-windows-fanned/jobConfigHistory/showDiffFiles?timestamp1=2018-09-16_23-11-57&timestamp2=2018-09-18_17-21-11

Let's see if that works

@refack
Copy link
Contributor

refack commented Sep 18, 2018

I've changed in to ${JOB_NAME}-${GIT_COMMIT}

I reverted that because now node-test-commit-windows-fanned is not a git job...

@joaocgreis
Copy link
Member Author

@refack not a git job, but it may receive GIT_COMMIT as a parameter. Here's another try: https://ci.nodejs.org/job/node-test-commit-windows-fanned/jobConfigHistory/showDiffFiles?timestamp1=2018-09-18_22-40-44&timestamp2=2018-09-18_22-44-06

@joaocgreis
Copy link
Member Author

Resuming at the level of node-test-commit-windows-fanned works, but does not at any level above. This is because the REBASE_ONTO parameter gets updated.

(My test run yesterday: https://ci.nodejs.org/job/node-test-commit-windows-fanned/20840/ and https://ci.nodejs.org/job/node-test-commit-windows-fanned/20841/ , @Trott failed run now: https://ci.nodejs.org/job/node-test-pull-request/17320/ and https://ci.nodejs.org/job/node-test-pull-request/17327/ .)

I'm not sure whether we should ignore the REBASE_ONTO parameter for this. The fanned jobs will never rebase when resumed while others do and if there is any issue with the job understanding it will be even more confusing. There will be interference if multiple jobs for the same commit happen to run simultaneously. Without a better solution, I'm leaning towards giving this a try. What do you think?

@joaocgreis
Copy link
Member Author

Let's try it: https://ci.nodejs.org/job/node-test-commit-windows-fanned/jobConfigHistory/showDiffFiles?timestamp1=2018-09-18_22-44-06&timestamp2=2018-09-20_23-08-36 . This might cause subtle issues in CI, if the same commit has more than one job running simultaneously. Since the branch name includes the commit SHA, it should be safe, but we might end up testing outdated rebases and that can be confusing.

@refack
Copy link
Contributor

refack commented Sep 21, 2018

Resuming at the level of node-test-commit-windows-fanned works, but does not at any level above. This is because the REBASE_ONTO parameter gets updated.

I think that is a bug. We should strive to have "resume" use the exact same parameters (in this case a SHA and not a ref for the rebase point). I'll look at that...

@refack
Copy link
Contributor

refack commented Sep 21, 2018

This might cause subtle issues in CI, if the same commit has more than one job running simultaneously.

IMHO we can consider skipping compilation job altogether, if we find the binary branch...

@joaocgreis
Copy link
Member Author

I think that is a bug. We should strive to have "resume" use the exact same parameters (in this case a SHA and not a ref for the rebase point). I'll look at that...

I don't think that's going to be possible (but would be great to find a way!). The only thing you have is "rebase against the base branch", so you have to find out what the "base branch" is somewhere. Resuming at that level will calculate the SHA again... unless we find a way to know the job is a resume.

IMHO we can consider skipping compilation job altogether, if we find the binary branch...

For new builds or rebuilds, it should always rebase against the latest and compile again. It would be nice to have the resume feature working well, but it's not worth interfering with fresh builds because of it.

I've changed the compile job to delete the binary branch if found, before starting to compile. This is to make sure that if the compile job fails the test job doesn't simply pick an old branch and tests that.

If the current approach has issues, we could try to propagate a branch name from the top, from the test-PR job. Then, allow resuming only on the test-commit level and below. But this is one more bit of complexity, so I'd avoid it if possible.

@refack
Copy link
Contributor

refack commented Oct 12, 2018

@refack
Copy link
Contributor

refack commented Nov 17, 2018

@joaocgreis PTAL
I've chanched two values in the config https://ci.nodejs.org/job/node-test-commit-windows-fanned/jobConfigHistory/showDiffFiles?timestamp1=2018-10-30_19-03-30&timestamp2=2018-11-17_11-26-47

  1. Don't try to test the binaries if the compilation step failed
  2. Delete the temp branch if build is stable (i.e. ✔️ )

@joaocgreis
Copy link
Member Author

@refack LGTM, thanks!

I think we're all done here. #1515 is still open but we can discuss there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants