Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable and require use of node-accept-pull-request? #324

Closed
Trott opened this issue Feb 8, 2016 · 31 comments
Closed

Enable and require use of node-accept-pull-request? #324

Trott opened this issue Feb 8, 2016 · 31 comments

Comments

@Trott
Copy link
Member

Trott commented Feb 8, 2016

For the second time in the last month or so, a commit was landed even though it had a failing test.

When we're not doing stuff like that, CI seems stable enough lately (arm explosions notwithstanding, but we're marking those as flaky and fixing them, so we should be reliably yellow/green on master builds now). I wonder if it's time to try enable the usage of node-accept-pull-request again. It can be voluntary for a few days, I suppose, and if it works well enough, then it can be mandatory.

I'm not sure what the process is for this. (Does it involve CTC? TSC?) But I thought this would be a good place to at least start the discussion.

/cc @nodejs/testing

@Fishrock123
Copy link
Contributor

Last I heard, someone was working on a CLI tool for this. That paired with pushing build statuses to GitHub should work just as good, without the arbitrary caveats.

(I do not want to use that CI job again.)

@Trott
Copy link
Member Author

Trott commented Feb 8, 2016

Last I heard, someone was working on a CLI tool for this.

CI tool for what exactly?

@orangemocha
Copy link

Perhaps @Fishrock123 is referring to #177? It was a way to launch the job by doing a git push. Essentially a CLI-friendly front-end for node-accept-pull-request.

I think we need to make sure that CI is really stable before going down the route of node-accept-pull-request.

Things that I would consider as steps towards additional stability:

  1. Rebooting slaves at the end of each run. @joaocgreis has done some experimentation but last we talked about it it still needed work.
  2. Secondary Jenkins masters for two purposes: a) to test job/configuration updates before going live in production; b) for additional availability in case the main jenkins goes down (this is less of a blocker, as currently if Jenkins goes down nobody can merge PRs anyway).

@orangemocha
Copy link

  • Adding more admins to Jenkins

@Trott
Copy link
Member Author

Trott commented Feb 9, 2016

FYI, we've now had a third PR land that breaks CI today: nodejs/node#4741 This would not have happened with node-accept-pull-request because it would have rebased against master and caught the issue.

As we add more and more contributors, this is likely to happen more and more (on the assumption that more contributors means more contributions means more wires crossed between two PRs as happened here).

@MylesBorins
Copy link
Contributor

An alternative could be having a hook to run CI when there is a push to master and notify the build group if things are broken.

@Trott
Copy link
Member Author

Trott commented Feb 9, 2016

I agree that CI stability needs to be solid for this to work. Can we come up with a checklist of measurable things that would indicate that we're ready to propose moving forward?

For tests, I'd say that if we can get 20 runs in a row on CI that don't fail spuriously, we're likely there. Would it probably be the same for CI? Can we use https://ci.nodejs.org/job/node-daily-master/ as our benchmark? If it runs for 20 consecutive days with nothing but yellow and green, maybe it's time to propose a move?

Here's a tentative checklist:

  • Slaves reboot after each run
  • Secondary Jenkins master for testing new configs
  • Increase Jenkins admins to ??? (what's the right integer there?)
  • 20 days of consecutive node-daily-master runs without any spurious red

Anything else?

@Trott
Copy link
Member Author

Trott commented Feb 9, 2016

@thealphanerd wrote:

An alternative could be having a hook to run CI when there is a push to master and notify the build group if things are broken.

That would be an improvement over the current situation. To me, it puts the pressure on the wrong people. And it's reactive instead of proactive.

I would prefer:

  • Step 1: Stable CI infrastructure with reliable tests
  • Step 2: No, you can't land your commit if it breaks CI
  • Step 3: Pizza party

@Trott
Copy link
Member Author

Trott commented Feb 9, 2016

Would the node-accept-pull-request job were to be used by the CLI tool? It seems the answer is "yes".

Moreover, does it make sense to enable the node-accept-pull-request tool but not require its use? I would totally use it. That would help shake bugs out in preparation for a CLI tool that leveraged it or whatever.

@mhdawson
Copy link
Member

mhdawson commented Feb 9, 2016

I wonder about the reboot after each run which seems pretty heavy handed. Would some sort of cleanup including processes etc would work ? In particular if we give people access to machines to debug issues, ongoing reboots would make this difficult (although in that specific case maybe the machine should be disabled but I don't think people generally have the capability to do that)

@orangemocha
Copy link

The CLI tool doesn't need to be a blocker IMO, but others might disagree. Maybe @Fishrock123 can confirm whether that's what he was referring to, and if so we can reopen #177. /cc @silverwind

@silverwind
Copy link

Is it possible to integrate Jenkins into Github, like many projects already have in place with Travis?

I think our CI is fast enough now to not generate a significant backlog, and it would give instant feedback to every contributor.

@silverwind
Copy link

@jbergstroem
Copy link
Member

@silverwind both me and @Starefossen has been looking at doing this. It will require us to build a third party tool to avoid credential leaking. See #236.

@jbergstroem
Copy link
Member

@orangemocha I still think rebooting is too dramatic and think it might lead to other issues (esp how jenkins and windows has been relatively flakey in terms of connectivity); at least with @joaocgreis's redeployment work on rackspace we should be able to have enough redundancy to do a lot of reboots.

@Starefossen
Copy link
Member

Here is my take on build status from Jenkins: https://github.com/Starefossen/jenkins-github-status. It is currently posing builds to the #node-test IRC channel.

@jbergstroem
Copy link
Member

It's unfortunate that this happens and I can see how our previous flakiness has been making it hard for collaborators to distinguish between "real" green runs, yellow runs and red runs. I think it could be good to remind everyone to just have a second look before merging; there's no real gain in being trigger happy. The best case is obviously automation such as node-accept-pull-request but I don't think the trust is quite there yet.

@orangemocha
Copy link

Re: rebooting, it is only a hunch of mine that it would help and I don't have any quantifiable proof that it would help. So a criterion on consecutive green/yellow runs is a more direct proof of reliability or what needs to be fixed to achieve reliability.

@silverwind
Copy link

@jbergstroem nice progress, wasn't aware of these efforts. I see that a lot of caution is necessary there. For example, rate-limiting and cancelling builds after a push when CI is running on the same PR are probably necessary to prevent someone from hogging CI resources.

@Trott I feel your pain, but few collaborators (including me) don't really like being forced through Jenkins. It's just not reliable enough and some just prefer the CLI workflow. Old habits die hard, unfortunately.

@Trott
Copy link
Member Author

Trott commented Feb 9, 2016

@silverwind If a CLI wrapper around CI/Jenkins is the secret sauce that gets us buy-in from collaborators, that's good by me!

@jbergstroem
Copy link
Member

Based on the progress of our collaborators, the testing WG and our build WG I think we are soon in a state where we can trust jenkins to take all deviations seriously. Next step should be improving tooling for 1: visibility/feedback (wip, #236), 2: landing commits (node-accept or a derivative) and 3: latency (jenkins, more ci, parallel tests). Albeit slow, I feel we're making progress and the pain points for all all collaborators should slowly fade away. CI is very important and I don't believe that the 15 minutes we might have to wait for a green light should be replaces with a post-commit strategy.

@silverwind
Copy link

Such a CLI wrapper could actually be very useful for collaborators. I've had the idea of a tool that pre-fills commit meta data on the CLI, but haven't found an easy way to supply git rebase with the meta data. I think I'll open a issue regarding that, maybe some git guru has a better idea. After rebasing is automated, I think adding some sort of webhook to start a Jenkins job and sleep the process until it has a result would be quite trivial.

@jbergstroem
Copy link
Member

It's definitely possible; I've worked with automating setting jenkins slaves up through the api. Jenkins api is by no means complete, but libraries usually patch the missing parts with plain 'ol http posts. If someone wants to look at this, here's a good start: https://github.com/jansepar/node-jenkins-api

@joaocgreis
Copy link
Member

@silverwind for reference, this is the relevant part of the script that was used by CI (node-push-merge-commit):

GIT_COMMIT=$(git rev-parse HEAD)
git rev-parse refs/remotes/origin/$TARGET_BRANCH

rm -rf env.git_editor
touch env.git_editor
echo '#!/bin/bash' >> env.git_editor
# These two lines will strip existing metadata from the commit message
# echo "grep --ignore-case --invert-match '^Reviewed-By\|^PR-URL' \$1 > env.commit_message" >> env.git_editor
# echo "cp env.commit_message \$1" >> env.git_editor
if [ -n "${FIXES// }" ]; then
  echo "echo 'Fixes: $FIXES' >> \$1" >> env.git_editor
fi
if [ -n "${FIXES2// }" ]; then
  echo "echo 'Fixes: $FIXES2' >> \$1" >> env.git_editor
fi
if [ -n "${REF// }" ]; then
  echo "echo 'Ref: $REF' >> \$1" >> env.git_editor
fi
if [ -n "${PR_URL// }" ]; then
  echo "echo 'PR-URL: $PR_URL' >> \$1" >> env.git_editor
fi
if [ -n "${REVIEWED_BY// }" ] && [ "${REVIEWED_BY}" != "<empty>" ]; then
  echo "echo 'Reviewed-By: $REVIEWED_BY' >> \$1" >> env.git_editor
fi
if [ -n "${REVIEWED_BY2// }" ] && [ "${REVIEWED_BY2}" != "<empty>" ]; then
  echo "echo 'Reviewed-By: $REVIEWED_BY2' >> \$1" >> env.git_editor
fi
if [ -n "${REVIEWED_BY3// }" ] && [ "${REVIEWED_BY3}" != "<empty>" ]; then
  echo "echo 'Reviewed-By: $REVIEWED_BY3' >> \$1" >> env.git_editor
fi
if [ -n "${REVIEWED_BY4// }" ] && [ "${REVIEWED_BY4}" != "<empty>" ]; then
  echo "echo 'Reviewed-By: $REVIEWED_BY4' >> \$1" >> env.git_editor
fi

cat env.git_editor
chmod 755 env.git_editor
export GIT_EDITOR=./env.git_editor

rm -rf env.rebase_sequence
git checkout -f $GIT_COMMIT
git rebase refs/remotes/origin/$TARGET_BRANCH
echo $?
git log --reverse --format="reword %h %s" refs/remotes/origin/$TARGET_BRANCH..HEAD > env.rebase_sequence
cat env.rebase_sequence

rm -rf env.git_sequence_editor
touch env.git_sequence_editor
echo '#!/bin/bash' >> env.git_sequence_editor
echo "cp env.rebase_sequence \$1" >> env.git_sequence_editor
cat env.git_sequence_editor
chmod 755 env.git_sequence_editor
export GIT_SEQUENCE_EDITOR=./env.git_sequence_editor

git rebase -i refs/remotes/origin/$TARGET_BRANCH

@silverwind
Copy link

@joaocgreis thanks, that use of GIT_EDITOR and GIT_SEQUENCE_EDITOR is quite clever!

@orangemocha
Copy link

Scrubbing my notes, other issues that we might want to address for re-enabling node-accept-pull-request:

@orangemocha
Copy link

Another PR that was landed without running CI on the final set of changes, breaking master: nodejs/node#4892

@Trott
Copy link
Member Author

Trott commented Feb 18, 2016

Maybe we can at least get node-accept-pull-request to the point where it's an option for people (like me) who would want to use it?

Then we can maybe build a CLI tool based on that work?

@orangemocha
Copy link

As it was designed, if someone else pushes to the same target branch while the job is in progress, the job will fail. If you are still ok to use it with that caveat, we can work on reviving it.

@Trott
Copy link
Member Author

Trott commented Feb 18, 2016

@orangemocha wrote:

As it was designed, if someone else pushes to the same target branch while the job is in progress, the job will fail.

That's a feature, not a bug. :-)

@orangemocha
Copy link

Agreed :) Though if it was fully optional, it would have helped its adoption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants