ACTION REQUIRED: workflow change for merging changes to nodejs/node #2598

orangemocha · 2015-08-28T14:16:26Z

Based on previous discussion, feedback, and the approval by the TSC (#2434), we are adopting a new workflow for merging pull request in nodejs/node, and making any changes to the code branches at nodejs/node in general. The new workflow needs to be adopted by all @nodejs/collaborators for all changes to be merged to nodejs/node, starting this coming Monday at 2pm UTC: http://www.timeanddate.com/worldclock/fixedtime.html?msg=Start+merging+changes+to+nodejs%2Fnode+with+Jenkins&iso=20150831T14&p1=1440

From that time on, please refrain from pushing changes to nodejs/node manually, and instead use the workflow documented at https://github.com/nodejs/node/wiki/Merging-pull-requests-with-Jenkins. I also added a wiki page specifically on how to deal with flaky tests.

The workflow for testing (but not landing) pull requests with Jenkins is unchanged, and now documented here: https://github.com/nodejs/node/wiki/Testing-pull-requests-with-Jenkins

I'll be monitoring Jenkins to make sure things keep working smoothly. Let me and @nodejs/jenkins-admins know if you encounter any issues with the CI infrastructure or to share your feedback. Thank you!

Fishrock123 · 2015-08-28T14:40:36Z

👍

silverwind · 2015-08-28T14:40:56Z

Do these merges add the committer line of whoever performed the merge?

Fishrock123 · 2015-08-28T14:42:03Z

Do these merges add the committer line of whoever performed the merge?

I was told they will, and they need to for us to use it.

orangemocha · 2015-08-28T14:42:42Z

Do these merges add the committer line of whoever performed the merge?

Yes, the information is taken from your full name and email address as set in Jenkins. I will update the wiki to reflect that.

ChALkeR · 2015-08-28T14:43:11Z

What about documentation-only changes? I believe we don't yet have documentation-related tests, do we?

orangemocha · 2015-08-28T14:44:51Z

To ensure you committer info is set correctly, go to https://jenkins-iojs.nodesource.com/, login, click on your username on the top-right, then on the left click on 'Configure'. Double-check 'Full Name' and 'E-mail address'.

orangemocha · 2015-08-28T14:46:26Z

What about documentation-only changes? I believe we don't yet have documentation-related tests, do we?

Use node-accept-pull-request, and you can set NODES_SUBSET to pure_docs_changes to skip the tests. If we'll ever invent a way to test documentation, those tests will run as part of that configuration 😄

ChALkeR · 2015-08-28T14:48:18Z

On the left hand side, you should see "Build with parameters". If not, it probably means that you're not logged in. You need to be logged in to start this job.

Doesn't work for me. I am logged in and there is nothing like «Build with parameters». Can I have a screenshot?

I remeber it being ok in iojs+any-pr+multi.

orangemocha · 2015-08-28T14:50:47Z

ChALkeR · 2015-08-28T14:51:16Z

No, it's definitely not there.

orangemocha · 2015-08-28T14:51:22Z

@ChALkeR did you go to https://jenkins-iojs.nodesource.com/job/node-accept-pull-request/ first ?

ChALkeR · 2015-08-28T14:51:31Z

@orangemocha Yes, of course.

Fishrock123 · 2015-08-28T14:53:24Z

Weren't we going to have dropdowns or something for names? If not that's ok.

Fishrock123 · 2015-08-28T14:56:44Z

@orangemocha also, it still points by default to the joyent org, not to nodejs. :)

orangemocha · 2015-08-28T14:57:15Z

@Fishrock123 there are dropdowns, in node-accept-pull-request. Maybe you are looking at node-merge-commit?

orangemocha · 2015-08-28T14:58:50Z

@Fishrock123 I will update the defaults from joyent/node in the 2-hour window between the move of the joyent/node repo and this workflow change 😄

orangemocha · 2015-08-28T15:00:22Z

@ChALkeR : do you see "Build with Parameters" for other jobs, like node-test-pull-request ?

Fishrock123 · 2015-08-28T15:02:23Z

@orangemocha oops, you're right. :)

silverwind · 2015-08-28T15:09:48Z

The steps required for small corrections like modifying the commit message or squashing commits seem unwieldly. I feel like this is taking quite a bit flexibilty away. How about introducing an alternative CLI-only workflow like this:

Pull commits to local repo
Rebase and modify the commits
Push changes to a special CI branch which kicks off the tests and if it's green, it gets commited with all original information intact.

One could do multiple target branches for different subsets of tests to run. A way of feedback to the user could be a bot that posts CI results on the issue through the GitHub API.

As a die-hard CLI user, I sure would appreciate something like this.

orangemocha · 2015-08-28T15:26:08Z

@ChALkeR : if you are still having problems with the missing "Build with Parameters", I would suggest trying the following in order:

Try a different browser.
We can create another account in Jenkins for you.

orangemocha · 2015-08-28T15:32:25Z

@silverwind : I am definitely not against additional tooling to make things even easier. We can work together on this if you want, or we could even let you mess with Jenkins if you are interested.

Having said that.. are the steps for modifying the commit message or squashing commits really more complicated under this workflow? For doing that, you'd still have to pull the changes locally and make your edits (I simply described a possible procedure in the wiki), and finally push a branch to GitHub (which in the pre-Jenkins way could be the final branch). The only additional steps are now a) filling the form to kick off node-accept-pull-request, and b) deleting the branch after the fact. The time it takes for filling the form should be compensated by the ease of filling in reviewers with the drop-downs. And we could add a checkbox to node-merge-commit to have it delete the temporary source branch after the merge.

Fishrock123 · 2015-08-28T15:35:59Z

@orangemocha so just to be clear, I can already use this for nodejs/node, right now?

orangemocha · 2015-08-28T15:46:32Z

@Fishrock123 : technically yes, but if anybody lands another pull request manually in the middle of your run, it will break it.

Fishrock123 · 2015-08-28T15:47:09Z

technically yes, but if anybody lands another pull request manually in the middle of your run, it will break it.

But nothing fatal that another run can't fix?

silverwind · 2015-08-28T15:52:16Z

@orangemocha I'm a bit confused. Do you still have to start off node-accept-pull-request when doing node-merge-commit as described here?

What I like to see is an alternative entry point to node-merge-commit through a simple git push. From what I understand, you're already interfacing with the branches on github. Would it be possible to start off node-merge-commit when a new branch, maybe with a special prefix, is created?

(Also, I'm not really motivated to learn Jenkins internals, so I think I'll decline your offer 😉)

orangemocha · 2015-08-28T16:01:08Z

@Fishrock123 : correct. You can try it if you want.

Fishrock123 · 2015-08-28T17:43:10Z

@orangemocha https://jenkins-iojs.nodesource.com/job/node-test-commit-arm/380/ I think the Pi1s aren't depending on the other task correctly?

trevnorris · 2015-09-02T08:58:28Z

@orangemocha On last Friday I landed two commits. You can find each of those original commits listed in their respective PRs. Later that day @Fishrock123 landed another patch using Jenkins. For some reason the two patches I had landed earlier were left stranded. This lead me to the assumption that Jenkins was responsible.

orangemocha · 2015-09-03T11:36:44Z

Folks, thank you for your patience with this. The last couple days definitely didn't go as well as I had hoped. There were a lot of random test failures all over the place, preventing people from landing PRs. There have been so many unexpected ones, that it makes me wonder whether those failures could be caused by an underlying bug in node itself.

We have been discussing this today within the build WG and the sane thing to do would seem to halt using the automated merge temporarily, until we can get things to a stable place. Hence I am re-opening this issue to elicit people's feedback.

Right now we are operating under this plan:

Gather all the test failures from the last runs (and more runs), and mark them all as flaky
Re-run all the accept jobs that have failed because of flaky tests
Only if everything looks stably green/yellow after that, and if we reach that state before morning PDT (9am), continue with the automated merges. Otherwise suspend the trial until we can reach a more stable state.

/cc @nodejs/collaborators @nodejs/tsc

Fishrock123 · 2015-09-03T13:15:37Z

Gather all the test failures from the last runs (and more runs), and mark them all as flaky

I am worried we are going to have a high potential of starting to ignoring legitimate failures due to current hardware issues.

A while ago (1-2 months?) we were having all green runs. What happened to that?

targos · 2015-09-03T13:57:05Z

In order to win some time, would it be possible to automatically cancel a merge job as soon as one of the test jobs fails ?
There is no point in waiting for the RPi to compile/test if something went wrong on a faster hardware.

orangemocha · 2015-09-03T14:05:10Z

@targos : yes, no harm in doing that if you don't need complete results from the run.

orangemocha · 2015-09-03T14:10:58Z

A while ago (1-2 months?) we were having all green runs. What happened to that?

That would be a tough question to answer precisely. The high level guess is that due the fast pace of development the quality is slowly drifting down (no offense to anyone, I hope). It does highlight the importance of having a system like this in place. The other side is that some of these can be due to faulty hardware or bad configuration of the slaves. We do check for that, and while this is true in some cases, there are still a majority of flaky failures that cannot be clearly attributed to issues with the slaves.

orangemocha · 2015-09-03T14:14:08Z

I am worried we are going to have a high potential of starting to ignoring legitimate failures due to current hardware issues.

We are already making those calls when we say "not related to this change" and try again (or land it manually). Once we make that decision and move forward with the PR, the flakiness is now in master. There is little harm in marking those tests as flaky, and at least we are opening GitHub issues to track the problems and follow up. This might be an argument against nodejs/build#182 though.

Fishrock123 · 2015-09-03T14:16:50Z

The high level guess is that due the fast pace of development the quality is slowly drifting down (no offense to anyone, I hope).

Ok this is just incorrect.

They were failing BEFORE THAT. And we cleaned it up besides what was still in our issue tracker. Some of these failures have only manifested in the last few days. I've been keeping an eye on the CI since the beginning of io.js.

there are still a majority of flaky failures that cannot be clearly attributed to issues with the slaves.

Anything with failures relating to reset connections or port binding is a) relatively new, and b) appeared around some CI changes about 2 or so months ago.

Also now the iojs+any+pr+multi job and all of it's history is gone. Wonderful.

We are already making those calls when we say "not related to this change" and try again (or land it manually). Once we make that decision and move forward with the PR, the flakiness is now in master.

We can also parse if the output is or isn't actually flaky.

Fishrock123 · 2015-09-03T14:23:37Z

They were failing BEFORE THAT. And we cleaned it up besides what was still in our issue tracker.

Also for clarification, I am quite sure we have had green runs for over a week, at a couple intervals, on master. (This was after the point where we were accidentally ignoring windows failures.)

orangemocha · 2015-09-03T14:25:25Z

I doubt that the failures are related to the shift from iojs+any+pr+multi to node-test-pull-request. They differ in the arguments they take and the way they fetch things, but internally they are just calling the test runner in the same way.

I have to agree that this started in the last few days. Before proceeding with opening this issue, we had several test runs with node-accept-pull-request where things were stably yellow. That means that there might be a real bug in node. If someone has time to start investigating those flaky tests locally, it would be helpful.

orangemocha · 2015-09-03T17:29:08Z

OK, after chasing after failures for the last few days, I am convinced that we need to suspend this experiment. 😞
Even though we tried aggressively to mark tests as flaky - this PR is marking 29 new tests as flaky! - we are not even able to land that PR because new tests are failing every time. Given the current state of things, most attempts to land PRs via CI would fail. So please refrain from using node-accept-pull-request / node-merge-commit, and instead merge changes manually.

This level of flakiness in the tests is unprecedented, and I believe it started very recently (this week). While there are occasional failures due to misconfigured machines, those are easy to spot. The vast majority of failures seem due reasons outside of the Jenkins/CI realm, and seem to indicate a real problem with the state of the master branch. The recent libuv upgrade (a161594) is high on the list of suspects. What we can do is run some test jobs on recent commits and try to bisect it that way. If you have a chance to investigate some of those failures locally, that would be helpful too.

We can resume this experiment after we have found the underlying cause of all this instability. I guess that at this point, we'll also take the time to fix a few other high priority issues with the CI itself (like making the build faster), and restart only when it looks really solid.

Sorry for the headaches that this has caused.

/cc @nodejs/collaborators @nodejs/tsc

silverwind · 2015-09-03T17:32:55Z

These flaky tests have been plaguing us a lot longer than a week, and the failures were almost never reproduceable locally, so I think we indeed should take a close look on the CI itself. Are other projects using Jenkins also experiencing this? Maybe we're doing something fundamentally wrong?

orangemocha · 2015-09-03T17:39:25Z

Jenkins is just running a bunch of scripts on a variety of machines. Sure, it has its own flukes, but I don't see how it could be causing some of the issues that we have seen lately.

Maybe to get a better shot at reproducing locally, we should try running make run-ci. The top failing platforms seems to be armv7-wheezy, RPis, and centos5.

Trott · 2015-09-03T17:46:46Z

Wild guess, probably wrong, but hey, that's what the Internet is for, so here you go:

Didn't we move a test or three from sequential to parallel recently? Maybe something in one of those tests slipped by that may be making other tests somehow unstable?

orangemocha · 2015-09-03T17:48:13Z

That's not to be excluded @Trott . I already started a few runs to try and bisect commits in CI, so that if there was one offending commit, we can find it.

orangemocha · 2015-09-03T17:58:43Z

Also, I just checked PRs that landed in v0.12 recently. About 10 PRs landed in the last couple of weeks using node-accept-pull-request from the new Jenkins, using the same exact Jenkins infra, except they don't run on ARM. No records of people having to retry runs. This also seems to confirm that the problem is in the current master branch.

Fishrock123 · 2015-09-03T18:23:37Z

The vast majority of failures seem due reasons outside of the Jenkins/CI realm, and seem to indicate a real problem with the state of the master branch.

Anything that deals with ECONNRESETor unavailable ports has in the past been said to be configuration, that stuff was already there before the Libuv upgrade. Those were the largest CI issue for a while iirc.

Related: I saw two unavailable ports after like 50 test suite runs on my OS X 10.10.5 machine. None on my remote Ubuntu 15 testing box. (It did has a weird fs EACCESS but I am quite certain this is config also.)

indutny · 2015-09-03T20:42:18Z

@orangemocha what do you think about delaying the accept-pullrequest project? It looks like it is blocking us from making progress at the moment, and considering the upcoming release - it is very troublesome. We can always return to running the experiments with this at some later point, when things will be more stable.

orangemocha · 2015-09-03T20:47:29Z

@indutny yes see above #2598 (comment)

indutny · 2015-09-03T20:51:36Z

@orangemocha argh, right! sorry, I didn't get it.

orangemocha · 2015-09-07T12:21:23Z

Closing for now. I will open a new issue when we have addressed the stability and performance concerns.

orangemocha mentioned this issue Aug 28, 2015

Procedure for rolling out node-accept-pull-request #2434

Closed

Fishrock123 added build Issues and PRs related to build files or the CI. meta Issues and PRs related to the general management of the project. labels Aug 28, 2015

orangemocha reopened this Sep 3, 2015

orangemocha mentioned this issue Sep 3, 2015

build: Add build-with-asan + use leak-check before v8 teardown #2376

Closed

orangemocha mentioned this issue Sep 3, 2015

deps: update v8 to 4.5.103.30 #2632

Closed

joaocgreis mentioned this issue Sep 4, 2015

test: marking recent test failures as flaky #2679

Closed

silverwind mentioned this issue Sep 4, 2015

doc: update COLLABORATOR_GUIDE #2638

Closed

thefourtheye mentioned this issue Sep 6, 2015

cpplint: make it possible to run outside git repo #2710

Merged

orangemocha closed this as completed Sep 7, 2015

This was referenced Sep 1, 2021

[Snyk] Fix for 4 vulnerabilities UbuntuEvangelist/node#14

Open

[Snyk] Security upgrade request from 2.79.0 to 2.82.0 enterstudio/node#21

Open

ACTION REQUIRED: workflow change for merging changes to nodejs/node #2598

ACTION REQUIRED: workflow change for merging changes to nodejs/node #2598

Comments

orangemocha commented Aug 28, 2015

Fishrock123 commented Aug 28, 2015

silverwind commented Aug 28, 2015

Fishrock123 commented Aug 28, 2015

orangemocha commented Aug 28, 2015

ChALkeR commented Aug 28, 2015

orangemocha commented Aug 28, 2015

orangemocha commented Aug 28, 2015

ChALkeR commented Aug 28, 2015

orangemocha commented Aug 28, 2015

ChALkeR commented Aug 28, 2015

orangemocha commented Aug 28, 2015

ChALkeR commented Aug 28, 2015

Fishrock123 commented Aug 28, 2015

Fishrock123 commented Aug 28, 2015

orangemocha commented Aug 28, 2015

orangemocha commented Aug 28, 2015

orangemocha commented Aug 28, 2015

Fishrock123 commented Aug 28, 2015

silverwind commented Aug 28, 2015

orangemocha commented Aug 28, 2015

orangemocha commented Aug 28, 2015

Fishrock123 commented Aug 28, 2015

orangemocha commented Aug 28, 2015

Fishrock123 commented Aug 28, 2015

silverwind commented Aug 28, 2015

orangemocha commented Aug 28, 2015

Fishrock123 commented Aug 28, 2015

trevnorris commented Sep 2, 2015

orangemocha commented Sep 3, 2015

Fishrock123 commented Sep 3, 2015

targos commented Sep 3, 2015

orangemocha commented Sep 3, 2015

orangemocha commented Sep 3, 2015

orangemocha commented Sep 3, 2015

Fishrock123 commented Sep 3, 2015

Fishrock123 commented Sep 3, 2015

orangemocha commented Sep 3, 2015

orangemocha commented Sep 3, 2015

silverwind commented Sep 3, 2015

orangemocha commented Sep 3, 2015

Trott commented Sep 3, 2015

orangemocha commented Sep 3, 2015

orangemocha commented Sep 3, 2015

Fishrock123 commented Sep 3, 2015

indutny commented Sep 3, 2015

orangemocha commented Sep 3, 2015

indutny commented Sep 3, 2015

orangemocha commented Sep 7, 2015