-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: windows ci timeouts related to child_process #1383
Comments
And ya beat me to it. Some analysis on when the errors started on the CI will be coming soon. |
Note: this picks out only the builds that were done against The last consistently stable Since then, there have been 5 stable Here are some builds after The commits from 209 into 220 are:
|
Hmm, of note:
Did not have a CI run against the PR. |
Hmm, nothing suspicious in that commit log, except maybe the vcbuild thing. Maybe an CI issue? |
CI pre - 25da074 / vcbuild improvements: https://jenkins-iojs.nodesource.com/view/iojs/job/iojs+any-pr+multi/475/ |
Maybe related to #1005? A lot of the same timeouts there too. |
Everything in #1005 should have been fixed... and, iirc, was fixed.. |
Ok, I think I was wrong. The current timeouts appear to have been introduced in
(probably in the commits below, see the below comment)
|
That being said, due to
|
https://jenkins-iojs.nodesource.com/view/iojs/job/iojs+any-pr+multi/476/ Narrowed it down to
|
https://jenkins-iojs.nodesource.com/view/iojs/job/iojs+any-pr+multi/477/ 1.6.3 fails. I Suspect the culprit is:
(No CI was run on that PR either) |
All tests seem to timeout on |
Hmm, not sure about that, there are three uses of util.inherits(SocketListSend, EventEmitter);
util.inherits(SocketListReceive, EventEmitter);
util.inherits(ChildProcess, EventEmitter); |
It's the windows machine itself. Both CI's are at the same ref. The older one passes, the current one fails. Same commits. https://jenkins-iojs.nodesource.com/view/iojs/job/iojs+any-pr+multi/478/ https://jenkins-iojs.nodesource.com/view/iojs/job/iojs+any-pr+multi/368/ |
Ah, that explains why the commits were misleading. Great detective work 👍 |
Maybe it's just a intermittent failure that happens to happen very often. We need tests from a second machine to verify. |
It has happened every single run since 368 though. |
The problem here is that these are only reproducible within Jenkins. I can't even get them to happen if I run the same code on the same machines from a |
Jenkins So it looks like win2008r2 has decided to also begin timing out on similar tests. |
Just wanted to reference #19 here. Seems like we've had similar child process issues prior to the commits triaged above. Since both windows bots has been seeing similar issues lately, we should probably update the issue topic to just "windows" 😢 |
So, I just tried running jenkins straight out of powershell instead of wrapping through windows services (that in turn uses nssm.exe). Results here: https://jenkins-iojs.nodesource.com/job/iojs+any-pr+multi/529/nodes=win2008r2/console. No more timeouts. The test that failed is irrelevant to timeouts (we should fix it though :). I'll debug further. Edit: Here's a run using nssm (and not proxied through windows services): https://jenkins-iojs.nodesource.com/job/iojs+any-pr+multi/nodes=win2008r2/530/console (hint: full pass) Edit2: If you look at https://jenkins-iojs.nodesource.com/job/iojs+any-pr+multi/531/ windows2012 and 2008 doesn't have any timeouts. I restarted all by avoiding starting/stopping through windows services and only using nssm. I'm far from a windows expert, but there is something going on with how and what services launches what. |
Give it time. We've had passes under the current config before but they end up at failure over time. I.e. don't celebrate yet. |
Not celebrating at all. It pretty much feels like I have no clue what I'm doing :) |
Seems the recent switch to running through |
Lets give it a week so we can confirm that we still don't see them. As mentioned above, I managed to get a pretty stable environment by launching through nssm. |
I think this is new on Windows CI:
|
Sounds like a build machine needs to be cleaned. cc @rvagg |
Currebtly on the run, but last I checked it had lots of disk space. I think it might be related to something else. Will investigate when I'm back tomorrow. |
I've had a look, enough space on all of the windows machines but I've done a clean-out and restart on them all anyway, want to try again @silverwind? |
I recall starting to see this when libuv 1.6.0 landed (when we had all other strange fs-max errors) |
Haven't seen this recently. |
With ARM timeouts fixed, this looks to be the last persistent test failure on the CI. If we can solve this, the builds should start to get blue, and all that's left are some intermittent failures on OS X and ARM.
https://jenkins-iojs.nodesource.com/job/iojs+any-pr+multi/473/nodes=iojs-win2012r2/
cc: @rvagg @piscisaureus
The text was updated successfully, but these errors were encountered: