Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test-osuosl-aix61-ppc64_be-2 test failures #984

Closed
Trott opened this issue Nov 7, 2017 · 13 comments
Closed

test-osuosl-aix61-ppc64_be-2 test failures #984

Trott opened this issue Nov 7, 2017 · 13 comments

Comments

@Trott
Copy link
Member

Trott commented Nov 7, 2017

I took test-osuosl-aix61-ppc64_be-2 offline because it running node-test-commit-aix was a sea of red with failures on parallel/test-repl-envvars. I think this is a build thing and not a test issue. Anyone have any better information on that, though?

/cc @gireeshpunathil @refack @mhdawson

Refs: nodejs/node#16859

@mhdawson
Copy link
Member

mhdawson commented Nov 7, 2017

@gireeshpunathil once you figure out what is going on please get either @refack or @gibfahn to help make the change on the machine that was disabled/reenable.

@refack
Copy link
Contributor

refack commented Nov 7, 2017

That server I bonkers.

  1. It's failing with a perfectly good test (parallel/test-repl-envvars passes as is on test-osuosl-aix61-ppc64_be-1
  2. gives compilation errors (https://ci.nodejs.org/job/node-test-commit-aix/10098/nodes=aix61-ppc64/console)
    Please don't bring on-line until it's fixed

@refack
Copy link
Contributor

refack commented Nov 7, 2017

P.S. I even rebooted, and it didn't help.

@maclover7 maclover7 added the infra label Nov 7, 2017
@mhdawson
Copy link
Member

mhdawson commented Nov 8, 2017

@refack I've been told its not really expected that AIX machines are rebooted often like you would with linux machines. It is the recommendation of our AIX contact that we only do it when absolutely necessary... Likely not an issue but just to ask that we don't do it too often.

@mhdawson
Copy link
Member

mhdawson commented Nov 8, 2017

One, other thing to consider is if the rest of the tests pass, and its going to take a while to figure out what the issue is with this specific test, one option is to exclude the test. Even though we have the other machine we should keep an eye on whether we end up with a big backlog because we are down to a single machine.

@mhdawson
Copy link
Member

mhdawson commented Nov 8, 2017

@gibfahn I think @gireeshpunathil may already have access to the AIX machines but if not can you facilitate that.

@mhdawson
Copy link
Member

mhdawson commented Nov 8, 2017

Ran the test manually on the machine and it passed:

bash-4.3$  CONFIG_FLAGS="--dest-cpu=ppc64" CC=gcc tools/test.py parallel/test-repl-envvars
[00:00|% 100|+   1|-   0]: Done

@mhdawson
Copy link
Member

mhdawson commented Nov 8, 2017

Trying in the CI to see if it still fails when run that way: https://ci.nodejs.org/job/node-test-commit-aix/10102/nodes=aix61-ppc64/console

@mhdawson
Copy link
Member

mhdawson commented Nov 8, 2017

Seems to have failed in CI. Going to kill and restart the jenkins agent in case it is related to how it was started

@mhdawson
Copy link
Member

mhdawson commented Nov 8, 2017

Passed in this run: https://ci.nodejs.org/job/node-test-commit-aix/nodes=aix61-ppc64/10103/console.

My only guess may be that it might somehow be related to how the jenkins agent is started.

What I used was (and have used in the past):

sh /etc/rc.d/rc2.d/S20Jenkins started as the root user from a bash shell

And to be more precise I usually cd into etc/rc.d/rc2.d and then run sh S20jenkins start

I wonder if there is something subtly different when it gets started automatically on reboot.

@mhdawson
Copy link
Member

mhdawson commented Nov 8, 2017

Going to leave enabled. The only difference between the passing and previous failing run was me killing and restarting the agent so I'm hoping that is what caused the passing run and means the jobs will continue to pass on the machine as well.

@refack
Copy link
Contributor

refack commented Nov 8, 2017

https://ci.nodejs.org/computer/test-osuosl-aix61-ppc64_be-2/builds
Looking good 👍
image

@refack refack added the incident label Nov 8, 2017
@refack
Copy link
Contributor

refack commented Nov 8, 2017

Closing as this seems resolved.

Summary:

  • Restarting the Jenkins agent can solve ghosts
  • Should avoid rebooting AIX hosts

@refack refack closed this as completed Nov 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants