Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aix: add AIX7.1 build machines to CI #2110

Closed
sam-github opened this issue Dec 19, 2019 · 34 comments
Closed

aix: add AIX7.1 build machines to CI #2110

sam-github opened this issue Dec 19, 2019 · 34 comments

Comments

@sam-github
Copy link
Contributor

Plan is a gradual addion of AIX7.1 to test configs, and then an even more gradual switch over of the release builds from AIX6.1 to AIX7.1.

See this for details:

This is a tracking issue, to reference from pull requests, and to comment on as changeover occurs, if problems occur, or to notify interested people what is happening.

@sam-github
Copy link
Contributor Author

I added aix-7.1 to the build labels for https://ci.nodejs.org/job/node-test-commit-aix/

Building:

@richardlau
Copy link
Member

I added aix-7.1 to the build labels for https://ci.nodejs.org/job/node-test-commit-aix/

Building:

* 14.x: https://ci.nodejs.org/job/node-test-commit-aix/27417/ (should build on 7.1)

* 12.x: https://ci.nodejs.org/job/node-test-commit-aix/27418/ (should not build on 7.1 for the moment)

aix71-ppc64 would follow the existing labelling convention.

@sam-github
Copy link
Contributor Author

@richardlau Am I greatly misunderstanding the groovy setup?

[ /aix71/, anyType, lt(14) ],
looks to me like it excludes aix7.1 for lt 14, but its running:

13:07:51 Commit message: "Working on v12.14.1"
...
13:07:56 Triggering node-test-commit-aix » aix61-ppc64
13:07:56 Triggering node-test-commit-aix » aix-7.1

... And I think I see the problem, I selected build label aix-7.1, but the groovy script needs to apply to aix71-ppc64...

I changed the build label selected for the job.

@richardlau
Copy link
Member

@sam-github I think the job isn't using the VersionSelectorScript -- most likely because previously we haven't had to exclude anything on AIX before.

@sam-github
Copy link
Contributor Author

sam-github commented Dec 19, 2019

I'll go check... but I think it worked this last time:

Also, there used to be exclusions, < 6 didn't support AIX:

[ /aix61/, anyType, lt(6) ],

I'll go look at the jobs, try and figure out where the goovy comes in.

@sam-github
Copy link
Contributor Author

Unsurprisingly, you are absolutely correct wrt. ci.nodejs.org.

The script is used for https://ci-release.nodejs.org/job/iojs+release/configure, probably because it is only over there that release exclusions take place.

@sam-github
Copy link
Contributor Author

image

I used https://ci.nodejs.org/job/node-test-commit-smartos/configure as the example, see above, and added that stanza to the aix build. I'll kick off some more builds when the current ones are done to see if it worked.

@richardlau
Copy link
Member

richardlau commented Dec 19, 2019

Looks like tap2junit can't be found.

e,g, https://ci.nodejs.org/job/node-test-commit-aix/27420/nodes=aix71-ppc64/console

17:30:27 [aix71-ppc64] $ /bin/sh -xe /tmp/jenkins5116011305803213197.sh
17:30:27 + mkdir out/junit
17:30:27 mkdir: 0653-358 Cannot create out/junit.
17:30:27 out/junit: Do not specify an existing file.
17:30:27 + true
17:30:27 + tap2junit -i test.tap -o out/junit/test.xml
17:30:27 /tmp/jenkins5116011305803213197.sh[5]: tap2junit:  not found.
17:30:28 POST BUILD TASK : FAILURE

@sam-github
Copy link
Contributor Author

Path problem of some sort:

% ssh  test-ibm-aix71-ppc64_be-1 /opt/freeware/bin/tap2junit -V
usage: tap2junit [-h] --input INPUT --output OUTPUT
tap2junit: error: argument --input/-i is required

Interesting the build still passed.

10.x: https://ci.nodejs.org/job/node-test-commit-aix/27421/nodes=aix71-ppc64/console failed, looks like a timeout.

I'll find where the path is supposed to be set.

@richardlau
Copy link
Member

Interesting the build still passed.

Probably because the tests passed and we don't have the following checked in the post build steps:
image

One could argue that the build passing is accurate (the tests did pass but our post build tap->junit conversion failed).

@sam-github
Copy link
Contributor Author

I modified the post-build step pictured above to include /opt/freeware/bin, that shouldn't affect any of the compilation stages, where PATH is sometimes used to search for compilers.

#/bin/bash -x +e
export PATH=/opt/freeware/bin:$PATH
mkdir out/junit || true
tap2junit -i test.tap -o out/junit/test.xml
tap2junit -i cctest.tap -o out/junit/cctest.xml || true
rm -Rf out/Release/obj.target

@sam-github
Copy link
Contributor Author

sam-github commented Dec 19, 2019

Also, the VersionSelector.groovy depends on a plugin to export node versions as env vars, I enabled it, the "Extract Node.js versions as parameters" section is new:

image

@sam-github
Copy link
Contributor Author

^-- FTR, @richardlau found that.

Now, groovy is respected: https://ci.nodejs.org/job/node-test-commit-aix/27423/console is a 10.x branch build, and its only on AIX6.1.

https://ci.nodejs.org/job/node-test-commit-aix/27424/ is a master/14.x build, it was run on AIX7.1 and 6.1, as it should.

https://ci.nodejs.org/job/node-test-commit-aix/27425/console is a 13.x build, it ran, correctly, only on 6.1 (and then I cancelled it)

So that's good, we can now control versions via the groove script.

@richardlau
Copy link
Member

richardlau commented Dec 19, 2019

^-- FTR, @richardlau found that.

Also FTR the plugin source is https://github.com/nodejs/node-version-jenkins-plugin.

Technically VersionSelectorScript.groovy doesn't explicitly depend on that plugin... it expects NODEJS_MAJOR_VERSION to be set and the plugin does that based on the Git checkout (there are other jobs that use the groovy script without that plugin by setting NODEJS_MAJOR_VERSION as an explicit build parameter).

if (parameters['NODEJS_MAJOR_VERSION'])
nodeMajorVersion = parameters['NODEJS_MAJOR_VERSION'].toString().toInteger()

@sam-github
Copy link
Contributor Author

https://ci.nodejs.org/job/node-test-commit-aix/27424/ passed, and found tap2junit. Still one problem:

15:26:50 + tap2junit -i test.tap -o out/junit/test.xml
15:26:55 + tap2junit -i cctest.tap -o out/junit/cctest.xml
15:26:55 usage: tap2junit [-h] --input INPUT --output OUTPUT
15:26:55 tap2junit: error: argument --input/-i: can't open 'cctest.tap': [Errno 2] No such file or directory: 'cctest.tap'
15:26:55 + true

I haven't had time to look into whether cctest should actually exist.

Kicked off a top-level node-test-commit, to check it builds all the way down to node-test-commit-aix on both 6.1 and 7.1:

@richardlau
Copy link
Member

Ah the cctest thing is kind of expected. In the older release lines gtest (which cctest uses) wrote output in tap format which we convert using tap2junit but later versions of gtest dropped tap support so in later release lines cctest writes directly to xml (with no tap file). The job supports both, hence the || true.

(I did notice in the macOS job the equivalent is

if [ -f "cctest.tap" ]; then
  tap2junit -i cctest.tap -o out/junit/cctest.xml
fi

which is a bit cleaner than || true as it avoids the error being output when the tap file is not present.)

@richardlau
Copy link
Member

(And also nodejs/node#27231 (comment) -- We could backport the updated gtest to 10.x and then when 8.x goes EOL we can simplify the jobs to not expect cctest.tap.)

@richardlau
Copy link
Member

Kicked off a top-level node-test-commit, to check it builds all the way down to node-test-commit-aix on both 6.1 and 7.1:

The failing addon test was also seen on one of the old machines and has an issue raised: nodejs/node#31061

@sam-github
Copy link
Contributor Author

FTR, looking OK in ci.nodejs.org, now working on ci-release.nodejs.org, have a build running there ATM.

@sam-github
Copy link
Contributor Author

sam-github commented Dec 31, 2019

@sam-github
Copy link
Contributor Author

@nodejs/releasers We will convert citgm to use AIX7.1 on all platforms (instead of AIX6.1) after the January 7th releases are complete.

Note that Node.js (by its own support statements) doesn't support AIX6.1, only AIX7.1 and later, so running citgm on supported platforms is preferrable, even though the release was built on an older platform in the expectation of forward binary compatibility.

Using AIX7.1 will allow the removal of most of the AIX specific shell code in the citgm-smoker build steps (though it will still need to call gmake instead of make).

If there are any concerns with the timing of this, please tell us, thanks!

@richardlau
Copy link
Member

@nodejs/releasers We will convert citgm to use AIX7.1 on all platforms (instead of AIX6.1) after the January 7th releases are complete.

Note that Node.js (by its own support statements) doesn't support AIX6.1, only AIX7.1 and later, so running citgm on supported platforms is preferrable, even though the release was built on an older platform in the expectation of forward binary compatibility.

We can use citgm-smoker-nobuild, which runs CITGM against an already built release/nightly, to test forward compatibility.

@AshCripps
Copy link
Member

12.x now uses aix 7.1 for release, final one to add is 10.x

citgm-smoker and citgm-smoker-nobuild now have AIX71 added to them and will run AIX71 on 12.x and above. Running a test run: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/2215/

@richardlau
Copy link
Member

The https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/2215/nodes=aix71-ppc64/testReport/junit/(root)/citgm/node_report_v2_2_9/ looks new. I'll run a comparison with Node.js 10 (which should still run on both AIX 6.1 and 7.1 still) to check.

@richardlau
Copy link
Member

eek @AshCripps citgm-smoker-nobuild on AIX looks broken in a very scary way:
e.g. https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker-nobuild/718/nodes=aix71-ppc64/consoleFull

11:58:05 ++ uname -s
11:58:05 + '[' AIX = AIX ']'
11:58:05 ++ uname -v
11:58:05 + '[' 7 = '6]'
11:58:05 /tmp/jenkins6520255993216933805.sh: line 13: [: missing `]'
11:58:05 + export 'CITGM_COMMAND=citgm node-report'
11:58:05 + CITGM_COMMAND='citgm node-report'
11:58:05 + pgrep node
11:58:05 /tmp/jenkins6520255993216933805.sh: line 21: pgrep: command not found
11:58:05 + xargs kill -9
11:58:05 + mkdir
11:58:05 Usage: mkdir [-p] [-e] [-m mode] Directory ...
11:58:05 + true
11:58:05 + rm -Rf /admin /aha /audit /bin /dev /esa /etc /export /home /lib /lost+found /lpp /mnt /opt /pconsole /proc /run /sbin /smit.log /smit.script /smit.transaction /tftpboot /tmp /u /unix /usr /var
11:58:05 rm: 0653-609 Cannot remove /admin/lost+found.
11:58:05 The file access permissions do not allow the specified action.
11:58:05 rm: 0653-602 Cannot read /admin/tmp/.workdir.11534490.11993228_1.
11:58:05 rm: 0653-611 Directory /admin/tmp is not empty.
11:58:05 rm: 0653-611 Directory /admin is not empty.
11:58:05 rm: 0653-609 Cannot remove /aha/evProds.list.
11:58:05 Operation not permitted.
...

@richardlau
Copy link
Member

image

There's two issues here:

  • Missing space character after "6"
  • temp is not set on AIX 7.1 due to the way the if statements are nested

I'll fix both.

@richardlau
Copy link
Member

Here's a citgm-smoker-nobuild run for node-report that shows it passing on AIX 6.1 but not AIX 7.1: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker-nobuild/719/

The relevant error is

not ok 23 - Checking machine name in report header section contains os.hostname()
           ---
           found: >-
             Event: JavaScript API, location: "TriggerReport"
           
             Filename: node-report.20200110.121808.14811156.001.txt
           
             Dump event time:  2020/01/10 12:18:08
           
             Module load time: 2020/01/10 12:18:08
           
             Process ID: 14811156
           
             Command line: /home/iojs/build/workspace/citgm-smoker-nobuild/node/bin/node
             /home/iojs/tmp/citgm_tmp/29d18be0-64d0-45c6-9189-1cffb65eec28/node-report/test/test-api-bad-processobj.js
             child
           
           
             Node.js version: v10.18.1
           
           
             node-report version: 2.2.9 (built against Node.js v10.18.1, 64 bit)
           
           
             OS version: AIX 7.1
           
           
             Machine: p158a06 00CCB1C24C00
           pattern: '/Machine: test-ibm-aix71-ppc64_be-2/'
           at:
             line: 136
             column: 7
             file: test/common.js
             function: validateContent
           stack: |
             Object.validateContent (test/common.js:136:7)
             fs.readFile (test/common.js:51:47)
           source: |
             t.match(nodeReportSection,
           ...

Namely a mismatch between the expected test-ibm-aix71-ppc64_be-2 and actual p158a06. I'll need to remind myself which API calls are used in the test and node-report itself to retrieve the host info.

@sam-github
Copy link
Contributor Author

@richardlau @AshCripps I thought perhaps the machine did not have its hostname set correctly, but it did:

% ssh test-ibm-aix71-ppc64_be-2 hostname
test-ibm-aix71-ppc64_be-2.nodejs.org

@richardlau
Copy link
Member

@sam-github It's a mismatch between the following in node-report:
The report uses the values returned by uname() in the C++ layer while the test uses os.hostname() at Node.js level which ends up being gethostname() at the C++ layer.

I think uname() results should match the uname command in the shell but no idea what config on AIX affects the C gethostname() call and why there is therefore a discrepency between the API's on our AIX 7.1 vs AIX 6.1 hosts.

Looking at the node-report code again it looks like on z/OS we prepend the result of gethostname() to the Machine: line in the report and that would probably be a simple fix if we do the same for AIX. I'll open a node-report PR.

FWIW this isn't an issue with the diagnositics report in core as it consistently uses libuv functions which we can't use in the standalone report module as they only appeared during the 10.x releases (so older 10.x releases don't have the libuv functions and neither do 8.x or 6.x (which are End-of-Life so technically we don't have to worry about but I'd rather not intentionally break them without good reason)).

@richardlau
Copy link
Member

FWIW this is the discrepancy:
test-osuosl-aix61-ppc64_be-3:

# uname -n
power8-nodejs4
# hostname
power8-nodejs4.osuosl.org
#

test-ibm-aix71-ppc64_be-2

$ uname -n
p158a06
$ hostname
test-ibm-aix71-ppc64_be-2.nodejs.org
$

but as I said above I'll PR an update to node-report.

@richardlau
Copy link
Member

nodejs/node-report#141 should fix node-report so the tests pass on the AIX 7.1 nodes.

@sam-github
Copy link
Contributor Author

@richardlau If node-report is passing, can this be closed?

@richardlau
Copy link
Member

@richardlau If node-report is passing, can this be closed?

@sam-github Were we also going to switch 10.x to build releases on AIX 7.1?

[ /aix71/, releaseType, lt(12) ],

@richardlau
Copy link
Member

Closing as done.

The last step of building 10.x on AIX 7.1 was done a while ago via #2184:

// AIX PPC64 ---------------------------------------------
[ /aix71/, anyType, lt(10) ],

We removed the AIX 6.1 machines in #2273.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants