-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AIX: pseudo-tty/no_dropped_stdio failures on new machine #7973
Comments
@mhdawson Anything different about this new machine? Didn't it pass on the old ones? |
Related to JOBS= ? |
It did pass on the old one. The previous machine was a vm from siteox with different levels of the OS as opposed to the new ones that are from osusol. I have asked @gireeshpunathil to investigate. |
The test case want to make sure every byte that is written to the standard stream is drained before the process exits, basically 1025 characters with 21 new lines. The test case seem to have devised to validate an OS X behavior on the 1025th character. However in AIX case, at times it receives none while the python parent expects 21 lines. The stdout of the child is piped into a tempfile, which is later read by the python parent. I tried to isolate the issue in a number of manners without success: (though many of these different variant of one or two discrete scenarios)
So it looks like the python parent in the test framework has something peculiar or the machine I am running is different from the new CI one. @mhdawson how easy it is to get hold of the failing CI system for debugging? |
Created this request to get Gireesh access nodejs/build#463 |
@gireeshpunathil None of those will work correctly. Your examples are all pipes. You need to use a faked TTY. Either via pty.js or via the python test runner in To be clear: if AIX actually fails this then there is a big problem on AIX even if the test was designed for OS X. |
Thanks @Fishrock123 - yes, I agree that all the above cases are variants of pipes. I actually debugged the python code, and validated that the spawing code does not use faketty in this specific scenario, rather creates a temp file, its fd piped into the child node's streams, and then upon child termination, close the fd and read from the file - hence my attempt to mimic the behavior through case 6 above. However, this is really useful information, I will study the faketty and debug further by taking lead from that. Thanks for the suggestion. |
@gireeshpunathil it is possible that the python fake tty does not work on AIX -- in which case these tests should be disabled there. |
@gireeshpunathil I just want to make sure you are investigating both failures parallel/test-stdio-closed parallel/test-stdio-closed - fails in every run |
Have provided @gireeshpunathil with access to community machine |
We now have adequate AIX hardware to add AIX to the regular regression runs. However, there are a couple of failing tests even though AIX was green at one point. This PR marks those tests as flaky so that we can add AIX so that we can spot any new regressions without making the builds RED The tests are being worked under the following PRs - being worked under nodejs#7564 test-async-wrap-post-did-throw test-async-wrap-throw-from-callback test-crypto-random - being worked under nodejs#7973 test-stdio-closed - covered by nodejs#3796 test-debug-signal-cluster
We now have adequate AIX hardware to add AIX to the regular regression runs. However, there are a couple of failing tests even though AIX was green at one point. This PR marks those tests as flaky so that we can add AIX so that we can spot any new regressions without making the builds RED The tests are being worked under the following PRs - being worked under #7564 test-async-wrap-post-did-throw test-async-wrap-throw-from-callback test-crypto-random - being worked under #7973 test-stdio-closed - covered by #3796 test-debug-signal-cluster PR-URL: #8065 Reviewed-By: joaocgreis - João Reis <reis@janeasystems.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net>
We now have adequate AIX hardware to add AIX to the regular regression runs. However, there are a couple of failing tests even though AIX was green at one point. This PR marks those tests as flaky so that we can add AIX so that we can spot any new regressions without making the builds RED The tests are being worked under the following PRs - being worked under #7564 test-async-wrap-post-did-throw test-async-wrap-throw-from-callback test-crypto-random - being worked under #7973 test-stdio-closed - covered by #3796 test-debug-signal-cluster PR-URL: #8065 Reviewed-By: joaocgreis - João Reis <reis@janeasystems.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net>
@gireeshpunathil have you had a change to look at this ? |
@mhdawson - yes, but is a slow progress. Any major findings, I will post here. |
just a status update: no major insights so far. Issue seen as consistent, trying to localize it into a python / python + C test case: to eliminate node's role in it. |
Does this investigation apply to one or both of:
? |
This is only for 2) pseudo-tty/no_dropped_stdio |
ok, came up with a small python code which works well in Linux but hangs in AIX. This is detached from node, which means that the problem is outside the node scope. While the code comes out immediately in Linux, the usage of fake_out for the stdout causes the python thread to hang in close() call.
I think the psuedo-tty tests should be omitted for AIX, while the python behavioral difference is understood and or addressed - as they don't reveal any issues in node instead blocks the test logic due to the python bug. |
parallel/test-stdio-closed failure - test case performs the following:
In Linux, the child executes the stream writes and then exits with the specified error code (42). And hence the failure: Investigating further. |
Should we be marking |
Pulled out stdio failure from description in initial text for report since it is not being worked in this issue and does not seem related. #8375 |
Just talking with @gireeshpunathil , I'll submit a PR to mark as flaky |
PR to mark as flaky https://github.com/nodejs/node/pull/8385/files |
Make sure to un-exclude this one as well once resolved: #9765 |
pseudo-tty/no_interleaved_stdio has hung a few times in the last couple of days on AIX. We believe it is not a Node.js issue but an issue with python on AIX. Its being investigated under: nodejs#7973. Excluding this additional test until we can resolve the python issue.
pseudo-tty/no_interleaved_stdio has hung a few times in the last couple of days on AIX. We believe it is not a Node.js issue but an issue with python on AIX. Its being investigated under: #7973. Excluding this additional test until we can resolve the python issue. Fixes #9765 PR-URL: #9772 Reviewed-By: Sam Roberts <sam@strongloop.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
pseudo-tty/no_interleaved_stdio has hung a few times in the last couple of days on AIX. We believe it is not a Node.js issue but an issue with python on AIX. Its being investigated under: #7973. Excluding this additional test until we can resolve the python issue. Fixes #9765 PR-URL: #9772 Reviewed-By: Sam Roberts <sam@strongloop.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
pseudo-tty/no_interleaved_stdio has hung a few times in the last couple of days on AIX. We believe it is not a Node.js issue but an issue with python on AIX. Its being investigated under: nodejs#7973. Excluding this additional test until we can resolve the python issue. Fixes nodejs#9765 PR-URL: nodejs#9772 Reviewed-By: Sam Roberts <sam@strongloop.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
pseudo-tty/no_interleaved_stdio has hung a few times in the last couple of days on AIX. We believe it is not a Node.js issue but an issue with python on AIX. Its being investigated under: #7973. Excluding this additional test until we can resolve the python issue. Fixes #9765 PR-URL: #9772 Reviewed-By: Sam Roberts <sam@strongloop.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
pseudo-tty/no_interleaved_stdio has hung a few times in the last couple of days on AIX. We believe it is not a Node.js issue but an issue with python on AIX. Its being investigated under: #7973. Excluding this additional test until we can resolve the python issue. Fixes #9765 PR-URL: #9772 Reviewed-By: Sam Roberts <sam@strongloop.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
pseudo-tty/no_interleaved_stdio has hung a few times in the last couple of days on AIX. We believe it is not a Node.js issue but an issue with python on AIX. Its being investigated under: #7973. Excluding this additional test until we can resolve the python issue. Fixes #9765 PR-URL: #9772 Reviewed-By: Sam Roberts <sam@strongloop.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
The tests in pseudo-tty takes the form of child node writing some data and exiting, while parent python consume them through pseudo tty implementations, and validate the result. While there is no synchronization between child and parent, this works for most platforms, except AIX, where the child exits even before the parent could setup the read loop, under race conditions Fixing the race condition is ideally done through sending ACK messages to and forth, but involves massive changes and have side effect. The workaround is to address them in AIX alone, by adding a reasonable delay. PR-URL: nodejs#11715 Fixes: nodejs#7973 Fixes: nodejs#9765 Fixes: nodejs#11541 Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
The tests in pseudo-tty takes the form of child node writing some data and exiting, while parent python consume them through pseudo tty implementations, and validate the result. While there is no synchronization between child and parent, this works for most platforms, except AIX, where the child exits even before the parent could setup the read loop, under race conditions Fixing the race condition is ideally done through sending ACK messages to and forth, but involves massive changes and have side effect. The workaround is to address them in AIX alone, by adding a reasonable delay. PR-URL: nodejs#11715 Fixes: nodejs#7973 Fixes: nodejs#9765 Fixes: nodejs#11541 Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Failures on new AIX machine: https://ci.nodejs.org/job/node-test-commit-aix/nodes=aix61-ppc64/283/console
The text was updated successfully, but these errors were encountered: