-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion fails in uv__io_poll() in aix.c after IBM XL C++ Runtime upgrade #3465
Comments
cc @libuv/aix |
Looks like a nodejs bug. libuv does not use (or support) threads in that manner |
Can you run with any sanitizer tools, such as ASAN or TSAN? |
@laurencehook is there a recreate using the AIX binaries that are available on https://nodejs.org/en/download/ ? |
Or possibly a simplified C program using libuv that also recreates? |
The assert at line 297 has been that way for 8 years, so its nothing new/recent in the code. The structure used in the assert is Doc for that structure in AIX - https://www.ibm.com/docs/en/aix/7.2?topic=files-pollseth-file |
It was suggested that we try rebuilding Node.js with a more recent GCC compiler version. This recommendation happened to coincide with discovering that a local AIX build environment for Node.js had been "broken" since upgrading the AIX level to 7.2 TL5 SP3, due to the following issue: Upgrading to GCC v8.3.0.6 resolved the build issue due to the struct sigset_t conflict AND appears to have resolved the runtime assertion failure in uv__io_poll(). More testing will be needed once we have upgraded our product build environment for AIX with this later GCC compiler, but it seems promising. |
@laurencehook thanks for the update and good to hear. |
Unfortunately, after upgrading our product AIX build machines to GCC v8.3.0.6, the assert failure in uv__io_poll() still occurs. The local sandbox build from last month seemed ok, but the problem still exists with the output from our product build machine after the GCC upgrade. |
had a quick look. This pattern is typically invalid block eye-catcher. That means, can you run with |
Thanks for the suggestion Gireesh. I did try with 'MALLOCDEBUG=catch_overflow,validate_ptrs' set, and it did appear to blow up consistently when trying to use the 'events' ptr. Further debugging with printing hex values for some of the variables appeared to show that the pollfd 'events' array returned by pollset_poll() was looking fine. It returns one valid entry. But the 'for loop' counter variable 'i' is becoming corrupted and this is being used to increment the 'events' ptr to the next array entry. So pollset_poll() returns 1 event, nfds =1. The first time into the loop everything appears fine, but when we next test the 'i' counter variable, it has some wacky value that evaluates to a large negative number, so still less than nfds (=1), and we use that 'i' as the next offset to 'events'. And that's when the assert will typically fail. It was suggested that we disable gcc compiler optimization. With this disabled, or with optimization level 1, our application starts ok. |
This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions. |
It sounds like we're dealing with a compiler bug here and not something libuv can fix so I'll go ahead and close this out. Let me know if there is reason to reopen. |
The IBM App Connect Enterprise v11/12 integration product on AIX is built using the IBM XL C++ v16.1.0.3 compiler.
It embeds Node.js v14.x.x (most recently 14.18.1 and 14.18.3) which has been built using the GCC v6 compiler.
Therefore, execution of IBM App Connect Enterprise on AIX requires;
We have found that if the IBM XL C++ v17.1 runtime libraries are present on the system, IBM App Connect Enterprise consistently fails to start and the following assertion error is reported to stderr:
Assertion failed: __EX, file ../deps/uv/src/unix/aix.c, line 297
which maps to the following line in method uv__io_poll():
assert((unsigned) pc.fd < loop->nwatchers);
If I simply add a printf statement to report these values, we see something like:
pc.fd, nwatchers: 1532713819, 30
If I add many more printf statements to try to debug it, the problem does not occur.
At the time of the assertion, two threads are in method uv__io_poll():
__assert_c99 [/usr/lib/libpthreads.a(shr_xpg5_64.o)]
uv__io_poll [/var/opt/ace-11.0.0.15/server/lib/libnode.83.a]
uv_run [/var/opt/ace-11.0.0.15/server/lib/libnode.83.a]
_ZN4node16NodeMainInstance3RunEv [/var/opt/ace-11.0.0.15/server/lib/libnode.83.a]
_ZN4node5StartEiPPc [/var/opt/ace-11.0.0.15/server/lib/libnode.83.a]
_ZN13NodejsManager19startAndMonitorNodeEv [bipbroker]
and
uv__io_poll [/var/opt/ace-11.0.0.15/server/lib/libnode.83.a]
uv_run [/var/opt/ace-11.0.0.15/server/lib/libnode.83.a]
ZZN4node23WorkerThreadsTaskRunner20DelayedTaskScheduler5StartEvENUlPvE_4_FUNES2 [/var/opt/ace-11.0.0.15/server/lib/libnode.83.a]
_pthread_body [/usr/lib/libpthreads.a(shr_xpg5_64.o)]
Current workaround is to uninstall the IBM XL C++ V17.1 runtime libraries and use the v16.1.0.8 runtime libraries. The issue was raised with the IBM XL C++ compiler team but they first wanted the owners of aix.c to investigate the assertion.
The text was updated successfully, but these errors were encountered: