Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeJS + Linux - IPC via webpack thread-loader is incorrectly received #243374

Closed
bradzacher opened this issue Jul 14, 2023 · 11 comments
Closed

NodeJS + Linux - IPC via webpack thread-loader is incorrectly received #243374

bradzacher opened this issue Jul 14, 2023 · 11 comments
Labels
0.kind: bug Something is broken

Comments

@bradzacher
Copy link

Describe the bug

This bug is a bit complicated. A few weeks ago I upgraded my company's codebase to NodeJS 18 (bumping to this commit 9c70578). Everything went fine during the upgrade - except we've recently found a very weird crash in a very specific case.

We bundle our large monorepo with webpack and thread-loader. After upgrading to the above commit - the build started to crash out with the following error:

<path>/node_modules/thread-loader/dist/readBuffer.js:40
        callback(null, Buffer.concat(buffers, length));
        ^
SyntaxError: Unexpected token  in JSON at position 251504

We tried reproducing this bug locally on our laptops and we couldn't.
When we SSH'd into one of the linux CI boxes and ran the same commands - we could reproduce it.
This told us it was something OS-specific.

We symptom the problem deeper and in a nutshell it is as follows:
During a build webpack (via thread-loader) spawns two child workers via node's child_process. It uses node's builtin message passing streams to communicate via JSON blobs between the processes. Because communication is buffered, thread-loader precedes the blob with a 32-bit int dictating the length of the following message - which allows the receiver to know when to keep reading and when to stop.
When serialised as bytes of a utf-8 string - a small enough BE 32-bit integer is lead with 0 bytes (eg 53148 can be hex encoded as 00 00 cf 9c).
So we figured out that what's happening is that the first message is being interrupted by the next message - so the payload looks like: <int>{..partial json..<int>{..json..} - or put another way: <int>{..partial json..\0\0\xcf\x9c{..json..}.
Null bytes are disallowed by JSON parsers - leading the JSON parser to crash when it reaches the start of the 2nd integer.

Originally we thought it was an out-of-date package or something weird with our webpack customisations - but after updating packages and cutting back as much as we can - the issue still persisted. We investigated thread-loader and found that whilst it is complicated, it doesn't use any native npm packages - just node's APIs.

Next we thought that it might be a breakage specifically in Node18 - so we updated the nix config to Node16 and it still crashed in exactly the same way. I went to try Node14 on the commit but Node14 isn't a cached build. Because I didn't want to sit through the ~3h local build I installed v14 via nvm and ran the build outside nix and it worked. For sanity check I then went to bisect to narrow down which major version broke things and to my surprise the build passed successfully on all node versions outside of nix - I tried 14, 15, 16, 17 and 18 (installed from nvm) and they all worked.

This narrowed the problem space down to it specifically being the nix build of NodeJS on Linux. Next we thought that the nix commit might be a broken build - so we tried updating to a newer commit (fc4810b) and it crashed in the same way.

Which further narrowed it down - it must be a dependency of NodeJS within nix that's broken. To validate that it wasn't our overlay that broke something I isolated NodeJS and it was still broken.

I'm currently at a bit of a loss as to what the problem might be. I'm going to try a few old commits to see if I can get a bisect of sorts - but it's going to take a while given how long NodeJS takes to build.

I was hoping that with my description one of the maintainers or experts might have an idea that points towards a fix.

Steps To Reproduce

Sadly I don't have an exact reproduction for you - I haven't been able to properly isolate it. AFAICT it needs quite a large codebase (>50k files) running through a webpack build.

I'll see if I can create one whilst I wait for builds to complete.

Expected behavior

NodeJS + webpack + thread loader works on linux

Screenshots

N/A

Additional context

N/A

Notify maintainers

@marsam @cko @gilligan @cillianderoiste

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 5.19.0-1028-aws, Ubuntu, 22.04.2 LTS (Jammy Jellyfish), nobuild`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.16.1`
 - channels(root): `"nixpkgs-23.11pre494885.683f2f5ba2e"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixpkgs`
@bradzacher bradzacher added the 0.kind: bug Something is broken label Jul 14, 2023
@bradzacher
Copy link
Author

bradzacher commented Jul 14, 2023

Note: just tried commit a05f2b1 (libuv v1.44.2) and it works fine.

@bradzacher
Copy link
Author

bradzacher commented Jul 14, 2023

Note: just tried commit 9d2e951 (libuv v1.45.0) and it crashed.

@bradzacher
Copy link
Author

bradzacher commented Jul 15, 2023

Note: just tried commit f53e153 (the parent of 9d2e951) and it worked successfully - which suggests that libuv 1.45.0 is broken in nix.

cc libuv maintainer @cstrahan

libuv 1.45.0 changelog

https://github.com/libuv/libuv/blob/v1.45.0/ChangeLog#L1-L291

2023.05.19, Version 1.45.0 (Stable)

Changes since version 1.44.2:

  • win: remove stdint-msvc2008.h (Ben Noordhuis)
  • android: remove pthread-fixes.c (Ben Noordhuis)
  • build: enable MSVC_RUNTIME_LIBRARY setting (自发对称破缺)
  • unix: switch to c11 atomics (Ben Noordhuis)
  • unix: don't accept() connections in a loop (Ben Noordhuis)
  • win: fix off-by-1 buffer overrun in uv_exepath() (Ben Noordhuis)
  • build: switch ci from macos-10.15 to macos-11 (Ben Noordhuis)
  • win: fix thread race in uv_cwd() and uv_chdir() (Ben Noordhuis)
  • unix,win: remove UV_HANDLE_SHUTTING flag (Santiago Gimeno)
  • win: support Windows 11 in uv_os_uname() (Luan Devecchi)
  • unix: fix uv_getrusage() ru_maxrss reporting (Ben Noordhuis)
  • doc: add note about offset -1 in uv_fs_read/write (Steven Schveighoffer)
  • test: fix musl libc.a dlerror() test expectation (Ben Noordhuis)
  • kqueue: DRY file descriptor deletion logic (Ben Noordhuis)
  • linux: teach uv_get_constrained_memory() cgroupsv2 (Ben Noordhuis)
  • build: upgrade qemu-user-static package (Ben Noordhuis)
  • linux: move epoll.c back into linux-core.c (Ben Noordhuis)
  • unix: remove pre-macos 10.8 compatibility hack (Ben Noordhuis)
  • unix,win: fix memory leak in uv_fs_scandir() (Ben Noordhuis)
  • build: restore qemu download logic (Ben Noordhuis)
  • win: fix uv__pipe_accept memory leak (number201724)
  • doc: update LINKS.md (Daniel)
  • unix: simplify atomic op in uv_tty_reset_mode() (Ben Noordhuis)
  • build: add LIBUV_BUILD_SHARED cmake option (Christian Clason)
  • linux: remove unused or obsolete syscall wrappers (Ben Noordhuis)
  • linux: merge files back into single file (Ben Noordhuis)
  • stream: process more than one write req per loop tick (ywave620)
  • unix,win: give thread pool threads an 8 MB stack (Ben Noordhuis)
  • build: add MemorySanitizer (MSAN) support (Ben Noordhuis)
  • doc: add uv_poll_cb status==UV_EBADF note (jensbjorgensen)
  • build: support AddressSanitizer on MSVC (Jameson Nash)
  • win,pipe: improve method of obtaining pid for ipc (number201724)
  • thread: add support for affinity (daomingq)
  • include: map ENODATA error code (Ben Noordhuis)
  • build: remove bashism from autogen.sh (Santiago Gimeno)
  • win,tcp,udp: remove "active streams" optimization (Saúl Ibarra Corretgé)
  • win: drop code checking for Windows XP / Server 2k3 (Saúl Ibarra Corretgé)
  • unix,win: fix 'sprintf' is deprecated warning (twosee)
  • doc: mention close_cb can be NULL (Qix)
  • win: optimize udp receive performance (ywave620)
  • win: fix an incompatible types warning (twosee)
  • doc: document 0 return value for free/total memory (Ben Noordhuis)
  • darwin: use hw.cpufrequency again for frequency info (Jameson Nash)
  • win,test: change format of TEST_PIPENAME's (Santiago Gimeno)
  • win,pipe: fixes in uv_pipe_connect() (Santiago Gimeno)
  • misc: fix return value of memory functions (theanarkh)
  • src: add new metrics APIs (Trevor Norris)
  • thread: add uv_thread_getcpu() (daomingq)
  • build: don't use ifaddrs.h on solaris 10 (Edward Humes)
  • unix,win: add uv_get_available_memory() (Tim Besard)
  • test: fix -Wunused-but-set-variable warnings (Ben Noordhuis)
  • doc: bump min supported linux and freebsd versions (Ben Noordhuis)
  • Add Socket Runtime to the LINKS.md (Sergey Rubanov)
  • unix: drop kfreebsd support (Ben Noordhuis)
  • win: fix fstat for pipes and character files (Stefan Stojanovic)
  • win: fix -Wunused-variable warning (Ben Noordhuis)
  • win: fix -Wunused-function warning (Ben Noordhuis)
  • build: drop qemu-alpha from ci matrix (Ben Noordhuis)
  • win: move child_stdio_buffer out of uv_process_t (Santiago Gimeno)
  • test: fix some unreachable code warnings (Santiago Gimeno)
  • linux: simplify uv_uptime() (Ben Noordhuis)
  • test: unflake fs_event_watch_dir test (Ben Noordhuis)
  • darwin: remove unused fsevents symbol lookups (Ben Noordhuis)
  • build: add define guard around UV_EXTERN (Zvicii)
  • build: add UndefinedBehaviorSanitizer support (Ben Noordhuis)
  • build: enable platform_output test on qemu (Ben Noordhuis)
  • linux: handle cpu hotplugging in uv_cpu_info() (Ben Noordhuis)
  • build: remove unnecessary policy setting (dundargoc)
  • docs: add vcpkg instruction step (Jack·Boos·Yu)
  • win,fs: fix readlink errno for a non-symlink file (Darshan Sen)
  • misc: extend getpw to take uid as an argument (Jameson Nash)
  • unix,win: use static_assert when available (Ben Noordhuis)
  • docs: delete code Makefile (Jameson Nash)
  • docs: add CI for docs PRs (Jameson Nash)
  • docs: update Sphinx version on RTD (Jameson Nash)
  • doc: clean up license file (Ben Noordhuis)
  • test: fix some warnings when compiling tests (panran)
  • build,win: add mingw-w64 CI configuration (Jameson Nash)
  • build: add CI for distcheck (Jameson Nash)
  • unix: remove busy loop from uv_async_send (Jameson Nash)
  • doc: document uv_fs_cb type (Tamás Bálint Misius)
  • build: Improve build by cmake for Cygwin (erw7)
  • build: add libuv:: namespace to libuvConfig.cmake (AJ Heller)
  • test: fix ThreadSanitizer thread leak warning (Ben Noordhuis)
  • test: fix ThreadSanitizer data race warning (Ben Noordhuis)
  • test: fix ThreadSanitizer data race warning (Ben Noordhuis)
  • test: fix ThreadSanitizer data race warning (Ben Noordhuis)
  • test: cond-skip fork_threadpool_queue_work_simple (Ben Noordhuis)
  • test: cond-skip signal_multiple_loops (Ben Noordhuis)
  • test: cond-skip tcp_writealot (Ben Noordhuis)
  • build: promote tsan ci to must-pass (Ben Noordhuis)
  • build: add CI for OpenBSD and FreeBSD (James McCoy)
  • build,test: fix distcheck errors (Jameson Nash)
  • test: remove bad tty window size assumption (Ben Noordhuis)
  • darwin,process: feed kevent the signal to reap children (Jameson Nash)
  • unix: abort on clock_gettime() error (Ben Noordhuis)
  • test: remove timing-sensitive check (Ben Noordhuis)
  • unix: DRY and fix tcp bind error path (Jameson Nash)
  • macos: fix fsevents thread race conditions (Ben Noordhuis)
  • win: fix leak in uv_chdir (Trevor Norris)
  • test: make valgrind happy (Trevor Norris)
  • barrier: wait for prior out before next in (Jameson Nash)
  • test: fix visual studio 2015 build error (Ben Noordhuis)
  • linux: fix ceph copy error truncating readonly files (Bruno Passeri)
  • test: silence more valgrind warnings (Trevor Norris)
  • doc: add entries to LINKS.md (Trevor Norris)
  • win,unix: change execution order of timers (Trevor Norris)
  • doc: add trevnorris to maintainers (Trevor Norris)
  • linux: remove epoll_pwait() emulation code path (Ben Noordhuis)
  • linux: replace unsafe macro with inline function (Ben Noordhuis)
  • linux: remove arm oabi support (Ben Noordhuis)
  • unix,sunos: SO_REUSEPORT not valid on all sockets (Stacey Marshall)
  • doc: consistent single backquote in misc.rst (Jason Zhang)
  • src: switch to use C11 atomics where available (Trevor Norris)
  • test: don't use static buffer for formatting (Ben Noordhuis)
  • linux: introduce io_uring support (Ben Noordhuis)
  • linux: fix academic valgrind warning (Ben Noordhuis)
  • test: disable signal test under ASan and MSan (Ben Noordhuis)
  • linux: add IORING_OP_OPENAT support (Ben Noordhuis)
  • linux: add IORING_OP_CLOSE support (Ben Noordhuis)
  • linux: remove bug workaround for obsolete kernels (Ben Noordhuis)
  • doc: update active maintainers list (Ben Noordhuis)
  • test: add ASSERT_OK (Trevor Norris)
  • src: fix events/events_waiting metrics counter (Trevor Norris)
  • unix,win: add uv_clock_gettime() (Ben Noordhuis)
  • build: remove freebsd and openbsd buildbots (Ben Noordhuis)
  • win: fix race condition in uv__init_console() (sivadeilra)
  • linux: fix logic bug in sqe ring space check (Ben Noordhuis)
  • linux: use io_uring to batch epoll_ctl calls (Ben Noordhuis)
  • macos: update minimum supported version (Santiago Gimeno)
  • docs: fix some typos (cui fliter)
  • unix: use memcpy() instead of type punning (Ben Noordhuis)
  • test: add additional assert (Mohammed Keyvanzadeh)
  • build: export compile_commands.json (Lewis Russell)
  • win,process: write minidumps when sending SIGQUIT (Elliot Saba)
  • unix: constrained_memory should return UINT64_MAX (Tim Besard)
  • unix: handle CQ overflow in iou ring (Santiago Gimeno)
  • unix: remove clang compiler warning pragmas (Ben Noordhuis)
  • win: fix mingw build (gengjiawen)
  • test: fix -Wbool-compare compiler warning (Ben Noordhuis)
  • win: define MiniDumpWithAvxXStateContext always (Santiago Gimeno)
  • freebsd: hard-code UV_ENODATA definition (Santiago Gimeno)
  • linux: work around EOWNERDEAD io_uring kernel bug (Ben Noordhuis)
  • linux: fix WRITEV with lots of bufs using io_uring (Santiago Gimeno)

@bradzacher
Copy link
Author

bradzacher commented Jul 15, 2023

Investigating my local install of node 18:

$ node -e 'console.log(process.version, process.versions.uv)'
v18.16.1 1.44.2

So that is probably why the non-nix version works - because it's using 1.44.2; whereas the nix version for 18.16.1 uses libuv 1.45.0.

It was only a minor version bump - so nothing should have broken?

@bradzacher
Copy link
Author

bradzacher commented Jul 17, 2023

Investigating further - it appears that nodejs builds each nodejs versions with specific versions of libuv. eg https://github.com/nodejs/node/blob/v18.16.1/deps/uv/include/uv/version.h#L33-L35
Which would be my local version uses 1.42.2.

It looks like Node v20.3.0 it is the first version to be built against libuv v1.45.0
https://github.com/nodejs/node/blob/v20.3.0/deps/uv/include/uv/version.h#L33-L35

@bradzacher
Copy link
Author

bradzacher commented Jul 17, 2023

I tried Node v20.3.1 (libuv 1.45.0) from nvm and it worked fine.
I tried Node v20.3.1 (libuv 1.45.0) from nix and it crashed.

I also tried 87478fd (node v20.4.0 w/ libuv 1.46.0) and it crashed.
I also tried Node v20.4.0 from nvm (which comes with libuv 1.46.0) and it worked fine.

So I'm back to it being a nix build issue rather than a libuv problem.

@marsam
Copy link
Contributor

marsam commented Jul 17, 2023

Thanks a lot for reporting and investigating this.
Would you mind trying the following?

--- a/pkgs/development/web/nodejs/v18.nix
+++ b/pkgs/development/web/nodejs/v18.nix
@@ -1,8 +1,8 @@
-{ callPackage, openssl, python3, enableNpm ? true }:
+{ callPackage, quictls, python3, enableNpm ? true }:
 
 let
   buildNodejs = callPackage ./nodejs.nix {
-    inherit openssl;
+    openssl = quictls;
     python = python3;
   }; 

@bradzacher
Copy link
Author

bradzacher commented Jul 17, 2023

Thanks for the suggestion @marsam!
I included your patch and rebuilt (this was against 9c70578)

$ node
Welcome to Node.js v18.16.1.
Type ".help" for more information.
> process.versions.openssl
'3.0.9+quic'
> process.versions.uv
'1.45.0'

However it crashes in exactly the same way as before.

@marsam
Copy link
Contributor

marsam commented Oct 11, 2023

Apparently it's due io_uring support in libuv nodejs/node#49911.
Just to confirm, can you try setting UV_USE_IO_URING=0?

@bradzacher
Copy link
Author

ah that would make sense - and 18.16.1 breaks with nix because nix was building it with libuv v1.45.0 instead of 1.44.2 like the original binary.

so I guess than node v18.18.1 which includes the fix should work fine as well.

I can't test this any more because we actually went the route of rewriting thread-loader to use a different threading model to fix this 🤣

@marsam
Copy link
Contributor

marsam commented Jan 23, 2024

This should be fixed by #265974

@marsam marsam closed this as completed Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

No branches or pull requests

2 participants