-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Currently no way to use zlib optimally in Node.js #47709
Comments
You can always change the size of the libuv thread pool (by setting the |
I did consider this, and that may be an acceptable solution for some simple situations where you have total control over where your code is being run, but in my case, the code is going to be run within many different Node programs across many teams in a large organization. It's unreasonable for me to expect that I can force every person in the organization to set this environment variable, especially when the programs in question are hosted across a wide variety of infrastructure. In other words, it should be possible to create an |
It's already possible to do what you're proposing by compiling zlib to wasm or using an existing JS compression library. You're welcome to open a pull request but it's a niche enough feature that I won't guarantee it's going to get merged. |
Sure, but wasm and JS are slower than using native zlib. They're not optimal. And it seems silly to reach for a third-party library for zlib when Node.js is supposed to come with it out-of-the-box. I think if Node.js started out with worker threads, there's a good chance the zlib module would never have been written to use the libuv event loop. Even ignoring the issue with thread pool size, the libuv event loop isn't supposed to be used for CPU-bound tasks, so the current zlib implementation seems like an anti-pattern to me. With the current API, if I want to perform CPU-bound tasks like compression, it will actually block I/O-bound tasks such as reading files, because all the event loop threads will be busy doing actual work (instead of just sitting around waiting for I/O, which is what they're designed for). The idea that the current API is an anti-pattern is reinforced by the Node.js docs themselves, which warn about memory fragmentation due to using the event loop: https://nodejs.org/api/zlib.html#threadpool-usage-and-performance-considerations. Compression is inherently a synchronous job, so it makes (to me) to optimize the API around using it synchronously. I'm happy to create a PR. |
You don't know that until you measure it. Reviewers will ask you to quantify performance gains compared to the status quo.
That seems like an argument for smarter thread pool scheduling to me, not a reason to work around it by adding sync compression.
It's a lot more subtle than that, see #8871 (comment). The docs are not intended to be an expose on the intricacies of glibc so the warning was reduced to something that lacks nuance but saying it's an anti-pattern is factually wrong. |
Hello, I am the author of one of the mentioned libraries and I just want to drop my two cents on this topic. It appears to me that the current zlib implementation is geared towards compression of files and/or other sources of large quantities of data, however one very common use case that is currently suboptimal to implement is transport compression for network streams, tcp and websocket. This type of compression requires keeping a shared zlib context for as long as the connection lives, and most of its time will be spent processing small chunks of data, which can potentially be done more efficiently in a synchronous fashion, which also avoids the issues discussed in #8871 as well as in ws#1369. As an example, the Discord API uses of this type of compression in its websocket connections, and most of its NodeJS client libraries ended up relying on third party solutions such as https://github.com/abalabahaha/zlib-sync or one of the mentioned libraries to provide this functionality instead of using the built in zlib. Event though my use case is a bit different from Joshua's, I would personally love to see a synchronous shared context zlib class implemented some day, even if just for the sake of having more options available to developers, just like many other Node APIs have both sync and async variants. |
@JoshuaWise @timotejroiko you may want to coordinate amongst yourselves. This is basically at the "pull request welcome and we'll take it from there" stage. No promises but no outright "no" either. |
Can this be set from within a running node application? I assume the answer is "no"? Edit: If this is to be believed, it should be possible to set before any async stuff is done. An real API for it would be still nice, I guess. |
Not reliably, no. I don't think there's anything left to discuss, we're basically waiting for a pull request to materialize, so I'm going to go ahead and close the issue. |
What is the problem this feature will solve?
Zlib compression is inherently a CPU-bound task. Therefore, it's only logical that when you need to compress lots of things, you'd want to utilize all your CPUs. However, Node.js's built-in
zlib
module does compression withinlibuv
's thread pool, which is designed for I/O-bound tasks, not CPU-bound tasks. In fact,libuv
spawns exactly4
threads by default; it has no correlation to the number of CPUs available.Node.js does provide synchronous convenience methods which do the work in the current thread. This, in combination with
worker_threads
, allows you to actually utilize all your CPUs for bulk compression tasks. Unfortunately, these synchronous methods are unable to process chunks piecewise; if you need to compress a large file, you have to put the entire thing into memory.Therefore, if you need to compress many large files, Node.js provides no public API for doing it in an optimal way.
What is the feature you are proposing to solve the problem?
I propose a new API within the built-in
zlib
module which allows you to synchronously compress data, chunk-by-chunk. The API can mirror the existingHash
class that's provided by thecrypto
module.An example of real-world usage might look like this:
What alternatives have you considered?
I have searched npm and found only two packages that claim to achieve this while using native zlib bindings:
Both of these packages utilize very scary, dirty hacks around the undocumented
_processChunk()
method, and manipulating the undocumented_handle
property (which holds the native zlib instance), used internally by Node.js. I don't find these solutions to be remotely safe or future-proof.The text was updated successfully, but these errors were encountered: