-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Piping a stream into slow and fast consumer streams in parallel causes excessive buffering #16706
Comments
Hi, thanks for the bug report! This seems pretty weird for me since it sounds like the exact same bug as #5820 (and #5257), so we actually have tests that should make sure this doesn’t happen. If I modify your example like this to make it runnable, I see a steady memory footprint: 'use strict';
const fs = require('fs');
const crypto = require('crypto');
const stream = require('stream');
const fileStream = fs.createReadStream(process.execPath, {encoding: null});
const checksum = crypto.createHash("sha1");
const network = new class extends stream.Writable {
_write(data, encoding, cb) {
// 'Slow' stream
setTimeout(cb, 10);
}
}();
fileStream.pipe(checksum);
fileStream.pipe(network);
network.on('finish', () => {
checksum.end();
console.log(checksum.read().toString("hex"));
}); My conclusion would be that something might be calling |
I'm not sure if that would be an expected behaviour in all cases (I can think of cases when it's totally fine to use whole memory avalaible). I think it would be better to emit warning if stream buffers too much (eg. more than |
I just tried to follow your hint regarding let oresume = stream.resume.bind(stream);
stream.resume = (...args) => {
console.trace("resume");
return oresume(...args);
};
let opause = stream.pause.bind(stream);
stream.pause = (...args) => {
console.trace("pause");
return opause(...args);
}; It didn't reveal anything interesting, but it did reveal interesting stuff on the call stack. So, closing this issue for that reason. Sorry for wasting your time. If you're curious and have too much time, here is a more detailed explanation of what happens. I was actually using the So So without the checksum pipe, the file stream isn't flowing yet and But with the checksum, the stream becomes flowing immediately thanks to |
Problem
My use case is the following:
I want to send a file over the network but at the same time calculate a local checksum (like a sha-1 hash) for later use.
I tried the following (abbreviated and pseudo):
And it works... Except that pipe'ing to
checksum
- which can consume a lot faster than a networked stream - will cause the file stream to emit data at that high rate, but since the networked stream is a lot slower, the file stream will buffer all data for it.In the end (almost) the entire file is buffered in memory this way, which is kinda bad especially if that file is a multi gigabyte file like in my tests.
Inspecting the node process showed thousands of 16KB (default
highWatermark
) buffers in memory totaling 5GB (essentially the file size of the file stream) waiting to be consumed by the network stream.Remedies
When a read stream has multiple pipes, it should emit data at the rate of the slowest attached stream, not at the rate of the fastest stream.
Workaround
Well, I cheated and implemented a custom
Transform
that willchecksum.write
in.transform
. That way, only the backpressure of the network stream is taken into account by the file stream emitting data.The text was updated successfully, but these errors were encountered: