-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stdout always cuts off the stream at character 8192 #12921
Comments
Can you provide some example code that demonstrates the problem? |
I cannot reproduce in Windows 7 x64: const fs = require('fs');
const cp = require('child_process');
console.log(process.version);
fs.writeFileSync('data.txt', 'x'.repeat(8192 + 100));
console.log(fs.statSync('data.txt').size);
cp.exec('cat data.txt', (error, stdout, stderr) => {
console.log(stdout.length);
});
|
Example: const child_process = require('child_process')
const stdout = child_process.execFileSync(
'yarn',
['info', 'joi', '--json'],
{
encoding: 'utf8',
maxBuffer: 500 * 1024
}
)
console.log('Stdout length:', stdout.length)
try {
const stdoutObject = JSON.parse(stdout)
console.log('stdout is a valid JSON object.')
} catch (err) {
console.log('stdout is an invalid JSON object.')
console.error(err.toString())
} Output:
# bytes
$ yarn info joi --json | wc -c
11222
# lines
$ yarn info joi --json | wc -l
1 edit:
but node.js edit2: edit3: const fs = require('fs');
const cp = require('child_process');
console.log(process.version);
fs.writeFileSync('data.txt', 'x'.repeat(8192 + 100));
console.log(fs.statSync('data.txt').size);
cp.exec('cat data.txt', (error, stdout, stderr) => {
console.log(stdout.length);
}); works on my macOS also correctly. EDIT 4!: |
Here is the test case isolated from yarn, npm ect. const cp = require('child_process');
if (process.argv[2] === 'child') {
console.log('g'.repeat(8193).toString());
}
else {
const child = cp.spawn(process.execPath, [__filename, 'child']);
child.stdout.on('data', function(d) {
console.log(d.toString());
});
} In MAC OS + v7.10, a new line is inserted between 8192nd and 8193rd character, possibly buffering is happening at 8K. This is not happening in Linux. Due to this, the incoming JSON gets corrupted. |
@gireeshpunathil Can you write that script as a test case that makes clear what the expected outcome is (i.e. that crashes otherwise)? I’ve tried a variant of your script in a Travis CI run and it seems to work fine on OS X. |
Isn't that expected as the |
Shouldn't the sync. functions return all the received data? |
I can reproduce the issue on const child_process = require('child_process')
const child = child_process.spawn(
'yarn',
['info', 'joi', '--json']
);
let stdout = '';
child.stdout.on('data', (d) => {
stdout += d;
});
child.stdout.on('close', () => {
try {
const stdoutObject = JSON.parse(stdout)
console.log('stdout is a valid JSON object.')
} catch (err) {
console.log('stdout is an invalid JSON object.')
console.error(err.toString())
}
}); By running the code with dtruss the child process only writes 8192 bytes. Later on it exits, so could it be an specific issue with yarn interoperating with node Dtruss output. Parent Process PID: 31586 | Child PID: 31587:
|
Okay, that’s curious. stdio should be blocking, so why doesn’t libuv write the rest of the 0x2bd6 bytes immediately after the first Alternatively, does Node somehow mess up making FD 1 blocking, and consequently uv_try_write doesn’t write the whole block (16 kB is Node’s current limit for using try_write, and if OS X is the only platform where the buffer is smaller than that by default, it might explain platform-specific weirdness) and “forgets” to write the rest? |
(It seems links to comments are broken in GitHub lately: hashes are stripped. I've reported to Github about the issue. Be careful) |
The problem seems to be that
This is a test case that reproduces the error: const cp = require('child_process');
if (process.argv[2] === 'child') {
console.log('g'.repeat(81933).toString());
process.exit();
} else {
const child = cp.spawn(process.execPath, [__filename, 'child']);
child.stdout.pipe(process.stdout);
} I guess yarn is calling process.exit() before waiting the data to be written out. |
@santigimeno What kind of stream is |
Nvm, found it in the docs. I don’t actually see the point in being so inconsistent about stdio – it doesn’t seem to me like there’s any sufficiently good reason to have async stdio. I’ll try to put a PR together and see if I’m proven wrong by our tests 😄 |
@addaleax , @santigimeno - sorry, my test case was mis-leading. Agree, that is a manifestation of two callbacks, and the newline inserted by console.log Having said that, again sorry to diverge the attention, but I guess there is something else happening other than the process.exit(). Here is why:
|
Always make stdio blocking. There have only been a few exceptions to this rule and I can’t seem to find any reason for those that outweighs the benefits. Fixes: nodejs#6456 Fixes: nodejs#12921
Continuation to my previous comment, here is a test case which fails even without process.exit(). Note that however, this fails only in MAC. dtruss showed nothing un-natural, other than erred psynch_cvwait calls. 48131/0xb70b9: write(0x1, "ggg...", 0x10000) = 8192 0
48131/0xb70b9: madvise(0x10229A000, 0x41000, 0x9) = 0 0
48131/0xb70b9: kevent(0x5, 0x7FFF5FBF6C30, 0x2) = 0 0
48131/0xb70b9: kevent(0x5, 0x7FFF5FBF6C30, 0x0) = 0 0
48131/0xb70b9: psynch_cvwait(0x103800830, 0x100000100, 0x0) = -1 Err#316
48131/0xb70b9: psynch_cvwait(0x10230AC40, 0x100000100, 0x0) = -1 Err#316
...
48129/0xb709e: wait4(0xBC03, 0x7FFF5FBF68FC, 0x1) = 48131 0
In the course of its journey to the natural exit, does not look like it prepared or attempted to write again! at least from the system perspective. const cp = require('child_process');
const fs = require('fs');
if (process.argv[2] === 'child') {
fs.writeSync(process.stdout.fd, 'g'.repeat(0x10000).toString());
} else {
const child = cp.spawn(process.execPath, [__filename, 'child']);
child.stdout.on('data', function(d) {
console.log(d.length);
});
} expected: 65536 A more trueful replica of the yarn scenario is as follows: const cp = require('child_process');
const fs = require('fs');
if (process.argv[2] === 'child') {
var str = new fs.SyncWriteStream(process.stdout.fd);
new Promise(function(res, rej) {
str.write(('g'.repeat(0x10000).toString()));
res();
}).then(function(gval) {
console.log('success!');
process.exit();
}, function(eval) {
console.log('error!');
process.exit();
});
} else {
const child = cp.spawn(process.execPath, [__filename, 'child']);
child.stdout.on('data', function(d) {
console.log(d.toString());
});
} expected: 65536 While 'blocking pipes' can be used to justify why stdout was made non-blocking, this brings more concerns on multipe API specs:
|
@gireeshpunathil fs.writeSync(process.stdout.fd, 'g'.repeat(0x10000).toString()); With: fs.writeSync(fs.openSync('/dev/stdout', 'w'), 'g'.repeat(0x10000).toString()); And the parent will receive the full output. I don't see a bug here, just consequences of the trade-offs node makes. |
@bnoordhuis - thanks, the proposed code performs full write. But how about point 1? Why the child would - in the absence of an explicit exit - come out without writing fully? Isn't it a bug? |
@gireeshpunathil Point 1 is this?
IIUYC, node normally waits until it's been able to write all backlogged data to the pipe. (The data having been written does not mean the reader consumed it, though.) Example:
The caveat here probably is that the pipe buffer on MacOS is 64 kB (try the example with 65536 and 65537), which means quite a bit of data can get lost in transit if the reader exits before consuming everything. |
@bnoordhuis, yes - point 1. was about that itself. Are you hinting that the reader (in my case, the parent node) is exiting pre-maturely? with a listener on the child data, I thought it will wait until the child end of the pipe is closed, isn't it the case? with your example, I (the cat command) am able to get all the data - with or without sleeping - any amount.
On the other side, in my example, even if I add a delay to the reader (parent), I don't get more data. const cp = require('child_process');
const fs = require('fs');
if (process.argv[2] === 'child') {
fs.writeSync(process.stdout.fd, 'g'.repeat(0x10000).toString());
} else {
const child = cp.spawn(process.execPath, [__filename, 'child']);
child.stdout.on('data', function(d) {
console.log(d.length);
});
setTimeout(function() {
console.error('slept!');
}, 10000);
}
Evidently, child is exiting pre-maturely. (hint: process.stdout is a tty in your case, and a paired up socket in my case). Am I missing anything? |
@gireeshpunathil Use |
@bnoordhuis , yes - that works fine. But to be honest, that looks to me like a circumvention to the original issue which yarn brought forth: the fact that stdio is now non-blocking is not contained well in sync methods or fs methdods which take native fds as input. This is the route through which the data was written into: #0 uv__fs_write (req=0x7fffe7c1b7c0) at ../deps/uv/src/unix/fs.c:710
#1 0x0000000001576d08 in uv__fs_buf_iter (w=0x7fffe7c1b910) at ../deps/uv/src/unix/fs.c:902
#2 uv__fs_work (w=0x7fffe7c1b910) at ../deps/uv/src/unix/fs.c:974
#3 0x000000000157966c in uv_fs_write (loop=0x21f5ae0, req=0x7fffe7c1b7c0, file=Unhandled dwarf expression opcode 0xf3) at ../deps/uv/src/unix/fs.c:1349
#4 0x00000000012eaba3 in node::WriteString(v8::FunctionCallbackInfo<v8::Value> const&) ()
#5 0x0000000000ad1cc4 in v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) ()
#6 0x0000000000b5423c in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#7 0x0000000000b55075 in v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) ()
#8 0x000017aabf7843a7 in ?? ()
Though I tested only in Linux and Mac, this looks like is applicable to all the platforms, as there is no platform specificity in the write logic. Evidently, there is no second attempt made as in a typical non-blocking fd case, as the fd in question is assumed to be a file device and write is performed in one shot. I would expect this write to be performed in the way uv__write does - catering to blocking and non-blocking descriptors. Thinking about solutions - one of:
diff --git a/lib/fs.js b/lib/fs.js
index c05e9e5..5334070 100644
--- a/lib/fs.js
+++ b/lib/fs.js
@@ -651,6 +651,13 @@ fs.write = function(fd, buffer, offset, length, position, callback) {
// OR
// fs.writeSync(fd, string[, position[, encoding]]);
fs.writeSync = function(fd, buffer, offset, length, position) {
+ if(fd === process.stdout.fd ||
+ fd === process.stderr.fd ||
+ fd === process.stdin.fd)
+ throw new Error('fd ' + fd + ' belongs to non-blocking standard stream,' +
+ ' and cannot be used in a sync call');
+
+
if (isUint8Array(buffer)) {
if (position === undefined)
position = null;
Please suggest. |
There is no documentation to update because the |
@bnoordhuis - thanks. I will work on a PR |
@gireeshpunathil What exactly are you proposing? I don't think anything needs to change. |
the check mentioned in this comment to be applied on readSync and writeSync - to make sure that non-blocking fds do not propagate into Sync calls - as they cannot handle them reliably and completely. |
If we don't do that, the API description for fs.writeSync will not match with the actuals - nor the doc |
It does exactly what it says. If your assumption is that |
@bnoordhuis - ok, thanks for the clarification - I did not think in that way. However, I have this doubt - the reason for returning fewer bytes than requested on a blocking write I guess is due to extraneous reasons outside of the I/O buffer limitations? otherwise if both blocking and non-blocking descriptor return in the same way on buffer exhaustion, what is the difference between them? I cannot find any document evidece in OS manuals about returning fewer bytes on blocking writes other than the fact that they return the number of bytes written. The scenario of returning fewer bytes in the present case is clearly due to the fact that it is a non-blocking socket, replacing it with a tty or a blocking socket writes huge data in shot, and never returned with fewer bytes. I guess this is exactly the reason why the stdio's were made non-blocking? The API doc looks right in its grammatical form, but one would expect (as in the yarn case) the data is written in fully, albeit blocking the thread, through the passed fd. |
Yes. For example, a write to an almost-full disk. (Disclaimer: some file systems will reject the request with a ENOSPC error instead of doing a partial write. A robust program should handle both scenarios.) |
@bnoordhuis - thanks, understood. I don't have more comments, and am fine with your assertions. |
Okay, good. I'll remove the confirmed-bug label and close the issue. |
Works around the following bug: nodejs/node#12921
Works around the following bug: nodejs/node#12921
Works around the following bug: nodejs/node#12921
Works around the following bug: nodejs/node#12921
…7298) Works around the following bug: nodejs/node#12921
Using
fork
,spawn
,exec
in async and sync mode - stdout stream only returns up to 8192 characters. (note that the output is without any newlines)Note that
maxBuffer
does not let increase this maximum. But it's possible to lower it by setting it to a value below 8192.The text was updated successfully, but these errors were encountered: