Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added notes on special .write callback behaviour on process.stdout/.stderr #3772

Closed
wants to merge 2 commits into from
Closed

Added notes on special .write callback behaviour on process.stdout/.stderr #3772

wants to merge 2 commits into from

Conversation

minesworld
Copy link

This "solves" issue #3670 and the issues of the underlying bug...

@mscdex mscdex added doc Issues and PRs related to the documentations. stream Issues and PRs related to the stream subsystem. process Issues and PRs related to the process subsystem. labels Nov 11, 2015
data is flushed completly. The only way to ensure that all data to
`process.stderr` and `process.stdout` is written and flushed is to let
Node.js end itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... perhaps, "Note that on process.stdout and process.stderr, the callback passed to stream.write might be called before all written data is flushed completely. This can result in data loss if Node.js is ended prematurely using process.exit."

@minesworld
Copy link
Author

As a german I'm not a native english speaker and so I'm fine with any corrections on my worse than school english ;-)

@jasnell
Copy link
Member

jasnell commented Nov 12, 2015

Hey, I am a native English speaker and I struggle at it constantly! :)

One other comment tho: your commit log message needs to be updated to match the project style guidelines (do a git log to see examples and check the contributing.MD in the project root). We can fix it up when it lands but it would be helpful if you could fix it before then. Thank you!

@Fishrock123
Copy link
Contributor

@minesworld could you also add a note to process.exit() that it tries to shut the process down as fast as possible / immediately?

The documentation of stream.write, process.stdout and process.stderr
now makes clear that ending Node.js via process.exit might result in
data loss despite the callback being called.
The documentation of stream.write, process.stdout and process.stderr
now makes clear that ending Node.js via process.exit might result in
data loss despite the callback being called.

The documentation of process.exit now makes clear that Node.js will
end as fast as possible ignoring outstanding writes.
@minesworld
Copy link
Author

I changed the commit messages and hope they fit into the desired scheme.

Also added a note to the process.exit documentation. Now using jasnells sentence.

Please let me know if I should change something...

@jasnell
Copy link
Member

jasnell commented Nov 17, 2015

LGTM
The PR needs to be rebased and updated but otherwise looks good.

@@ -286,6 +286,10 @@ event and that writes can block when output is redirected to a file (although
disks are fast and operating systems normally employ write-back caching so it
should be a very rare occurrence indeed.)

Note that on `process.stdout` and `process.stderr`, the callback passed to
`stream.write` might be called before all written data is flushed completely.
This can result in data loss if Node.js is ended prematurely using `process.exit`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. What you should convey here is that the callback can fire before the receiving end has received all data because it's still in transit in a kernel buffer.

The other caveat is that calling process.exit before callbacks have fired means the data may not have been handed off to the operating system yet. The imperative word here is 'may' - it may or may not have been handed off. It's not until the callback fires that you know for sure.

In case it isn't clear, the flow is node -> kernel -> other, where other is either a tty, a file or another process reading from a pipe. Calling process.exit when pending data is still in node land means the data is irretrievably lost. When it's been handed off to the kernel, it's unspecified what happens - it depends on the operating system and the phase of the moon.

@minesworld
Copy link
Author

@bnoordhuis Sorry - but you're wrong on this. See the example code in #3670 - it would also fail if the buffers are not empty.

What you're saying is that Unix pipes are a miracle as there is no way to tell before calling exit if the receiver got it all...

Reality is that node.js has a bug. And the terminology of the doc changes is correct. The data isn't flushed completely on callback as flushed means that the OS got it.

It doesn't depend on the phase of the moon as exiting a process closes a pipe instead of throwing it away... .

@minesworld
Copy link
Author

@bnoordhuis But (!!) if you want to say that the EXISTING docs are wrong in stating the on callback time the data is flushed - we have to open an issue for that too.

If you're correct in your analysis of the bug, then a workaround would be to call flush on all open file descriptors for writing on process.exit() just to be really sure the data isn'n in 'kernel ring land" anymore...

If this helps it would be the best solution :-)

@bnoordhuis
Copy link
Member

What you're saying is that Unix pipes are a miracle as there is no way to tell before calling exit if the receiver got it all...

That is correct.

It doesn't depend on the phase of the moon as exiting a process closes a pipe instead of throwing it away... .

That's partially correct. With a pipe between two processes, exiting closes one end of the pipe.

If you're correct in your analysis of the bug, then a workaround would be to call flush on all open file descriptors for writing on process.exit() just to be really sure the data isn'n in 'kernel ring land" anymore...

There may be room to improve the current situation but there is no 100% fool-proof way to ensure that the other end consumed all the data. Consider the case where the reader simply stopped reading - the node process would hang indefinitely.

@jasnell
Copy link
Member

jasnell commented Nov 17, 2015

Looking it over again and running a few tests, @bnoordhuis is absolutely correct (thanks for catching that!)

@minesworld
Copy link
Author

@jasnell , @bnoordhuis OK
The documenation says on writable.write:
callback Function Callback for when this chunk of data is flushed
This method writes some data to the underlying system, and calls the supplied callback once the data has been fully handled.

What do you think does "flushed" and "fully handled" mean for "normal" programmers?

On every other serious language I would be able to somehow call the os flush() function also on process.stdout and process.stderr as this is the only way to ensure that all data could be handled by a consumer no matter what.

Yes - you can redefine the meaning of "flushed".

I don't want to write what I think about doing so without telling the users of your project.

The issue will come up again as long as the underlying bug is not fixed or the documentation doesn't make it clear.

As promised I changed the docs. Do what you want with it.

As I can reproduce it in production writing TB to stdout, I'm off this "kindergarten" here...

PS: at the moment this bug doesn't really hurt the reputation of node.js as node is used for different things. But this might change in the near future...

@jasnell
Copy link
Member

jasnell commented Nov 18, 2015

I agree that the current docs are lacking here and definitely appreciate
your work helping to improve them. Issues around this have come up before.
I think, however, that there is some nuance here that is being missed. Node
does take a slightly different approach to this than other languages so
there will be differences. I'll sit down a bit later to go through this PR
and the existing docs again to see what more can be done.
On Nov 18, 2015 4:23 AM, "minesworld" notifications@github.com wrote:

@jasnell https://github.com/jasnell , @bnoordhuis
https://github.com/bnoordhuis OK
The documenation says on writable.write:
callback Function Callback for when this chunk of data is flushed
This method writes some data to the underlying system, and calls the
supplied callback once the data has been fully handled.

What do you think does "flushed" and "fully handled" mean for "normal"
programmers?

On every other serious language I would be able to somehow call the os
flush() function also on process.stdout and process.stderr as this is the
only way to ensure that all data could be handled by a consumer no matter
what.

Yes - you can redefine the meaning of "flushed".

I don't want to write what I think about doing so without telling the
users of your project.

The issue will come up again as long as the underlying bug is not fixed or
the documentation doesn't make it clear.

As promised I changed the docs. Do what you want with it.

As I can reproduce it in production writing TB to stdout, I'm off this
"kindergarten" here...


Reply to this email directly or view it on GitHub
#3772 (comment).

@bnoordhuis
Copy link
Member

What do you think does "flushed" and "fully handled" mean for "normal" programmers?

The same what it would mean for exceptional programmers and - hopefully - for sub-par programmers: that the data has left the process; no more, no less.

If you don't want to update the pull request, that's fine. I'll close it for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Issues and PRs related to the documentations. process Issues and PRs related to the process subsystem. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants