Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Standardize a way for writable streams to do work after all data has been consumed #7348

Closed
evansolomon opened this issue Mar 24, 2014 · 15 comments
Labels

Comments

@evansolomon
Copy link

Transform streams can optionally have a _flush method that lets the stream do something after all incoming data has been consumed, but before the stream's readable side fires its 'end' event. It would be great if a similar convention existed for writable streams.

The particular use case I have in mind here is a writable stream that uploads data to S3 (source) that has to tell S3 that the upload is finished, then wait for a response. In our case, we chose to fire a 'close' event once that work is done. This is annoying for users though, because they have to know that special event exists. If they bind to the standard 'finish' event they may try to use the data produced by the writable stream before it's actually ready. Chaos ensues and tears are shed.

The two solutions I've thought about are (1) allowing writable streams to implement a _flush method or (2) indicating when the last data is being written so that the stream can delay calling the last callback until it's finished with it's post-processing work.

What do you think?

@TooTallNate
Copy link

I wanted this as well when we were speccing out streams2, but the idea got shot down for some reason. I can't remember the details.

@evansolomon
Copy link
Author

I chatted briefly about this with @tjfontaine yesterday afternoon and I think he's open to the idea.

@markstos
Copy link

This is annoying for users though, because they have to know that special event exists

This is what documentation is for. It seems to be a reasonable expectation that if people are using streams, that they know what the methods are. Further, if a module uses the 'close' event, it seems reasonable to document that in the particular model. I see that's now been done for the module in question.

So while I don't find that particular use case compelling, I can see potential use if the WritableStream has an internal buffer. Like with Transform streams a _flush method for a writable stream could be used to clear out an internal buffer once it knows that there's no more incoming data.

I ran into this case myself recently, but in my case I found my WritableStream was better modeled as a Transform stream, which made the problem go away, as I then had access to the _flush method I wanted.

@evansolomon
Copy link
Author

I found my WritableStream was better modeled as a Transform stream

If that works then that's a nice solution. In my case it doesn't make any sense to read data out of the stream, so to get it to actually consume (upload) the data I'd have to bind my own "data" listener or pipe it to some unused place. Transform streams are a special case for readable + writable streams, so it seems weird that they have a feature that exists on neither readable nor writable streams.

@markstos
Copy link

Transform streams are a special case for readable + writable streams, so it seems weird that they have a feature that exists on neither readable nor writable streams.

Great point. I would expect a Transform stream to be a pure union of Readable and Writable streams.

@TomFrost
Copy link

TomFrost commented Nov 11, 2014

To the folks finding this via Google and looking for the best workaround until there's a native _flush implementation, here's my monkey patch. It's a strange approach, but since it doesn't actually patch the stream itself, it's immune to any changes that may be made in reference to this. I'm using this in production to stream to S3, in a framework where 'finish' absolutely must mean that no part of the stream is still active.

var EventEmitter = require('events').EventEmitter,
    Writable = require('stream').Writable,
    util = require('util');

function MyWriteStream() {
    Writable.call(this, {highWaterMark: 500});
}
util.inherits(MyWriteStream, Writable);

MyWriteStream.prototype.emit = function(evt) {
    if (!Writable.prototype._flush && evt === 'finish') {
        this._flush(function(err) {
            if (err)
                EventEmitter.prototype.emit.call(this, 'error', err);
            else
                EventEmitter.prototype.emit.call(this, 'finish');
        }.bind(this));
    }
    else {
        var args = Array.prototype.slice.call(arguments);
        EventEmitter.prototype.emit.apply(this, args);
    }
};

MyWriteStream.prototype._flush = function(cb) {
    // This now executes identically to Transform._flush.
    // 'finish' will be emitted when cb() is called.
};

module.exports = MyWriteStream;

@markstos
Copy link

@TomFrost This looks like a reasonable solution to me, and worthy of publishing on npmjs.org until there's an official solution. As I understand it, you've created a new Writable stream module that will call a custom _flush method after the 'finish' event is emitted. Otherwise, it works like a normal Writable stream. Nicely done.

@TomFrost
Copy link

@markstos Actually, the key is that it calls _flush before emitting 'finish' ;-). But that's an excellent suggestion -- I'll turn that into something that can be used as a drop-in extendable and put it on NPM.

@markstos
Copy link

Thanks!

@TomFrost
Copy link

@markstos Done!
https://www.npmjs.org/package/flushwritable
https://github.com/TomFrost/FlushWritable

@thomas-riccardi
Copy link

I ran into the exact same need: AWS S3 multipart-upload requires a final HTTP request to commit the upload, after all data has been sent.
Another use-case is writing data to GridFS (MongoDB file-system): like with AWS S3 we first have to write the chunks then commit the file by writing the file metadata linking to the chunks.

Earlier suggestions to listen to the finish event, and emit close has several flaws for this use-case:

  • the finish semantic is broken: "all data has been really written" is no more true (in the AWS S3 case it means that the multipart-upload has been commited without error)
  • in case of error during final flush, the stream emits both finish and error, which may cause a lot of trouble in the calling code.

The _flush() method for Writable streams seems to be the best solution: it covers all issues raised here, and doesn't break anything.

@TomFrost Thank you for your monkey patch solution! This is perfect in the mean time!

Will this be fixed in time for node.js v0.12?

@thomas-riccardi
Copy link

node v0.12.0 was released, without Writable._flush.

Will this be fixed in 0.12.x? or should we wait for v0.14?

@markstos
Copy link

@triccardi-systran resolutions are being discussed in the io.js readable-stream repo if you are interested:
nodejs/readable-stream#112

@jasnell
Copy link
Member

jasnell commented May 29, 2015

Closing this in favor of nodejs/readable-stream#112. This should eventually land in either io.js or the converged stream and we'll pick it up from there.

@jasnell jasnell closed this as completed May 29, 2015
@shaoshuai0102
Copy link

shaoshuai0102 commented Feb 14, 2017

still working on progress nodejs/node#2314

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

8 participants