Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch stream2 writable v2 #67

Merged
merged 28 commits into from
Feb 25, 2015
Merged

Patch stream2 writable v2 #67

merged 28 commits into from
Feb 25, 2015

Conversation

thomas-riccardi
Copy link
Contributor

Here is a review with fixes and cleanups on top of riaan53/gridfs-stream master, for Writable stream only.

  • no more internal buffering, see High memory usage because of unbound internal buffering #58
  • finish event is emitted when the file is fully on GridFS (using flushwritable).
  • destroy() emits an error (and the error can be specified as destroy(err), because the write to GridFS may be aborted, so the file is corrupted.
  • Added documentation and tests for special events open and close (and finish)
  • Added documentation and tests for destroy

Pending questions:

I don't know what we should do in case of error:

  • Internally: Currently we just call _error() which emits error and then _close().
    It works, but we could do other things: _write and _flush give callbacks that accept an error as first argument. If there is either a _write or a _flush (we cannot have both at the same time) running during the error (for example _store.open() failed, or _store.write() failed, or _store.close() failed), then we could (also?) call that callback with the error.
    From what I read in the code of nodejs v0.10.31 and v0.12.0, both with indirectly emit error, and the _write() callback will call the optional write() callback so the caller will be notified too. It may be useful to notify the caller this way, but it's harder to do it properly: we need to avoid multiple error emitted, and we still need to call _close() (and maybe in a precise order: it may be useful to guarantee that 'error' happens before (or after?) 'close'; for that we could delay _close() to nextTick). I don't know what is the best practice here... Maybe we should ask on nodejs mailing list?
  • externally: We always call _store.close(), so we always commit the file to GridFS by writing the document in the files collection. This may not be the best solution in case of error: If an error happened, the file may be corrupted, but we store it like no error happened. Two solutions:
    • keep commiting the file to GridFS, but add an extra metadata field error to notify that this file may be corrupted. It would be up to the caller to remove the file afterward if it wants that, or not.
    • remove the file and the chunks on error, because the file is likely corrupted, but it's an issue if the file was overwriting another file, because we would want to rollback to a previous state (which is what we would expect. It's how AWS S3 works for example), and the current GridStore from nodejs mongodb driver 2.0.15 deletes the old chunks before starting to write the new ones, so we cannot... Maybe it's a bug of the mongodb driver, but I don't see how it could implement what we want without performance issues.

I haven't reviewed the Readable stream yet, I'm less familiar with Readable streams than with Writable ones...

PS: I patched the commit riaan53@c6ac486 in this branch to remove the wrongly committed temporary file test/tmp/1mbBlob.

riaan53 and others added 28 commits February 12, 2015 16:11
Pull new master from gridfs-stream
…store is closed. Do not call _write internally. Inherit from FlushWriteable as a workaround to implement a _flush method. Removed destrySoon.
…lush indication is received before the store opens.
It's not strictly required but it makes the code more robust.
Check and set them whenever possible for more robust code.
Like _delayedWrite and _writeInternal; we need to delay the flush during
opening, let's do it the same way as _write.
Because _opened can be false even if opening finished (in case of
error), and we don't want to delay the write in case of error.

It's also done this way for delayed _flush.

Normalize comments with delayed _flush too.
…ile is really written (ie after _store.close()
Test open, close, finish events.
Now emits an 'error', because it stops the write before it's finished.
Also takes an optional error argument which will be emitted with the
'error' event.

Add a test for both features.
…time

to the caller to register the callbacks.
It could happen if destroy() is called while _store.open fails.
If destroy() is called during _store open, then _close could be ignored;
leaving the _store open.
…ether.

Current implementation always calls 'open' and 'close'; it could also
just call 'error', in which case we should rewrite this test.
@riaan53
Copy link
Contributor

riaan53 commented Feb 24, 2015

Awesome! thank you.

@Reggino Reggino merged commit 546f160 into aheckmann:master Feb 25, 2015
@Reggino
Copy link
Collaborator

Reggino commented Feb 25, 2015

@triccardi-systran - Sorry for the late reply: thank you for this! Very cool.

Your version has been published as version 1.0.2.

@thomas-riccardi
Copy link
Contributor Author

@Reggino
What about semver? a581ca9 removed public method destroySoon, so a major version bump should have been done. This PR add new features (destroy() now optionally takes an error parameter), so a minor version bump should have been done.
If we don't want to break the API we could add destroySoon again, as a deprecated alias to destroy(), this is now almost the same behavior as we don't queue internally (the only difference is that the current _write() was written, and now is ignored (hm, this case may lead to a Chunk leak... maybe we should always wait for the _write to finish...).

@thomas-riccardi
Copy link
Contributor Author

(Nevermind for the Chunk leak in case of destroy(): it's not part of the File data, but it's still associated to the File via its files_id. Luckily on File unlink the MongoDB driver removes all Chunks associated to the File via files_id, so the Chunk will not definitely leak.)

Reggino added a commit that referenced this pull request Feb 26, 2015
@Reggino
Copy link
Collaborator

Reggino commented Feb 26, 2015

@triccardi-systran Thanks, you're right. I've added the deprecated destroySoon and publish it as 1.1.0

@thomas-riccardi
Copy link
Contributor Author

For the error handling questions, I opened a ticket at mongodb:
https://jira.mongodb.org/browse/NODE-372

yeonwoozxi added a commit to yeonwoozxi/aheckmann2 that referenced this pull request Nov 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants