Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streams are hot, FileReader.readAs____ is not #40

Closed
jimmywarting opened this issue Jun 11, 2016 · 12 comments · Fixed by #117
Closed

Streams are hot, FileReader.readAs____ is not #40

jimmywarting opened this issue Jun 11, 2016 · 12 comments · Fixed by #117

Comments

@jimmywarting
Copy link

jimmywarting commented Jun 11, 2016

Large Blob/Files becomes an issue if you don't slice it into chunks...

Streams brings other cool feature like pipe-ing, back pressure & transforming
Currently I'm using Response (part of Fetch) to transform a blob to a stream that i can "read into" step by step

var stream = new Response(blob).body
var reader = stream.getReader()

This works okey. but it feels hacky.
Could we get new method to get a stream from File & Blob's please?

@domenic
Copy link
Contributor

domenic commented Jul 14, 2016

This would be a great addition. When I asked people for how they would use streaming uploads in the browser, many of them had to do with getting a file from <input type="file">, modifying it in a streaming fashion (e.g. validating or prepending a header, or interleaving two uploaded files) and then sending them on to a server. The ability to turn a Blob/File into a ReadableStream is crucial for this.

I'd suggest blob.stream(), as a counterpart to the promise-returning blob.arrayBuffer() and blob.text() that were mentioned in #39 (comment).

@jimmywarting
Copy link
Author

jimmywarting commented Sep 8, 2016

Just thought about something. If we have .arrayBuffer() and .text() why not just throw in .json() as well? ✌️
The fetch api has it, so why shouldn't blob's?

Otherwise you end up doing the same thing as (new Response(Blob)).json() which is the same kind of hack. As a bonus it rejects invalid json. Just an idea

@inexorabletash
Copy link
Member

Blob.toStream() (the concept, not the name) was brought up at TPAC F2F and everyone in the room who had an opinion seemed to be a fan.

Also implies we should do FormData.toStream() (again, the idea not the name) for symmetry with other BodyInit types.

Open question raised in the discussion: do we need to compute the total size synchronously to be able to report progress and populate Content-Length headers? Is that doable, i.e. is the encoded size + separators knowable given data length? Is this compatible with current architectures (i.e. Chrome's browser/renderer process split)?

@annevk - did I miss anything?

@annevk
Copy link
Member

annevk commented Sep 27, 2016

That sounds about right. I think for FormData we concluded that exposing the size would be too hard (due to various encodings being in play). (We could also consider URLSearchParams I suppose, although that stringifies so maybe not.)

I guess the other question is whether there is some ToStream operation that takes an arbitrary object and invokes toStream() on it. Whether there's a protocol in play or just some consistency we try to enforce. (And if there's a protocol in play we might not want to use a method invocation to let browsers more easily make certain optimizations.)

As for Content-Length, we're unsure how that can work in general for stream uploads (and not needed in H2 aiui) since we need to take into account that the developer might be malicious. It might be best to simply not support Content-Length use cases since it's going away anyway.

@domenic
Copy link
Contributor

domenic commented Sep 27, 2016

Regarding size, I think that is a somewhat orthogonal issue that can be solved separately. We could add .stream() ASAP, and later add .size if possible, or figure out some out-of-band way of passing through the size specific to whatever progression framework we come up with, or... For now, no Content-Length would get set.

I don't quite remember what purpose the to-stream-able protocol would serve. I guess maybe some internal protocol that produces a { readableStream, optionalLength } structure could be used to remove the switch statements in Fetch, but that's just spec-factoring.

@inexorabletash, I can prioritize working on this next week if you think the spec is ready to accept a PR. We're still figuring out how to interface streams with other specs nicely, so I think having me do the initial work for this spec would be nice. So, a few concrete questions:

  • Is this spec ready to accept such a PR? Maybe we should get more implementer commitments first?
  • What's the name? I like .stream(), but if we do .arrayBuffer() returning a promise for an array buffer, maybe it's confusing for .stream() to return a non-promise? Maybe that argues for .toStream() or .asStream().

@annevk
Copy link
Member

annevk commented Sep 27, 2016

I think the main benefit of a protocol (similar to Symbol.iterator) is that a) userland objects can participate, b) we can potentially make it easier to add such functionality to future IDL classes, and c) we settle any future naming disputes.

@inexorabletash
Copy link
Member

I can't make spec editor or implementer commitments at the moment so no
urgency on my part.

On Tuesday, September 27, 2016, Jimmy Karl Roland Wärting <
notifications@github.com> wrote:

I like .stream() without promise
It short!

You haven't thought of doing .toArrayBuffer() or .asArrayBuffer() so why
should .stream() be prefixed with to/as while arrayBuffer don't?

Its more like fetch. you get the stream from res.body and res.arrayBuffe()
as promise

So I don't think there's any confusion. I just think it's better to mimic
the Response method .arrayBuffer() .text()

If you want to get a stream synchronized you could still use the Response
hack I shown you initially (which would break the purpes of having
something like toStream() be prototyped to blob in the first place)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#40 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAvF2yAFadYnM-GM3VMeg0mRzxWQK8doks5quS8ZgaJpZM4Izqf9
.

@jimmywarting
Copy link
Author

jimmywarting commented Oct 23, 2016

Now when i have tried converting mostly anything to a readable byte stream on a couple of things such as fetch, blob, formData, JSZip, TextEncoder/Decoder, Buffers, iterators, blinks sandboxed filesystem API, and even strings

I'm starting to thing that @annevk got a good point about symbol

@mkruisselbrink
Copy link
Collaborator

So I've been sort or arguing against things like this in the past, primarily because it doesn't seem like adding more methods etc really helps much for the API when you can already achieve pretty much anything by calling new Response(my_blob).body, .arrayBuffer(), .text() etc to read a blob with a nicely promisified and or streamified API. But that doesn't feel very satisfying. The Response wrapping trick kind of feels like a hack, and especially as long as FileReader still exists as well, it isn't very discoverable.

So what to do about this. If I squint hard enough it sort of seems to make sense if a Blob would just implement the Body mixin. That way we get all its methods "for free". The main problem with that is that Body by design can only be read from once. This is particularly the case for the body and bodyUsed attributes. We could of course still have the same API, but with a slightly different behavior where bodyUsed would always return false, and body would return a new ReadableStream every time it is accessed. Not sure how weird that would be (and it might require some patching of the algorithms in fetch to make that happen).

The alternative would of course be to not inherit from the Body mixin, and instead replicate most of the interface ourselves, with something like the proposed toStream() method rather than the body attribute that the Body mixin has. That still seems better than just relying on the new Response(...).bla() hack, but also feels somewhat less satisfying, since we're essentially mostly duplicating an existing interface...

So what do people think? Shoe-horn in Body with potentially slightly different semantics (i.e. being able to read the blob multiple times), or just add all the methods of Body without actually implementing Body?

@annevk
Copy link
Member

annevk commented Feb 7, 2019

Returning a new object each time from a getter is an anti-pattern. We want obj.x === obj.x to be true. (It's okay to update obj.x at well-defined times, but that would still mean the comparison yields true.)

Given that, duplication coupled with partial modification (perhaps stream() rather than body; and no need for blob(), bodyUsed, or formData() I suspect) seems like the way to go.

There might be ways to share a bunch of underlying infrastructure though and even some of the IDL if we split the mixin. I'd be open to such a refactoring of Fetch to enable this functionality on Blob objects.

(I guess the one question I have is whether it makes sense to have blobs that once read can be collected. But I guess that would bring back close() and all its issues.)

@domenic
Copy link
Contributor

domenic commented Feb 7, 2019

I agree with @annevk that we should not try to reuse IDL and interface definitions where they don't fit perfectly. Embrace the spirit of JavaScript, and just make the names line up; that's what matters in the end.

It's possible you could reuse some of the processing model, but probably not easily, so I'd suggest starting with new from-scratch definitions for all the methods and considering a refactoring afterward.

I think .arrayBuffer(), .text(), and .stream() is the right place to start. You could do json(), but I'm skeptical it would be used as often.

@mkruisselbrink
Copy link
Collaborator

Sounds good to me. Totally agree that having getters return new objects all the time is really ugly, and reusing IDL just doesn't make much sense in this case. I guess one benefit for having as much of the same methods as Body, even if they seem silly/not used as much for blobs (i.e. blob() and formdata()) would be being able to use Blobs as if they are bodies in more cases. But also agree that probably starting with just the ones that actually make sense is probably the way to go. So yeah, arrayBuffer, text and stream sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants