Add support for binary data (#12) #15

jonnyreeves · 2016-08-18T10:19:16Z

Request implementations now supply a Uint8Array of binary data through to the chunkParser.
defaultChunkParser converts the chunkBytes into a string for processing
Overhauled how parserState is handled; no longer assume that the chunkParser will emit a string; instead we use a state object whose contract is wholly owned by the parser (we just shuffle it around internally).
Added test-coverage for UTF-8 characters
Added dependency on utf-8 lib for handling (un)marhsalling of binary data to text.
Updated documentation
Updated npm dependencies.

Inspiration provided by @mwitkow and @ariutta -- many thanks! :)

@mwitkow

* Request implementations now supply a `Uint8Array` of binary data through to the `chunkParser`. * `defaultChunkParser` converts the `chunkBytes` into a string for processing * Overhauled how parserState is handled; no longer assume that the `chunkParser` will emit a string; instead we use a `state` object whose contract is wholly owned by the parser (we just shuffle it around internally). * Added test-coverage for UTF-8 characters * Added dependency on `utf-8` lib for handling (un)marhsalling of binary data to text. * Updated documentation * Updated npm dependencies. Inspiration provided by @mwitkow and @ariutta -- many thanks! :)

ariutta · 2016-08-19T03:34:12Z

Looks good! Only question is about xhr.js. It doesn't convert the string to an ArrayBuffer. Is it best to make the parsers responsible for handling two types of input types -- string or ArrayBuffer?

* Update README to convey which polyfills are required for crappy browsers.

Would appear that IE10 has typedarray support :)

jonnyreeves · 2016-08-20T14:27:26Z

@ariutta thanks so much for your input on #17; I've adopted your approach of using TextEncoder / TextDecoder to do the marshalling and dropped the dependency on utf-8 (can't remember why I went down that route - bits and bytes aren't my forte...)

I've left the polyfills for TextEncoder, TextDecoder and TypedArray out but added a note in the README that users must ensure certain browser features have been polyfilled correctly (this keep the library nice and slim for newer browsers).

Let me know your thoughts; if it looks good to you I'll land this (and add you to the CONTRIBUTORS).

Thanks again.

ariutta · 2016-08-20T22:31:16Z

@jonnyreeves: sure, keeping the polyfills out of the default bundle is usually a best practice, so that's fine. It's easy but maybe less clean to exclude them with this syntax in the browser field:

"browser": {
    "typedarray": "./empty-object.js"
}

About the TextDecoder, I notice uint8ArrayToString instantiates a new instance of TextDecoder for every chunk.

TextDecoder has a stream option:

A Boolean flag indicating that additional data will follow in subsequent calls to decode(). Set to true if processing the data in chunks, and false for the final chunk or if the data is not chunked. It defaults to false.

By not using the stream option and by instantiating a new instance for each chunk, I'm concerned we could run into trouble. One example: the WHATWG Encoding standard says decoders (but not encoders) should be able to handle UTF-16. In UTF-16, code points can be encoded with two 16-bit code units. Our chunk could split a code point, and the parser would fail. A more common concern is just a possible performance hit.

Here's how I handled this case:
https://github.com/ariutta/chunked-request/blob/4b07bf6b576e0c17dd0f6e5895d87903968139fc/src/defaultChunkParser.js#L16

Maybe we should also use the stream option for TextEncoder.

One more thing: the defaultParser seemed to have trouble with this sample .ndjson file (maybe a good candidate for another test?):
https://cdn.rawgit.com/maxogden/sqlite-search/master/test.ndjson

I made several changes to defaultParser.js so that it could handle that file, as you can see here.

jonnyreeves · 2016-08-21T17:38:27Z

...instantiates a new instance of TextDecoder for every chunk... I'm concerned we could run into trouble.

Makes me wonder if we want more control over the lifecycle of the chunkParser; at the moment you just supply a function which has no idea if it's the first call or last call for a given server response.

Perhaps instead of supply a Uint8Array to the chunkParser we should instead supply a stream which the chunk parser can consume until there are no more bytes left, ie:

// use `onChunk` to emit parsed data from the stream.
chunkParser(stream, onChunk) {
    let state = {};
    stream.on('data', bytes => { 
        // ...
        onChunk(jsonObjects); 
    });
    stream.on('end', () => { 
        onChunk(flushedState);
    });
}

Now the (default) chunkParser can instantiate the TextDecoder once per response (and we don't have to mess around managing the parser's state inside index.js anymore as an added bonus.

Let me know your thoughts - and thanks again.

ariutta · 2016-08-21T21:22:52Z

I like the idea of passing a minimal stream to the parser. It separates concerns nicely. Only question is whether to base it on Node streams (eventemitter with data and end events) or WHATWG streams (reader with an async read method). I thought we'd previously agreed to use WHATWG streams because that's what fetch will return and is what the Node community is actually telling WHATWG to use instead of Node streams, but I don't have a strong preference for one versus the other. Here's the relevant comparison between the two:

Node stream data event maps to WHATWG stream reader.read(function({ value: theChunk, done: false }) {...})
Node stream end event maps to WHATWG stream reader.read(function({ value: undefined, done: true }) {...})

For the parser, some of the ideas you're raising about handling state and transforms remind me of transducers. I'm interested in using transducers for the parsers because it appears they may be the fastest way to transform data when it is sometimes returned synchronously and other times asynchronously. I also like that transducers "are composable algorithmic transformations. They are independent from the context of their input and output sources and specify only the essence of the transformation in terms of an individual element. Because transducers are decoupled from input or output sources, they can be used in many different processes," such as native or Underscore reduce implementations, immutable-js, Node streams, RxJS, etc (source). I've already created transducers for quite a few different parsers here.

What do you think about using WHATWG streams and transducers?

ariutta · 2016-08-22T06:11:02Z

In my fork, I created a new branch feature/reader to show the use of WHATWG streams, passing a reader to the parser. I didn't use transducers, but that might be an interesting next step.

This required creating a minimal fetch polyfill for the XHR implementations, as you can see here and here.

Browsers using the fallback `xhr` transport should expect an addtional call to their `chunkParser` as we flush out the state of the `TextEncoder` when the XHR connection is closed.

jonnyreeves · 2016-08-22T07:56:44Z

Thanks @ariutta; I've started incorperating the changes from your feature/reader branch into this Pull Request; I'll start pulling across the ReadableStream shims later today / tomorrow - time permitting.

It's obvious now that we should look to de-dupe xhr.js and mozXhr.js; however I feel that's best achieved in a seperate PR after this one lands. Additionally, your change to the defaultParsers delimiter is valid; but I think it highlights the need to make it user-configurable (so the user doesn't have to reimplement the default parser just because they have a different delimiter) - again probably best in a subsiquent PR.

Thank you so much for your contributions - no only are you helping shape this library, but you are helping me think about the problem in a new way - I think I owe you a 🍻 :)

ariutta · 2016-08-22T19:14:52Z

Beoir! 👍 Glad to collaborate -- it's tricky trying to cover the appropriate use cases without over-engineering.

Agree on segmenting the changes into multiple PRs.

jonnyreeves · 2016-08-26T14:12:44Z

@ariutta; will follow-up with integrating your streams work in a subsiquent PR as this one was getting quite chunky and it would get good to get some of the recent improvements out the door first :)

ariutta · 2016-08-27T16:15:36Z

Good plan.

On Aug 26, 2016 10:12 AM, "John Reeves" notifications@github.com wrote:

@ariutta https://github.com/ariutta; will follow-up with integrating
your streams work in a subsiquent PR as this one was getting quite chunky
and it would get good to get some of the recent improvements out the door
first :)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#15 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABPJg1B_G_E8SokAbPK2SL2jWAkOX5T6ks5qjvRcgaJpZM4JnV8v
.

jonnyreeves mentioned this pull request Aug 18, 2016

moz-chunked-arraybuffer vs. moz-chunked-text #13

Closed

jonnyreeves added 4 commits August 20, 2016 15:10

Use TextEncoder / Decoder over utf-8.

228cef3

* Update README to convey which polyfills are required for crappy browsers.

Merge branch 'master' into feature/byte-arrays

90977c5

Remove typearray polyfill

622f787

Would appear that IE10 has typedarray support :)

Add test to ensure we always get a Uint8Array supplied.

1ca21f3

ariutta added a commit to ariutta/chunked-request that referenced this pull request Aug 20, 2016

Pull in some changes from pull request jonnyreeves#15.

4b07bf6

ariutta mentioned this pull request Aug 22, 2016

Match Fetch and WHATWG Streams Spec #16

Closed

ariutta and others added 3 commits August 22, 2016 08:33

Instantiate TextEncoder/Decoder only once. Use stream option.

9e2d7fc

Patch up flakey test.

9b82bdd

Browsers using the fallback `xhr` transport should expect an addtional call to their `chunkParser` as we flush out the state of the `TextEncoder` when the XHR connection is closed.

Remove left over utility methods.

c5cd2ce

This was referenced Aug 22, 2016

Investigate ms-stream for IE. #10

Closed

Parser API #18

Closed

jonnyreeves mentioned this pull request Aug 26, 2016

Any way to cancel the request? #19

Closed

jonnyreeves added this to the 0.5 milestone Aug 26, 2016

jonnyreeves merged commit 251c178 into master Aug 26, 2016

jonnyreeves mentioned this pull request Aug 26, 2016

Feature/byte arrays #17

Closed

jonnyreeves deleted the feature/byte-arrays branch August 29, 2016 07:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for binary data (#12) #15

Add support for binary data (#12) #15

jonnyreeves commented Aug 18, 2016

ariutta commented Aug 19, 2016

jonnyreeves commented Aug 20, 2016

ariutta commented Aug 20, 2016

jonnyreeves commented Aug 21, 2016

ariutta commented Aug 21, 2016 •

edited

Loading

ariutta commented Aug 22, 2016

jonnyreeves commented Aug 22, 2016

ariutta commented Aug 22, 2016

jonnyreeves commented Aug 26, 2016

ariutta commented Aug 27, 2016

Add support for binary data (#12) #15

Add support for binary data (#12) #15

Conversation

jonnyreeves commented Aug 18, 2016

ariutta commented Aug 19, 2016

jonnyreeves commented Aug 20, 2016

ariutta commented Aug 20, 2016

jonnyreeves commented Aug 21, 2016

ariutta commented Aug 21, 2016 • edited Loading

ariutta commented Aug 22, 2016

jonnyreeves commented Aug 22, 2016

ariutta commented Aug 22, 2016

jonnyreeves commented Aug 26, 2016

ariutta commented Aug 27, 2016

ariutta commented Aug 21, 2016 •

edited

Loading