Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf and binary-parse-stream/binary-parser #86

Closed
BurtHarris opened this issue Aug 25, 2018 · 5 comments
Closed

Perf and binary-parse-stream/binary-parser #86

BurtHarris opened this issue Aug 25, 2018 · 5 comments

Comments

@BurtHarris
Copy link

BurtHarris commented Aug 25, 2018

@hildjj, @nathan7

node-cbor depends on some very clever work combining generators with readable streams that has cloned into the node-cbor/vendor/binary-parse-stream directory. The README.md is rather sparse, and says it's based on https://github.com/nathan7/binary-parser, but that's a broken link now. However, https://github.com/nathan7/binary-parse-stream still seems present, if a bit stale... (4 years since last commit.

None-the-less, the approach used permits a stream-based parser to be written with sync-looking code, which is very impressive. My gut feeling however is that the performance of node-cbor may be limited by this, in particular yielding once or more for each parsed data item (perhaps as short as one byte of incoming data.) All of these yields seem like a substantial source of overhead for large data sets. A chunk based approach where the parser can remain synchronous during processing of entire Buffer's seem like it could offer substantial performance gains.

ECMAScript async iterators seems like it might be a good forward-looking and chunk-oriented approach. TC 39 declared async iteration "finished" in January 2018, and V8 and Node v10 both seem to have adopted it, at lest experimentally.

I'm considering writing a binary parsing framework similar to binary-parse-stream but based instead on async iterators. Before I dig into alone, I thought it might be valuable to run the idea by you both, to see if what you think, and to see if you know of other implementations doing something like this.

@hildjj
Copy link
Owner

hildjj commented Aug 26, 2018

I would very interested in an async iterators parsing framework, and would be willing to consider switching to it eventually once it can be used in Node without the --harmony_async_iteration flag.

I'm also interested in the interplay between streams and async iterators... I wonder how easy one could make it to build transform streams from bytes to ASTs that would work both in node and in browsers that implement the WHATWG streams work.

@BurtHarris
Copy link
Author

--harmony_async_iteration

With Node 10 --harmony_async_iteration is no longer needed to use async iterator syntax, and the implementation seems stable However stream extension adding a Readable[Symbol.asyncIterator]() method still does generate a warning at runtime:

ExperimentalWarning: Readable[Symbol.asyncIterator] is an experimental feature. This feature could change at any time

WHATWG streams

Yes! The idea of using WHATWG streams has been on my mind, but I've no experience with them. I've been a bit put off because there seems to be no official NPM distribution of the reference-implementation.

However, digging a little it seems this might be a very timely topic: issue whatwg/streams#778 is tracking adding support for async iterators, with a PR whatwg/streams#950 in very active discussion to add it to the spec and reference-implementation.

I also see a recent PR at nodejs/node#22352 proposing adding whatwg streams to Node. Also very active discussion

Transform stream from bytes to ASTs/parse trees

Async iterators for parsing came to mind when I was worked on a TypeScript port of ANTLR parser generator framework See the issue I entered there tunnelvisionlabs/antlr4ts#360. ANTLR is much more challenging environment than CBOR, and it generates parse trees (a close relative of ASTs).

A CBOR stream has natural breaking points, as it is by definition "A sequence of zero or more data items, not further assembled into a larger containing data item." This seems to make a mapping to stream style much more reasonable.

@hildjj
Copy link
Owner

hildjj commented Aug 26, 2018

I wonder if @tschneidereit might be willing to give us a little direction here.

@hildjj
Copy link
Owner

hildjj commented Sep 13, 2022

It looks like https://github.com/Borewit/strtok3 does more-or-less what we were talking about here. Using that would probably mean switching to ES modules, which I've been contemplating anyway now that we only support node 14+. I'm not sure how fast it will be to do an await for every stream read, either, since those force a hop through the event loop.

@hildjj
Copy link
Owner

hildjj commented Oct 25, 2024

I'm not going to make large changes to this library any more. Please move to cbor2.

@hildjj hildjj closed this as not planned Won't fix, can't repro, duplicate, stale Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants