-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf and binary-parse-stream/binary-parser #86
Comments
I would very interested in an async iterators parsing framework, and would be willing to consider switching to it eventually once it can be used in Node without the I'm also interested in the interplay between streams and async iterators... I wonder how easy one could make it to build transform streams from bytes to ASTs that would work both in node and in browsers that implement the WHATWG streams work. |
--harmony_async_iterationWith Node 10 --harmony_async_iteration is no longer needed to use async iterator syntax, and the implementation seems stable However stream extension adding a
WHATWG streamsYes! The idea of using WHATWG streams has been on my mind, but I've no experience with them. I've been a bit put off because there seems to be no official NPM distribution of the reference-implementation. However, digging a little it seems this might be a very timely topic: issue whatwg/streams#778 is tracking adding support for async iterators, with a PR whatwg/streams#950 in very active discussion to add it to the spec and reference-implementation. I also see a recent PR at nodejs/node#22352 proposing adding whatwg streams to Node. Also very active discussion Transform stream from bytes to ASTs/parse treesAsync iterators for parsing came to mind when I was worked on a TypeScript port of ANTLR parser generator framework See the issue I entered there tunnelvisionlabs/antlr4ts#360. ANTLR is much more challenging environment than CBOR, and it generates parse trees (a close relative of ASTs). A CBOR stream has natural breaking points, as it is by definition "A sequence of zero or more data items, not further assembled into a larger containing data item." This seems to make a mapping to stream style much more reasonable. |
I wonder if @tschneidereit might be willing to give us a little direction here. |
It looks like https://github.com/Borewit/strtok3 does more-or-less what we were talking about here. Using that would probably mean switching to ES modules, which I've been contemplating anyway now that we only support node 14+. I'm not sure how fast it will be to do an |
I'm not going to make large changes to this library any more. Please move to cbor2. |
@hildjj, @nathan7
node-cbor depends on some very clever work combining generators with readable streams that has cloned into the node-cbor/vendor/binary-parse-stream directory. The README.md is rather sparse, and says it's based on https://github.com/nathan7/binary-parser, but that's a broken link now. However, https://github.com/nathan7/binary-parse-stream still seems present, if a bit stale... (4 years since last commit.
None-the-less, the approach used permits a stream-based parser to be written with sync-looking code, which is very impressive. My gut feeling however is that the performance of node-cbor may be limited by this, in particular
yield
ing once or more for each parsed data item (perhaps as short as one byte of incoming data.) All of these yields seem like a substantial source of overhead for large data sets. A chunk based approach where the parser can remain synchronous during processing of entireBuffer
's seem like it could offer substantial performance gains.ECMAScript async iterators seems like it might be a good forward-looking and chunk-oriented approach. TC 39 declared async iteration "finished" in January 2018, and V8 and Node v10 both seem to have adopted it, at lest experimentally.
I'm considering writing a binary parsing framework similar to binary-parse-stream but based instead on async iterators. Before I dig into alone, I thought it might be valuable to run the idea by you both, to see if what you think, and to see if you know of other implementations doing something like this.
The text was updated successfully, but these errors were encountered: