`.stream(cb)` method #99

fb55 · 2012-09-09T14:11:22Z

Just as an idea: The parser could do much more when it would actually get a stream of data. This would allow the creation of the DOM while IO is happening, which will speed up initial loading (and more stuff could be done inside of DomHandler).

There is already a WritableStream.js file shipped with htmlparser2 (it's accessible via require("htmlparser2").WritableStream) that pretty much solves all problems. The implementation of the cheerio method could look like this:

cheerio.createWritableStream = function(cb, options){
  var handler = new DomHandler(function(dom){ cb(cheerio(dom)); }, options);
  return new WritableStream(handler, options);
};

The text was updated successfully, but these errors were encountered:

matthewmueller · 2012-09-12T21:33:44Z

Cool, I like this idea - I'm just not sure how useful this would be. It would produce unexpected results if I tried to run a $('li') on a partially streamed file.

fb55 · 2012-09-13T18:15:48Z

Well, as far as I know, most people are currently using eg. request to
download a file, only to open it directly with cheerio. Having a streaming
method would allow a much nicer and speedier creation of the DOM, and a
much more node-y interface.

2012/9/12 Matt Mueller notifications@github.com

Cool, I like this idea - I'm just not sure how useful this would be. It
would produce unexpected results if I tried to run a $('li') on a
partially streamed file.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/99#issuecomment-8510858.

matthewmueller · 2012-09-23T07:52:29Z

Right, but how would you actually run queries on a half-parsed DOM?

The only use case I could see is if you're looking for something specific, ex. $("title").text()), and as it get's parsed you could return it and stop. That would require some major rework to the library to support this feature though, and for something like that, it might be better to just use node-htmlparser2 directly.

fb55 · 2012-09-23T09:13:18Z

You misunderstood me: The idea was to parse data while the user is still waiting for the next chunk to arrive. This way, the DOM will be available immediately after the download of the page is complete.

Running queries isn't hard, though: I solved it yesterday with fb55/node-cornet :)

matthewmueller · 2013-06-09T03:53:48Z

I've been thinking about this more and more lately. It would be awesome to select queries as they come through. Right now I'm thinking the API could be:

var $ = cheerio.stream('http://google.com');
$.on('.logo', function($) {
   console.log($.html());
})

@fb55 do you think this is feasible?

davidchambers · 2013-06-09T04:06:13Z

Well, as far as I know, most people are currently using eg. request to download a file, only to open it directly with cheerio.

Irrespective of the streaming functionality, it would be great if cheerio provided a way to create a "DOM" from a URL. As @fb55 stated, this is no doubt a very common use case.

matthewmueller · 2013-06-09T04:11:06Z

looking back at my example, I kind of think adding URL fetching functionality is a bit leaky (do we then support headers, what kind of request methods, etc).

It would be nice to add a streaming interface though, as @fb55 did with cornet. Perhaps more along the lines of:

var $ = cheerio.stream();
minreq.get("http://github.com/fb55").pipe($)
$.on(...)

fb55 · 2013-06-09T13:36:36Z

@matthewmueller First of all, on is probably the last name that method should have :) (edit: how about find?)

Secondly, cheerio would have to wait until the entire DOM is present, as it calls the method with an array of results (cornet only passes a single element at a time). That would stop people from getting confused, with the benefit of the pauses between IO being used for actual work.

Finally, the implementation of this should be pretty straight-forward, probably as complex as cornet (which has 30 LOC).

fb55 · 2022-05-11T12:25:21Z

Closing in favour of #2051.

matthewmueller closed this as completed Sep 23, 2012

matthewmueller reopened this Sep 23, 2012

fb55 added the Feature label Apr 8, 2014

fb55 mentioned this issue Dec 31, 2014

Parse from stream #618

Closed

fb55 mentioned this issue Dec 22, 2020

Guarantee a Cheerio.load(dom) overload #1126

Closed

fb55 changed the title ~~.createWritableStream(cb) method~~ .stream(cb) method May 3, 2022

fb55 mentioned this issue May 11, 2022

Add functions to load buffers, streams & URLs #2051

Closed

fb55 closed this as completed May 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`.stream(cb)` method #99

`.stream(cb)` method #99

fb55 commented Sep 9, 2012

matthewmueller commented Sep 12, 2012

fb55 commented Sep 13, 2012

matthewmueller commented Sep 23, 2012

fb55 commented Sep 23, 2012

matthewmueller commented Jun 9, 2013

davidchambers commented Jun 9, 2013

matthewmueller commented Jun 9, 2013

fb55 commented Jun 9, 2013

fb55 commented May 11, 2022

.stream(cb) method #99

.stream(cb) method #99

Comments

fb55 commented Sep 9, 2012

matthewmueller commented Sep 12, 2012

fb55 commented Sep 13, 2012

matthewmueller commented Sep 23, 2012

fb55 commented Sep 23, 2012

matthewmueller commented Jun 9, 2013

davidchambers commented Jun 9, 2013

matthewmueller commented Jun 9, 2013

fb55 commented Jun 9, 2013

fb55 commented May 11, 2022

`.stream(cb)` method #99

`.stream(cb)` method #99