-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guarantee a Cheerio.load(dom) overload #1126
Comments
This would need to use http://inikulin.github.io/parse5/classes/parserstream.html for HTML, otherwise happy to add this as an additional method ( |
Great to hear! I think PS: I've just realized that I constantly referred to the "old" master branch in my previous comment. Maybe it would be a good idea to directly link from NPM to the v1.0.0 branch or to mention it in master's README. |
Glad to see there's development here! I just hit this snag as I have been changing my sync node script to streams. @ComFreek I have looked at your nested links, but it is beyond my knowledge- Is fragments support a requirement for streaming to Cheerio selectors? Like If I wanted to start going in your direction and try get Cheerio to work with streams (I was thinking a through stream), where should I start? It sounds like without fragment support, I can't just do something like:
|
@coryarmbrecht The streams I mentioned above and (afaik) parse5's ParserStream only deal with the problem that you would need to store all the HTML in memory if you had not such streaming approaches. Why would you need to store all the HTML in memory if you were to feed it into the parser chunk-by-chunk anyway? What you are describing, is called SAX parsing in case of XML, for example. By a quick search, I found sax-js, but I have no idea how up-to-date it is. |
@ComFreek, ok I think I figured out my disconnect. I was thinking that if a single chunk has an opening element tag But, I guess all you really need is the opening tag, and the closing tag is just a sign of where to stop. I was thinking about the chunks as needing to be complete objects in order to parse correctly, and not how I just need the beginning tag. |
There is also the parse5.SAXParser option. Should we try and create a streaming solution based on this - anyone up for it ? |
Is there any news on that @fb55? I think the community could use an official API for streams. |
This overload is now properly documented, with an example in the README using it. Let's keep the streams discussion to #99. |
Since there is no built-in stream-reading method in Cheerio (see the discussion), I have built my own:
Even though the call
cheerio.load(dom)
works*, it actually does not conform to Cheerio's public API, which states thatload
only accepts a string (cf. README, code).Could the public API be extended to include a
Cheerio.load(dom)
overload, wheredom
is a DOM tree compatible to the output produced byhtmlparser.DomHandler
?*) see IonicaBizau/scrape-it#83 (comment).
The text was updated successfully, but these errors were encountered: