-
-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scrapeIt.fromStream function #83
Comments
Nice work! However:
When passing the That's what the I'm not sure how this helps, since the HTML will be passed as a variable anyways. |
My code only stores the DOM tree in memory, not the HTML string. I think the DOM is necessary for CSS-like selectors in *) Actually, this exercise seems quite interesting: "Given a set of CSS selectors, does a program exist, which extracts the results from sequential HTML parsing?" or "Can we pre-compile a set of selectors, so that SAX-like parsing suffices to extract all the matches?" |
That sounds good. But, is that tree compatible with Cheerio? It's still unclear to me if |
In the course of creating that table myself, I realized that Cheerio indeed adds its The only thing to be still wary of is that |
@ComFreek Sounds good! 👍 Should we wait for the stream function to be implemented in Cheerio? |
According to this comment, my code should actually use I think we could adopt a Maybe at this point it should also be considered how |
I am going to close this, but contributions are welcome via PRs! 🚀 |
What do you think of a
fromStream
function which feeds the HTML parser chunk by chunk instead of allocating one big string in memory?I have already implemented it and am using it in one of my projects: master...ComFreek:master.
I also have a unit test at hand, which requires an upgrade of the package
tester
, as already mentioned at IonicaBizau/tester#15.Open points:
scrapeIt
method usescheerio-req
, which actually could benefit from the chunk by chunk loading as well.Cheerio.load(dom)
public API: Guarantee a Cheerio.load(dom) overload cheeriojs/cheerio#1126The text was updated successfully, but these errors were encountered: