Replies: 1 comment
-
Instead of removing the API - the changes were made to re-use crawl and simply store the pages by channeling. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The
scraping
API provides one main difference fromcrawl
which is storing the page object. This is fine when it comes to small websites that need to collect data or perform some manual parsing after the content is stored etc. We can do the same withcrawl
, we just need to use asubscription
and control what we do with the page after.Making an API that works with both of the scenarios is not straight forward and very difficult due to the feature parity and the setup required for a crawl.
The scrape API is really old and it usually lagged behind when it came to the crawl endpoint for functionality working 1:1 as intended. This happened because of the complexity of the feature flags and the way the setup is needed.
The scrape API looks like a good candidate for removal so we can develop safer across the core features of the crate. We can provide an example of how to scrape which should suffice when using the crawler.
Beta Was this translation helpful? Give feedback.
All reactions