Removal of scraping API v2 #193

j-mendez · 2024-07-22T13:04:24Z

j-mendez
Jul 22, 2024
Maintainer

The scraping API provides one main difference from crawl which is storing the page object. This is fine when it comes to small websites that need to collect data or perform some manual parsing after the content is stored etc. We can do the same with crawl, we just need to use a subscription and control what we do with the page after.

Making an API that works with both of the scenarios is not straight forward and very difficult due to the feature parity and the setup required for a crawl.

The scrape API is really old and it usually lagged behind when it came to the crawl endpoint for functionality working 1:1 as intended. This happened because of the complexity of the feature flags and the way the setup is needed.

The scrape API looks like a good candidate for removal so we can develop safer across the core features of the crate. We can provide an example of how to scrape which should suffice when using the crawler.

j-mendez · 2024-08-14T11:50:15Z

j-mendez
Aug 14, 2024
Maintainer Author

Instead of removing the API - the changes were made to re-use crawl and simply store the pages by channeling.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spider-rs

Removal of scraping API v2 #193

{{title}}

Replies: 1 comment

{{title}}

Select a reply

spider-rs

Removal of scraping API v2 #193

j-mendez Jul 22, 2024 Maintainer

Replies: 1 comment

j-mendez Aug 14, 2024 Maintainer Author

j-mendez
Jul 22, 2024
Maintainer

j-mendez
Aug 14, 2024
Maintainer Author