v1.99.10
Whats Changed
- Major performance increase for the CLI scraping and downloading. We now use subscriptions and concurrently stream process files storing and stdout.
- You can now use the
spider_utils
crate to extract content manually as needed using CSS generic CSS element scraping.
use spider::{
hashbrown::{HashMap, HashSet},
packages::scraper::Selector,
};
use spider_utils::{build_selectors, css_query_select_map_streamed};
async fn css_query_selector_extract() {
let map = HashMap::from([(
"list",
HashSet::from([".list", ".sub-list"]),
)]);
let data = css_query_select_map_streamed(
r#"<html>
<body>
<ul class="list"><li>First</li></ul>
<ul class="sub-list"><li>Second</li></ul>
</body>
</html>"#,
&build_selectors(map),
)
.await;
println!("{:?}", data);
// {"list": ["First", "Second"]}
}
thanks for the help @gjtorikian
Full Changelog: v1.99.5...v1.99.10