Whats Changed

Major performance increase for the CLI scraping and downloading. We now use subscriptions and concurrently stream process files storing and stdout.
You can now use the spider_utils crate to extract content manually as needed using CSS generic CSS element scraping.

use spider::{
    hashbrown::{HashMap, HashSet},
    packages::scraper::Selector,
};
use spider_utils::{build_selectors, css_query_select_map_streamed};

async fn css_query_selector_extract() {
    let map = HashMap::from([(
        "list",
        HashSet::from([".list", ".sub-list"]),
    )]);
    let data = css_query_select_map_streamed(
        r#"<html>
            <body>
                <ul class="list"><li>First</li></ul>
                <ul class="sub-list"><li>Second</li></ul>
            </body>
        </html>"#,
        &build_selectors(map),
    )
    .await;
    println!("{:?}", data);
    // {"list": ["First", "Second"]}
}

thanks for the help @gjtorikian

Full Changelog: v1.99.5...v1.99.10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.99.10

Whats Changed

Contributors