Skip to content

v1.99.10

Compare
Choose a tag to compare
@j-mendez j-mendez released this 19 Jul 19:36
· 449 commits to main since this release

Whats Changed

  1. Major performance increase for the CLI scraping and downloading. We now use subscriptions and concurrently stream process files storing and stdout.
  2. You can now use the spider_utils crate to extract content manually as needed using CSS generic CSS element scraping.
use spider::{
    hashbrown::{HashMap, HashSet},
    packages::scraper::Selector,
};
use spider_utils::{build_selectors, css_query_select_map_streamed};

async fn css_query_selector_extract() {
    let map = HashMap::from([(
        "list",
        HashSet::from([".list", ".sub-list"]),
    )]);
    let data = css_query_select_map_streamed(
        r#"<html>
            <body>
                <ul class="list"><li>First</li></ul>
                <ul class="sub-list"><li>Second</li></ul>
            </body>
        </html>"#,
        &build_selectors(map),
    )
    .await;
    println!("{:?}", data);
    // {"list": ["First", "Second"]}
}

thanks for the help @gjtorikian

Full Changelog: v1.99.5...v1.99.10