29 Aug 02:03

j-mendez

dca3d09

v2.2.18

Whats Changed

We can now auto detect locales without losing out on performance. We default enabled the encoding flag for this change!

get_html now properly encodes the HTML instead of UTF8 default encoding
bump chromiumoxide@0.7.0
fix chrome hang on ws connections handler
fix fetch stream infinite loop on error
fix chrome frame setting url ( this temp prevents hybrid caching from having the req/res for the page )

let mut website: Website = Website::new("https://tenki.jp");
// all of the content output has the proper encoding automatically

Full Changelog: v2.1.9...v2.2.18

Assets 2

26 Aug 17:07

j-mendez

v2.1.9

97f2e91

v2.1.9

Whats New

This release brings bug fixes with chrome opening pages causing hangs. The builder method website.with_return_page_links can be used to attach the links found on the web page to the page object.

chore(chrome): fix instances being left open from ignorable handler errors
chore(scrape): add sitemap and smart [#206]
feat(page): add return page links configuration
chore(config): fix budget reset on crawl end

Thanks @DimitriTimoz

Full Changelog: v2.0.6...v2.1.9

Contributors

DimitriTimoz

Assets 2

20 Aug 20:51

j-mendez

v2.0.6

66af308

v2.0.6

What's Changed

add http response cookies map
fix chrome fs feature flag build
Update README.md by @James4Ever0 in #203

New Contributors

@James4Ever0 made their first contribution in #203

Full Changelog: v2.0.3...v2.0.6

Contributors

James4Ever0

Assets 2

14 Aug 11:49

j-mendez

v2.0.3

89d3ddd

v2.0.3

Whats Changed

Scrape and Crawl now perform functionality identically as scrape re-uses crawl underneath.
Scrape API cleanup
Add get_chrome_page chrome page ref

Full Changelog: v1.99.30...v2.0.3

Assets 2

07 Aug 20:33

j-mendez

v1.99.30

ba48095

v1.99.30

Whats Changed

feat Web automation steps by target url or path.
add internal ViewPort for chrome handling.
add partial eq configuration

    let mut automation_scripts = HashMap::new();

    automation_scripts.insert(
        "/en/blog".into(),
        Vec::from([
            WebAutomation::Evaluate(r#"document.body.style.background = "blue";"#.into()),
            WebAutomation::ScrollY(2000),
            WebAutomation::Click("article a".into()),
            WebAutomation::Wait(5000),
            WebAutomation::Screenshot {
                output: "example.png".into(),
                full_page: true,
                omit_background: true,
            },
        ]),
    );

    let mut website: Website = Website::new("https://rsseau.fr/en/blog")
        .with_chrome_intercept(true, true)
        .with_wait_for_idle_network(Some(WaitForIdleNetwork::new(Some(Duration::from_secs(30)))))
        .with_caching(cfg!(feature = "cache"))
        .with_limit(1)
        .with_automation_scripts(Some(automation_scripts))
        .build()
        .unwrap();

web-automation-chrome.mov

Full Changelog: v1.99.21...v1.99.30

Assets 2

07 Aug 15:40

j-mendez

v1.99.21

e5148dd

v1.99.21

Whats Changed

You can now block ads over the network when using chrome and chrome_intercept using the adblock feature flag.

Full Changelog: v1.99.18...v1.99.21

Assets 2

05 Aug 19:19

j-mendez

v1.99.18

d3da89d

v1.99.18

Whats Changed

chore(fs,chrome): fix chrome fs storing [#198]

Thanks for the help @haijd

Full Changelog: v1.99.16...v1.99.18

Contributors

haijd

Assets 2

01 Aug 13:33

j-mendez

v1.99.16

ec5634b

v1.99.16

What's Changed

Fixing clap issues #195 by @jmikedupont2 in #196
Fix chrome fingerprint and initial document scripts setup
Perf improvements for smart mode handling assets with compile time constant map

New Contributors

@jmikedupont2 made their first contribution in #196

Full Changelog: v1.99.10...v1.99.16

Contributors

jmikedupont2

Assets 2

19 Jul 19:36

j-mendez

v1.99.10

62d6179

v1.99.10

Whats Changed

Major performance increase for the CLI scraping and downloading. We now use subscriptions and concurrently stream process files storing and stdout.
You can now use the spider_utils crate to extract content manually as needed using CSS generic CSS element scraping.

use spider::{
    hashbrown::{HashMap, HashSet},
    packages::scraper::Selector,
};
use spider_utils::{build_selectors, css_query_select_map_streamed};

async fn css_query_selector_extract() {
    let map = HashMap::from([(
        "list",
        HashSet::from([".list", ".sub-list"]),
    )]);
    let data = css_query_select_map_streamed(
        r#"<html>
            <body>
                <ul class="list"><li>First</li></ul>
                <ul class="sub-list"><li>Second</li></ul>
            </body>
        </html>"#,
        &build_selectors(map),
    )
    .await;
    println!("{:?}", data);
    // {"list": ["First", "Second"]}
}

thanks for the help @gjtorikian

Full Changelog: v1.99.5...v1.99.10

Contributors

gjtorikian

Assets 2

13 Jul 00:12

j-mendez

v1.99.5

768db50

v1.99.5

Whats Changed

Use website.with_execution_scripts with a hashmap of exact paths and JS to run on the page before returning the result.
This can help writing one off code to unblock crawling a path like a login page.

Fix chrome wait_for selector and event handling
Major perf improvement when using the regex feature flag and robots.txt handling.

Full Changelog: v1.98.8...v1.99.5

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whats Changed

Whats New

Contributors

What's Changed

New Contributors

Contributors

Whats Changed

Whats Changed

Whats Changed

Whats Changed

Contributors

What's Changed

New Contributors

Contributors

Whats Changed

Contributors

Whats Changed

Releases: spider-rs/spider

v2.2.18

Whats Changed

v2.1.9

Whats New

Contributors

v2.0.6

What's Changed

New Contributors

Contributors

v2.0.3

Whats Changed

v1.99.30

Whats Changed

v1.99.21

Whats Changed

v1.99.18

Whats Changed

Contributors

v1.99.16

What's Changed

New Contributors

Contributors

v1.99.10

Whats Changed

Contributors

v1.99.5

Whats Changed