Seeding and Sorting Overview

What do Seeders/Sorters do?

Seeders and Sorters canvass the resources of a given government agency, identifying important URLs. They identify whether those URLs can be crawled by the Internet Archive's webcrawler. If the URLs are crawlable, the Seeders/Sorters nominate them to the End-of-Term (EOT) project, otherwise they add them to the Uncrawlable spreadsheet using the project's Chrome Extension.

Choosing the website

The Seeders/Sorters team will use the EDGI subprimer systems (found here), or a similar set of resources, to identify important/at risk data. Talk to the DataRescue organizers to learn more.

Canvassing the website and evaluating content

Start exploring the website assigned, identifying important URLs.
Decide whether the data on a page or website subsection can be automatically captured by the Internet Archive webcrawler.
The best source of information about the seeding and sorting process is represented at https://envirodatagov.org/, see:
Understanding What the Internet Archive Webcrawler Does
Seeding the Internet Archive’s Webcrawler)

Crawlable URLs

URLs judged to be possibly crawlable are "nominated" (equivalently, "seeded") to the End-Of-Term project (EOT), using the EDGI Nomination Chrome extension or bookmarklet.

Wherever possible, add in the Agency Office Code. Talk to the DataRescue organizers to learn more.

Uncrawlable URLs

If URL is judged not crawlable, add it to the "Uncrawlable" spreadsheet through the Chrome Extension.
In the spreadsheet is automatically associated with a universal unique identifyer (UUID) that was generated in advance.
You can check whether the page or some files are rendered using the Internet Archive's Wayback Machine Chrome Extension

Not sure?

This sorting is only provisional: when in doubt seeders nominate the URL and mark it as possibly not crawlable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seednsort.md

seednsort.md

Seeding and Sorting Overview

What do Seeders/Sorters do?

Choosing the website

Canvassing the website and evaluating content

Crawlable URLs

Uncrawlable URLs

Not sure?

Files

seednsort.md

Latest commit

History

seednsort.md

File metadata and controls

Seeding and Sorting Overview

What do Seeders/Sorters do?

Choosing the website

Canvassing the website and evaluating content

Crawlable URLs

Uncrawlable URLs

Not sure?