Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a few new Common Crawl resources #152

Merged
merged 3 commits into from
Nov 5, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Add a few new Common Crawl resources
wumpus authored Nov 5, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit d92410b9b5c988f119fba553b47b8722fe72b386
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -39,6 +39,7 @@ Web archiving is the process of collecting portions of the World Wide Web to ens
* [IIPC and DPC Training materials: module for beginners (8 sessions)](https://netpreserve.org/web-archiving/training-materials/)
* [UNT Web Archiving Course](https://github.com/vphill/web-archiving-course)
* [Continuing Education to Advance Web Archiving (CEDWARC)](https://cedwarc.github.io/)
* [A Whirlwind Tour of Common Crawl's Datasets using Python](https://github.com/commoncrawl/whirlwind-python/)
* The WARC Standard:
* The [warc-specifications](https://iipc.github.io/warc-specifications/) community HTML version of the official specification and hub for new proposals.
* The [offical ISO 28500 WARC specification homepage](http://bibnum.bnf.fr/WARC/).
@@ -222,6 +223,7 @@ This list of tools and software is intended to briefly describe some of the most
* [WS-DL Blog](https://ws-dl.blogspot.com/) - Web Science and Digital Libraries Research Group blogs about various Web archiving related topics, scholarly work, and academic trip reports.
* [DSHR's Blog](https://blog.dshr.org/) - David Rosenthal regularly reviews and summarizes work done in the Digital Preservation field.
* [UK Web Archive Blog](https://blogs.bl.uk/webarchive/)
* [Common Crawl Foundation Blog](https://commoncrawl.org/blog) -- [rss](http://www.commoncrawl.org/blog/rss.xml)

### Mailing Lists

@@ -235,6 +237,7 @@ This list of tools and software is intended to briefly describe some of the most
* [IIPC Slack](https://iipc.slack.com/) - Ask [@netpreserve](https://twitter.com/NetPreserve?s=20) for access.
* [Archives Unleashed Slack](https://archivesunleashed.slack.com/) - [Fill out this request form](http://slack.archivesunleashed.org/) for access to a researcher group of people working with web archives.
* [Archivers Slack](https://archivers.slack.com) - [Invite yourself](https://archivers-slack.herokuapp.com/) to a multi-disciplinary effort for archiving projects run in affiliation with [EDGI](https://envirodatagov.org/archiving/) and [Data Together](http://datatogether.org/).
* [Common Crawl Foundation Partners](ccfpartners.slack.com) (ask greg zat commoncrawl zot org for an invite)

### Twitter