Archive the Web is an open-source website archiving tool that allows you to set up automated archiving stored on Arweave. Our mission at Archive the Web is to create a decentralized backup of the world wide web together.
Website can be found here.
In its basic form, this application crawls a website up to a specific depth, saves all interactions with the website's servers and resources loaded in a WARC format and uploads it all to the Arweave network.
WARC 1.1 is the format chosen for this application. It an international standard used by many archives and thus allows for composable applications.
We rely heavily on Webrecorder's pywb toolkit to capture all requests between our browser and the website's servers to output a WARC file.
Data added to Arweave is replicated amongst hundreds or thousands of computers or "miners" making it resilient and easily retrievable. To permanently save data, the Arweave network charges an upfront fee or an "endowment fee". The cost is estimated to incentivize these miners to continue to store the data for at least 200 years. The cost is calculated based on conservative estimates around price reductions for storage over time. For more information please check their yellow paper
A Warp contract (smart contract on Arweave) is used to update the current state of the archive. Currently it is where an archiver can register, and anyone can create an "Archiving Request" that will be fulfilled by an archiver.
Warp contract address: dD1DuvgM_Vigtnv4vl2H1IYn9CgLvYuhbEWPOL-_4Mw
First ensure you have an Arweave wallet with AR in it. Also, make sure you fund you Bundlr account with sufficient AR on the Bundlr node of your choice (default is node1).
Make sure that the file is stored at the path ./archiver/.secret/wallet.json
.
Third, make sure to register as an archiver. More info to come.
-
Run
git submodule update
-
Ensure you have redis running on port 6379
-
Install Google Chrome (latest stable release)
-
Install pywb by running
pip3 install pywb
-
Run
cd archiver && cargo run
. If you want to get the debug output, make sure to addRUST_LOG=debug
to your environment variables
-
Run
git submodule update
-
Run
docker-compose up