-
Notifications
You must be signed in to change notification settings - Fork 45
feed_archive
Saves public feed data from Google Reader's feed archive.
Google Reader has (for the most part) a copy of all blog posts and other feed items published since its launch in late 2005 (assuming that at least one Reader user subscribed to the feed). This makes it an invaluable resource for sites that disappear, can serve as a backup mechanism and enables tools to be created.
Presumably access to this data is also going away come July 2013, and thus this tool can be used to get one last shot at archiving feeds you might want to refer to later.
The easiest way to use it is get the OPML file with all your Reader subscriptions, and run it like so:
bin/feed_archive \
--opml_file=~/Downloads/feeds.opml \
--output_directory=~/Downloads/feed_archive
The destination specified by --output_directory
will be populated with one file per feed, named after its URL. The file contains all items that Reader ever saw in that feed, in the Atom format. Google Reader normally omits unknown (namespaced) elements in its API output, but in the script makes an attempt to use high-fidelity mode to reconstruct the original data as much as possible.
If you have specific feeds you'd like to save the archive for, instead of --opml_file
you can also pass in feed URLs as command line arguments:
bin/feed_archive \
--output_directory=~/Downloads/feed_archive \
http://googlereader.blogspot.com/atom.xml \
http://persistent.info/atom.xml \
...
The tool supports additional arguments for controlling how many items are fetched, see bin/feed_archive --help
for more information.