Paperbot is an IRC bot that fetches academic papers. It monitors all conversation for links to scholarly content, then fetches the content and posts a public link. This seems to help enhance the quality of discussion and make us less ignorant.
All content is scraped using zotero/translators. These are javascript scrapers that work on a large number of academic publisher sites and are actively maintained. Paperbot offloads links to zotero/translation-server, which runs the zotero scrapers headlessly in a gecko and xulrunner environment. The scrapers return metadata and a link to the pdf. Then paperbot fetches that particular pdf. Sometimes in IRC someone drops a link straight to a pdf, which paperbot is also happy to compulsively archive.
It would be nice to use multiple proxies to resolve a pdf request.
say hi to paperbot on irc.freenode.net ##hplusroadmap
BSD.