so_docs

A collection of scripts to use the data dump from Stack Overflow's dearly departed Documentation feature.

Pour one out for Stack Overflow Documentation and then grab the data dump. With a JSON parser in hand, you can use that content wherever your dreams take you. Just be sure to provide proper attribution.

Getting started

Install Ruby.
Execute gem install bundler to install Bundler.
In the repository's directory, run bundle install to install all the required gems for the scripts.
Clone or download this very repository on your machine.
In order to test the scripts and download the Documentation archive, run bundle exec rake.
If the test all succeed, you are all set to run the scripts from the examples directory.

Libraries

so_docs.rb—Library for loading and manipulating the JSON Documentation archive.
wayback-api.rb—Library to save and verify URLs on the Wayback Machine. (Probably should be a separate project as it has no particular connection to the Documentation project other than I want to save pages there.)

Examples

get-archive.rb—Downloads the archive and extracts it's contents. You only need to do this once.
example2html.rb—Extract the HTML representation of an example. To see what this looks like, I made a copy of Creating and Initializing Arrays in Java.
revision2jekyll.rb—a Ruby script that prints a revision history item Markdown text.
attribution2wbm.rb—Submits example or topic attribution to the Wayback Machine.
submit2wbm.rb—a Ruby script that submits all topics to the Internet Archive Wayback Machine. Demonstrates how to use doctags.json and topics.json. (I ran it on August 16, 2017 after Documentation was put in readonly mode. There's probably no reason to run it again. Also, it doesn't work for C# as c%23 isn't allowed in their URLs.)
Stay tuned for other exciting scripts!^*

Contributions welcome

I'm working with Ruby, but I'm happy to accept scripts written in other languages as long as I can test them out. I'm also happy to include links to other project using Documentation archive in this README. Feel free to submit pull requests and I'll incorporate them as quickly as I can.

If there's something you'd like to see from the archive and can't figure out how to extract the content, feel free to add an issue or ask on Meta Stack Overflow.

Bugs

Tests are fragile. Changing the way these scripts work in even minor ways will break the tests. (Fortunately, the tests are also simple, so changing the expected md5 hash result usually suffices.)
The test framework also pulls in the entire archive from Archive.org and doesn't clean it up. This might be considered a feature by some.
Getting user's display names requires a call to the Stack Exchange API, which is subject to rate limiting. The method does not check to see if it's used the daily quota. Nor does it cache results. So it's easy to be throttled if you aren't careful. I've added an application key since exceeding the quota was a leading cause of failure for Travis CI Continuous Integration tests.
My code more or less reproduces a RDBMS—poorly. It would probably be smarter to load the JSON files into SQLite or something.

Footnote:

* Offer contingent on author's creativity and reader's ability to be excited.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
examples		examples
lib		lib
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
Gemfile		Gemfile
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

so_docs

Getting started

Libraries

Examples

Contributions welcome

Bugs

About

Releases

Packages

Languages

License

jericson/so_docs

Folders and files

Latest commit

History

Repository files navigation

so_docs

Getting started

Libraries

Examples

Contributions welcome

Bugs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages