Skip to content

DominicBurkart/wikipedia-revisions-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

  _      ___ __    _   ___           _     _                 ____                    
 | | /| / (_) /__ (_) / _ \___ _  __(_)__ (_)__  ___  ___   / __/__ _____  _____ ____
 | |/ |/ / /  '_// / / , _/ -_) |/ / (_-</ / _ \/ _ \(_-<  _\ \/ -_) __/ |/ / -_) __/
 |__/|__/_/_/\_\/_/ /_/|_|\__/|___/_/___/_/\___/_//_/___/ /___/\__/_/  |___/\__/_/   

status status

This project serves wikipedia revision differences from a given time period, taking an http request with a start datetime and end datetime, and sending the revisions via a brotli-compressed stream. In the response stream, each line is a JSON-encoded revision.

documentation coming soon 🥧⏲️

Build the project:

docker build -t wikipedia-revisions-server .

Run (specifying working & storage directories, plus dump date):

docker run -it -v /local/path:/fast_dir -v /other/local/path:/big_dir wikipedia-revisions-server -d 20200601

If the data and index files have already been built, you can start the server without having to rebuild:

docker run -it -v /local/path:/fast_dir -v /other/local/path:/big_dir wikipedia-revisions-server

To find a valid date (-d param), go to the wiki archives and find a date with available .xml.bz2 files to download for "All pages with complete page edit history"

See the python wikipedia revisions repo for different download targets & schemes than those available here.

Thanks to JetBrains for providing an open source license to their IDEs for developing this project!

About

store and serve every wikipedia edit

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published