ratsinfo-scraper

Scrape documents with associated metadata from http://ratsinfo.dresden.de

INSTALLATION

Get the code

git clone https://github.com/Mic92/ratsinfo-scraper.git

Get ruby (>= 2.0.0)

gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
\curl -sSL https://get.rvm.io | bash -s stable
rvm install 2.2.3
rvm --default use 2.2.3

Install bundler

gem install bundler

Install Dependencies

cd ratsinfo
bundle install

USAGE

To start scraping use on other console:

rake

This will extract all documents to the path of the environment variable DOWNLOAD_PATH (defaults to "data") and convert it to xml files, containing metadata and full text of the pdfs

To scrape an individual session for example: http://ratsinfo.dresden.de/to0040.php?__ksinr=100

rake testmonth

To just download a tiny set of Data, only session data. Just for testing.

To display all tasks use:

rake -T

The download directory will have the following scheme:

each session have a directory, where the id is the directory name
every document belonging to this session will be extracted to this directory
additionally a JSON file is created, with the session id in its name. This is a machine-readable version of the index.htm file, which is contained in the document archives

We do now follow the OParl specification!

Deviations from the OParl spec:

Numerical id everywhere, because we don't yet serve the data on HTTP URIs

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
lib		lib
test		test
.gitignore		.gitignore
.ruby-version		.ruby-version
.travis.yml		.travis.yml
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile
gemset.nix		gemset.nix
meetings2ics.rb		meetings2ics.rb
shell.nix		shell.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ratsinfo-scraper

INSTALLATION

USAGE

About

Releases

Packages

Contributors 3

Languages

License

offenesdresden/ratsinfo-scraper

Folders and files

Latest commit

History

Repository files navigation

ratsinfo-scraper

INSTALLATION

USAGE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages