naics-scraper

About

A simple Ruby scraper for getting the content of NAICS code descriptions from the US Census web site and storing them in a Mongo data store.

Current Status

Current status is: experimental

It seems to work for the core content for 2012 codes -- it has not yet been tested on other years.

A good way to help is to check out the JSON of 2012 code scraping results, and open an Issue for any problems you discover.

Installing

Requirements

Ruby (built on 1.9.3)
MongoDB
Gems included in Gemfile

Getting Started

To run the scraper, do the following:

First, in a separate terminal window, start Mongo:

mongodb

Next, from the project directory, install gems:

bundle install

Then, run the script:

ruby naics_scraper.rb

You're now at an interactive terminal, from which you can run any of the scraping commands (read the code to get a sense for what you can do).

The main way to get all the data for a year is to do:

NaicsScraper.put_year_content_in_mongo(2012)

The scraper uses VCR to cache responses locally, both for web-citizenry purposes and to speed up testing new content-scraping approaches.

Contributing

Shoot on over a GitHub Issue. This is very much a script right now, so no formal process for contributing.

Contact

You can totally tweet at me! https://twitter.com/allafarce

License

Open source under the BSD* license (see LICENSE.md for full details)

* Go bears!

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
spec		spec
vcr_cassettes		vcr_cassettes
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE.md		LICENSE.md
README.md		README.md
complete-data-2012-052513-145pmPT.json		complete-data-2012-052513-145pmPT.json
naics_scraper.rb		naics_scraper.rb
problem-codes.txt		problem-codes.txt
sample-data-2012-incomplete.json		sample-data-2012-incomplete.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

naics-scraper

About

Current Status

Installing

Requirements

Getting Started

Contributing

Contact

License

About

Releases

Packages

Languages

License

daguar/naics-scraper

Folders and files

Latest commit

History

Repository files navigation

naics-scraper

About

Current Status

Installing

Requirements

Getting Started

Contributing

Contact

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages