Skip to content

Extracts and loads 311 data from Salesforce into the enterprise GIS

Notifications You must be signed in to change notification settings

CityOfPhiladelphia/311-data-pipeline

Repository files navigation

311-data-pipeline

This is an ETL process for extracting and publishing data from the City of Philadelphia 311 system.

Deployment

  1. git clone this repo
  2. Create a virtualenv, activate, and pip install -r requirements.txt
  3. Rename sample_config.py to config.py and enter actual values (or download from Lastpass).
  4. Create a batch file to activate the virtualenvandpython sync.py`. Schedule this to run regularly.

Seeding

seed.py is used to truncate the cases table and reload from a CSV dump. The basic usage is:

python seed.py <file>

Syncing

sync.py will check the database table for the most recent updated_datetime and get all records from Salesforce that have been updated since then.

The basic usage is:

python sync.py

If the Salesforce query times out you may have to chunk the updates into individual days. To sync just a single day, use the -d option:

python sync.py -d 2016-05-18

sync-ago.py will check the salesforce_cases dataset in AGO for the most recent updated_datetime and then use that to get all records in databridge that have been updated since then. It will then upsert into AGO in small batches of these updated rows after formatting the rows properly for AGO to accept them.

You can also run sync-ago.py to refresh for a whole day:

python sync-ago.py -d 2016-05-18

About

Extracts and loads 311 data from Salesforce into the enterprise GIS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published