This repository contains the full pipeline for updating and maintaining NYC Open Data Dashboard (requires permissions).
Files in this repository:
dashboard_prod.py
- main script executed hourly through GitHub Actions.
dashboard_dev.py
- a copy of the main script for testing, executed hourly through GitHub Actions.
credentials.py
- supplemental module, called within main script. Contains helper functions to call Socrata and Google Sheets APIs. Does not contain any actual credentials. All credentials are stored as GitHub secrets.
requirements.txt
-python3
requirements for the dashboard script to run.
.github/workflows/dashboard.yml
- GitHub Actions workflow file.
The infrastructure of the dashboard relies entirely on tools available to public at no cost (if used within allocated limits):
- GitHub Actions
- Google Spreadsheets and Google Data Studio
GitHub Actions allow users to automate, customize, and execute workflows within a repository. There is no need to have access to special hardware or cloud infrastructure outside GitHub.
Implementation tip: To set up a simple workflow, similar to the Open Data Dashboard, user needs to create a workflow file within their repository to initiate and run the process. .github/workflows/dashboard.yml
is the workflow file for the Open Data Dashboard.
Google Spreadsheets and Google Data Studio are free tools available to all users with a Google account.
Implementation tip: To establish communication between the processes within a GitHub Actions workflow and Google applications, user needs to retreive their own Service Account credentials from Google API. You can use these instructions to retreive credentials associated with your Google account (we use gspread
Python package to establish the connection in our pipeline). We store our credentials as GitHub Secrets, which is a good practice for storing any sensitive information stored on GitHub.
All metrics in the dashboard are derived from two public datasets available on Open Data NYC:
Some Python
knowledge would be helpful if user wants to understand how the metrics are derived in the main script. However, there is no need to install Python
on user's local machine as all the processing is performed on a GitHub-hosted runner.
General familiarity with GitHub is necessary if user wants to replicate this process within their own GitHub/Google accounts by forking the repo.
The diagram below provides an overall view of the Open Data Dashboard pipeline.