Pipeline for ingesting data about events on campus.
- Configure your environment (instructions on the wiki).
- Choose an unassigned issue, and comment that you're working on it.
- Open a PR containing a new
fetch
,parse
, ornormalize
script! (details on these stages)
See the wiki for instructions on how to run event-data-ingest
.
For more information on (pipeline stages) and how to contribute, see the wiki!
The below details on interacting with our production environment are intended for staff developers.
In production, all stages for all runners are run, and outputs are stored to the vaccine-feeds
bucket on GCS.
If you are developing a feature that interacts with the remote storage, you need to test GCS then install the gcloud
SDK from setup instructions and use the vaccine-feeds-dev
bucket (you will need to be granted access).
Results are also periodically committed to vaccine-feed-ingest-results
.
To load the generated output to a frontend API, the following bash one-liner can be used to grab the most recent normalized output from all runner stages and concatenate them together into one file.
find out -type f -mtime -1 -exec ls -lt {} + | grep "normalized" | awk '{print $NF}' 2> /dev/null |xargs cat > "$(date +'%Y-%m-%d')_concatenated_events.parsed.normalized.ndjson"