-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create statewide analysis tables in Google BigQuery #31
Comments
Some thoughts, for once we merge ##51. BackgroundAFAIK the warehouse will have 2 jobs here:
Proposal (in order of priority)Note that everything below assumes we'd be loading only the most current day (e.g. for that day). That might be the easiest place to start / investigate kicking things of with bigquery. For keeping the full history, and updating incrementally, we'd need to add something like execution_date to the primary keys listed below.
|
[not sure if this is helpful/relevant] |
@e-lo these are super helpful--thanks! I'll definitely use these repos (and scan their issues) to try and figure out how to load the data / common snags people hit :o. |
Going to try loading schedules over the next day. Now that I've dug a big more into some of the schema formats out there--it seems like taking a two-step strategy is useful. That is.. This GTFS frictionless data schema defines two levels of validation:
A big advantage of separating these out, is we can run validations for step 2 directly inside the warehouse, and avoid looping over files, expose the task to analysts, etc.. |
also @machow should we close? |
ah, yeah! |
Currently, after we download the data, we should aggregate out each of the GTFS fields into "statewide" tables suitable for analysis.
stops
schedule
/ stop timesroutes
The text was updated successfully, but these errors were encountered: