Research Request - Cache each month's single day speeds data into public GCS bucket #923
Labels
gtfs-rt
Work related to GTFS-Realtime
open-data
Work related to publishing, ingesting open data
research request
Issues that serve as a request for research (summary and handoff)
Complete the below when receiving a research request, and continue to add to this issue as you receive additional details and produce deliverables. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).
Research Question
Single sentence description: As we start publishing monthly speed related datasets, we want to prepare for future time-series dataset(s). Start by adding the parquets we have into 1 gdf, gzipping it?
Detailed description:
Save out an aggregated weekly average by shape-stop and route-directon-stop_pair of speeds.
How will this research be used?
For time-series datasets, we may want to do some averaging across dates for a given time-of-day or hour for a given route-direction?
shape_id
may not be stable over each month, evenroute_id
sometimes isn't stable in identifying the same route.Stakeholders & End-Users
Public
Metrics
route_id
aggregations in GTFS schedule (scheduled service hours with speeds for each month #806), we may want to do some work related to identifying the same route when itsroute_id
may change between months, but itsroute_long_name/short_name/desc
columns are the same. (Ex:route_id = 720-[some_suffix]
). Assume that the converse can also be happening, whereroute_id
is the same, yet there are small differences inroute_long_name/short_name/desc
. Need to implement logic so we can uniquely identify a route / direction over time.Data sources
fct_monthly_routes
table to start parsing through route logicThe text was updated successfully, but these errors were encountered: