Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research Request - Cache each month's single day speeds data into public GCS bucket #923

Closed
tiffanychu90 opened this issue Oct 5, 2023 · 1 comment
Labels
gtfs-rt Work related to GTFS-Realtime open-data Work related to publishing, ingesting open data research request Issues that serve as a request for research (summary and handoff)

Comments

@tiffanychu90
Copy link
Member

tiffanychu90 commented Oct 5, 2023

Complete the below when receiving a research request, and continue to add to this issue as you receive additional details and produce deliverables. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).

Research Question

Single sentence description: As we start publishing monthly speed related datasets, we want to prepare for future time-series dataset(s). Start by adding the parquets we have into 1 gdf, gzipping it?

Detailed description:
Save out an aggregated weekly average by shape-stop and route-directon-stop_pair of speeds.

How will this research be used?

For time-series datasets, we may want to do some averaging across dates for a given time-of-day or hour for a given route-direction? shape_id may not be stable over each month, even route_id sometimes isn't stable in identifying the same route.

Stakeholders & End-Users

Public

Metrics

  • From early work looking at month route_id aggregations in GTFS schedule (scheduled service hours with speeds for each month #806), we may want to do some work related to identifying the same route when its route_id may change between months, but its route_long_name/short_name/desc columns are the same. (Ex: route_id = 720-[some_suffix]). Assume that the converse can also be happening, where route_id is the same, yet there are small differences in route_long_name/short_name/desc. Need to implement logic so we can uniquely identify a route / direction over time.

Data sources

  • Cal-ITP data sources:
  • speeds datasets
  • schedule trips data or fct_monthly_routes table to start parsing through route logic
@tiffanychu90 tiffanychu90 added research request Issues that serve as a request for research (summary and handoff) gtfs-rt Work related to GTFS-Realtime labels Oct 5, 2023
@tiffanychu90 tiffanychu90 added the open-data Work related to publishing, ingesting open data label Nov 15, 2023
@tiffanychu90
Copy link
Member Author

Close for now - we have the ability to save out a single parquet for multiple months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gtfs-rt Work related to GTFS-Realtime open-data Work related to publishing, ingesting open data research request Issues that serve as a request for research (summary and handoff)
Projects
Status: Done
Development

No branches or pull requests

1 participant