Skip to content

Research Request - Segment-trip speed diagnostics #594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tiffanychu90 opened this issue Dec 29, 2022 · 1 comment
Closed

Research Request - Segment-trip speed diagnostics #594

tiffanychu90 opened this issue Dec 29, 2022 · 1 comment
Assignees
Labels
gtfs-rt Work related to GTFS-Realtime research request Issues that serve as a request for research (summary and handoff)

Comments

@tiffanychu90
Copy link
Member

tiffanychu90 commented Dec 29, 2022

Complete the below when receiving a research request, and continue to add to this issue as you receive additional details and produce deliverables. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).

Research Question

Single sentence description: Exploratory work to better understand issues that arise from using segments. Start from the most granular, with each row being segment-trip.

  • Do we need to get rid of unusually low / high speed observations before aggregating to segment-time_of_day?
  • How to aggregate when there are multiple stops in the segment? This issue will only grow once we move to road segments.
  • Epic - Daskify RT segment speeds #592

Detailed description:

(1) How many segments have only 1 point per trip? Either missing min_time/min_dist or max_time/max_dist
(2) How many segments have unusually high or low speeds calculated? What's happening here? Continue to drop unusually high speeds, but do we want to drop unusually low speeds?
(3) How are routes that inline / loop handled? Hypothesis: we should see unusually low speeds.

How will this research be used?

Stakeholders & End-Users

Metrics

Data sources

  • Cal-ITP data sources: GCS
    analysis_date = '2023-01-18' --> shared_utils.rt_dates["jan2023"]
  • partitioned parquets in rt_segment_speeds/speeds_route_segments/.

To read in the partitioned parquets, you'll need to use dd.read_parquet("gs://bucket/folder/speeds_{analysis_date}/"), not pd.read_parquet()

  • External data sources:

  • Remaining data source questions:

Deliverables:

Notebooks, tables saved as parquets

Timeline of deliverables:

@tiffanychu90
Copy link
Member Author

Closing because we're using "super" project to deal with where this happens. Adapt it not only for stop segments, but also use same approach when we are dealing with road segments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gtfs-rt Work related to GTFS-Realtime research request Issues that serve as a request for research (summary and handoff)
Projects
None yet
Development

No branches or pull requests

2 participants