Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research Request - Use gtfs_segments base nearest neighbors function in pipeline #996

Closed
tiffanychu90 opened this issue Jan 17, 2024 · 0 comments · Fixed by #1005
Closed
Assignees
Labels
gtfs-rt Work related to GTFS-Realtime research request Issues that serve as a request for research (summary and handoff)

Comments

@tiffanychu90
Copy link
Member

Complete the below when receiving a research request, and continue to add to this issue as you receive additional details and produce deliverables. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).

Research Question

Single sentence description:

Detailed description:

  • After digging into gtfs_segments package, we will use their create_segments to get our stop segments. When we apply that create_segments function, it takes quite a bit longer in geopandas, but if we wrap it with dask.map_partitions, it will be much faster to use.
  • But also, create_segments relies on a nearest_points base function that uses scipy.spatial CKDTree that we can use more broadly.
  • We can extend the use case in at least 4 areas (anytime we want to ask, where are the nearest x points to this other point location):
  1. Eric's segments are 1,000 m (sometimes between stops), and we can still find the cutoffs within a shape and ask where the nearest vp is. --> Where are the nearest vp to these arbitrary 1,000m cutoffs?
  2. Stop segments: All trips along a given shape, if there are variations, one shape variation is selected to stand in for all the trips that travel along that shape. This way, our stop segments are stable. But, if there are 2 variations (shape-stop_sequence-stop_id combinations) for a given shape_array_key, then not every stop is reflected for that trip, only 1 variation. --> Where are the nearest vp to these more-easily-aggregated-segment cutoffs?
  3. Trip stops / rt_stop_times table: We would want to reflect each trip-stop's location and the speed it took to get there. This might be a contributing factor / conceptual bug in why not all stops are monotonically increasing. --> Where are the nearest vp to these every stop for that trip, get stop arrival, and derive speed.
  4. Road segments: As is, we have trouble parsing out enough vp to traverse a given segment, especially when the bus makes a turn onto a different road. --> Where are the nearest vp to these arbitrary road cutoffs?

Deliverables

  • Add functions, and/or apply functions in scripts to do use gtfs_segments.nearest_points in all portions of the analytics pipeline
@tiffanychu90 tiffanychu90 added gtfs-rt Work related to GTFS-Realtime research request Issues that serve as a request for research (summary and handoff) labels Jan 17, 2024
@tiffanychu90 tiffanychu90 self-assigned this Jan 17, 2024
@tiffanychu90 tiffanychu90 linked a pull request Jan 24, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gtfs-rt Work related to GTFS-Realtime research request Issues that serve as a request for research (summary and handoff)
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant