Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic - GTFS analytics pipeline performance improvements #1315

Closed
4 of 5 tasks
tiffanychu90 opened this issue Dec 3, 2024 · 0 comments
Closed
4 of 5 tasks

Epic - GTFS analytics pipeline performance improvements #1315

tiffanychu90 opened this issue Dec 3, 2024 · 0 comments
Assignees
Labels
epic Representing research requests - large segments of work and their dependencies gtfs-rt Work related to GTFS-Realtime

Comments

@tiffanychu90
Copy link
Member

tiffanychu90 commented Dec 3, 2024

After receiving a research request, use this template to plan and track your work. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).

Epic Information - GTFS analytics pipeline performance improvements

Summary

  • Now that the pipeline is mostly settled down with single day calculations for 3 types of segments, take a pause to work on performance improvements.
  • Start with the need to swap out dask.delayed to dask.from_map (for underlying time_series_utils). This is a good start but unlikely to be the only portion that could benefit.

Issues

  • 1. Research Request - switch dask.delayed to dask.from_map #1299
    • Quarterly and yearly averages built on dask.delayed is too slow, particularly when segment geometries are involved.
  • 2. Remove vp nearest neighbor intermediate output #1319 vp gets pre-processed into 2 intermediate outputs before being used for nearest neighbors. Drop the 2nd one and rewrite the nearest neighbor selection to use the 1st intermediate output.
  • 3. Refactor nearest neighbor and interpolation steps #1325 refactor nearest neighbor steps so that instead of 2 intermediate outputs, we have just 1. refactor interpolation step, which is better set up now, to take new intermediate step and remove extraneous functions
  • 4. Gtfs funnel performance improvements #1335 refactor gtfs_funnel scripts, particularly ones related to stop_times and vp. drop vp_direction script
  • 5. post HQTA methodology changes...benchmark the times to understand performance. Since the monthly pipeline includes so many items, a reasonable goal is that it takes the same amount of time it did or less before the methodology swap. Aim for 10 min to start. (this is Eric's task's, somewhat an independent portion of the pipeline)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Representing research requests - large segments of work and their dependencies gtfs-rt Work related to GTFS-Realtime
Projects
None yet
Development

No branches or pull requests

1 participant