You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After receiving a research request, use this template to plan and track your work. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).
Epic Information - GTFS analytics pipeline performance improvements
Summary
Now that the pipeline is mostly settled down with single day calculations for 3 types of segments, take a pause to work on performance improvements.
Start with the need to swap out dask.delayed to dask.from_map (for underlying time_series_utils). This is a good start but unlikely to be the only portion that could benefit.
Quarterly and yearly averages built on dask.delayed is too slow, particularly when segment geometries are involved.
2. Remove vp nearest neighbor intermediate output #1319 vp gets pre-processed into 2 intermediate outputs before being used for nearest neighbors. Drop the 2nd one and rewrite the nearest neighbor selection to use the 1st intermediate output.
3. Refactor nearest neighbor and interpolation steps #1325 refactor nearest neighbor steps so that instead of 2 intermediate outputs, we have just 1. refactor interpolation step, which is better set up now, to take new intermediate step and remove extraneous functions
5. post HQTA methodology changes...benchmark the times to understand performance. Since the monthly pipeline includes so many items, a reasonable goal is that it takes the same amount of time it did or less before the methodology swap. Aim for 10 min to start. (this is Eric's task's, somewhat an independent portion of the pipeline)
The text was updated successfully, but these errors were encountered:
tiffanychu90
added
epic
Representing research requests - large segments of work and their dependencies
gtfs-rt
Work related to GTFS-Realtime
labels
Dec 3, 2024
After receiving a research request, use this template to plan and track your work. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).
Epic Information - GTFS analytics pipeline performance improvements
Summary
dask.delayed
todask.from_map
(for underlyingtime_series_utils
). This is a good start but unlikely to be the only portion that could benefit.Issues
dask.delayed
todask.from_map
#1299dask.delayed
is too slow, particularly when segment geometries are involved.gtfs_funnel
scripts, particularly ones related tostop_times
andvp
. dropvp_direction
scriptThe text was updated successfully, but these errors were encountered: