Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align speed scripts part 2 / Jul open data part 2 #1192

Merged
merged 12 commits into from
Jul 31, 2024
15 changes: 12 additions & 3 deletions gtfs_funnel/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@ Use `update_vars` and input one or several days to download.

1. **Schedule data**: download data for [trips](./download_trips.py), [stops](./download_stops.py), [shapes](./download_shapes.py), and [stop times](./download_stop_times.py) and cache parquets in GCS
1. **Vehicle positions data**: download [RT vehicle positions](./download_vehicle_positions.py)
1. Use the `Makefile` and download schedule and RT data. In terminal: `make download_gtfs_data`
1. Other preprocessing steps for schedule and vp: `make preprocess`
1. Use the [Makefile](./Makefile) and download schedule and RT data. In terminal: `make download_gtfs_data`
1. Other preprocessing steps for schedule and vp: `make preprocess_schedule_vp_dependency` and `make preprocess_vp`, `make preprocess_schedule_only`
* Preprocessing batches are grouped in a somewhat modular way to clarify schedule only and vp only preprocessing steps so the two large workstreams move independently.
* The `make preprocess_schedule_vp_dependency` call out specific preprocessing steps where RT data depends on schedule preprocessing (such as shared crosswalks and additional `stop_times` transformations).
1. Harmonize how route names are displayed in time-series: `make timeseries_preprocessing`
1. Download monthly aggregated schedule data: `make monthly_scheduled_data`

Expand All @@ -16,4 +18,11 @@ Use `update_vars` and input one or several days to download.
[![rt_vs_sched_mermaid](https://mermaid.ink/img/pako:eNqFVG1vmzAQ_iuWpyidlEQhIU1KpUlNm2zqlnZao30YVMiBA7yCjWzTllb57zNQCqjNygewfM9z99wL94w97gO28HA4dJiiKgYL_dqie4luvAj8LAa0ASWoJx1WYnq9Z4chRBlVFiqPCPVVBAn0LdTfEQn9Qfv2NxGU7GKQ_Ve4NqWCJkTk5zzmouB9Ws3Ws_VZTW0QW3hUDWo8Hr-FLLnwQRwCxZTBIZsEjzO_q2O9nq-WLYwCoWgHEgRBvzLvi49-7Xs9hzksiPmDFxGh0HZZAWS2CwVJIyRfilm7RciLiZQXECA_QAGNY-ulBO8gpCdoqmpUKbAIV-PObN2fVN5alqV9DYdfUGMrntWRzVMQRHHh1jp8Vyqi5CjNbz9rXhXhtOEtbRmRFNpOVy3zuS0VTw9ZL0qrq2hy0MHaDlUg3TvIXS5CwugTUZQz1xNcygcS33V4rYRW5dUPu04EFbmjpBrRVxIiYSggJAreqcemIQueKXB9KsArwnf9nNZFBubXx9eGXp2db68rPlJ5qocjpCBbNShUyFRnRWL0l1OGFEe7LAhAgK95xJfvSPtqFxZXQpgAU4db-u3IrqQ3obutLDhdyqVdBX2gKmopbkrWhXekl5yYsxCkQuVgID1QL9m_ifS90taJlDepbE4R-l9l7yGinu5NyiUtmiLbs35l36duJouFcrA41xrz4Vh0KYWTnwXto4GoCS359RUe4AREQqiv12m56RxcbkAHW_roQ0CyWDlYrwsNJZniNznzsKVEBgOcpb6WdUGJLkKCrYDEUt-CT_Vfu6lWdLmpBzglDFvP-BFbw8VkND2ZTk7M6XwxPZkY5gDn2JoejwxztjCPTdOYjRemsR_gJ861V2M0MfQGHBvz6VSjzblZuvtTGsuQ-39p7tDM?type=png)](https://mermaid.live/edit#pako:eNqFVG1vmzAQ_iuWpyidlEQhIU1KpUlNm2zqlnZao30YVMiBA7yCjWzTllb57zNQCqjNygewfM9z99wL94w97gO28HA4dJiiKgYL_dqie4luvAj8LAa0ASWoJx1WYnq9Z4chRBlVFiqPCPVVBAn0LdTfEQn9Qfv2NxGU7GKQ_Ve4NqWCJkTk5zzmouB9Ws3Ws_VZTW0QW3hUDWo8Hr-FLLnwQRwCxZTBIZsEjzO_q2O9nq-WLYwCoWgHEgRBvzLvi49-7Xs9hzksiPmDFxGh0HZZAWS2CwVJIyRfilm7RciLiZQXECA_QAGNY-ulBO8gpCdoqmpUKbAIV-PObN2fVN5alqV9DYdfUGMrntWRzVMQRHHh1jp8Vyqi5CjNbz9rXhXhtOEtbRmRFNpOVy3zuS0VTw9ZL0qrq2hy0MHaDlUg3TvIXS5CwugTUZQz1xNcygcS33V4rYRW5dUPu04EFbmjpBrRVxIiYSggJAreqcemIQueKXB9KsArwnf9nNZFBubXx9eGXp2db68rPlJ5qocjpCBbNShUyFRnRWL0l1OGFEe7LAhAgK95xJfvSPtqFxZXQpgAU4db-u3IrqQ3obutLDhdyqVdBX2gKmopbkrWhXekl5yYsxCkQuVgID1QL9m_ifS90taJlDepbE4R-l9l7yGinu5NyiUtmiLbs35l36duJouFcrA41xrz4Vh0KYWTnwXto4GoCS359RUe4AREQqiv12m56RxcbkAHW_roQ0CyWDlYrwsNJZniNznzsKVEBgOcpb6WdUGJLkKCrYDEUt-CT_Vfu6lWdLmpBzglDFvP-BFbw8VkND2ZTk7M6XwxPZkY5gDn2JoejwxztjCPTdOYjRemsR_gJ861V2M0MfQGHBvz6VSjzblZuvtTGsuQ-39p7tDM)

## Analytic Grains
[![mermaid_grains](https://mermaid.ink/img/pako:eNqNVttu4zYQ_RWCReAUsAJJtmtHDwWSdeQ-tECRDYqgsrGgxZFNVCJVksquN8i_lxTlmPKlWT_YFufMmTPD4VCvOBcUcIKDIFhyzXQJCVo8pZ_RHSflTrNcoT9ZDSXjgBaSMK6WvMVeXb0uOUKMM52g9i9CA72FCgYJGqyJgsHQX_2LSEbWJajBO9yYaskqInefRCmk9fvpYZJO0ru96wHxBN_0ARWG4SnkXkgK8hLIZnDJpiAXnPZ1pOn04d7DaJCa9SBFUQyc-c3-mK-3q6slX_KiFF_zLZEa_f7oAHlJlJpDgSTkGhWsLJMu0yN7zmRewh7RajjDEHeA9DYNH2Ib0kJUs95IUm-RYnxjSCjZOcM-B4TuMpVvgTYlrFAQ_IruokyKRkNAmeVlgq88sIPEmahBEi1k33ZjjaNMaVGv9hLs5z57gS2zWdRCMcupXLD7KFOwqYBrpGoAqg4-RyKCDtfnjToWcp2pLakhsKFRB_XI7PKX_fLq5yRJuqr6XKQl-xSRjLyY7DbwsSZLWxMmV4ax3YUD9ozK9XVGyhJpyepLQh-fWq1Iswp8oT7butO5znw00vYo_ZiQ3JTLplaRgwYPF_1TXQidd6HzoxKhD_ftC6OetjOa6LXpO0LP1CRCF_XQTg-9uGX2jJsxQ4NKF3n-rkbBvw3w_IN6OWmmoZvKTpMPk135znFve9TR_nTUfinm1uFFof55nP_feZw7SOyOnHsYHdH4DeXL69B7BcDpydSwTqZkkoHyhaZnFHnWljiNskpwvY0CM3MiJ-3RtJ0mdqpS9h1c8RAnlzq9I4odUewT-fEeHWx0LKqv_tBMlJioZnyhlvZsUy7OzbeFm2-LKLMH2ApBZLORsLF8Wji6VstidOoVn9Tsxwnala6cXoe5sriK_PauGNVSmKsAThI7t8EthZ_589Ft8Nx1n1eHZ2fo3QFeBITwEFcgK8KoeYlo7_Ulbu_7JU7MXwoFaUq9xOZyNFDSaPF5x3OcaNnAEJtwmy1OClIq89TUdrvmjBi91ftqTfjfQvSecfKKv-FkEt6MJ-PpLJrFt-E0vp0O8Q4ncRzdjEaTX6bhZDqehrPJ2xB_bwnM-ng6uh3P4nEYz8JoEg0xUGay-sO9BLXvQm__Ae0gtOI?type=png)](https://mermaid.live/edit#pako:eNqNVttu4zYQ_RWCReAUsAJJtmtHDwWSdeQ-tECRDYqgsrGgxZFNVCJVksquN8i_lxTlmPKlWT_YFufMmTPD4VCvOBcUcIKDIFhyzXQJCVo8pZ_RHSflTrNcoT9ZDSXjgBaSMK6WvMVeXb0uOUKMM52g9i9CA72FCgYJGqyJgsHQX_2LSEbWJajBO9yYaskqInefRCmk9fvpYZJO0ru96wHxBN_0ARWG4SnkXkgK8hLIZnDJpiAXnPZ1pOn04d7DaJCa9SBFUQyc-c3-mK-3q6slX_KiFF_zLZEa_f7oAHlJlJpDgSTkGhWsLJMu0yN7zmRewh7RajjDEHeA9DYNH2Ib0kJUs95IUm-RYnxjSCjZOcM-B4TuMpVvgTYlrFAQ_IruokyKRkNAmeVlgq88sIPEmahBEi1k33ZjjaNMaVGv9hLs5z57gS2zWdRCMcupXLD7KFOwqYBrpGoAqg4-RyKCDtfnjToWcp2pLakhsKFRB_XI7PKX_fLq5yRJuqr6XKQl-xSRjLyY7DbwsSZLWxMmV4ax3YUD9ozK9XVGyhJpyepLQh-fWq1Iswp8oT7butO5znw00vYo_ZiQ3JTLplaRgwYPF_1TXQidd6HzoxKhD_ftC6OetjOa6LXpO0LP1CRCF_XQTg-9uGX2jJsxQ4NKF3n-rkbBvw3w_IN6OWmmoZvKTpMPk135znFve9TR_nTUfinm1uFFof55nP_feZw7SOyOnHsYHdH4DeXL69B7BcDpydSwTqZkkoHyhaZnFHnWljiNskpwvY0CM3MiJ-3RtJ0mdqpS9h1c8RAnlzq9I4odUewT-fEeHWx0LKqv_tBMlJioZnyhlvZsUy7OzbeFm2-LKLMH2ApBZLORsLF8Wji6VstidOoVn9Tsxwnala6cXoe5sriK_PauGNVSmKsAThI7t8EthZ_589Ft8Nx1n1eHZ2fo3QFeBITwEFcgK8KoeYlo7_Ulbu_7JU7MXwoFaUq9xOZyNFDSaPF5x3OcaNnAEJtwmy1OClIq89TUdrvmjBi91ftqTfjfQvSecfKKv-FkEt6MJ-PpLJrFt-E0vp0O8Q4ncRzdjEaTX6bhZDqehrPJ2xB_bwnM-ng6uh3P4nEYz8JoEg0xUGay-sO9BLXvQm__Ae0gtOI)
[![mermaid_grains](https://mermaid.ink/img/pako:eNqNVttu4zYQ_RWCReAUsAJJtmtHDwWSdeQ-tECRDYqgsrGgxZFNVCJVksquN8i_lxTlmPKlWT_YFufMmTPD4VCvOBcUcIKDIFhyzXQJCVo8pZ_RHSflTrNcoT9ZDSXjgBaSMK6WvMVeXb0uOUKMM52g9i9CA72FCgYJGqyJgsHQX_2LSEbWJajBO9yYaskqInefRCmk9fvpYZJO0ru96wHxBN_0ARWG4SnkXkgK8hLIZnDJpiAXnPZ1pOn04d7DaJCa9SBFUQyc-c3-mK-3q6slX_KiFF_zLZEa_f7oAHlJlJpDgSTkGhWsLJMu0yN7zmRewh7RajjDEHeA9DYNH2Ib0kJUs95IUm-RYnxjSCjZOcM-B4TuMpVvgTYlrFAQ_IruokyKRkNAmeVlgq88sIPEmahBEi1k33ZjjaNMaVGv9hLs5z57gS2zWdRCMcupXLD7KFOwqYBrpGoAqg4-RyKCDtfnjToWcp2pLakhsKFRB_XI7PKX_fLq5yRJuqr6XKQl-xSRjLyY7DbwsSZLWxMmV4ax3YUD9ozK9XVGyhJpyepLQh-fWq1Iswp8oT7butO5znw00vYo_ZiQ3JTLplaRgwYPF_1TXQidd6HzoxKhD_ftC6OetjOa6LXpO0LP1CRCF_XQTg-9uGX2jJsxQ4NKF3n-rkbBvw3w_IN6OWmmoZvKTpMPk135znFve9TR_nTUfinm1uFFof55nP_feZw7SOyOnHsYHdH4DeXL69B7BcDpydSwTqZkkoHyhaZnFHnWljiNskpwvY0CM3MiJ-3RtJ0mdqpS9h1c8RAnlzq9I4odUewT-fEeHWx0LKqv_tBMlJioZnyhlvZsUy7OzbeFm2-LKLMH2ApBZLORsLF8Wji6VstidOoVn9Tsxwnala6cXoe5sriK_PauGNVSmKsAThI7t8EthZ_589Ft8Nx1n1eHZ2fo3QFeBITwEFcgK8KoeYlo7_Ulbu_7JU7MXwoFaUq9xOZyNFDSaPF5x3OcaNnAEJtwmy1OClIq89TUdrvmjBi91ftqTfjfQvSecfKKv-FkEt6MJ-PpLJrFt-E0vp0O8Q4ncRzdjEaTX6bhZDqehrPJ2xB_bwnM-ng6uh3P4nEYz8JoEg0xUGay-sO9BLXvQm__Ae0gtOI?type=png)](https://mermaid.live/edit#pako:eNqNVttu4zYQ_RWCReAUsAJJtmtHDwWSdeQ-tECRDYqgsrGgxZFNVCJVksquN8i_lxTlmPKlWT_YFufMmTPD4VCvOBcUcIKDIFhyzXQJCVo8pZ_RHSflTrNcoT9ZDSXjgBaSMK6WvMVeXb0uOUKMM52g9i9CA72FCgYJGqyJgsHQX_2LSEbWJajBO9yYaskqInefRCmk9fvpYZJO0ru96wHxBN_0ARWG4SnkXkgK8hLIZnDJpiAXnPZ1pOn04d7DaJCa9SBFUQyc-c3-mK-3q6slX_KiFF_zLZEa_f7oAHlJlJpDgSTkGhWsLJMu0yN7zmRewh7RajjDEHeA9DYNH2Ib0kJUs95IUm-RYnxjSCjZOcM-B4TuMpVvgTYlrFAQ_IruokyKRkNAmeVlgq88sIPEmahBEi1k33ZjjaNMaVGv9hLs5z57gS2zWdRCMcupXLD7KFOwqYBrpGoAqg4-RyKCDtfnjToWcp2pLakhsKFRB_XI7PKX_fLq5yRJuqr6XKQl-xSRjLyY7DbwsSZLWxMmV4ax3YUD9ozK9XVGyhJpyepLQh-fWq1Iswp8oT7butO5znw00vYo_ZiQ3JTLplaRgwYPF_1TXQidd6HzoxKhD_ftC6OetjOa6LXpO0LP1CRCF_XQTg-9uGX2jJsxQ4NKF3n-rkbBvw3w_IN6OWmmoZvKTpMPk135znFve9TR_nTUfinm1uFFof55nP_feZw7SOyOnHsYHdH4DeXL69B7BcDpydSwTqZkkoHyhaZnFHnWljiNskpwvY0CM3MiJ-3RtJ0mdqpS9h1c8RAnlzq9I4odUewT-fEeHWx0LKqv_tBMlJioZnyhlvZsUy7OzbeFm2-LKLMH2ApBZLORsLF8Wji6VstidOoVn9Tsxwnala6cXoe5sriK_PauGNVSmKsAThI7t8EthZ_589Ft8Nx1n1eHZ2fo3QFeBITwEFcgK8KoeYlo7_Ulbu_7JU7MXwoFaUq9xOZyNFDSaPF5x3OcaNnAEJtwmy1OClIq89TUdrvmjBi91ftqTfjfQvSecfKKv-FkEt6MJ-PpLJrFt-E0vp0O8Q4ncRzdjEaTX6bhZDqehrPJ2xB_bwnM-ng6uh3P4nEYz8JoEg0xUGay-sO9BLXvQm__Ae0gtOI)


| | | pipeline and workstream outputs available |
|---|---|---|
| Sampled Wednesdays Each Month for Time-Series<br>[rt_dates.py](../_shared_utils/shared_utils/rt_dates.py) | Mar 2023 - present | downloaded schedule tables (trips, shapes, stops, stop_times)<br>downloaded vehicle positions (vp)<br><br>`gtfs_funnel`: intermediate outputs for schedule and vp<br>* crosswalk<br>* schedule only metrics related to service availability<br>* operator aggregated metrics from schedule data<br>* route typologies<br><br><br>`rt_segment_speeds`: vp interpreted as speeds against <br>various segment types<br>* segment types: <br>(1) `stop segments` (shape-stop segments,<br>most common shape selected for a route-direction and all <br>trips aggregated to that shape)<br>(2) `rt_stop_times` (trip-stop segments, most granular, <br>cannot be aggregated, but used for rt_stop_times table)<br>(3) `speedmap segments`<br>(4) `road segments` (1 km road segments with all <br>transit across operators aggregated to the same physical <br>road space, currently WIP)<br>* interpolated stop arrivals <br>* speeds by trip<br>* segment and summary speeds for single day<br><br>`rt_vs_schedule`: <br>* RT vs schedule metrics<br>* rt_stop_times table (companion to scheduled stop_times)<br><br>`gtfs_digest`:<br>* downstream data product using all the outputs created in <br>gtfs_funnel, rt_segment_speeds, rt_vs_schedule. |
| Full Week for Weekly Averages<br>April / October each year | Apr 2023<br>Oct 2023<br>Apr 2024 | rt_segment_speeds:<br>* segment and summary speeds for a week<br><br>gtfs_digest<br>* service hours by hour for weekday / Saturday / Sunday |
| | | |
16 changes: 10 additions & 6 deletions gtfs_funnel/update_vars.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
from shared_utils import catalog_utils, rt_dates

all_dates = (rt_dates.y2024_dates + rt_dates.y2023_dates +
rt_dates.oct2023_week + rt_dates.apr2023_week +
rt_dates.apr2024_week
)
apr2024_week = rt_dates.get_week("apr2024", exclude_wed=True)
oct2023_week = rt_dates.get_week("oct2023", exclude_wed=True)
apr2023_week = rt_dates.get_week("apr2023", exclude_wed=True)

all_dates = (
rt_dates.y2024_dates + rt_dates.y2023_dates +
oct2023_week + apr2023_week +
apr2024_week
)


apr_week = rt_dates.get_week("apr2024", exclude_wed=True)

analysis_date_list = [rt_dates.DATES["jul2024"]]

GTFS_DATA_DICT = catalog_utils.get_catalog("gtfs_analytics_data")
Expand Down
1 change: 1 addition & 0 deletions rt_scheduled_v_ran/logs/rt_v_scheduled_route_metrics.log
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,4 @@
2024-06-10 13:45:35.950 | INFO | __main__:route_metrics:74 - route aggregation 2024-05-26: 0:00:02.558000
2024-06-13 17:43:27.638 | INFO | __main__:route_metrics:74 - route aggregation 2024-05-22: 0:00:03.033399
2024-06-13 18:22:48.882 | INFO | __main__:route_metrics:74 - route aggregation 2024-06-12: 0:00:02.676141
2024-07-31 11:52:22.880 | INFO | __main__:route_metrics:74 - route aggregation 2024-07-17: 0:00:02.719825
3 changes: 3 additions & 0 deletions rt_scheduled_v_ran/logs/rt_v_scheduled_trip_metrics.log
Original file line number Diff line number Diff line change
Expand Up @@ -444,3 +444,6 @@
2024-06-13 17:16:24.597 | INFO | __main__:rt_schedule_trip_metrics:285 - spatial trip metrics 2024-05-22: 0:34:06.386767
2024-06-13 17:18:34.697 | INFO | __main__:rt_schedule_trip_metrics:333 - Total run time for metrics on 2024-05-22: 0:40:08.995180
2024-06-13 18:00:39.971 | INFO | __main__:rt_schedule_trip_metrics:280 - tabular trip metrics 2024-06-12: 0:02:50.176180
2024-07-31 11:27:35.394 | INFO | __main__:rt_schedule_trip_metrics:280 - tabular trip metrics 2024-07-17: 0:02:44.065009
2024-07-31 11:50:37.358 | INFO | __main__:rt_schedule_trip_metrics:285 - spatial trip metrics 2024-07-17: 0:23:01.963503
2024-07-31 11:51:59.676 | INFO | __main__:rt_schedule_trip_metrics:333 - Total run time for metrics on 2024-07-17: 0:27:08.346659
18 changes: 10 additions & 8 deletions rt_scheduled_v_ran/scripts/rt_stop_times.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ def prep_scheduled_stop_times(


def prep_rt_stop_times(
analysis_date: str
analysis_date: str,
trip_stop_cols: list
) -> pd.DataFrame:
"""
For RT stop arrivals, drop duplicates based on interpolated
Expand All @@ -55,12 +56,11 @@ def prep_rt_stop_times(

df = pd.read_parquet(
f"{SEGMENT_GCS}{STOP_ARRIVALS}_{analysis_date}.parquet",
columns = ["trip_instance_key", "stop_sequence", "stop_id",
"arrival_time"]
columns = trip_stop_cols + ["arrival_time"]
).rename(columns = {"arrival_time": "rt_arrival"})

df2 = df.sort_values(
["trip_instance_key", "stop_sequence"]
trip_stop_cols
).drop_duplicates(
subset=["trip_instance_key", "rt_arrival"]
).reset_index(drop=True)
Expand All @@ -73,19 +73,20 @@ def prep_rt_stop_times(


def assemble_scheduled_rt_stop_times(
analysis_date: str
analysis_date: str,
trip_stop_cols: list
) -> pd.DataFrame:
"""
Merge scheduled and rt stop times so we can compare
scheduled arrival (seconds) and RT arrival (seconds).
"""
sched_stop_times = prep_scheduled_stop_times(analysis_date)
rt_stop_times = prep_rt_stop_times(analysis_date)
rt_stop_times = prep_rt_stop_times(analysis_date, trip_stop_cols)

df = pd.merge(
sched_stop_times,
rt_stop_times,
on = ["trip_instance_key", "stop_sequence", "stop_id"],
on = trip_stop_cols,
how = "inner"
)

Expand All @@ -97,12 +98,13 @@ def assemble_scheduled_rt_stop_times(
from update_vars import analysis_date_list

EXPORT_FILE = GTFS_DATA_DICT.rt_vs_schedule_tables.schedule_rt_stop_times
trip_stop_cols = [*GTFS_DATA_DICT.rt_stop_times.trip_stop_cols]

for analysis_date in analysis_date_list:

start = datetime.datetime.now()

df = assemble_scheduled_rt_stop_times(analysis_date)
df = assemble_scheduled_rt_stop_times(analysis_date, trip_stop_cols)

df.to_parquet(f"{RT_SCHED_GCS}{EXPORT_FILE}_{analysis_date}.parquet")

Expand Down
4 changes: 2 additions & 2 deletions rt_scheduled_v_ran/scripts/update_vars.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
apr2024_week = rt_dates.get_week("apr2024", exclude_wed=True)

analysis_date_list = [
rt_dates.DATES["jun2024"]
]
rt_dates.DATES["jul2024"]]


GTFS_DATA_DICT = catalog_utils.get_catalog("gtfs_analytics_data")

Expand Down
Loading
Loading