This tool is a Python wrapper around Open Trip Planner (OTP) v2 which produces cost metrics for public transport routing between given positions in Great Britain.
This is an updated version of OTP4GB-py produced by Open Innovations (available at github.com/open-innovations/otp4gb-py), which itself is a port of the OTP4GB tool written by Tom Forth at Open Innovations.
This flowchart details the full OTP4GB-Py routing and cost calculation process. The process is split into two steps prepare and routing which correspond to the Prepare and Process scripts, which are discussed in the respective sections below.
In order to use the OTP4GB-Py tool, please clone, or download, this repository to your local machine. The OTP4GB-Py process requires both Java and Python to be installed in order to run, this section outlines the installation of both pieces of software and some additional dependancies.
Java Standard Edition Development Kit (JDK) is required for running the OTP server. JDK installers are available for Windows, Mac and Linux1 from Oracle's website.
Once installed you can check the installation has completed correctly and you have the correct
version using the command java --version
. If this command fails make sure JDK has been added to
the PATH environment variable.
This section outlines the installation of Python and the installation of the dependancies required for OTP4GB-Py. For this tool the recommended method for installing Python and it's dependancies would be to use Miniconda from the Anaconda Prompt, although Anconda Navigator or Python's built-in pip could be used. Miniconda installers are avaiable from the conda website, the 64-bit version should be used.
Once installed, run "Anaconda Prompt (Miniconda3)" to open a command prompt with access to the conda commands then run the following commands.
- Move to the OTP4GB-Py directory:
cd "path\to\otp4gb-py"
- Create a new conda environment and install the dependancies (this may take a few minutes to
complete):
conda create -n otp4gb-py --file requirements.txt -y -c conda-forge
- Activate the environment:
conda activate otp4gb-py
Note: All future commands discussed should be ran from the "Anaconda Prompt" after moving to the OTP4GB-py directory and activating the "otp4gb-py" environment (see image below for what this should look like).
Setting up some additional requirements for running OTP4GB-Py can be done with the setup.py
Python script, which is provided with the tool. Running setup is done with the command
python setup.py
2, which will perform the following steps (some of which require user input).
- Firstly, the script will open a web browser and ask the user to download the Open Street Map (OSM) converter (osmconvert64.exe) and place it in the "bin" directory. The existance of this file in the correct location will be checked before the script moves on.
- Next the script automatically downloads and installs Maven into the "bin" folder.
- Finally, it will use maven to install GTFS filter and OTP.
After the setup script has finished the bin folder should contain the following files and folders:
- apache-maven-3.8.6: folder containing the maven installation
- lib: java dependancies
- gtfs-filter-0.1.jar: GTFS filter Java executable file
- otp-2.3.0-shaded.jar: OTP server Java executable file
- osmconvert64: OSM converter executable file
The final requirement before running the tool is to fill the "assets" directory with all required input files. This folder should contain the following:
- GTFS zip files for all dates, times, modes and areas.
- Centroids CSV: CSV file containing all the zone centroids for the routing, with the
following columns:
- zone_id
- zone_name
- zone_system
- latitude
- longitude
- OSM data file for area of interest, information on downloading this file is available on the OSM Wiki.
This section outlines the scripts involved with running the process and outputting the cost metrics and generalised cost matrix. Before running the process a new folder should be created for storing the intemediary files and the outputs, this should also contain the OTP4GB-Py configuration file.
The config file should be named "config.yml" and uses the YAML format, an example config is provided and the parameters for the it are outlined in table 1.
Table 1: Details the parameters available in the config file and their default values.
Parameter | Sub-Parameter | Type | Default | Description |
---|---|---|---|---|
date | - | Date (YYYY-MM-DD) | N/A | Travel date on which the routing should be calculated for. |
end_date | - | Date (YYYY-MM-DD) | same value as 'date' parameter | Optional End date for construction of the OTP configuration- allows one OTP config for multiple travel dates |
extents | min_lat | Real | N/A | Latitude and longitude bounds for the area of interest. |
^ | min_lon | Real | N/A | ^ |
^ | max_lat | Real | N/A | ^ |
^ | max_lon | Real | N/A | ^ |
osm_file | - | File name | N/A | Name of the OSM data file, expected to be found in the "assets" folder. |
gtfs_files | - | List of file names | N/A | List of all the GTFS zip files, should be in the "assets" folder. |
time_periods | name | Text | N/A | List of time periods with sub-parameter name used for naming the output folder. |
^ | travel_time | Time (HH:MM) | N/A | Time routes must arrive at the destination by. |
^ | search_window_minutes | Integer | Calculated based on route availability | Maximum travel duration. |
^ | search_window_minutes_min | Integer | Calculated based on route availability | Optional Minimum travel duration. Default = IsochroneConfiguration.step_minutes value |
modes | - | List of lists of modes (TRANSIT, BUS, RAIL, TRAM, WALK, BICYCLE) | N/A | List of the multiple modes to be considered for routing, each item is list of modes for a single matrix output. |
centroids | - | File name | N/A | Name of the zone centroids file, should be found in the "assets" folder. |
destination_centroids | - | File name | N/A | Optional name of the centroids file for the zone destinations (if they differ). |
generalised_cost_factors | wait_time | Real | 1.0 | Factors for applying to the different metrics used when calculating the generalised cost. |
^ | transfer_number | Real | 1.0 | ^ |
^ | walk_time | Real | 1.0 | ^ |
^ | transit_time | Real | 1.0 | ^ |
^ | walk_distance | Real | 1.0 | ^ |
^ | transit_distance | Real | 1.0 | ^ |
iterinary_aggregation_method | - | MEAN or MEDIAN | MEAN | Method for averaging multiple route itineraries into a single cost metric for each OD pair. |
max_walk_distance | - | Integer | 2500 | Maximum walking distance allowed on a single route. |
number_of_threads | - | 0 - 10 | 0 | Number of threads to use when running the Python process, doesn't apply to the OTP Java server. |
no_server | - | Boolean | False | Turns off starting up a new OTP Java server, if this is already running. |
hostname | - | Text | localhost | Hostname of the OTP server to connect to (or start). |
port | - | Integer | 8080 | Port number of the OTP server to connect to (or start). Secure port is set to this value plus one. |
crowfly_max_distance | - | Integer | None | Optional maximum distance which filters what OD pairs will have routing calculations done. |
ruc_lookup | path | File path | None | Optional zone rural urban classification lookup, used for adjusting crowfly_max_distance for specific zones. |
^ | id_column | Text | 'zone_id' | Name of column containing zone IDs in ruc_lookup . |
^ | ruc_column | Text | 'ruc' | Name of column containing rural-urban classification codes in ruc_lookup . |
irrelevant_destinations | path | File path | None | Optional CSV containing list of zones which will exclude any OD pairs if the zone is the destination. |
^ | zone_column | Text | 'zone_id' | Name of column containing zone IDs in irrelevant_destinations . |
This section discusses the prepare Python script, before this make sure the dependencies are installed (see Preparation and Dependencies) and the output folder with a config file has been created (see Running OTP4GB-Py). This script will prepare the output folder by filtering the input files and creating the OTP graph file.
Run the prepare script by calling it with the path to the output folder, with some optional arguments2:
python prepare.py [-F] <directory>
Optional command line arguments:
-F, --force
: Force overwrite of existing prepared files
The prepare script performs the following steps (corresponds to the "OTP4GB-Py Prepare" box in the Overview flowchart):
- Filters the GTFS files based on date and extents given in the config file, outputs the filtered GTFS files to a new filtered folder in the output folder.
- Filters the OSM file to the extents given and outputs to the filterd folder.
- Infills the date and copies the build config to the filtered folder.
- Copies the router config to the filtered folder.
- Runs OTP to build the graph file in the filtered folder.
Once complete the output folder should contain a new sub-folder graphs\filtered
, with all the
prepared files, ready to run the process script.
Once the output folder has been prepared then running the full OTP generalised cost calculation process is simply running the process Python script.
Running the process script uses the same config file and output folder as was used for prepare. Run the process script by providing a path to the output directory, command usage shown below:
usage: process.py [-h] [-p] folder
positional arguments:
folder folder containing config file and inputs
options:
-h, --help show this help message and exit
-p, --save_parameters
save build parameters to JSON lines files and exit (default: False)
The main process performs the following calculations (outlined in the "OTP4GB-Py Routing" box in the Overview flowchart):
- Caculates a list of all the origin-destination pairs of zones which are within the extents.
- Starts the OTP server with the inputs and graph built previously.
- Iterates through all the time periods and then modes performing the following calculations for
each:
- Sends a request for a route between two zones to the OTP server.
- Saves the server response to a file.
- Calculate the generalised cost for each itinerary returned by OTP.
- Average the various itineraries to a single set of cost metrics per zone pair.
- Save the cost metrics to a CSV file and the generalised costs to a square CSV matrix.
The outputs from the process are discussed in the Outputs section below. This process can take a long time to run and use a lot of CPU and RAM, see table 2 for some estimates time scales and PC requirements for various areas.
Table 2: Recommended PC requirements and rough timescales for different size runs. All estimates are approximate and two areas of similar size could have vastly different runtimes depending on the detail of the road / public transit network in the area.
Area Size | Number of Zones | Recommended Processors | Recommended RAM | Time Estimates |
---|---|---|---|---|
LAD | < 100 | 8 | 32GB | < half a day |
LAD | 100 - 500 | 8 | 32GB | < half a day |
A single English Region | < 100 | 8 | 32GB | half a day |
A single English Region | 100 - 500 | 8 | 32GB | 1 day |
North England | < 500 | 16 | 64GB | 1 day |
Great Britain | < 500 | 16 | 64GB | 1 - 2 days |
Note: Generally speaking adding zones increases the required time by the square of the number of zones added, but has minimal impact on RAM usage. Increasing the area size impacts the RAM usage and the timescales.
Another optional script is server.py, which allows running the OTP server for an existing prepared output folder without performing the processing calculations. The server is ran on the local machine by default and can be connected to through a web browser at http://localhost:8080/.
Running the server script is done with the following command2:
python server.py <directory>
This section outlines the outputs produced by the OTP cost calculation process (see Process section for details on running). All three outputs discussed are created for every combination of time period and modes provided in the config file.
This cost matrix contains the mean (or median) generalised cost calculated from each of the itineraries provided by the OTP server. This output is saved as a CSV in the square format at whatever zone system was provided in the input config file, the first row and column contain the zone IDs from the zone centroids input.
A second CSV output is the cost metrics CSV, this contains a variety of cost
The first six columns contain the name, ID and zone system for the origin and destination zones. The seventh column contains the total number of itineraries OTP found between the two zones. The remaining columns are various statistics about the following metrics:
- duration: total duration of the route, in seconds.
- walkTime: total time spent walking on the route, in seconds.
- transitTime: total time spent in transit, in seconds.
- waitingTime: total waiting time, in seconds.
- walkDistance: total walk distance on the route, in metres.
- otp_generalised_cost: the generalised cost calculated by OTP to determine how to rank the itineraries found.
- transfers: total number of transfers in the route.
- generalised_cost: generalised cost calculated for each itinerary using the parameters from the input config file.
- startTime: the departure time of the route.
- endTime: the arrival time of the route.
This file contains all the data provide in OTP's routing response, it is saved as a JSON lines file i.e. each line is a single response for a pair of zones in JSON format. This file contains all the individual itineraries and their respective statistics that OTP calculated.
The file can be opened in any basic text editor, although it is recommended to use Python if any
further analysis on the file is required. The CostResults
class provided in the otp4gb\cost.py
module can be used to parse the individual lines from the file, to perform any bespoke analysis.
Additional a cost_matrix_from_responses
function is provided (in the same cost module) to
use the responses file to recalculate the generslised cost matrix with different parameters and
recreate the CSV outputs.
This sections outlines some stand-alone post-processing scripts to perform additional analysis on the outputs from OTP4GB-Py.
The accessibility script uses the zone to zone average durations to calculate the amount of various land use metrics which can be reached from a zone in a given time limit. The inputs for this script are provided in a config file and outlined in table 3, (see accessibility config for an example).
Table 3: Details of the config file parameters for the accessibility script.
Parameter | Type | Default | Description |
---|---|---|---|
cost_metrics | list of file paths | - | List of cost metrics CSVs (from OTP4GB-Py process). |
output_folder | path | - | Path to folder to save outputs to. |
accessibility_time_minutes | Integer | - | Maximum time (minutes) for OD pairs to include in the cost metrics. |
landuse_data | list of file paths | - | List of land use files which should contain a column named "id" and any number of other columns containing data. |
aggregation_method | mean or median | mean | Type of average duration to use from the cost metrics files. |
Note: Accessibility calculations will be done for the complete combination of all the land use files and all the cost metrics files.
This process outputs a single new CSV file for each combination of cost metrics and land use given as an input. The output file takes the origin zone information from the cost metrics files and contains all the columns provided in the land use input data.
The processIsochrone script utilises the same basic process, but performs an accessibility analysis generating travel time isochrones, and optionally generates a travel time matrix.
For every location in 'destination_centroids' a series of travel time isochrones is generated, at an adjustable level of time granularity.
The generated isochrones are saved as geoJson files, which are zipped up in batches to reduce number of files and storage requirements. The parameters used to generate the isochrone are stored as properties within the geoJson file.
If there are any locations within 'centroids' file, then these locations are tested to determine which travel time band they fall within to create a travel time matrix.
If the 'centroids' file contains a 'poly_wkt' column, the centroid is treated as an area (with the geometry described in well-known-text format) rather than a simple point. The fraction of the centroid area that falls within the isochrone is calculated and output as the 'OriginCoverFraction' in the resulting matrix.
If the 'destination_centroids' file does not exist then the 'centroids' locations are also used as the journey destinations.
This uses a number of additional configuration options (within the same config file), and interprets some config values in a different way.
Parameter | Sub-Parameter | Type | Default | Description |
---|---|---|---|---|
step_minutes | - | integer (minutes) | 15 | Optional A series of isochrones is generated up to the maximum travel time, this specifies the interval between bands. |
buffer_metres | - | integer (metres) | 100 | Optional To fix invalid geometries the isochrone can be buffered. Size of buffer to apply. Expensive function to call. |
zone_column | - | string | "zone_id" | Optional Column which specifies the zone name / number. Must be unique or values may be over-written. |
union_all_times | - | boolean | False | Optional Where multiple arrival times are specified - do we union all the generated isochrones together |
arrive_by | - | boolean | True | Optional If the isochrone requested has the time specified as 'arrive by'(True) or 'depart by'(False) |
compress_output_files | - | boolean | True | Optional specifies if output files are zipped up (in batches) |
fanout_directory | - | boolean | False | Optional fans out files into multiple directories. fanout_directory=true is incompatible with compress_output_files=true |
'union_all_times' is provided to allow softening of the effect of the frequency of public transport services. A series of arrival times can be specified, with the isochrone created being the union of all the accessible areas to illustrate the maximum accessible area. This assumes that passengers can to a degree flex their arrival time and / or passenger perception is not completely reliant on a hard arrival time, but the duration of their journey from door-to-door.
If the 'buffer_metres' value is non-zero then the geometry is first simplified, then buffered. If a negative 'buffer_metres' value is specified, the geometry is just simplified using a 'tolerance' value of the number supplied.
A series of travel time isochrones are generated, from 'search_window_minutes_min' up to 'search_window_minutes', at intervals of 'step_minutes'.
If 'search_window_minutes_min' is not specified then the value of 'step_minutes' is used as the minimum travel time.