Skip to content

The Connected Vehicle Big Data Analytics Tool code is designed to predict traffic congestion in an 100 ft x 100 ft area every minute an hour ahead of time using Basic Safety Messages (BSM) or average speed data from INRIX.

License

Notifications You must be signed in to change notification settings

OSADP/Traffic-Congestion-State-Predictor-Tool

Repository files navigation

Traffic Congestion State Predictor Tool
----------------------------------------

The Connected Vehicle Big Data Analytics Tool code is designed to predict traffic congestion in an 100 ft x 100 ft area
every minute an hour ahead of time using Basic Safety Messages (BSM) or average speed data from INRIX.
Noblis researchers developed and applied the tool to BSM generated by the Southeast Michigan Safety Pilot Model Deployment,
which was provided by the USDOT, and to data from INRIX (Traffic and Driver Services Information Provided by INRIX © 2014.
All rights reserved by INRIX, Inc.) for the Safety Pilot Model Deployment network. This work was sponsored by USDOT.

Code included calculates the average speed from BSMs or INRIX in each 100 ft x 100 ft box
for each minute queried, converts .csv files and .graphmls into SQLite files for quick access,
uses graph networks on similar training days to predict traffic congestion every minute an hour ahead of time
for each box and automatically calculates the error between the predicted congestion
and the actual congestion observed by the BSM data.

Additional code is designed to convert traffic congestion predictions into INRIX-defined bottlenecks. 
Due to proprietary restrictions, the main part of the algorithm used to analyze the average speeds 
in each box and build graph networks out of the correlations is not included. Therefore the 
application as presented is not fully complete and ready to run out of the box. .

License information
-------------------
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this
file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under
the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the specific language governing
permissions and limitations under the License.

Configuration
-------------------------
The Connected Vehicle Big Data Analysis Tool software can run on most standard Window or Linux based computers
with at least two gigabits of RAM and at least 100 MB of drive space.
Performance of the software will be based on the computing power and available RAM in 
the system. The software is designed to run with larger datasets which can require 
much larger space requirements.

The Connected Vehicle Big Data Analysis Tool software application was developed using the open source
programming language Python 2.7 (www.python.org). The application requires Python 2.7
or higher to run. 

Usage Examples
-------------
The Connected Vehicle Big Data Analysis Tool code is made up of seven modules (six are Open Source and included in the package). 
The first module to execute is createGrid.py which averages speeds into 100 ft x 100 ft boxes to input
into the Noblis proprietary Lark algorithm and also output this data as counts .csvs. The Lark code (not included) 
then analyzes this average speed data and creates graph networks of the correlated data outputting as .graphmls.
createGrid.py and the Lark are run for 20 minute periods in a day, then MergeGraphmls.py is used to merge
three 20 minute .graphmls into one .graphml for the hour. Then graphs_to_sqlite.py and counts_to_sqlite.py are
used to convert the .graphmls and counts .csvs into SQLite databases. Finally the .graphmls and counts .csvs are
used as input to run prediction.py which outputs the predicted congestion index for each minute an hour ahead. 

prediction.py
This is the main module. It uses graph networks on similar training days and boxes to predict 
traffic congestion every minute an hour ahead of time for each box and automatically calculates 
the error between the predicted congestion and the actual congestion observed by the BSM data. 
There are three different prediction models that follow the same algorithm using different data 
as input: BSM, INRIX and combined BSM and INRIX. 
	Command line parameters:
		Output filename
		Tab separated Day Sets file
			Column 1: Date (2012-10-30)
			Column 2: Training, Validation or Testing
	In code:
		Output from graphs_to_sqlite used as graphs_db for BSM in prediction_BSM, as graphs_dbinrix
		 for prediction_INRIX and prediction_BSMandINRIX, and graphs_dbBSM for prediction_INRIX and prediction_BSMandINRIX
		In prediction_INRIX and prediction_BSMandINRIX the TMC reference file (see below)

createGrid.py
This is the other main module. It queries the BSM or INRIX data for a given date and time range, 
assigns each data row to a 100 ft x 100 ft box and direction based on the given latitude-longitude 
and heading for BSM or TMC for INRIX and calculates the average speed in each 100 ft x 100 ft box 
for each minute queried. There are two versions of this one that is designed for BSM data and one 
for INRIX average speed data.
	Parameters to pass to the file:
		Start date of data query (YYYY-DD-MM)
		Start time of data query (HH:MM)
		End date of data query
		End time of data query
		Time interval to group data by in seconds (recommended 60 seconds)
		Boxes file (see below)
		Output filename for the list of count.csvs created (data output is automatically written to 
		"counts_[start date]_[start time]_[end date]_[end time].csv")
		TMC reference file (see below)
	Code to connect to an SQL database containing the BSM or INRIX data is included but is commented 
	out starting on line 196, this should be edited to connect to the appropriate database

MergeGraphmls.py
In the Noblis driven experiment graphs were created for 20 minute periods and then merged three 20 
minute graphs were merged into one graph for each hour. This module is designed to merge graphs for 
every day-hour analyzed quickly and create a list of the merged graphml file locations. 
	Command line parameters:
		Output filename for the list of merged graphmls created
	In code:
		List of days to process on line 8
		Start time and end time on lines 10-11
		Lines 40-53 need to be pointed to the correct graphmls to be merged

graphs_to_sqlite.py
This module goes through the list of merged graphmls and converts their contents into an SQLite database 
file for easy access.
	Command line parameters:
		Name of sqlite file to output to
		Name of file with list of merged graphmls created by MergeGraphmls.py

counts_to_sqlite.py
This module goes through the list of counts .csvs created by createGrid and converts their contents into 
an SQLite database file for easy access.
	Command line parameters:
		Name of sqlite file to output to
		Name of file with list of counts.csvs created by createGrid.py

inrix_bottleneck_prediction.py
This is an additional module used to convert the predicted congestion index data created by prediction_BSM.py 
into predicted bottlenecks as defined by INRIX. INRIX defines a bottleneck state as the average speed on a TMC 
defined roadway segment being under 60 percent of its free-flow reference speed for five consecutive minutes. 
Adjacent TMCs with concurrent bottleneck states are merged to form a full bottleneck. A bottleneck state is ended
when all segments are above the 60 percent reference speed threshold for 10 consecutive minutes.
	Command line parameters:
		Output from prediction.py
		Tab separated file of adjacent TMCs for each TMC. The first column is the TMC being analyzed the following 
		columns are the TMC for one adjacent road segment. Example:
			108+10860	108+10861
			108+10861	108+10860
			108-10861	108-10860
			108-10860	108-10861	108-10859
		TMC reference file (see below)
		Boxes file (see below)

Boxes File
This is a file of each box defined by the grid over the geographic region. Boxes are associated with an INRIX TMC 
(roadway identifier) and are bounded by two latitude and longitudes. Boxes are have a special Box_ID identifier and 
a specific direction to indicate which way traffic moves NORTH/EAST (1) or SOUTH/WEST (2). An example line is below:

TMC,start_lat,start_lon,end_lat,end_lon,Box ID,direction
108+04160,42.2173404,-83.554277,42.2176173,-83.55393,1258421,2

TMC Reference File
The TMC Reference file comes directly from INRIX. It provides additional contextual information for each TMC identified
roadway segment including: road name, direction, state, county, zip, start and end latitude and longitude, miles and 
position of the segment along the entire road. An example line is below:

tmc,road,direction,intersection,state,county,zip,start_latitude,start_longitude,end_latitude,end_longitude,miles,road_order
108+10860,5 MILE RD,EASTBOUND,PONTIAC TRL,MI,WASHTENAW,48178,42.387783000000,-83.722892000000,42.387882000000,-83.644897000000,4.004612,1

Code.gov Info
----------------
Agency: DOT

Short Description: This software package contains code to predict travel times an hour ahead of time using connected vehicle messages and machine learning techniques.

Status: Beta

Tags: transporation, connected vehicles, data emulator, BSM

Labor Hours: 0

Contact Name: James O'Hara

Contact Phone: 703-610-1632

About

The Connected Vehicle Big Data Analytics Tool code is designed to predict traffic congestion in an 100 ft x 100 ft area every minute an hour ahead of time using Basic Safety Messages (BSM) or average speed data from INRIX.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages