This project evaluates ride-sharing algorithms on spatio-temporal data. The orginal dataset is obtained from NYC government website. The orginal dataset consisted of nearly 9 million trips in New York City, which is cut down to 157.3 thousand after cleaning the dataset considering various assumptions.
The ojective of this project is to merge trips that maximizes the total benefit (includes social preferences and distance saved).
Orginal Dataset: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
- Removed all trips that orginated out of JFK
- Removed trips with missing data values (latitude, longitude missing).
- Calculated trip duration for each trip in dataset.
- Calculated the delay as 20% of the trip duration
- Randomly assigned social preferences for each trip.
- Calculated the speed factor.
- Mapped the destination of each trip to nearest intersection.
- Python version 2
- MySQL
- networkx (
pip install networkx
) - MySQLdb (
pip install MySQL-python
)
- Install MySQL and all the prerequisites.
- Import the dataset trips into database using
queries.sql
file. - Set the username and password of the database in
dbconfig.py
. - Run
ride_share.py
using the commandpython ride_share.py
To Run using the precomputed intersection to intersection data
- Import final_int_int.csv into the database.
- Make following change in
ride_share.py
At line #246 change
check(conn, all_trips[a], all_trips[b], G, benefit_G, delay)
to
check(conn, all_trips[a], all_trips[b], G, benefit_G, delay, false)
- Run
ride_share.py
using the commandpython ride_share.py
Note:
Database name: rideshare
Default parameters:
Delay: 20%(Trip duration)
Pool size: 5 minutes