While working with huge CSV files is certainly possible, for online visualization and exploration, it is helpful to have a JSON compatible data api. The preprocessed files generated by the DataPreprocessor
can be easily imported into a MySQL database and when used with the PHP API backend, online API queries are trivial.
To set up your own Divvy data API, you will need a web server with PHP and MySQL. Most modern servers, including shared hosting, offer this capability. To set up the data API follow these steps:
-
Generate the
Divvy_Stations_2013_Preprocessed.csv
andDivvy_Trips_2013_Preprocessed.csv
(~50MB) files using theDataPreprocessor
. -
On your server, create a MySQL database called
divvy_2013
and a MySQL user calleddivvy
. Thedivvy
user should have read access to thedivvy_2013
database (if this doesn't make sense, that's ok, there is an easier alternative below). -
Next create a table called
Divvy_Stations_2013
in thedivvy_2013
database with using the following structure:CREATE TABLE IF NOT EXISTS `Divvy_Stations_2013` ( `id` int(11) NOT NULL, `name` varchar(255) DEFAULT NULL, `latitude` float DEFAULT NULL, `longitude` float DEFAULT NULL, `capacity` int(11) DEFAULT NULL, `landmark_id` int(11) DEFAULT NULL, `online_date` datetime NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Import the Divvy_Stations_2013_Preprocessed.csv
file into this table.
-
Next create a tabled called
Divvy_Trips_2013
in thedivvy_2013
database with the following structure:CREATE TABLE IF NOT EXISTS `Divvy_Trips_2013` ( `trip_id` int(11) NOT NULL, `start_time` datetime NOT NULL, `stop_time` datetime NOT NULL, `bike_id` int(11) NOT NULL, `from_station_id` int(11) NOT NULL DEFAULT '-1', `to_station_id` int(11) NOT NULL DEFAULT '-1', `user_type` varchar(255) DEFAULT NULL, `gender` varchar(255) DEFAULT NULL, `birth_year` int(11) DEFAULT NULL, PRIMARY KEY (`trip_id`), KEY `from_station_id` (`from_station_id`), KEY `to_station_id` (`to_station_id`), KEY `user_type` (`user_type`), KEY `stop_time` (`stop_time`), KEY `start_time` (`start_time`), KEY `birth_year` (`birth_year`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Import the Divvy_Trips_2013_Preprocessed.csv
file into this table. _Note: The Divvy_Trips_2013_Preprocessed.csv
is quite large and it may not be possible to import the file via a simple interface like phpMyAdmin. Instead, consider uploading the CSV file to your server, logging in via SSH, changing the name of Divvy_Trips_2013_Preprocessed.csv
to Divvy_Trips_2013.csv
and running the following command:
mysqlimport --ignore-lines=1 \
--fields-terminated-by=, \
--columns='trip_id,start_time,stop_time,bike_id,from_station_id,to_station_id,user_type,gender,birth_year' \
--local -u root -p divvy_2013 Divvy_Trips_2013.csv
- Next, upload api.php and credentials_example.php to the location of your choice on your server.
- Rename
credentials_example.php
tocredentials.php
and enter your database location and password. - You are now ready to query the api using the parameters defined below.
The API is currently a single endpoint with a simple set of parameters.
Parameter | Description | Example Values |
---|---|---|
start_min |
Set the start time range minimum | 2013-06-01 or 2013-06-01 12:00:00 |
start_max |
Set the start time range maximum | 2013-06-01 or 2013-06-01 12:00:00 |
stop_min |
Set the stop time range minimum | 2013-06-01 or 2013-06-01 12:00:00 |
stop_min |
Set the stop time range minimum | 2013-06-01 or 2013-06-01 12:00:00 |
from_station_id |
Set the starting station id | See the stations list for valid ids |
to_station_id |
Set the ending station id | See the stations list for valid ids |
bike_id |
Set the bike id | See the trips data for valid ids |
trip_id 1 |
Set the trip id | See the trips data for valid ids |
trip_id_min 1 |
Set the minimum trip id | See the trips data for valid ids |
trip_id_max 1 |
Set the maximum trip id | See the trips data for valid ids |
user_type |
Set the user type | subscriber or customer |
gender |
Set the user gender | male or female |
birth_year 2 |
Set the user birth year | Any year >= 0 |
birth_year_min 2 |
Set the minimum birth year | Any year >= 0 |
birth_year_max 2 |
Set the maximum birth year | Any year >= 0 |
age 2 |
_Set the user age | Any age >= 0 |
age_min 2 |
Set the minimum age | Any age >= 0 |
age_max 2 |
Set the maximum age | Any age >= 0 |
page 3 |
The results page | Any page >= 0 |
rpp 3 |
Set the maximum age | 0 <= rpp <= 100 |
callback |
A JSONP callback | Any valid javascript method name. |
For convenience the openLab has established a public endpoint for testing. The base endpoint URL is:
http://data.olab.io/divvy/api.php
While we will do our best to support public projects, this endpoint may be rate limited without notice if the need arises and users are encouraged to follow the instructions above to establish their own endpoints.
All query parameter strings build from that endpoint. If you install your own API, your endpoint URL will be different.
Select the first page of 25 results for for trips between 2013-06-01 and 2013-07-01 for males over the age of 50:
To get results 26 - 50 from the same query:
To get all trips taken by 33 year old females:
Static stations are represented in the stations.json
data. The stations.json
data can be generated by the stations.php
script. The stations endpoint takes no parameters.
Create a table that includes duration:
CREATE VIEW Divvy_Trips_2013_With_Duration AS
SELECT *, (UNIX_TIMESTAMP(stop_time) - UNIX_TIMESTAMP(start_time)) AS duration
FROM `Divvy_Trips_2013`
Footnotes
-
If a
trip_id
parameter is passed along with atrip_id_min
and / ortrip_id_max
parameter, thetrip_id
parameter is ignored and the range style parameters are preferred. ↩ ↩2 ↩3 -
Both
age
andbirth_year
select on the samebirth_year
column of the database. Since it's easier to think in terms of age, when bothage
andbirth_year
parameters are included, allbirth_year
parameters will be ignored in favor of theage
parameter. Liketrip_id
, the corresponding range-based versions of theage
andbirth_year
parameters will be used. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 -
Since this is a massive data set, it is not advisable to let a user return huge quantities of data with a single query. Instead, the trip results are broken down into pages of results.
rpp
is set to 100 by default and is also the default maximum. Thepage
parameter determines which trip id to begin with. For instance, to return results starting with the 200th trip, one might passrpp=100
andpage=1
. ↩ ↩2