Skip to content

Latest commit

 

History

History
129 lines (95 loc) · 8.12 KB

File metadata and controls

129 lines (95 loc) · 8.12 KB

Divvy Data DIY API

While working with huge CSV files is certainly possible, for online visualization and exploration, it is helpful to have a JSON compatible data api. The preprocessed files generated by the DataPreprocessor can be easily imported into a MySQL database and when used with the PHP API backend, online API queries are trivial.

Installation

To set up your own Divvy data API, you will need a web server with PHP and MySQL. Most modern servers, including shared hosting, offer this capability. To set up the data API follow these steps:

  1. Generate the Divvy_Stations_2013_Preprocessed.csv and Divvy_Trips_2013_Preprocessed.csv (~50MB) files using the DataPreprocessor.

  2. On your server, create a MySQL database called divvy_2013 and a MySQL user called divvy. The divvy user should have read access to the divvy_2013 database (if this doesn't make sense, that's ok, there is an easier alternative below).

  3. Next create a table called Divvy_Stations_2013 in the divvy_2013 database with using the following structure:

     CREATE TABLE IF NOT EXISTS `Divvy_Stations_2013` (
       `id` int(11) NOT NULL,
       `name` varchar(255) DEFAULT NULL,
       `latitude` float DEFAULT NULL,
       `longitude` float DEFAULT NULL,
       `capacity` int(11) DEFAULT NULL,
       `landmark_id` int(11) DEFAULT NULL,
       `online_date` datetime NOT NULL,
       PRIMARY KEY (`id`)
     ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
    

Import the Divvy_Stations_2013_Preprocessed.csv file into this table.

  1. Next create a tabled called Divvy_Trips_2013 in the divvy_2013 database with the following structure:

     CREATE TABLE IF NOT EXISTS `Divvy_Trips_2013` (
       `trip_id` int(11) NOT NULL,
       `start_time` datetime NOT NULL,
       `stop_time` datetime NOT NULL,
       `bike_id` int(11) NOT NULL,
       `from_station_id` int(11) NOT NULL DEFAULT '-1',
       `to_station_id` int(11) NOT NULL DEFAULT '-1',
       `user_type` varchar(255) DEFAULT NULL,
       `gender` varchar(255) DEFAULT NULL,
       `birth_year` int(11) DEFAULT NULL,
       PRIMARY KEY (`trip_id`),
       KEY `from_station_id` (`from_station_id`),
       KEY `to_station_id` (`to_station_id`),
       KEY `user_type` (`user_type`),
       KEY `stop_time` (`stop_time`),
       KEY `start_time` (`start_time`),
       KEY `birth_year` (`birth_year`)
     ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
    

Import the Divvy_Trips_2013_Preprocessed.csv file into this table. _Note: The Divvy_Trips_2013_Preprocessed.csv is quite large and it may not be possible to import the file via a simple interface like phpMyAdmin. Instead, consider uploading the CSV file to your server, logging in via SSH, changing the name of Divvy_Trips_2013_Preprocessed.csv to Divvy_Trips_2013.csv and running the following command:

    mysqlimport  --ignore-lines=1 \
    --fields-terminated-by=, \
    --columns='trip_id,start_time,stop_time,bike_id,from_station_id,to_station_id,user_type,gender,birth_year' \
    --local -u root -p divvy_2013 Divvy_Trips_2013.csv
  1. Next, upload api.php and credentials_example.php to the location of your choice on your server.
  2. Rename credentials_example.php to credentials.php and enter your database location and password.
  3. You are now ready to query the api using the parameters defined below.

DIY API Queries

The API is currently a single endpoint with a simple set of parameters.

Parameter Description Example Values
start_min Set the start time range minimum 2013-06-01 or 2013-06-01 12:00:00
start_max Set the start time range maximum 2013-06-01 or 2013-06-01 12:00:00
stop_min Set the stop time range minimum 2013-06-01 or 2013-06-01 12:00:00
stop_min Set the stop time range minimum 2013-06-01 or 2013-06-01 12:00:00
from_station_id Set the starting station id See the stations list for valid ids
to_station_id Set the ending station id See the stations list for valid ids
bike_id Set the bike id See the trips data for valid ids
trip_id1 Set the trip id See the trips data for valid ids
trip_id_min1 Set the minimum trip id See the trips data for valid ids
trip_id_max1 Set the maximum trip id See the trips data for valid ids
user_type Set the user type subscriber or customer
gender Set the user gender male or female
birth_year2 Set the user birth year Any year >= 0
birth_year_min2 Set the minimum birth year Any year >= 0
birth_year_max2 Set the maximum birth year Any year >= 0
age2 _Set the user age Any age >= 0
age_min2 Set the minimum age Any age >= 0
age_max2 Set the maximum age Any age >= 0
page3 The results page Any page >= 0
rpp3 Set the maximum age 0 <= rpp <= 100
callback A JSONP callback Any valid javascript method name.

Examples

For convenience the openLab has established a public endpoint for testing. The base endpoint URL is:

http://data.olab.io/divvy/api.php

While we will do our best to support public projects, this endpoint may be rate limited without notice if the need arises and users are encouraged to follow the instructions above to establish their own endpoints.

All query parameter strings build from that endpoint. If you install your own API, your endpoint URL will be different.

Select the first page of 25 results for for trips between 2013-06-01 and 2013-07-01 for males over the age of 50:

To get results 26 - 50 from the same query:

To get all trips taken by 33 year old females:

Station Data

Static stations are represented in the stations.json data. The stations.json data can be generated by the stations.php script. The stations endpoint takes no parameters.

Footnotes

MySQL Tricks

Create a table that includes duration:

CREATE VIEW Divvy_Trips_2013_With_Duration AS
SELECT *, (UNIX_TIMESTAMP(stop_time) - UNIX_TIMESTAMP(start_time)) AS duration 
FROM `Divvy_Trips_2013`

Footnotes

  1. If a trip_id parameter is passed along with a trip_id_min and / or trip_id_max parameter, the trip_id parameter is ignored and the range style parameters are preferred. 2 3

  2. Both age and birth_year select on the same birth_year column of the database. Since it's easier to think in terms of age, when both age and birth_year parameters are included, all birth_year parameters will be ignored in favor of the age parameter. Like trip_id, the corresponding range-based versions of the age and birth_year parameters will be used. 2 3 4 5 6

  3. Since this is a massive data set, it is not advisable to let a user return huge quantities of data with a single query. Instead, the trip results are broken down into pages of results. rpp is set to 100 by default and is also the default maximum. The page parameter determines which trip id to begin with. For instance, to return results starting with the 200th trip, one might pass rpp=100 and page=1. 2