Skip to content

camhwilson/HealthyRide

Repository files navigation

Repository Overview

Almost every city has a bike ridesharing service. In New York it's Citibike, in Chicago it's Divvy, in Pittsburgh it's HealthyRide. One late night in the 'Burgh I was riding home after a night out with the fellas on one of these bikes and wondered "how many other people have made this journey at this hour?"

This question formed the premise of this project.

Which neighborhoods are people coming from, and where are they going to? Which hours of the days see spikes in arrivals? Which see spikes in departures? Which neighborhoods did people use these bikes the most? Which use them the least?

Key Files

1. resource_consolidation.py

All of the data this project runs off of can be found on Pittsburgh's public data store. It's free. The data is stored in silos by economic quarter, these silos are reffered to as resources by the data service.This data can be easily accessed by identifying the resource id's that correspond to the various economic quarters, and using the resouce id's to form the endpoint you insert into the request header.

This is what class Resource in resource_consolidation.py does. Once instantiated, you run a function called "create_json" that creates a list of all resource id's found at the base url, and iterates through one by one extracting data from them. Once you have iterated through all of the resources listed on the page, function returns a list of json objects. This can be easily translated into a Json file which can live on your computer when you're developing.

The data in this JSON format is as follows.

"

{'Trip id': '60100515',

'Bikeid': '70332',

'To station name': 'S 12th St & E Carson St',

'Usertype': 'Customer',

'Stoptime': '10/1/2018 0:17',

'From station name': 'First Ave & Smithfield St',

'Starttime': '10/1/2018 0:00',

'To station id': '1049',

'Tripduration': '1017',

'_id': 1,

'From station id': '1003'}

"

I created a custom data type based on this format that is hosted in trip.py, the next important file I'll walk you through.

2. trip.py

In order to preform the types of operations I wanted to on the data, I decided to create a custom data type (class), a "Trip". This data type is based on the json format the api stores the data in, but adds a few attributes onto the end. Here are a list of the addributes, as they would appear for the trip shown above.

Attributes assigned at time on initialization:

Trip.tripid = '60100515' --> A unique identifying string of the trip

Trip.bikeid = '70332' --> A unique identifying ID of the bike

Trip.toname = 'S 12th St & E Carson St' --> The name of the station the bike left from

Trip.usertype = 'Customer' --> The type of customer (Customer vs. User)

Trip.stoptime = '10/1/2018 0:17' --> The time, in datetime format, the trip ended

Trip.fromname = 'First Ave & Smithfield St' --> The name of the station the bike ended its trip at

Trip.starttime = '10/1/2018 0:00' --> The time, in datetime format, the trip began

Trip.toid = '1049' --> The unique identifying ID of the station the trip ended at

Trip.tripduration = '1017' --> The duration of the trip, in seconds

Trip._id = 1 --> The nth trip

Trip.fromid = '1003' --> The unique identifying ID of the station the trip began at

Attributes assigned post-initialization:

Trip.weekday = 0 --> Calculated using the start time of the trip, this attribute returns the day of week in integer form. 0 = Monday.

Trip.start_neighborhood = 'Downtown' --> The neighborhood the trip began in, calculated using the station dictionary defined on the same file. This dictionary clumps station ID's by neighborhoods I defined

Trip.end_neighborhood = 'Southside Flats' --> The neighborhood the trip ended in, calculated using the station dictionary defined on the same file. This dictionary clumps station ID's by neighborhoods I defined

As you can see, this datatype was built to expand. When I go to build on the project by analyzing it in a different way, I usually beign the project by adding a post-initialization attribute that I use to expand the project. I've found it is much easier to measure attributes as far up-stream as possible, and everything begins here.

If you want to contribute to this project and begin with exploratory analysis, the best way to do so is to creat a list of these trip objects. You can then sort and filter these lists using list comprehension and begin to draw conclusions.

3. analytics.ipynb

analytics.ipynb is where I feature all of the visualizations that support my project. So far, they are as follows.

  1. Weekday Analytics - plots that display patterns of arrivals and departures by neighborhood Downtown Arrivals and Departures
  2. Neighborhood Analytics - similar to Weekday Analytics but more complicated - plots that display patterns of arrivals and departures from and to neighborhoods Downtown Arrivals by Neighborhood Downtown Departures by Neighborhood You may notice that these visualizations correspond to folders in this repository. This is the way I have chosen to structure the repository. Folders within the main repository contain logic that is specific to a visualization. Files that exist outside these folders are there for a reason- they apply across visualizations.

If you want to contribute

Want to contribute? This is how I invision it will work.

  1. Read the above (lol). You will need a really good working understanding of how to create a copy of the data on your computer, and how I've structured "the trip" datatype.
  2. After you clone the project, you'll want to create your own branch.
  3. Configure filepaths inside "sys.insert()" arguments. I want this eventually to configure automatically, this is an area for improvement in the repo. Maybe I will revisit it someday.
  4. After the filepaths have been configured, everything on the repository should be in working order. Run api_data_to_json.ipynb to create a copy of the data locally on your computer. You will see this file pop up in your directory as "data.json".
  5. Once you have a working copy of the data on your computer, you are set to begin exploratory analysis. You should preform this in exploratory_analysis.py, which is already set to read the data.json file you just created.
  6. When you have an idea of a point you want to prove through visualization, create a folder. I've done the majority of my development in jupyter notebooks and translated those files into regular python files, but you may have a different workflow.
  7. Push the branch! I'll review changes and feature your visualiation in analytics.ipynb

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published