This is an exploratory analysis of the San Francisco public service cases over the past few years.
Please see Analysis.ipynb
for details.
The problem I am trying to investigate is predicting statuses of the case requests from their other characteristics (such as request categories, locations, etc.)
The analysis is divided into the following parts:
The analysis consists of primarily 4 parts:
- Importing & Editing Data
- Importing CSV and Overview of the DataFrame
- Analysis of the Null Values
- Choosing the Problem/Characteristics to Investigate
- Exploratory Data Analysis
- Overview of Statuses of Requests
- Statuses of Requests and Request Categories
- Statuses of Requests and Request Sources
- Statuses of Requests and Request Locations
- Determining and Implementing the ML Model
- Cleaning & Restructuring Data (Re-aggregated Categories & One-hot Encoding)
- Random Forest Classification - Request Categories & Sources (as Input)
- K-Nearest Neighbors Classification - Request Locations
- Stacking ML Models
- Further Exploratory Analysis
- Exploring & Transforming the Opened and Closed Time Columns
- Distribution of the Days Elapsed for All the Requests
- Days Elapsed v. Re-aggregated Categories
- Days Elapsed v. Request Sources