This project was originally started for the course, IDS 702: Data Engineering. The purpose of this project is to analyze a dataset using pandas and a cloud hosted notebook in a python project template with a strong build system. This creates a base feedback loop every time I update my project.
Colab Notebook Link Access the python notebook in colab directly by clicking on the link above.
To use this repository to generate the analysis, start by cloning the repository. Make sure all the requirements in requirements.txt are fulfilled as these are necessary libraries to run the python script.
A. Run the script in Python using main.py, review the statistics and visualizations. B. Run in Google Colab: Navigate to Colab and login. Use the notebook link above and run the first cell to clone the repository and navigate to the project directory.
For more information around these insights, look at Summary.md.