This project is a data analysis for songs dataset pertaining a startup called Sparkify. The analytics particularly concern the songs in addition to the user activity on Sparkify new music streaming app. Sparkify aims to understand what songs users are listening to. To fulfill this purpose their data needs to be modeled in order to easily querying the data.
- Converting raw data into well structured data warehouse.
- Creating a database schema of Facts and Dimensions tables.
- Building the ETL pipelines.
- Python3
- Run create_tables.py.
- Run etl.py.
- Start tweaking etl.ipynd if you like.
- Test.ipynb > to check the update on the warehouse.
- sql_queries.py > contians basic (Drop tables, Create tables, Insert and Select) quieries.
- etl.py > etl pipelines.
- etl.ipynb > nontebook for the same goal as etl.py except it gives a closer look to the data and not considering the whole process.
- Create_tables.py > where connections to database and queries execution take place.