GitHub - Abishek3896/Data-Modelling-with-PostgresQL: I built an ETL Pipeline to fetch data from JSON records into structured Database columns

Introduction:

The Purpose of this Project is to create a Postgres database with tables designed to optimize queries on song play analysis, for a startup called Sparkify who wants to analyze the data they've been collecting on songs and user activity on their new music streaming app.

Database Schema design and ETL process:

Database Design

The Database Schema is a Star Schema which consists of a songplay table which is the fact table and the artist table, user table , song table, time table are all dimension tables. The ETL process conssts of transferring data from json files in two local directories into these tables in Postgres.

Files in Repository:

data files
create tables.py (create tables and connects to the sparkify db)
sql_queries.py (queries for drop, create and insert into tables)
etl.py (pipeline in transferring data into the tables)
etl.ipynb (Test notebook of the procedure implemented in etl.py)
test.ipynb (test notebook to query the tables to check if the data is inserted)

Execution:

Step1: Open terminal(Linux) or command prompt (Windows) and go to the project directory
Step2: Run the sql_queries python script
Step3: Run the create tables python script to create the tables and connect to the sparkify database
Step4: Run the etl python script to build the pipeline by extracting, transforming and Loading data from the data files into the tables.
Step 5: Visualize the populated tables by running the test.ipynb file.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.workspace-config.json		.workspace-config.json
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
image.png		image.png
sql_queries.py		sql_queries.py
submit-f9dae3b3-f515-41b1-8566-df2f834e085e.zip		submit-f9dae3b3-f515-41b1-8566-df2f834e085e.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction:

Database Schema design and ETL process:

Files in Repository:

Execution:

About

Releases

Packages

Languages

Abishek3896/Data-Modelling-with-PostgresQL

Folders and files

Latest commit

History

Repository files navigation

Introduction:

Database Schema design and ETL process:

Files in Repository:

Execution:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages