ETL Pipeline with Python, Pandas and PostgreSQL

This is a project that I made to teach myself the concept of ETL Pipeline, based on this page by Vivek Chaudhary.

ETL stands for Extract, Transform and Load which is a set of processes to extract the data from one or more input sources, transform or clean the data so that it will be in the appropriate format and finally loading the data into an output destination such as a database, data mart, or a data warehouse.

In this project, the data is exctracted from a database (PostgreSQL), cleaned using Pandas methods and loaded to a database.

Deploying

Download and run the PostgreSQL installer: https://www.postgresql.org/, or get it through the package manager of your distribution if you're using Linux.
Leave the default port 5432, and other default values during the installation.
You will be asked to provide a password for the superuser (postgres), remember this password because it will be used later.
Make sure to have Python installed ().
Install the packages necessary via the command: pip install sqlalchemy pandas psycopg2.
Python 3.9 and Numpy 1.19.4 has trouble running this code, so use Python 3.8 and Numpy 1.19.3: pip install numpy==1.19.3
Run the main.py script.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
create_db.py		create_db.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ETL Pipeline with Python, Pandas and PostgreSQL

Deploying

About

Uh oh!

Releases

Packages

Languages

pauloeps/etl-pipeline-python-postgresql

Folders and files

Latest commit

History

Repository files navigation

ETL Pipeline with Python, Pandas and PostgreSQL

Deploying

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages