Skip to content

Latest commit

 

History

History
37 lines (33 loc) · 1.18 KB

README.md

File metadata and controls

37 lines (33 loc) · 1.18 KB

PythonETL

Python ETL samples using docker

Environment

  1. homebrew: to manage macOs dependencies (like pip, mysql etc).
  2. http://brew.sh/
  3. pip: to manage python dependencies.
  4. http://docs.python-guide.org/en/latest/starting/install/osx/
  5. sudo easy_install pip
  6. docker to create containers
  7. https://docs.docker.com/docker-for-mac/
  8. install and start docker

Pre-requisite

  1. Install postgres using homebrew
  2. brew install postgres
  3. Install psycopg2 using
  4. sudo pip install psycopg2
  5. Start postgres container
  6. postgresql_container.sh
  7. Run individual python scripts
  8. python create_db.py
  9. python create_table.py
  10. python insert_data.py

Guidelines for working on this Repo

  1. Always work off a new branch.
  2. Create pull request against "dev" branch.

Tasks

  1. Create 2 sample data file with 10 records.
  2. These two files will be loaded in to the database.
  3. Create Python scripts to generate large amount of data (do not upload large data file to github).
  4. Create scripts to install and start mysql.
  5. Create scripts to create database and table in mysql.
  6. Create Python ETL script.
  7. Script to automatically run Python ETL script.