Skip to content

kostas696/airflow-census-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Census Data Pipeline with Apache Airflow

Overview

This project demonstrates an ETL pipeline using Apache Airflow. The pipeline:

  1. Downloads census data.
  2. Transforms and validates it.
  3. Loads the processed data into a PostgreSQL database.

Project Structure

airflow-census-pipeline/
├── dags/                    # DAGs scripts
│   ├── census_pipeline.py   # Main DAG script
├── data/                    # Data files
│   ├── city_census.csv
│   └── filtered_census.csv
├── .gitignore               # Git ignore rules
├── airflow.cfg              # Airflow configuration
├── requirements.txt         # Python dependencies
├── README.md                # Project documentation

Prerequisites

  • OS: Ubuntu (WSL)
  • Python: 3.9+
  • Airflow: 2.x
  • PostgreSQL

Setup Instructions

  1. Clone the repository:
    git clone https://github.com/kostas696/airflow-census-pipeline.git
    cd airflow-census-pipeline
    
  2. Setup a virtual environment:
    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
    
  3. Initialize Airflow:
    export AIRFLOW_HOME=$(pwd)
    airflow db init
    airflow users create --username admin --password admin --firstname Admin --lastname User --role Admin --email admin@example.com
    
  4. Run the scheduler and webserver:
    airflow scheduler &
    airflow webserver &
    
  5. Access Airflow UI at http://localhost:8080 and trigger the DAG.

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages