This project demonstrates an ETL pipeline using Apache Airflow. The pipeline:
- Downloads census data.
- Transforms and validates it.
- Loads the processed data into a PostgreSQL database.
airflow-census-pipeline/
├── dags/ # DAGs scripts
│ ├── census_pipeline.py # Main DAG script
├── data/ # Data files
│ ├── city_census.csv
│ └── filtered_census.csv
├── .gitignore # Git ignore rules
├── airflow.cfg # Airflow configuration
├── requirements.txt # Python dependencies
├── README.md # Project documentation
- OS: Ubuntu (WSL)
- Python: 3.9+
- Airflow: 2.x
- PostgreSQL
- Clone the repository:
git clone https://github.com/kostas696/airflow-census-pipeline.git cd airflow-census-pipeline
- Setup a virtual environment:
python3 -m venv venv source venv/bin/activate pip install -r requirements.txt
- Initialize Airflow:
export AIRFLOW_HOME=$(pwd) airflow db init airflow users create --username admin --password admin --firstname Admin --lastname User --role Admin --email admin@example.com
- Run the scheduler and webserver:
airflow scheduler & airflow webserver &
- Access Airflow UI at http://localhost:8080 and trigger the DAG.
This project is licensed under the MIT License.