This project focuses on collecting and integrating bike rental data with weather data from an API. The solution is built using Docker, Dagster, PostgreSQL, and Python, following Kimball’s star schema and Slowly Changing Dimensions (SCD). The pipeline is designed to run hourly, generating over 1 million data points daily.
- Docker: Containerizes the entire application, with services running independently.
- Dagster: Automates and orchestrates the pipeline to run hourly.
- PostgreSQL: Stores both historical rental data and real-time weather data.
- Python: Handles data extraction, transformation, and loading (ETL).
- Extract: Gather latitude and longitude for cities worldwide, then fetch weather data using an API.
- Transform: Create and automate SQL
INSERT
statements for PostgreSQL. - Load: Insert both historical and real-time data into PostgreSQL.
- Orchestrate: Run all services in Docker containers, managed by Dagster.
- Docker and Python 3.8+ should be installed on your machine.
- Clone the repository:
git clone https://github.com/extrm-gn/DE-Bike-rental.git cd DE-Bike-rental docker-compose up --build