The pipeline collects data from the surfline API and exports a csv file to S3. Then the most recent file in S3 is downloaded to be ingested into the Postgres datawarehouse. A temp table is created and then the unique rows are inserted into the data tables. Airflow is used for orchestration and hosted locally with docker-compose and mysql. Postgres is also running locally in a docker container. The data dashboard is run locally with ploty.
Airflow Basics:
Airflow DAG: Coding your first DAG for Beginners
Running Airflow 2.0 with Docker in 5 mins
S3 Basics:
Setting Up Airflow Tasks To Connect Postgres And S3
How to Upload files to AWS S3 using Python and Boto3
Docker Basics:
Build your first pipeline DAG | Apache airflow for beginners
Run Airflow 2.0 via Docker | Minimal Setup | Apache airflow for beginners
Plotly: