The objective of this project is to orchestarate a data pipeline using Airflow which runs in docker to acquire data from a subreddit - r/dataengineering using reddit's API, cleanse acquired data and finally load reporting level data to Amazon RDS MySQL table.
- Airflow: Workflow orchestration management platform
- AWS S3: Object storage service to store raw, cleansed and aggregated formats of data
- AWS RDS: Relational data service to store final aggregated - reporting layer data in a table
- AWS IAM: Identity and Access management service to create roles to access AWS S3