Tokyo Olympics - Data Engineering Project

A simple data engineering project for olympic data analysis on the azure stack.

Data Source

The dataset contains details of 11_000 athletes who had participated in the Tokyo Olympics. (Link attached to header)

It consists of the following files:

Azure Account with the following resources
- Azure Storage Account - Data layer
- Azure Data Factory - Orchestrator
- Azure Databricks - Spark Engine for data processing
- Azure Synapse - Data Analysis and Dashboards
Deploy the code in azure_data_factory/ to run ingestion pipelines.
Import the notebook in Azure Databricks and execute it to populate the processed data layer.
Use synapse to create views for the dashboard using SQL script in sql/

Create CI/CD pipelines to package source code as a wheel and deploy it on the databricks cluster.
Create pipelines to deploy azure resources using IaaC.
Optimize ADF pipelines by using parameterization, parallelism and create schedules to extract data at regular intervals.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
adb_notebooks		adb_notebooks
azure_data_factory		azure_data_factory
data		data
images		images
sql		sql
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md