Skip to content

dugar-tarun/olympics-data-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tokyo Olympics - Data Engineering Project

A simple data engineering project for olympic data analysis on the azure stack.

The dataset contains details of 11_000 athletes who had participated in the Tokyo Olympics. (Link attached to header)

It consists of the following files:

  • Athletes
  • Coaches
  • EntriesGender
  • Medals
  • Teams

Architecture

High Level Architecture

Prerequisites

  • Azure Account with the following resources

    • Azure Storage Account - Data layer
    • Azure Data Factory - Orchestrator
    • Azure Databricks - Spark Engine for data processing
    • Azure Synapse - Data Analysis and Dashboards
  • Deploy the code in azure_data_factory/ to run ingestion pipelines.

  • Import the notebook in Azure Databricks and execute it to populate the processed data layer.

  • Use synapse to create views for the dashboard using SQL script in sql/

Next Steps

  • Create CI/CD pipelines to package source code as a wheel and deploy it on the databricks cluster.
  • Create pipelines to deploy azure resources using IaaC.
  • Optimize ADF pipelines by using parameterization, parallelism and create schedules to extract data at regular intervals.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published