Skip to content

andrewwkimm/RAWG-Data-Pipeline

Repository files navigation

RAWG-Data-Pipeline

A data pipeline I am working on made using RAWG's Database API to explore trends on video games through Streamlit.

Architecture Diagram

pipeline

The pipeline is orchestrated by Airflow in the following steps:

  1. The data is extracted and parsed from RAWG's API
  2. A connection to Google Storage is programmatically made to load the data into it
  3. Data modeling is done through dbt for use in Big Query
  4. Streamlit connects to BigQuery to and visualizes the data

Terraform is used to manage the Google Cloud Platform infrastructure while Docker will containerize the orchestration done by Airflow.

CircleCI is used for CI/CD pipeline, ensuring that new code will not break the build and reflect the changes to Streamlit.

An example of how the dashboard will look:

dashboard

Requirements

Poetry is used for dependency management and the required dependencies are outline in the pyproject.toml and poetry.lock files. Refer here for the installation guide.

Getting started

Start your poetry project in the repository and install the dependencies.

$ poetry init
$ poetry install

Set the environment variable for the API key in the terminal. You'll have to register in RAWG's site here. Do the same for the service account key after making an account with GCP.

$ export API_KEY="enter-your-key-here"
$ export SERVICE_KEY="gcp-service-key.json"

Build out the infrastructure for Google Cloud Platform using Terraform. See here for how to download based on your machine (I am using Debian).

$ terraform init
$ terraform apply

When you are done using the project, remember to destroy the infrastructure to avoid potential charges.

$ terraform destroy

Orchestrate the tasks for the pipeline via Airflow. Airflow also provides a web UI to interact with its services.

$ poetry run airflow webserver
$ poetry run airflow scheduler

Access the web app dashboard with Streamlit. It will source the data from the data models in BigQuery into your local setup.

$ poetry run streamlit run streamlit/app.py

About

A data pipeline made using RAWG's public API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published