Skip to content

A Python ELT pipeline for extracting music metadata via the Spotify API and loading it to bigquery tables.

Notifications You must be signed in to change notification settings

Rishav273/spotify-python-bigquery-ELT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spotify-python-bigquery-ELT-project

An ELT project where music metadata is collected via the Spotify API and loaded into bigquery tables, visualized via Data Studio.

The Extract and Loading tasks were done using "extract_and_load_data.py" file, using bigquery client library and pandas_gbq package. Data was extracted using python's spotipy API client and loaded to a partitioned bigquery table. Logging procedures have also been applied in order to monitor every step of the pipeline.

All Transformation steps were done using SQL in BigQuery. The resultant data was stored in views and used to make dynamic dashboards via Google Data Studio. A configuration YAML file has been used to document and change parameters as necessary.

Some screenshots have been attached below:

image

image

Future improvements:

  • Collecting more metadata on genres of the tracks. Currently the spotify API doesn't expose any endpoint relating to track genre.
  • Trying out big-data resilient tools like Apache Beam as an alternative to pandas dataframes, for ingesting higer volumes of data per batch.

Docs Referenced:

About

A Python ELT pipeline for extracting music metadata via the Spotify API and loading it to bigquery tables.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages