spotify-python-bigquery-ELT-project

An ELT project where music metadata is collected via the Spotify API and loaded into bigquery tables, visualized via Data Studio.

The Extract and Loading tasks were done using "extract_and_load_data.py" file, using bigquery client library and pandas_gbq package. Data was extracted using python's spotipy API client and loaded to a partitioned bigquery table. Logging procedures have also been applied in order to monitor every step of the pipeline.

All Transformation steps were done using SQL in BigQuery. The resultant data was stored in views and used to make dynamic dashboards via Google Data Studio. A configuration YAML file has been used to document and change parameters as necessary.

Some screenshots have been attached below:

Future improvements:

Collecting more metadata on genres of the tracks. Currently the spotify API doesn't expose any endpoint relating to track genre.
Trying out big-data resilient tools like Apache Beam as an alternative to pandas dataframes, for ingesting higer volumes of data per batch.

Docs Referenced:

Spotipy Python docs - https://spotipy.readthedocs.io/en/master/
pandas_gbq - https://pandas-gbq.readthedocs.io/en/latest/
Google Bigquery docs - https://cloud.google.com/bigquery/docs/reference/libraries-overview
Data Studio - https://developers.google.com/datastudio

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
README.md		README.md
application_logs.log		application_logs.log
extract_and_load_data.py		extract_and_load_data.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spotify-python-bigquery-ELT-project

An ELT project where music metadata is collected via the Spotify API and loaded into bigquery tables, visualized via Data Studio.

About

Releases

Packages

Languages

Rishav273/spotify-python-bigquery-ELT

Folders and files

Latest commit

History

Repository files navigation

spotify-python-bigquery-ELT-project

An ELT project where music metadata is collected via the Spotify API and loaded into bigquery tables, visualized via Data Studio.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages