Spotify Data Analysis and Music Recommendation System 🎵📊

Gains comprehensive analysis of Spotify data using Machine-Learning, Python, SQL, and Tableau, along with a machine learning-based music recommendation system. It provides deep insights into Spotify's musical landscape, visualized through Tableau, analyzed with SQL, and enhanced with a music recommendation model.

Overview

This project provides detailed insights into Spotify data through interactive visualizations created using Tableau and analytical queries executed with SQL. By analyzing various metrics and trends, users can gain valuable insights into the musical landscape, track popularity, and streaming behavior.

Project Details

Utilized the Onyx Data DataDNA Dataset Challenge - Spotify Most Streamed Songs 2023 Dataset - October 2023 to meticulously analyze and visualize Spotify's musical landscape, providing a deep dive into the most streamed songs on the platform in 2023.
Employed a combination of descriptive statistics, exploratory data analysis techniques, and SQL queries to extract meaningful insights from the dataset, uncovering nuanced patterns and trends within Spotify's vast catalog of music.
Leveraged Tableau as the primary data visualization tool to create interactive dashboards that offer comprehensive insights into various aspects of Spotify's musical ecosystem, including track popularity, streaming behavior, and audio attributes.
Explored diverse dimensions of the dataset, such as artist presence, track inventory, stream metrics, and sonic landscape, to provide a holistic understanding of Spotify's music trends and user preferences.
Designed visually engaging dashboards with intuitive navigation features to present findings in a clear and accessible manner, enabling stakeholders to easily interpret complex data and make informed decisions.
Demonstrated proficiency in data analysis and visualization by creating compelling visualizations that not only showcase key metrics but also tell a coherent story about the evolving dynamics of the music industry.
Provided actionable insights into track popularity dynamics, streaming trends across different time periods and geographical regions, and the impact of audio attributes such as tempo and acoustic profiles on user engagement.
Integrated SQL queries to perform additional analysis, including data aggregation, filtering, and joining, augmenting the insights gained from Tableau visualizations with deeper data exploration and manipulation capabilities.
Developed a music recommendation system using machine learning techniques to suggest songs based on user input.
Contributed to advancing knowledge in the field of music analytics by applying rigorous analytical techniques to a real-world dataset, thereby enabling stakeholders in the music industry to gain valuable insights and make strategic decisions based on data-driven evidence.

Tableau Data Vizzes

Analytical Dashboard	Preview	Description
Holistic Insights		The Holistic Insights dashboard provides a comprehensive overview of musical trends and metrics, including KPIs such as artistic presence, track inventory, and stream metrics.
Sonic Overview		The Sonic Overview dashboard offers detailed analysis of the sonic landscape, including minor and major stream modality distribution, cross-platform song rankings, and scatter plot charts.
Audio Analytics Showcase		The Audio Analytics Showcase dashboard features bar charts illustrating key stream dominance, BPM elite selection, and acoustic profile selection, along with temporal trends of added tracks.
Stream Metrics		The Stream Metrics dashboard provides in-depth analysis of streaming metrics, including total streams, average BPM analysis, peak streamed tracks, leading streamed artists, and playlist curated track stream analysis.

SQL Analysis

Data Exploration

Conducted exploratory data analysis using SQL queries to gain deeper insights into the Spotify Dataset.
Analyzed track popularity dynamics, streaming trends, and audio attributes through SQL-based data manipulation and aggregation techniques.

Schema Diagram

The schema diagram provides a visual representation of the database structure, illustrating the relationships between different entities and attributes within the Spotify dataset.

ML and Python Analysis

Spotify Music Recommendation System

This project presents a Music Recommendation System built using the Spotify dataset. The system leverages advanced data analysis and machine-learning techniques to recommend songs based on user preferences.

Features

Data Extraction: Uses Spotify to fetch song data from the Spotify Web API.
Exploratory Data Analysis (EDA): Identifies key features and patterns in the Spotify dataset.
Feature Engineering: Selects relevant features to build an accurate recommendation model.
Recommendation System: Recommends songs based on user-input songs using cosine similarity.

EDA and Data Visualization

This project involves in-depth exploration and visualization of Spotify's dataset, using Python and statistical methods to derive insights.

Objective: Utilize Pandas for data manipulation, NumPy for computations, Matplotlib for detailed visualization, and Seaborn for aesthetic enhancements to uncover insights from Spotify's music catalog.
Key Analyses:
- Popularity Analysis: Identify top and least popular songs, examining user preferences.
- Correlation Studies: Use heatmaps to explore relationships between audio features like loudness, energy, and acousticness.
- Regression Analysis: Investigate correlations among specific attributes to understand their impact on song popularity.
Temporal Trends: Visualize song distribution since 1992, analyze changes in song duration over time, and track duration variations across different genres.
Genre Dynamics: Highlight top genres by popularity, offering insights into global music consumption trends and evolutionary patterns in genre preferences.

Key Insights

Top 10 Most Popular Songs	Top Genres by Popularity

Visualizes the top 10 most popular songs on Spotify based on their popularity score.	Highlights the most popular genres on Spotify, giving an overview of listener preferences based on genre popularity metrics.

Correlation Heatmap	Loudness vs. Energy

Illustrates the correlation between different audio features and song popularity, aiding in identifying relationships.	Examines how loudness and energy relate to each other in songs, highlighting their influence on musical characteristics.

Popularity vs. Acousticness	Total Songs Since 1992

Analyzes how a song's popularity correlates with its acousticness, providing insights into listener preferences.	Visualizes the growth in the number of songs added to Spotify since 1992, showing trends in music production and streaming.

Change in Duration	Duration by Genre

Shows how the average duration of songs has changed over time, reflecting shifts in music consumption and production trends.	Compares the duration of songs across different genres, offering insights into genre-specific trends and listener preferences.

Steps to Build the Recommendation System

Data Extraction:

Fetch detailed song information using Spotify's Web API. This involves accessing attributes such as track name, artist, duration, and audio features.

Example:

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=os.environ["SPOTIFY_CLIENT_ID"],
                                                           client_secret=os.environ["SPOTIFY_CLIENT_SECRET"]))

Feature Engineering:
- Select essential features for recommendation, including danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo, duration, and popularity.
- These features are crucial for understanding the musical characteristics that drive user preferences and recommendations.

Model Training and Recommendation:

Utilize machine learning techniques, such as cosine similarity, to compare user-selected songs with the dataset and recommend similar tracks.

Example:

user_selected_songs = [
    {'name': 'Come As You Are', 'year': 1991},
    {'name': 'Smells Like Teen Spirit', 'year': 1991},
    {'name': 'Lithium', 'year': 1992}
]

recommended_songs = recommend_songs(user_selected_songs, all_songs_df)
print("Recommended Songs:")
for song in recommended_songs:
    print(f"- {song[0]} ({song[1]})")

Model Evaluation:
- Assess the recommendation system's performance using metrics such as precision, recall, and F1-score.
- These metrics gauge how accurately the system predicts user preferences and similarity between songs.
Deployment:
- Deploy the recommendation system as a web application using frameworks like Flask or Django.
- This step involves integrating the recommendation model into a user-friendly interface accessible via web browsers or mobile apps.
- Example:
```
@app.route('/recommend', methods=['POST'])
def recommend():
    # Recommendation logic here
    return jsonify(recommended_songs)
```

Installation

To set up and run the Spotify data analysis and recommendation system on your local machine, follow these steps:

Clone the repository:

git clone https://github.com/yourusername/spotify-data-analysis-and-recommendation.git
cd spotify-data-analysis-and-recommendation

Install the required libraries:
```
pip install -r requirements.txt
```
Set up Spotify API credentials:
- Create an app on the Spotify Developer's page.
- Save your Client ID and Secret Key.
- Set the environment variables:
```
export SPOTIFY_CLIENT_ID='your_client_id'
export SPOTIFY_CLIENT_SECRET='your_client_secret'
```

Usage

To use the recommendation system, follow these steps:

Import necessary libraries:

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd
import numpy as np
from collections import defaultdict
from sklearn.metrics import euclidean_distances
from scipy.spatial.distance import cdist
import difflib
import os

Authenticate and initialize Spotipy:

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=os.environ["SPOTIFY_CLIENT_ID"],
                                                           client_secret=os.environ["SPOTIFY_CLIENT_SECRET"]))

Define functions to fetch song data and calculate recommendations (full code provided in the repository).

Get song recommendations:

recommended_songs = recommend_songs([
    {'name': 'Come As You Are', 'year': 1991},
    {'name': 'Smells Like Teen Spirit', 'year': 1991},
    {'name': 'Lithium', 'year': 1992},
    {'name': 'All Apologies', 'year': 1993},
])

Data Source

The dataset utilized in this project originates from the Onyx DataDNA Spotify Most Streamed Songs 2023 Challenge. This dataset offers invaluable insights into the most streamed songs on Spotify during 2023.
Key Dataset Information:
- The dataset comprises 21 information-rich columns meticulously cleaned and primed for analysis, encompassing track metadata, streaming metrics, and audio attributes. This careful curation ensures accuracy and reliability, facilitating robust analysis and informed decision-making processes.
- Featuring approximately 181K records, this dataset offers a comprehensive perspective of Spotify's musical landscape. It enables detailed examination of track popularity dynamics, streaming trends, and user preferences, empowering in-depth analysis for actionable insights.
Kaggle Spotify Datasets:
- Spotify Tracks and Artists
- Spotify Artists and Tracks Datasets

Analytical Approach

This project adopts a rigorous analytical approach combining descriptive and exploratory data analysis techniques, alongside SQL queries, to uncover intricate patterns and insights within the Spotify dataset. An array of visualization tools, including bar charts, scatter plots, and tables, are employed to present findings in a coherent and comprehensible manner.

Explore Further

For more visualizations and projects, visit my Tableau Public profile. Discover deeper insights into data analysis and visualization techniques through interactive dashboards and engaging storytelling.

Technologies Used

Data Visualization Libraries:

Machine Learning Libraries:

Other Tools:

Conclusion

In conclusion, the Spotify Data Insights project has provided a comprehensive exploration of Spotify's musical ecosystem through the combined use of Tableau for visualization and SQL for analysis. Throughout this project, we have:

Conducted detailed analysis and visualization of Spotify's most streamed songs in 2023, offering valuable insights into track popularity dynamics, streaming trends, and audio attributes.
Leveraged Tableau's interactive dashboards to present complex data in a clear and accessible manner, facilitating easy interpretation and decision-making for stakeholders in the music industry.
Employed SQL queries to perform additional data exploration and manipulation, enhancing the depth of analysis and uncovering nuanced patterns within the Spotify dataset.
Contributed to advancing knowledge in the field of music analytics by applying rigorous analytical techniques to a real-world dataset, enabling stakeholders to make data-driven decisions and strategic planning.
Demonstrated proficiency in data analysis, visualization, and storytelling, showcasing the evolving dynamics of the music industry and highlighting opportunities for innovation and growth.

Recommendations

As we move forward, here are some recommendations for further exploration and public awareness:

Continuously update and expand the dataset to capture evolving trends and dynamics within the music industry, enabling stakeholders to stay informed and adapt to changing consumer preferences.
Explore collaborative opportunities with other data-driven platforms and industries to leverage synergies and uncover new insights that drive innovation and growth.
Engage with the broader community through knowledge sharing and collaboration, fostering a culture of transparency and openness that encourages collective learning and development.
Advocate for data-driven decision-making and evidence-based strategies within the music industry, promoting a culture of innovation and experimentation that drives sustainable growth and success.

Connect with Me

Repository Navigation

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
assets		assets
data		data
docs		docs
notebooks		notebooks
sql		sql
tableau		tableau
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify Data Analysis and Music Recommendation System 🎵📊

Table of Contents

Overview

Project Details

Tableau Data Vizzes

SQL Analysis

Data Exploration

Schema Diagram

ML and Python Analysis

Spotify Music Recommendation System

Features

EDA and Data Visualization

Key Insights

Steps to Build the Recommendation System

Installation

Usage

Data Source

Analytical Approach

Explore Further

Technologies Used

Conclusion

Recommendations

Connect with Me

Repository Navigation

License

About

Releases

Packages

Languages

License

virajbhutada/spotify-track-analysis-and-recommendation

Folders and files

Latest commit

History

Repository files navigation

Spotify Data Analysis and Music Recommendation System 🎵📊

Table of Contents

Overview

Project Details

Tableau Data Vizzes

SQL Analysis

Data Exploration

Schema Diagram

ML and Python Analysis

Spotify Music Recommendation System

Features

EDA and Data Visualization

Key Insights

Steps to Build the Recommendation System

Installation

Usage

Data Source

Analytical Approach

Explore Further

Technologies Used

Conclusion

Recommendations

Connect with Me

Repository Navigation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages