Collection of data science projects
Recommending Twitch live streamers with an emphasis on the role of the most recent interaction
- Implemented
Factorizing Personalized Markov Chains
andPersonalized Ranking Metric Embedding
in TensorFlow. - Evaluated different models using Precision@K and Mean reciprocal rank and concluded the importance of recent interactions and geographical distance for recommending the next streamer to watch.
Classifying songs using Logistic Regression and Spotify API and updating a YouTube playlist using YouTube Data API
- Labeled 1300+ songs into 3 categories and collected audio feature data using Spotify API in python
- Applied feature engineering and
Principal Component Analysis
to create a dataset of 114 features - Achieved f1 weighted score of 0.68 using a
logistic regression
model - Created a workflow to add new songs in a
SQLite
database and update YouTube playlist automatically using Youtube API and SQLite database
Predicting Metacritic user ratings with data available before movies are released such as critic reviews, critic ratings, and genre.
- Websraping movie info, critic reviews, and user reviews from Metacritic using Python
BeautifulSoup
and storing inSQLite
database. - Extracting text sentiment features using
nltk.sentiment.SentimentIntensityAnalyzer
and keywords usingKeyBERT
. - Analyzing scraped data using
plotly
. - Predicting user rating with
lightgbm.LGBMRegressor
and hyperparameter tuning withOptuna
.
Exploratory data analysis project using Tableau for analyzing San Diego County Collision since 2016
- Retrieving geo coordinates using Google Geocoding API
- Storing data to
SQLite
database - Creating interactive visual with Tableau