Skip to content

Stochastic1017/Stochastic1017

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 

Repository files navigation

Welcome to my GitHub page!

MS in Statistics | BS in Mathematics and Statistics

Typing Animation

GitHub Stats

Most Commit Language GitHub Streak
Activity Graph
Repos Per Commit Profile Details Summary

Project Portfolio

Here are some of my notable projects:

  • Clustering Spotify Podcasts with NLP-Driven Insights

    • Scraped $\approx$ 284,481 episode details from 818 podcasts using Selenium and Spotify API pipeline.
    • Preprocessed and tokenized podcast descriptions with NLTK, including lemmatization and stopword removal.
    • Developed metrics to quantify directional, overlap, diversity similarities, and engineered recommendation system.
    • Deployed a Dash app for podcast clustering and personalized recommendations.
  • Predicting Flight Delays and Cancellations: An Integrated Analysis of Airport Data and Weather Data

    • Automated scraping of 23 GB airport and 30 GB weather data using Selenium.
    • Utilized reverse geocoding, Haversine-based, and UTC-normalized alignments to join datasets.
    • Trained random forest models, achieving $\approx$ 25 min test RMSE for delays and $\approx$ 98% test accuracy for cancellations.
    • Developed scalable workflows on Google Cloud, and deployed interactive web-app using Dash.
  • Modeling and Forecasting Walmart Stock Prices: A Comparative Analysis of ARMA and GARCH Approaches

    • Leveraged ARIMA and GARCH models using tseries and fGarch in R to analyze Walmart stock price volatility.
    • Developed and validated ensemble models through residual diagnostics and forecast evaluation.
    • Achieved RMSE of $\approx$ $ 0.01 for log-returns and $\approx$ $ 1.17 for closing prices on unseen 10-day forecast and actual prices.
  • Statistical Modeling and Deployment of Body Fat Percentage Prediction System

    • Implemented anomaly detection and imputation strategy using prior body fat estimation model.
    • Constructed Stepwise regression model with Goodness of Fit and Holm-Bonferroni F-tests to control Type I errors.
    • Developed Multiple Linear Regression model (R-squared 0.6592 and RMSE 4.38) with residual diagnostics.
    • Deployed an interactive Dash app with comprehensive explanations, detailed visuals, and predictions.
  • Outlier Detection and De-noising for Audio-Based Neural Network Language Classification

    • Developed an ensemble outlier detector using Isolation Forest and Local Outlier Factor in Scikit-Learn.
    • Designed a speech detection and spectral gating algorithm with Librosa and NoiseReduce for noise suppression.
    • Processed 1320 parallel jobs for anomaly detection and de-noising using Linux Bash and HTCondor.
    • Trained a preliminary CNN on a random sample in TensorFlow, improving test accuracy by 2% and AUC by 4%.
  • Finding Lyman Break Galaxy cB58 Resemblances Using High-Throughput Computing

    • Identified noisy spectra matching Lyman Break Galaxy cB58 from Sloan Digital Sky Survey (SDSS) datasets.
    • Computed similarity between spectra using distance metrics implemented in R.
    • Processed 2459 parallel jobs over 281 GB data using Linux Bash and HTCondor.
  • Supervised machine learning model to predict gender based on first names

    • Performed feature pre-processing and one-hot alphabet encoding using NumPy and Pandas.
    • Implemented an ensemble gradient boosting model with fine-tuned hyper-parameters using Scikit-Learn.
    • Achieved approximately 80% in accuracy, precision, recall, and AUC metrics.

Technical Skills

Programming Languages

Developer Tools

Computing

Miscellaneous

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published