We analyse the Stack Overflow data using multiple Data Science and Machine Learning techniques.
We use dimensionality reduction, more specifically the principal component analysis (PCA) to discover the skills or technologies explaining most of the variance in the Stack Overflow data i.e. the technology trends.
We take those technology trends and put them in Geographical context for Switzerland.
We employ Statistical Inference to build a new derived ratings
dataset providing user skill ratings. Finally we build a recommender system using collaborative filtering (CF) and the low-rank matrix factorization model-based approach (LRMF) to predict user skill ratings.
- The
create_dataset.r
script will automatically download, extract, parse and clean the Stack Overflow data files. It will also construct the new derivedratings
dataset. - The
skillability.r
script contains all the data science analysis code. A new LRMF model implementation is integrated with the popularcaret
machine learning library for calibration, training and prediction. - The
skillability.pdf
contains the final project report.