This repo contains two lesson tracks I made on key topics in unsupervised learning: Dimensionality Reduction and Clustering. Each track dives deep into the theory, mathematical foundations, and practical implementation using scikit-learn
, numpy
, pandas
and scipy
.
In this track, we explore various matrix factorization-based methods for reducing the dimensionality of data. The lessons include:
- Theory and Mathematics: Understand the mathematical concepts behind dimensionality reduction techniques based on matrix factorization and neighborhood graphs.
- Principal Component Analysis (PCA): Learn how
PCA
,Sparse PCA
,Kernel PCA
reduce dimensionality while retaining variance and how to apply it usingscikit-learn
. - Non-Negative Matrix Factorization (NMF): Dive into NMF, a powerful technique for extracting meaningful features from non-negative data.
- Practical Implementation: Hands-on tutorials using
numpy
,scipy
, andscikit-learn
to apply these techniques to real-world datasets.
This track focuses on the theory and implementation of various clustering algorithms. The lessons cover:
- Clustering Theory: An overview of clustering concepts, including different clustering paradigms and how to evaluate clustering performance.
- k-Means Clustering: Learn the k-Means algorithm, including its assumptions, limitations, and practical usage.
- Hierarchical Clustering: Explore agglomerative and divisive hierarchical clustering methods.
- DBSCAN: Understand the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm for discovering clusters in spatial data.
- Other Clustering Algorithms: A look at other clustering methods available in
scikit-learn
, such as Mean-Shift and Spectral Clustering. - Implementation in Python: Tutorials on implementing and applying these clustering algorithms using
scikit-learn
.