forked from DragonflyStats/Coursera-ML
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ML-Clustering.tex
51 lines (32 loc) · 2.09 KB
/
ML-Clustering.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
\documentclass[12pt]{article}
%opening
\title{Coursera - Machine Learning}
\author{www.Stats-Lab.com}
\begin{document}
\maketitle
\section{K-means Clustering}
\begin{itemize}
\item In this this exercise, you will implement the K-means algorithm and use it for image compression.
\item You will first start on an example 2D dataset that will help you gain an intuition of
how the K-means algorithm works. After that, you wil use the K-means algorithm for image
compression by reducing the number of colors that occur in an image to only those that are most common in that image.
\item You will be using ex7.m for this part of the exercise.
\end{itemize}
\subsection{Implementing K-means}
\begin{itemize}
\item The K-means algorithm is a method to automatically cluster similar data examples together.
\item The intuition behind K-means is an iterative procedure that starts by guessing the initial centroids, and then refines this guess by repeatedly assigning examples to their closest centroids and then recomputing the centroids based on the assignments.
\item The inner-loop of the algorithm repeatedly carries out two steps:
\begin{itemize}
\item[(i)] Assigning each training example to its closest centroid
\item[(ii)] Recomputing the mean of each centroid using the points assigned to it.
\end{itemize}
\item The K-means algorithm will always converge to some final set of means for the centroids.
Note that the converged solution may not always be ideal and depends on the initial setting of the centroids.
\item Therefore, in practice the K-means algorithm is usually run a few times with different random initializations.
\item One way to choose between these different solutions from different random initializations is to choose the one with the lowest cost function value (distortion).
\item Random initialization
The initial assignments of centroids for the example dataset in \texttt{ex7.m} were designed so that you will see the same gure as in Figure 1.
\item In practice, a good strategy for initializing the centroids is to select random examples from the training set.
\end{itemize}
\end{document}