unsupervised.Rmd

---
title: "Unsupervised Learning"
author: "Jonathan Rosenblatt"
date: "April 12, 2015"
output: html_document
---

Some utility functions:
```{r utility}
l2 <- function(x) x^2 %>% sum %>% sqrt 
l1 <- function(x) abs(x) %>% sum  
MSE <- function(x) x^2 %>% mean 

# Matrix norms:
frobenius <- function(A) norm(A, type="F")
spectral <- function(A) norm(A, type="2")
```


__Note__: `foo::bar` means that function `foo` is part of the `bar` package. 
With this syntax, there is no need to load (`library`) the package.
If a line does not run, you may need to install the package: `install.packages('bar')`.
Packages that are install from sources other than CRAN (like github or bioconductor) will include a commented installation line.

__Note__:RStudio currently does not autocomplete function arguments when using the `::` syntax.


# Learning Distributions 

## Gaussian Density Estimation
```{r generate data}
# Sample from a multivariate Gaussian:
## Generate a covariance matrix
p <- 10
Sigma <- bayesm::rwishart(nu = 100, V = diag(p))$W
lattice::levelplot(Sigma)

# Sample from a multivariate Gaussian:
n <- 1e3
means <- 1:p
X1 <- mvtnorm::rmvnorm(n = n, sigma = Sigma, mean = means)
dim(X1)

# Estiamte parameters and compare to truth:
estim.means <- colMeans(X1) # recall truth is (1,...,10)
plot(estim.means~means); abline(0,1, lty=2)

estim.cov <- cov(X1)
plot(estim.cov~Sigma); abline(0,1, lty=2)

estim.cov.errors <- Sigma - estim.cov
lattice::levelplot(estim.cov.errors)
lattice::levelplot(estim.cov.errors/Sigma) # percentage error

frobenius(estim.cov.errors)

# Now try the same while playing with n and p.
```


Other covariance estimators (robust, fast,...)
```{r covariances}
# Robust covariance
estim.cov.1 <- MASS::cov.rob(X1)$cov
estim.cov.errors.1 <- Sigma - estim.cov.1
lattice::levelplot(estim.cov.errors.1)
lattice::levelplot(estim.cov.errors.1/Sigma) # percentage error

frobenius(estim.cov.errors.1)


# Nearest neighbour cleaning of outliers
estim.cov.2 <- covRobust::cov.nnve(X1)$cov
estim.cov.errors.2 <- Sigma - estim.cov.2
lattice::levelplot(estim.cov.errors.2)
frobenius(estim.cov.errors.2)


# Regularized covariance estimation
estim.cov.3 <- robustbase::covMcd(X1)$cov
estim.cov.errors.3 <- Sigma - estim.cov.3
lattice::levelplot(estim.cov.errors.3)
frobenius(estim.cov.errors.3)


# Another robust covariance estimator
estim.cov.4 <- robustbase::covComed(X1)$cov
estim.cov.errors.4 <- Sigma - estim.cov.4
lattice::levelplot(estim.cov.errors.4)
frobenius(estim.cov.errors.4)
```

## Non parametric density estimation
There is nothing that will even try dimensions higher than 6.
See [here](http://vita.had.co.nz/papers/density-estimation.pdf) for a review.


## Graphical Models
[TODO]
See R's graphical modeling [task view](http://cran.r-project.org/web/views/gR.html).


## Association rules
Note: Visualization examples are taken from the arulesViz [vignette](http://cran.r-project.org/web/packages/arulesViz/vignettes/arulesViz.pdf)

```{r association rules}
library(arules)
data("Groceries")
inspect(Groceries[1:2])
summary(Groceries)

rules <- arules::apriori(Groceries, parameter = list(support=0.001, confidence=0.5))
summary(rules)
rules %>% sort(by='lift') %>% head(2) %>% inspect

# For a rule {A => B} we denote:
# support: P(A AND B)
# confidence: P(B|A)
# lift: P(A,B)/[P(B)P(A)]


# Select a subset of rules
rule.subset <- subset(rules, subset = rhs %pin% "yogurt")
inspect(rule.subset)

# Visualize rules:
library(arulesViz)
plot(rules)

subrules <- rules[quality(rules)$confidence > 0.8]
plot(subrules, method="matrix", measure="lift", control=list(reorder=TRUE))
plot(subrules, method="matrix", measure=c("lift", "confidence"), control=list(reorder=TRUE))

plot(subrules, method="grouped")
plot(rules, method="grouped", control=list(k=50))

subrules2 <- head(sort(rules, by="lift"), 10)
plot(subrules2, method="graph", control=list(type="items"))
plot(subrules2, method="graph")

# Export rules graph to use with other software:
# saveAsGraph(head(sort(rules, by="lift"),1000), file="rules.graphml")

rule.1 <- rules[1] 
inspect(rule.1)
plot(rule.1, method="doubledecker", data = Groceries)
```

See also the `prim.box` function in the `prim` package for more algorithms to learn association rules


# Dimensionality Reduction

## PCA
Note: example is a blend from [Gaston Sanchez](http://gastonsanchez.com/blog/how-to/2012/06/17/PCA-in-R.html) and [Georgia's Geography dept.](http://geog.uoregon.edu/GeogR/topics/pca.html).


Get some data
```{r PCA data}
?USArrests

plot(USArrests) # basic plot
corrplot::corrplot(cor(USArrests), method = "ellipse") # slightly fancier


# As a correaltion graph
cor.1 <- cor(USArrests)
qgraph::qgraph(cor.1) 
qgraph::qgraph(cor.1, layout = "spring", posCol = "darkgreen", negCol = "darkmagenta")
```


```{r prepare data}
USArrests.1 <- USArrests[,-3] %>% scale # note the scaling, which is required by some
```


```{r PCA}
# functions down the road...
pca1  <-  prcomp(USArrests.1, scale. = TRUE) # The main workhorse.

pca1$rotation # loadings

# Now score the states:
USArrests.1[
  pca1$x %>% extract(,1) %>% which.max
  ,] # Fewest arrests
USArrests.1[
  pca1$x %>% extract(,1) %>% which.min
  ,] # Most arrests

pca1$x %>% extract(,1) %>% sort %>% head 
pca1$x %>% extract(,1) %>% sort %>% tail
```
Interpretation:

- PC1 seems to capture overall crime rate.
- PC2 seems distinguish between sexual and non-sexual crimes
- North Dakota is the most "arrestful" state. Florida is the least.


Projecting on first two PCs:
```{r visualizing PCA}
library(ggplot2) # for graphing

pcs  <-  as.data.frame(pca1$x)
ggplot(data = pcs, aes(x = PC1, y = PC2, label = rownames(pcs))) +
  geom_hline(yintercept = 0, colour = "gray65") +
  geom_vline(xintercept = 0, colour = "gray65") +
  geom_text(colour = "red", alpha = 0.8, size = 6) +
  ggtitle("PCA plot of USA States - Crime Rates")
```


The bi-Plot
```{r biplot}
biplot(pca1) #ugly!

# library(devtools)
# install_github("vqv/ggbiplot")
ggbiplot::ggbiplot(pca1, labels =  rownames(USArrests.1)) # better!
```


The scree-plot
```{r screeplot}
screeplot(pca1)

ggbiplot::ggscreeplot(pca1)
```
So clearly the main differentiation is along the first component, which captures the overall crime level in each state (and not a particular type of crime).


Visualize the scoring as a projection of the states' attributes onto the factors.
```{r visualize contributions to factors}
# get parameters of component lines (after Everitt & Rabe-Hesketh)
load <- pca1$rotation
slope <- load[2, ]/load[1, ]
mn <- apply(USArrests.1, 2, mean)
intcpt <- mn[2] - (slope * mn[1])

# scatter plot with the two new axes added
dpar(pty = "s")  # square plotting frame
USArrests.2 <- USArrests[,1:2] %>%  scale
xlim <- range(USArrests.2)  # overall min, max
plot(USArrests.2, xlim = xlim, ylim = xlim, pch = 16, col = "purple")  # both axes same length
abline(intcpt[1], slope[1], lwd = 2)  # first component solid line
abline(intcpt[2], slope[2], lwd = 2, lty = 2)  # second component dashed
legend("right", legend = c("PC 1", "PC 2"), lty = c(1, 2), lwd = 2, cex = 1)

# projections of points onto PCA 1
y1 <- intcpt[1] + slope[1] * USArrests.2[, 1]
x1 <- (USArrests.1[, 2] - intcpt[1])/slope[1]
y2 <- (y1 + USArrests.1[, 2])/2
x2 <- (x1 + USArrests.1[, 1])/2
segments(USArrests.1[, 1], USArrests.1[, 2], x2, y2, lwd = 2, col = "purple")
```


Visualize the loadings (ok... we are already doing factor analysis without noticing...)
```{r visualize PCA}
# install.packages('GPArotation')
pca.qgraph <- qgraph::qgraph.pca(USArrests.1, factors = 2, rotation = "varimax")
plot(pca.qgraph)

qgraph::qgraph(pca.qgraph, posCol = "darkgreen", layout = "spring", negCol = "darkmagenta", 
    edge.width = 2, arrows = FALSE)
```


More implementations of PCA:
```{r many PCA implementations}
# FAST solutions:
gmodels::fast.prcomp()

# More detail in output:
FactoMineR::PCA()

# For flexibility in algorithms and visualization:
ade4::dudi.pca()

# Another one...
amap::acp()
```


Principal tensor analysis:
[TODO]
```{r PTA}
PTAk::PTAk()
```


## sPCA
```{r sPCA}
# Compute similarity graph
state.similarity <- MASS::cov.rob(USArrests.1)$cov

spca1 <- elasticnet::spca(state.similarity, K=2,type="Gram",sparse="penalty",trace=TRUE, para=c(0.06,0.16))
spca1$loadings
```


## kPCA
[TODO]
```{r kPCA}
kernlab::kpca()
```


## Random Projections 
[TODO]
```{r Random Projections}

```


## MDS
Classical MDS
```{r MDS}
# We first need a dissimarity matrix/graph:
state.disimilarity <- dist(USArrests.1)

mds.1 <- stats::cmdscale(state.disimilarity)

plot(mds.1, pch = 19)
abline(h=0, v=0, lty=2)
text(mds.1, pos = 4, labels = rownames(USArrests.2), col = 'tomato')

# Compare with two PCA (first two PCs):
points(pca1$x[,1:2], col='red', pch=19, cex=0.5)
# So classical MDS with Euclidean distance, is the same as PCA on two dimensions!
```
Note: Also see the `cluster::daisy` for more dissimilarity measures.


Let's try other strain functions for MDS.

Sammon's strain: 
```{r Sammon MDS}
mds.2 <- MASS::sammon(state.disimilarity)
plot(mds.2$points, pch = 19)
abline(h=0, v=0, lty=2)
text(mds.2$points, pos = 4, labels = rownames(USArrests.2))

# Compare with two PCA (first two PCs):
arrows(x0 = mds.2$points[,1], y0 = mds.2$points[,2], x1 = pca1$x[,1], y1 = pca1$x[,2], col='red', pch=19, cex=0.5)
# So Sammon's MDS with Euclidean distance, is *not* the same as PCA on two dimensions.
```


Kruskal's strain:
```{r isoMDS}
mds.3 <- MASS::isoMDS(state.disimilarity)
plot(mds.3$points, pch = 19)
abline(h=0, v=0, lty=2)
text(mds.3$points, pos = 4, labels = rownames(USArrests.2))

# Compare with two PCA (first two PCs):
arrows(x0 = mds.3$points[,1], y0 = mds.3$points[,2], x1 = pca1$x[,1], y1 = pca1$x[,2], col='red', pch=19, cex=0.5)
# So Kruskal's MDS with Euclidean distance, is *not* the same as PCA on two dimensions.
```


## Isomap
```{r Isomap}
# Installing the package:
# source("http://bioconductor.org/biocLite.R")
# biocLite("RDRToolbox")
isomap.1 <- RDRToolbox::Isomap(USArrests.1)

plot(isomap.1$dim2)
abline(h=0, v=0, lty=2)
text(isomap.1$dim2, pos = 4, labels = rownames(USArrests.2))


# Compare with two PCA (first two PCs):
arrows(x0 = isomap.1$dim2[,1], y0 = isomap.1$dim2[,2], x1 = pca1$x[,1], y1 = pca1$x[,2], col='red', pch=19, cex=0.5)
```


## Local Linear Embedding (LLE)
```{r LLE}
lle.1 <- RDRToolbox::LLE(USArrests.1, k=3)

plot(lle.1)
abline(h=0, v=0, lty=2)
text(lle.1, pos = 4, labels = rownames(USArrests.2))


# Compare with two PCA (first two PCs):
arrows(x0 = lle.1[,1], y0 = lle.1[,2], x1 = pca1$x[,1], y1 = pca1$x[,2], col='red', pch=19, cex=0.5)
```
Well, LLE (with 3 neighbors) clearly disagrees with PCA. Why is this?


## LocalMDS
The only package I found is `localmds` in [here](https://github.com/hadley/localmds/blob/master/R/localmds.r). 
It is currently under active development so I am still waiting a stable version.


## Principal Curves & Surfaces
```{r Principla curves}
princurve.1 <- princurve::principal.curve(USArrests.1, plot=TRUE)
princurve.1$s

points(princurve.1) # Projections of data on principal curve
whiskers <- function(from, to) segments(from[, 1], from[, 2], to[, 1], to[, 2])
whiskers(USArrests.1, princurve.1$s)
```


# Latent Space Generative Models

## Factor Analysis (FA)

No rotation
```{r FA}
fa.1 <- psych::principal(USArrests.1, nfactors = 2, rotate = "none")
fa.1
summary(fa.1)
biplot(fa.1, labels =  rownames(USArrests.1)) 

# Numeric comparison with PCA:
fa.1$loadings
pca1$rotation

# Graph comparison: loadings encoded in colors
qgraph::qgraph(fa.1)
qgraph::qgraph(pca.qgraph) # for comparison


# Geometric coherent graph comparison: loadings encoded in distances and colors
qgraph::qgraph(fa.1)
qgraph::qgraph(pca.qgraph) # for comparison
```


Varimax rotation
```{r varimax}
fa.2 <- psych::principal(USArrests.1, nfactors = 2, rotate = "varimax")

fa.2$loadings
fa.1$loadings
pca1$rotation
```
Notice the rotation has changed the interpretation of the factors.


## Independant component analysis (ICA)
```{r ICA}

ica.1 <- fastICA::fastICA(USArrests.1, n.com=2) # Also performs projection pursuit


plot(ica.1$S)
abline(h=0, v=0, lty=2)
text(ica.1$S, pos = 4, labels = rownames(USArrests.1))

# Compare with two PCA (first two PCs):
arrows(x0 = ica.1$S[,1], y0 = ica.1$S[,2], x1 = pca1$x[,2], y1 = pca1$x[,1], col='red', pch=19, cex=0.5)
```


## Exploratory Projection Pursuit
```{r exploratory projection pursuit}
epp.1 <- REPPlab::EPPlab(USArrests.1)
plot(epp.1)
```

## Generative Topographic Map (GTP)
[TODO]


## Finite Mixture
```{r mixtures}
library(mixtools)

# Generate data:
# Note that component-wise independence is assumed.
k <- 2
mix.p <- 4
mix.probs <- rep(1/k,k)
mix.means <- seq(1,k*mix.p) %>% matrix(nrow = k, ncol = mix.p)
mix.sigma <- rep(1,k*p) %>% matrix(nrow = k, ncol = mix.p)
x.mix <- mixtools::rmvnormmix(n=n, lambda =mix.probs, mu=mix.means, sigma = mix.sigma)
x.mix %>% dim

# Non parametric fit (initializing with true means)
mix.1 <- mixtools::npEM(x.mix, mu0 = mix.means, verb = TRUE)
plot(mix.1)

# Fit assuming the Gaussian distribution:
matrix2list <- function(x) split(x, rep(1:ncol(x), each = nrow(x)))
mix.means.list <- matrix2list(t(mix.means))

mix.2 <- mixtools::mvnormalmixEM(x.mix, k=2, mu=mix.means.list, verb = TRUE, epsilon = 1e-1)
summary(mix.2)
```
Read [this](http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch20.pdf) for more information on Finite mixtures.


## Hidden Markov Model (HMM)
```{r}
# Note: the HiddenMarkov::foo() syntax will not work with this function. We thus load it.
library(HiddenMarkov)

# Generate data:
(hmm.transition <- matrix(c(1/2, 1/2,   0, 1/3, 1/3, 1/3, 0, 1/2, 1/2), byrow=TRUE, nrow=3))
hmm.probs <- rep(1,3)/3
hmm.distribution <- 'norm'
hmm.params <- list(mean=c(1, 6, 3), sd=c(0.2, 0.2, 0.2))
x <- dthmm(x = NULL, Pi = hmm.transition, delta = hmm.probs, distn = hmm.distribution, pm = hmm.params)
x <- simulate(x, nsim=n)
plot(x$x)
# Can you guess when states were changed?

# Let's make this harder:
hmm.params <- list(mean=c(1, 6, 3), sd=rep(2,3))
x <- dthmm(NULL, hmm.transition, hmm.probs, hmm.distribution, hmm.params)
x <- simulate(x, nsim=n)
plot(x$x, type='h')


# Estimate parameters:
y <- BaumWelch(x)
summary(y)

# Compare with truth:
hmm.true.state <- x$y
hmm.predict.state <- Viterbi(y)
table(predict=hmm.predict.state, true=hmm.true.state)
```


# Clustering:
Some tutorials on clustering with R can be found in 

- [David Hitchcock](http://people.stat.sc.edu/Hitchcock/chapter6_R_examples.txt).
- [QuickR](http://www.statmethods.net/advstats/cluster.html).
- University of California, Riverside, [Institute of Integrative Genome Biology](http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#TOC-Clustering-and-Data-Mining-in-R).
- [Phil Spector's](http://www.stat.berkeley.edu/~s133/Cluster2a.html) class notes from Berkeley Stats dept.
- Michigan state university's [Laboratory for Dynamic Synthetic Vegephenomenology](http://ecology.msu.montana.edu/labdsv/R/labs/lab13/lab13.html).


## K-Means
The following code is an adaptation from [David Hitchcock](http://people.stat.sc.edu/Hitchcock/chapter6_R_examples.txt).
```{r kmeans}
k <- 2
kmeans.1 <- stats::kmeans(USArrests.1, centers = k)
kmeans.1$cluster # cluster asignments

# Visualize using scatter plots of the original features
pairs(USArrests.1, panel=function(x,y) text(x,y,kmeans.1$cluster))

# Visualize using scatter plots of the original features
plot(pca1$x[,1], pca1$x[,2], xlab="PC 1", ylab="PC 2", type ='n', lwd=2)
text(pca1$x[,1], pca1$x[,2], labels=rownames(USArrests.1), cex=0.7, lwd=2, col=kmeans.1$cluster)
```


## K-Means++
Recall that K-Means++ is a smart initialization for K-Means.
The following code is taken from the [r-help](https://stat.ethz.ch/pipermail/r-help/2012-January/300051.html) mailing list.
```{r kmeansPP}
kmpp <- function(X, k) {
  require('pracma')
  
  n <- nrow(X)
  C <- numeric(k)
  C[1] <- sample(1:n, 1)
  
  for (i in 2:k) {
    dm <- distmat(X, X[C, ])
    pr <- apply(dm, 1, min); pr[C] <- 0
    C[i] <- sample(1:n, 1, prob = pr)
  }
  
  kmeans(X, X[C, ])
}

# Examine output:
kmeans.2 <- kmpp(USArrests.1, k)
kmeans.2$cluster
```


## K-Medoids
```{r kmedoids}
kmed.1 <- cluster::pam(x= state.disimilarity, k=2)
kmed.1$clustering

plot(pca1$x[,1], pca1$x[,2], xlab="PC 1", ylab="PC 2", type ='n', lwd=2)
text(pca1$x[,1], pca1$x[,2], labels=rownames(USArrests.1), cex=0.7, lwd=2, col=kmed.1$cluster)
```
Many other similarity measures can be found in `proxy::dist()`.
See `cluster::clara()` for a massive-data implementation of PAM.


## Hirarchial Clustering
```{r Hirarchial Clustering}
# Single linkage:
hirar.1 <- hclust(state.disimilarity, method='single')
plot(hirar.1, labels=rownames(USArrests.1), ylab="Distance")

# Complete linkage:
hirar.2 <- hclust(state.disimilarity, method='complete')
plot(hirar.2, labels=rownames(USArrests.1), ylab="Distance")

# Average linkage:
hirar.3 <- hclust(state.disimilarity, method='average')
plot(hirar.3, labels=rownames(USArrests.1), ylab="Distance")

# Fixing the number of clusters:
cut.2.2 <- cutree(hirar.2, k=2)
cut.2.2     # printing the "clustering vector"

# Suppose we preferred a 5-cluster solution:
cut.2.5 <- cutree(hirar.2, k=5)
cut.2.5   # printing the "clustering vector"
```

Visualizing clusters:
```{r visualize clusters}
# Visualize using scatter plots of the original features
pairs(USArrests.1, panel=function(x,y) text(x,y,cut.2.5))

# Visualize in the PC plane:
plot(pca1$x[,1], pca1$x[,2], xlab="PC 1", ylab="PC 2", type ='n', lwd=2)
text(pca1$x[,1], pca1$x[,2], labels=rownames(USArrests.1), cex=0.7, lwd=2, col=cut.2.5)
```


```{r agnes}
# install.packages('cluster')
library(cluster)
agnes()
```


## QT Clustering
[TODO]
See [here](http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#TOC-Clustering-and-Data-Mining-in-R)]


## Fuzzy Clustering
[TODO]
See [here](But see [here](http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#TOC-Clustering-and-Data-Mining-in-R)])


## Self Organizing Maps (SOMs)
The following is adapted from [Shane Lynn](http://shanelynn.ie/index.php/self-organising-maps-for-customer-segmentation-using-r/).
More details in [this paper](http://www.jstatsoft.org/v21/i05/paper).
If you want hexagons instead of circles, see [this](http://stackoverflow.com/questions/19858729/r-package-kohonen-how-to-plot-hexagons-instead-of-circles-as-in-matlab-som-too).
```{r som}
library(kohonen)
som.1 <- kohonen::som(USArrests.1, grid = somgrid(6, 6, "hexagonal"))
```

Visuzlize results:
We may need [this figure](notes/art/som_simulation.png) in mind when interpreting SOM:
```{r som}
# Segments plot:
plot(som.1)

# Counts plot:
plot(som.1, type='counts')

# Quality  plot:
plot(som.1, type='quality')


# Neighbours Distance plot:
plot(som.1, type='dist.neighbours')


# 
property.plot <- function(k) plot(som.1, type='property', property = som.1$codes[,k], main = colnames(som.1$codes)[k])
property.plot(1)
property.plot(2)
property.plot(3)


# Clustering:
pretty_palette <- c('#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2')
som.1.cluster <- cutree(hclust(dist(som.1$codes)), 5)
plot(som.1, type="mapping", bgcol = pretty_palette[som.1.cluster], main = "Clusters") 
add.cluster.boundaries(som.1, som.1.cluster)
```
For fancy visualization of `kohonen` SOMs, see [Seth Spielman's](https://github.com/geoss/som_visualization_r) code.

Other SOM implementations can be found in `som::som()` and `class::SOM()` but `kohonen` seems the most complete and well documented.

__Note__: many functions are called `som`. Be careful when loading packages, and make use of the  `::` syntax.


## Spectral Clustering
```{r spectral clustering}
# install.packages('kernlab')
library(kernlab)

kernlab::specc()
```


## Model based (generative) clustering
```{r generative clustering}
library(mclust)
mclust.1 <- Mclust(USArrests.1)
summary(mclust.1)

# By default, the generative Gaussian distributions considered are:
# "EII": spherical, equal volume 
# "VII": spherical, unequal volume 
# "EEI": diagonal, equal volume and shape
# "VEI": diagonal, varying volume, equal shape
# "EVI": diagonal, equal volume, varying shape 
# "VVI": diagonal, varying volume and shape 
# "EEE": ellipsoidal, equal volume, shape, and orientation 
# "EEV": ellipsoidal, equal volume and equal shape
# "VEV": ellipsoidal, equal shape 
# "VVV": ellipsoidal, varying volume, shape, and orientation  

# Plotting the BIC values (which is possible for generative methods)
plot(mclust.1, data=USArrests, what="BIC")
# The best solution is VEI with 3 clusters.

# The clustering:
mclust.1$classification

# This gives the probabilities of belonging to each cluster for every object:
round(mclust.1$z,2)
```


Visualizing the clusters:
```{r visualize generative clustering}
# Visualize using scatter plots of the original features
pairs(USArrests.1, panel=function(x,y) text(x, y, mclust.1$classification))

# Visualize in the PC plane:
plot(pca1$x[,1], pca1$x[,2], xlab="PC 1", ylab="PC 2", type ='n', lwd=2)
text(pca1$x[,1], pca1$x[,2], labels=rownames(USArrests.1), cex=0.7, lwd=2, col=mclust.1$classification)
```