I would like to extend my sincere gratitude to the Unimelb MAST90083 2021S2 teaching team for providing me with the opportunity to work on this project, as well as for their guidance and feedback on my work.
In this project, we will implement the Expectation-Maximization (EM) algorithm to model the intensity distribution of an image. Specifically, we assume that the grayscale values of all pixels in the image follow a Gaussian Mixture Model (GMM) using three probability density functions (PDFs) with varying standard deviations, corresponding to the person, the camera, and the background.
The task involves first implementing the EM algorithm for GMM and then applying it iteratively to the image data to estimate the parameters (means, standard deviations, and mixing probabilities) for each Gaussian component. Additionally, the final part of the project requires assigning each pixel in the image to one of the Gaussian distributions based on the estimated parameters and visualizing the labeled image to compare it with the provided reference output. The complete code is included in 'em_algorithm.r'.
First, we import necessary packages and load data into R.
# Packages
library(plot.matrix)
library(png)
library(fields)
# Image data
I <- readPNG("CM.png")
I <- I[, , 1]
I <- t(apply(I, 2, rev))
# Plot inmage and density of image data
par(mfrow = c(1, 2))
image(I, col = gray((0:255) / 255))
plot(density(I))
In case of a Gaussian Mixture Model (GGM)
The joint pdf of
The complete log likelihood function is:
In E-step of EM algorithm, the expectation of complete log likelihood function is:
In M-step of EM algorithm, we take derivative of
In terms of
Then, we have a equation system with 2 equations and 2 unknowns.
Solving this system give us, at (m+1)th round of iteration, for
In terms of
Solving this equation gives us, at (m+1)th round of iteration, for
In terms of
Solving this equation gives us, at (m+1)th round of iteration, for
Having parameter estimations, in each round of iteration, we can implement EM algorithm in R. First, we build function of EM algorithm:
# EM algorithm function
GGM.EM <- function(X, w_init, mu_init, sigmasq_init, G) {
# parameter X: data
# parameter w_init: initial value of pi
# parameter mu_init: initial value of mu
# parameter sigmasq_init: initial value of sigmasq
# parameter G: number of components
# Stop condition: change in sum of mu's and sigma's less than e_min
e_min <- 0.000001
# List of parameters
w_iter <- matrix(0, nrow = 1, ncol = G)
mu_iter <- matrix(0, nrow = 1, ncol = G)
sigmasq_iter <- matrix(0, nrow = 1, ncol = G)
# Put initial values in list of parameters
w_iter[1,] <- w_init
mu_iter[1,] <- mu_init
sigmasq_iter[1,] <- sigmasq_init
# Initiate list of log likelihood, and list of sum of mu's and sigma's
log_liks <- c(0)
sum_list <- c(0)
# Initiate e: change in sum of mu's; b: number of iterations
e <- 1
b <- 1
# Main iteration of EM algorithm
while (e >= e_min) {
# E-step
# Calculate joint probability of X and Z
probXZ <- matrix(0, nrow = length(X), ncol = G)
for(k in 1:G) {
probXZ[, k] <- dnorm(X, mean = mu_iter[b,k], sd = sqrt(sigmasq_iter[b,k])) * w_iter[b,k]
}
# Calculate conditional probability of Z given X
P_ik <- probXZ / rowSums(probXZ)
# Calculate incomplete log likelihood
log_liks <- c(log_liks, sum(log(rowSums(probXZ))))
# M-step
# Calculate estimation of parameters in each round of iteration
w_new <- colSums(P_ik)/sum(P_ik)
mu_new <- c(1:G)*0
for (k in 1:G) {
mu_new[k] <- sum(P_ik[,k] * X) / sum(P_ik[,k])
}
sigmasq_new <- c(1:G)*0
for (k in 1:G) {
sigmasq_new[k] <- sum(P_ik[,k] * ((X-mu_new[k])^2))/ sum(P_ik[,k])
}
# Put new values of parameters in list of parameters
w_iter <- rbind(w_iter, w_new)
mu_iter <- rbind(mu_iter, mu_new)
sigmasq_iter <- rbind(sigmasq_iter, sigmasq_new)
# Renew list of sum of mu's and sigma's
sum_list <- c(sum_list, sum(c(mu_new, sqrt(sigmasq_new))))
# Calculate change in sum of mu's and sigma's
e <- sum_list[length(sum_list)] - sum_list[length(sum_list) - 1]
# Reset b
b <- b + 1
}
# Return estimation and necessary quantities
return(list(w_iter=w_iter,
mu_iter=mu_iter,
sigmasq_iter=sigmasq_iter,
log_liks=log_liks,
P_ik = P_ik,
probXZ = probXZ
)
)
}
First, we consider initial values.
From the density plot, we see there are two clear peaks in density of data: 1st component has mean around 0.2 and second component has mean around 0.85. The third component is not clear, so we suspect that it is blended within distribution of the second component. So, we set the mean of third component to be 0.7, where there is a tiny fluctuation in density plot. So we set initial mean (0.2, 0.85, 0.7).
For variance, we can see that, the distribution of component 1 seems to be narrower so we set a lower variance for component 1. The right tail of the second peak seems to be thinner and the left tail seems to be fatter, which indicate that component with mean 0.85 has a lower variance, and component with mean 0.7 has a higher variance. In conclusion, we set initial variance (0.001, 0.001, 0.01).
For proportion, it is obvious that the component with mean 0.85 has a larger proportion than the other two. So, we set initial proportion (0.25, 0.5, 0.25).
Then, we run EM algorithm on the image data, with initial values
# Change data into an one dimensional vector
X <- as.vector(I)
# EM algorithm on the image data
EM_estimation <- GGM.EM(X,
w_init=c(0.25,0.5,0.25),
mu_init=c(0.20,0.85,0.70),
sigmasq_init=c(0.001,0.001,0.01),
G=3
)
para_estimation <- round(cbind(EM_estimation$mu_iter,
sqrt(EM_estimation$sigmasq_iter),
EM_estimation$w_iter),
4)
Then we have a look of parameter estimation in each round of iteration
# Parameter estimation - iteration
colnames(para_estimation) <- c("mu1","mu2","mu3","sigma1","sigma2","sigma3","pi1","pi2","pi3")
rownames(para_estimation) <- paste("Iter", 0:(nrow(para_estimation)-1))
print(para_estimation)
Then, we look at the final parameter estimated.
# Parameter estimation - final
mu_final <- EM_estimation$mu_iter[nrow(EM_estimation$mu_iter),]
sigma_final <- sqrt(EM_estimation$sigmasq_iter[nrow(EM_estimation$sigmasq_iter),])
p_final <- EM_estimation$w_iter[nrow(EM_estimation$w_iter),]
# Show the parameter estimation
print(round(c(mu_final, sigma_final, p_final),4))
Therefore, the parameter estimated
Then, we classify each pixel into their distributions based on their posterior probability. That is based on maximum among
# Classify pixels into their estimated components
G <- c(1:length(X))*0
P_ik <- EM_estimation$P_ik
probXZ <- EM_estimation$probXZ
for (i in 1:length(X)) {
G[i] <- which.max(P_ik[i,])
}
# pdf of each components multiplied by their mean
pdf <- c(1:length(X))*0
for (i in 1:length(X)) {
pdf[i] <- sum(probXZ[i,1:3] / sum(probXZ[i,]) * mu_final[1:3])
}
Then we plot the labeled pixels.
# Show the result image
image.plot(matrix(G, 398, 398))
image.plot(matrix(pdf, 398, 398))