Not reproducible results with find.clusters #335

Deepak12Kaushik · 2022-08-01T13:49:43Z

I try using the find.clusters function with the phenotypic data of wheat (you can think of my data set similar to USArrets dataset) for the purpose of cutting the dendrogram into these number of clusters. But every time the sequence of cluster changes like if first cluster having 4 members, second as 2 members etc. then repeating the function with similar conditions give first cluster with, say, 5 members and so on. Not reproducible results.

#df is my dataset
foo.BIC <- find.clusters(df, max.n = 20, n.pca =200, scale = FALSE,
stat = "BIC", method = "kmeans")
plot(foo.BIC$Kstat, type="o", xlab="number of clusters (K)", ylab="BIC",
col="green", main="Detection based on BIC")
points(5, foo.BIC$Kstat[5], pch="x", cex=3)
mtext(3, tex="'X' indicates the actual number of clusters")

foo.BIC$size
foo.BIC$grp

sanderdebacker · 2024-08-19T11:39:57Z

Responding my findings here because I myself was looking for an answer to a similar problem. Hopefully this is useful for other users.

I've found this in another thread:

Odd shapes of the decrease of BIC can occur for several reasons. The possible explanations I can think of are:
a) there are no clearly identifiable clusters in the data.
b) there are clusters to be identified, but not enough information to disentangle different values of k. In your case this seems very likely: there are few SNPs, and if half of them are specific to one individual they are not informative in terms of clusters.

Original reference:
https://lists.r-forge.r-project.org/pipermail/adegenet-forum/2011-June/000303.html

Otherwise, it would be worth increasing the number of runs of k-means (n.start, default is 10) and increase the number of iterations for each run (n.iter, default is 1e5) to gain a bit of stability. Hopefully that makes your analysis reproducible.

EDIT: just as an example, for my data the analysis stabilised for n.start=1000 and n.iter=1e9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not reproducible results with find.clusters #335

Not reproducible results with find.clusters #335

Deepak12Kaushik commented Aug 1, 2022

sanderdebacker commented Aug 19, 2024 •

edited

Loading

Not reproducible results with find.clusters #335

Not reproducible results with find.clusters #335

Comments

Deepak12Kaushik commented Aug 1, 2022

sanderdebacker commented Aug 19, 2024 • edited Loading

sanderdebacker commented Aug 19, 2024 •

edited

Loading