Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] AutoGMM does poorly on well-separated clusters #975

Open
ebridge2 opened this issue Jul 15, 2022 · 0 comments
Open

[BUG] AutoGMM does poorly on well-separated clusters #975

ebridge2 opened this issue Jul 15, 2022 · 0 comments
Labels
bug Something isn't working

Comments

@ebridge2
Copy link
Collaborator

Expected Behavior

an example use-case can be come up with that I can demonstrate/justify for the book

Actual Behavior

The predicted number of clusters/clusterings are not accurate nor close to accurate. Seems to always favor extremely high number of clusters. I have played around with different settings for about 2 hours and cannot find one where GMM does appreciably better than K-means, and the number of clusters predicted is even close to the true number of clusters (seems to usually be 8 or 9)

Example Code

3 near-perfectly gaussians separated with extraordinarily high probability (density of overlap between the different gaussians is ~0), and I cannot seem to get AutoGMM to give me a good clustering where KMeans does appreciably worse, and AutoGMM gives me something within the ballpark of the true number of clusters. Maybe that's fine?

Step 1 generates the latent positions...

from graspologic.simulations import rdpg

pi = np.array([0.33, 0.33, 0.34])
zs = np.random.choice([0, 1, 2], replace=True, p=pi, size=200)
# the means
mus = np.array([[-.7, .7, 0], [.3, .3, .8]])
# the covariances
covars = np.stack(([[.005, .05], [.05, .8]], [[.005, -.05], [-.05, .8]], [[0.002, 0], [0, 0.002]]), axis=2)
np.random.seed(1234)
Xtrue = np.array([np.random.multivariate_normal(mus[:,z], covars[:,:,z]) for z in zs])
P_rdpg = Xtrue @ Xtrue.T
A = rdpg(Xtrue)

and plot it...

_ = pairplot(Xtrue, labels=zs)

Step 2 spectrally embeds...

Xhat = AdjacencySpectralEmbed(n_components=3).fit_transform(A)

Step 3 performs the clustering...

from graspologic.cluster.autogmm import AutoGMMCluster

autogmm_clust = AutoGMMCluster(max_components=10, random_state=1234)

labels_autogmm_erratic = autogmm_clust.fit_predict(Xhat)

Your Environment

  • Python version:
  • graspologic version:

Additional Details

Any other contextual information you might feel is important.

@ebridge2 ebridge2 added the bug Something isn't working label Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant