Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method to rename clusters #17

Open
gmaze opened this issue Feb 10, 2020 · 2 comments
Open

Method to rename clusters #17

gmaze opened this issue Feb 10, 2020 · 2 comments
Assignees
Labels
API API design enhancement New feature or request priority Top of the to-do list

Comments

@gmaze
Copy link
Member

gmaze commented Feb 10, 2020

Cluster IDs are set randomly by the classifier.
So when running multiple configurations of a PCM, it is complicated to understand the analysis if cluster IDs are changing every time.
A simple solution to this issue is to sort cluster IDs using a metric from the training set, this could be for instance:

  • the vertical average of features,
  • a value at a given depth,
  • a cluster median latitude or longitude
  • etc ...

This function is available in the Matlab PCM toolbox as a rename_labels function and should be implemented within pyXpcm as well.

@gmaze gmaze added enhancement New feature or request priority Top of the to-do list API API design labels Feb 10, 2020
@gmaze gmaze self-assigned this Feb 10, 2020
@sdat2
Copy link

sdat2 commented Nov 19, 2020

I think this function solves this problem in sklearn:

import copy

def sort_gmm_by_mean(gmm):
    weights = copy.deepcopy(gmm.weights_)
    means = copy.deepcopy(gmm.means_)
    covariances = copy.deepcopy(gmm.covariances_)
    precisions = copy.deepcopy(gmm.precisions_)
    precisions_cholesky = copy.deepcopy(gmm.precisions_cholesky_)
    # sorts so that the lowest is 0
    new_order = np.argsort(gmm.means_[:, 0]) # means.mean(axis=1))

    for i in range(means.shape[0]):
        # altering GMM
        gmm.weights_[i] =  weights[new_order[i]]
        gmm.means_[i, :] = means[new_order[i], :]
        gmm.covariances_[i, :, :] = covariances[new_order[i], :, :]
        gmm.precisions_[i, :, :] = precisions[new_order[i], :, :]
        gmm.precisions_cholesky_[i, :, :] = precisions_cholesky[new_order[i], :, :]

    return gmm

@gmaze
Copy link
Member Author

gmaze commented Nov 20, 2020

thanks @sdat2 for pointing this out !
this could be indeed much more simple to implement and would return sorted clusters by default
let's give this a try
g

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API API design enhancement New feature or request priority Top of the to-do list
Projects
None yet
Development

No branches or pull requests

2 participants