Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatching in labels of clusters and transition matrix #83

Open
martanit opened this issue May 23, 2023 · 2 comments
Open

Mismatching in labels of clusters and transition matrix #83

martanit opened this issue May 23, 2023 · 2 comments

Comments

@martanit
Copy link

Describe the bug

In SOAPify/Examples/LENS.ipynb, tmat labels are given incorrectly when the clusters assigned by KMeans are not in order (e.g.: [C0=0, C2=2, C1=1]).
The output of calculateTransitionMatrix is a matrix with columns and rows corresponding to ordered clusters (e.g. for columns: C0=0 in col 0, C1=1 in col 1, C2=2 in col 2 ...) while the label assignment is given depending on the cluster order (C0 for col 0, C2 for col 1, C1 for col 2).
The problem is fixed by sorting the labels, from:


classifications = SOAPclassification(
    [], prepareData(classifiedFilteredLENS), [f"C{m[0]}" for m in minmax]
)

to:


classifications = SOAPclassification(
    [], prepareData(classifiedFilteredLENS), [f"C{m[0]}" for m in np.sort(minmax, axis=0)]
)

To reproduce the bug, changing the random_state parameter in KMeans (and thus the cluster assignment order) changes the exchanging probabilities.

@MikkelDA
Copy link

Implementing this for me leads to an RGB conversion error ("Invalid RGBA argument: 'C0.0'") due to the creation of floats as strings instead of integers, thus a rounding should be applied inside the string creation. Here are some examples of what i mean, with the last one containing my proposed correction.

# Currently
print([f"C{m[0]}" for m in minmax])
['C1', 'C2', 'C3', 'C0']

# Correction proposed by martanit
print([f"C{m[0]}" for m in np.sort(minmax, axis=0)])
['C0.0', 'C1.0', 'C2.0', 'C3.0']

# My change to proposed correction
print([f"C{round(m[0])}" for m in np.sort(minmax, axis=0)])
['C0', 'C1', 'C2', 'C3']

Thus the change should be to:

classifications = SOAPclassification(
    [], prepareData(classifiedFilteredLENS), [f"C{round(m[0])}" for m in np.sort(minmax, axis=0)]
)

@martanit
Copy link
Author

Yes, @MikkelDA you are right, I also had that issue and forgot to add the round. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants