-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to extract latent representation from clustering algorithms #177
Comments
Hi there 👋, Thank you so much for your attention to PyPOTS! You can follow me on GitHub to receive the latest news of PyPOTS. If you find PyPOTS helpful to your work, please star⭐️ this repository. Your star is your recognition, which can help more people notice PyPOTS and grow PyPOTS community. It matters and is definitely a kind of contribution to the community. I have received your message and will respond ASAP. Thank you for your patience! 😃 Best, |
Hey @vemuribv, I'm going to adjust the PyPOTS framework API to make the clustering models return their latent representation. From your end, could you please give a thought to how PyPOTS can provide a more useful utility to help users calculate clustering validation measurements? e.g. could you help integrate some metrics you mentioned in sklearn.clustering into pypots.utils.metrics? After your code is merged into PyPOTS main branch, you will get listed as one of PyPOTS contributors https://pypots.com/about/#all-contributors |
No problem. I'll make the model to provide such an option to return the values. Could you please add the visualization functions as well? |
Thanks for your PR #179, Bhargav! Will review it 😃 |
Hey Bhargav, your PR #179 has been merged. Congrats! 👍 |
Awesome, thank you! What's the best place for the visualization functions? Also in pypots.utils.metrics? |
Absolutely my pleasure ;-) Please put them in |
Great! I'm going to make the clustering models return their latent representation. Then you can write some unit tests for your functions. |
Hi Bhargav, I've made VaDER and CRLI return their latent representations for clustering as you requested in this issue. I've also written unit testing to test our functions of internal cluster validation metrics that you can refer to PyPOTS/tests/clustering/vader.py Lines 66 to 76 in 09b494d
|
1. Feature description
The unsupervised clustering methods (VaDER & CRLI) should internally yield lower dimensional/latent representations of the input data that are used for the final clustering assignments. Users should be able to extract this latent representation for further downstream analysis.
2. Motivation
The ability to extract this latent representation would allow users to calculate internal clustering validation measures like silhouette coefficient, gap statistic, and other indices. This is important especially in cases where there are no ground truth labels.
3. Your contribution
I am not yet totally clear on the clustering architecture implementations in PyPOTS (though I'm starting to familiarize myself more). However, I think these latent representations are already baked into the code:
It may be as simple as including these in what's returned after running .cluster (or provide another function to extract them solely).
The text was updated successfully, but these errors were encountered: