-
Notifications
You must be signed in to change notification settings - Fork 777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix hierarchy viz and handle any form of distance matrix #1173
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR and the extensive docs, greatly appreciated! A few minor things to change before it will be merged (after CI has passed).
# Make sure its entries are non-negative | ||
if np.any(X < 0): | ||
raise ValueError("Distance matrix cannot contain negative values.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether this is necessary as this issue is handled and raised within Scipy
already. Perhaps remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the squareform
from SciPy doesn't raise error if the entries are non-negative.
X = np.array([1, -2, 3,])
squareform(X)
Output:
array([[ 0, 1, -2],
[ 1, 0, 3],
[-2, 3, 0]])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true but the linkage function will give the error but it's alright to keep it here.
I just pushed the changes you requested. I don't know why the test is failing. Let me know if I can help! |
The workflow is failing due to an issue with the testing pipeline. It automatically pulls pandas 2.0 which has a slight change that breaks some specific tests, unfortunately. Having said that, if it runs successfully locally for you, then I think it would be alright to merge it. |
Great, it runs successfully on my computer. |
Awesome, thanks for the work on this PR! It is highly appreciated. |
You are welcome! and thank you for this nice library! |
This fixes #923, fixes #1063 , and fixes #1021 which are just duplicates of the first one.
I added a utility function to make sure the distance matrix computed by a given distance_function is a condensed 1-D array. If not, the function convert it and return the result. It can be used directly as I did in
_bertopic.pyL877
:or to wrap around a function as I did in
_hierarchy.py#L132-L133
:I also added this information in the docstring to let the user know what is possible. I didn't find things to changes in the tutorials.
I added a testing notebook to test the code provided in the mentioned issues. (to be deleted after reviewing the PR)
Let me know if you have any comments!