-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix hierarchical_topics(...)
when the distances between three clusters are the same
#1929
Merged
MaartenGr
merged 9 commits into
MaartenGr:master
from
azikoss:fix-topic-hierarchy-issue-1907
Jun 13, 2024
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
58761df
Adds functionality that makes sure that the distances between cluster…
azikoss 8eadf3e
Change the default noise values
azikoss 3f027e9
Making sure that the values are always preserved in sorted order
azikoss a4bcff8
Fixing the unique distances - deleting the addition.
azikoss 37fedac
Simplified the code
azikoss 149378c
Typing fix
azikoss 62ead9a
fixing the docstring
azikoss 7b0eb2d
Merge remote-tracking branch 'upstream/master' into fix-topic-hierarc…
azikoss e9e0ee9
Merge branch 'master' into fix-topic-hierarchy-issue-1907
azikoss File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked the actual values of the updated distance list? When I run it, I get the following updated values:
The last value is twice as big which should not happen. I have a feeling the code for
get_unique_distances
could be simplified a bit. What about simply doing something like this:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice simplification.
Are we ok with changing distances that do not have a duplicate?
E.g.
check_dists([0, 0, 0, 0, 0, 0, 0, 1e-7], noise_max=1e-7)
changes the last value otherwise the distances would not be in the increasing order.I had a bug in the code (should assign and not add), that's why the last value was 2.00000008e+00.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, my preference would indeed be to keep them as is as long as it requires no more than one or two lines of code. I would like to simplify this as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the code. Please have a look and let me know if you have any ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! I just tested it a bunch of times and it all looks good to me. Thanks for simplifying the code. I'll re-run the workflow to check whether everything passes. If it does, I will go ahead and merge the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests failed but I believe because you used
list[float]
which is not supported in python 3.8. Removing that should make the tests pass I think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, you are right! I just changed it! Thank you!