-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable HDBSCAN gpu
training and cpu
inference
#6108
Enable HDBSCAN gpu
training and cpu
inference
#6108
Conversation
@@ -526,7 +526,15 @@ dependencies: | |||
- statsmodels | |||
- umap-learn==0.5.6 | |||
- pynndescent | |||
- setuptools # Needed on Python 3.12 for dask-glm, which requires pkg_resources but Python 3.12 doesn't have setuptools by default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dask-glm
was removed by PR #6028 so this is now unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great to me, had a question about a comment but that's all
# These attributes have to be reassigned to the CPU model | ||
# as the raw arrays because the reference HDBSCAN implementation | ||
# reconstructs the objects from the raw arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens in the setters in the hdbscan library, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially it happens in the getters. Here's the issue, consider CondensedTree
object:
- We use setter to assign
CondensedTree
object toself.condensed_tree_
from cuML to hdbscan - The getter for
self.condensed_tree_
checks if it has a value already. If it does, it assumes that it is raw numpy arrays and creates anotherCondensedTree
object without any value sanitization
That's why I re-assigned the raw arrays, so when hdbscan library internally calls the getters it reconstructs the object correctly.
/merge |
Until now, we supported all combinations of GPU/CPU interoperability except the one mentioned in the title. This was because the CPU HDBSCAN package was missing attribute setters. With scikit-learn-contrib/hdbscan#657, attribute setters are now available which allow us to transfer GPU trained attributes to the CPU model. This feature is available as part of
hdbscan=0.8.39