-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with high min_cluster_size - struct.error 'i' format requires.... #250
Comments
conda env pieces (let me know if you need more):
|
My quick workaround is that you need to set ``min_samples`` to something --
with it set to ``None`` it will default to using
``min_samples=min_cluster_size`` which means it will be hunting for the
5000 nearest neighbors of every point internally within the algorithm, and
that may be a little expensive, and is almost undoubtedly associated with
this error.
That being said it should still not be erroring like this. I don't know
quite what has gone wrong, but it seems to be in the distribution of the
nearest neighbor search stage. It is possible that setting
``core_dist_n_jobs=1`` may resolve the issue, but I honestly can't say. I
will try to look into this when I get some time, but I can't promise a
swift resolution beyond the workarounds offered here.
…On Wed, Nov 7, 2018 at 4:20 PM Sarah Bird ***@***.***> wrote:
conda env:
- scikit-learn 0.20.0 py36h4989274_1
- hdbscan 0.8.18 py36h7eb728f_0 conda-forge
- python 3.6.6 h5001a0f_3 conda-forge
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#250 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALaKBYGejPGbxILntiYshqLThn304Eg8ks5us06FgaJpZM4YTUqP>
.
|
No stress. I definitely am not going to use the values in this range, I was just curious and the error felt like maybe it could be reported more clearly so I figured I'd post it. Thanks so much for the swift reply. Keep up the great work @lmcinnes. Am using hdbscan and umap extensively. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I was working with data that has ~100,000 rows, and 2 columns. I was exploring increasing min_cluster_size up to high numbers to watch the effect. At
min_cluster_size=5000
I got the following error, which surprised me somewhat.I'm really not sure what's happening, or whether this is even an issue I should be reporting to HDBSCAN, so feel free to close if it doesn't look relevant.
Here are the plots for N=3000 and N=4000, that large cluster is very big so I was expecting this to work.
min_samples
was set to None.The text was updated successfully, but these errors were encountered: