Is Random Seed Support Available for Reproducibility in cuML HDBSCAN? #6121

MohabGhobashy · 2024-10-23T14:27:27Z

I am using the cuML implementation of HDBSCAN for clustering and would like to ensure reproducibility across multiple runs. Is there currently any support for setting a random seed (e.g., via a random_state parameter) in the HDBSCAN algorithm to make the results deterministic?

If not, is there any plan to introduce such a feature in future releases?

divyegala · 2024-11-05T16:44:27Z

@MohabGhobashy Could you please explain your use-case? How different are your results across different runs? Please provide a minimal reproducer also if you have one.

In general, it's hard to provide exact reproducibility in highly parallel environments.

MohabGhobashy added ? - Needs Triage Need team to review and classify question Further information is requested labels Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is Random Seed Support Available for Reproducibility in cuML HDBSCAN? #6121

Is Random Seed Support Available for Reproducibility in cuML HDBSCAN? #6121

MohabGhobashy commented Oct 23, 2024

divyegala commented Nov 5, 2024 •

edited

Loading

Is Random Seed Support Available for Reproducibility in cuML HDBSCAN? #6121

Is Random Seed Support Available for Reproducibility in cuML HDBSCAN? #6121

Comments

MohabGhobashy commented Oct 23, 2024

divyegala commented Nov 5, 2024 • edited Loading

divyegala commented Nov 5, 2024 •

edited

Loading