Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UMAP issues with large inputs #6245

Merged
merged 33 commits into from
Feb 13, 2025

Conversation

viclafargue
Copy link
Contributor

Answers #6204

Copy link
Contributor

@wphicks wphicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Let's just make sure to IWYU for the new uses of uint64_t. Using C++ types (std::uint64_t) in our non-CUDA code would be a bonus, but it shouldn't block merge. I've also called out some spots where we could use uniform initialization syntax rather than a bare cast.

Copy link

copy-pr-bot bot commented Jan 23, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@wphicks
Copy link
Contributor

wphicks commented Jan 23, 2025

@viclafargue That last commit was unsigned. Could you sign it and push that up?

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viclafargue and I discussed this briefly last week, but given the nature of the 64-bit hardcoded changes here, I would like to see at least a small benchmark before this is merged so that we can feel comfortable that this doesn’t have a huge impact on the runtime.

@divyegala divyegala requested a review from a team as a code owner February 3, 2025 18:42
@divyegala divyegala requested a review from vyasr February 3, 2025 18:42
@github-actions github-actions bot added the CMake label Feb 3, 2025
@divyegala divyegala changed the base branch from branch-25.02 to branch-25.04 February 3, 2025 18:43
@divyegala divyegala requested a review from a team as a code owner February 3, 2025 18:43
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issues as raft, but this is hardcoding types in perf critical code. Please use templates- it allows us to quickly switch.

@github-actions github-actions bot removed conda conda issue Cython / Python Cython or Python issue CMake labels Feb 7, 2025
@wphicks
Copy link
Contributor

wphicks commented Feb 7, 2025

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Feb 7, 2025

/ok to test

@viclafargue
Copy link
Contributor Author

This benchmark was ran a while back just before the last changes. It demonstrate that there does not seem to be a performance drop when switching to uin64_t. However, it could still be preferable to implement a dispatching mechanism that would store the indices on 32 bits below a certain number of rows. To prevent any delay in merging this PR, I propose opening a separate PR based on this one to handle this properly.

UMAP_branch-25.04_bench.csv
UMAP_53d276c_bench.csv

@divyegala
Copy link
Member

/ok to test

@jcrist
Copy link
Member

jcrist commented Feb 12, 2025

/ok to test

@dantegd
Copy link
Member

dantegd commented Feb 13, 2025

/ok to test

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for now, but can you please create an issue to follow up on the hardcodings of uint64_t in umap.cu? It would be nice for us to figure a good strategy to determine whether or not we should be using uint64_t or int based on the dataset size, rasther than hardcoding everywhere. cc @dantegd

@dantegd
Copy link
Member

dantegd commented Feb 13, 2025

/merge

@rapids-bot rapids-bot bot merged commit 9c0166a into rapidsai:branch-25.04 Feb 13, 2025
71 checks passed
@viclafargue
Copy link
Contributor Author

Here is the issue #6310

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CUDA/C++ non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants