Avoid large device allocation in UMAP with nndescent #6292
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently
NNDescent
returns two arrays:graph.graph()
: (n x graph_degree) on hostgraph.distances()
: (n x graph_degree) on deviceDownstream, the rest of UMAP wants both of these to be device arrays of shape (n x n_neighbors).
Currently we copy
graph.graph()
to a temporary device array, then slice and and copy it to the output arrayout.knn_indices
.Ideally we'd force
graph_degree = n_neighbors
to avoid the slicing entirely (and reduce the size of the intermediate results). However, it seems like currently there's a bug inNNDescent
where reducinggraph_degree
ton_neighbors
causes a significant decrease in result quality. So for now we need to keep the slicing around.We can avoid allocating the temporary device array though, instead doing the slicing on host. Doing this avoids allocating a (n x graph_degree) device array entirely; for large
n
this can be a significant savings (47 GiB on one test problem I was trying).We still should fix the
graph_degree
issue, but for now this should help unblock running UMAP on very large datasets.