Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAGRA graph degree manipulation #1472

Open
enp1s0 opened this issue Apr 28, 2023 · 1 comment
Open

CAGRA graph degree manipulation #1472

enp1s0 opened this issue Apr 28, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request Vector Search

Comments

@enp1s0
Copy link
Member

enp1s0 commented Apr 28, 2023

As mentioned in #1435 (comment), it would be useful to have a functionality to reduce the degree of a pre-existing graph. In the CAGRA graph, we can decrease the degree by removing nodes from the neighbor list starting from the end. This process results in a graph equivalent to the one initially constructed with the given degree.

Implementation example

  1. Add a member function to cagra::index structure that replaces the graph.
template <typename T, typename IdxT>
struct index : ann::index {
// ...
  void update_graph(
    raft::device_matrix<IdxT, IdxT, row_major> graph
    ) {graph_ = graph;};

 private:
  raft::device_matrix<IdxT, IdxT, row_major> graph_;
};
  1. Add a function to update the degree of the graph in a given index (replace the graph in the index).
template <typename DataT, typename IdxT>
void update_graph_degree(
    raft::device_resources& res, // [in]
    index<DataT, IdxT>& index, // [in,out]
    const uint32_t new_degree // [in] (must be smaller than index.graph_degree())
) {
  auto old_graph = index.graph();
  auto new_graph = make_device_matrix(...);
  // make a new graph from the old graph
  index.update_graph(new_graph);
}

Alternative approach 1 (Add a member function that updates the degree and replaces the graph)

template <typename T, typename IdxT>
struct index : ann::index {
// ...
  void update_graph_degree(
    raft::device_resources& res, // [in]
    const uint32_t new_degree // [in] (must be smaller than index.graph_degree())
    );

 private:
  raft::device_matrix<IdxT, IdxT, row_major> graph_;
};
  • Less flexible than the example above.
  • Lack of uniformity concerning the current API. (the cagra::prune function is not a member function)

Alternative approach 2 (Add a function that creates a new index)

template <typename DataT, typename IdxT>
index<DataT, IdxT> update_graph_degree(
    raft::device_resources& res, // [in]
    const index<DataT, IdxT>& index, // [in]
    const uint32_t new_degree // [in] (must be smaller than index.graph_degree())
);
  • This approach makes a copy of the dataset and can take up much memory space.

Concern

The feature of decreasing the degree by removing nodes from the neighbor list starting from the end is only applicable to CAGRA or sorted kNN graphs and not to general graphs. However, the CAGRA search engine is available for general graphs, and the graph in an index can be general. In such cases, the update_graph_degree function should not be used but can be called, which would result in a misuse of the API.

Do you have any comments or suggestions, @tfeher @anaruse ?
If there are no issues with the approach I have proposed as an example, I will implement it.

@enp1s0 enp1s0 self-assigned this Apr 28, 2023
@enp1s0 enp1s0 added the enhancement New feature or request label Apr 28, 2023
@tfeher
Copy link
Contributor

tfeher commented May 1, 2023

Thanks @enp1s0 for describing various alternatives! I do not have a strong preference here. We could go with the variant that you think are the most useful.

Please keep in mind that we want to address issue #1479 soon, which means we would have a method of index construction that just stores a reference of the data (if it is already on device). This way we can avoid unnecessary copies, and index could be constructed from device array without large overhead. But even in that case it could be good to have explicit method to update the graph.

Considering the options which take new_graph_degree scalar parameter: ideally the graph degree could be decoupled from the stride of the neighborhood matrix (e.g. blas like ld param for the kernels). In that case we would not need to create a new copy, just update the stride param. But currently it is fine to go with the simpler option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Vector Search
Projects
None yet
Development

No branches or pull requests

3 participants