Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Approximate Nearest Neighbors #2780

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
4124e5a
Multiple KNN strategies (implementing PQ)
viclafargue Sep 1, 2020
132acab
Multiple improvements
viclafargue Sep 2, 2020
fb82831
Adding nprobe parameter
viclafargue Sep 3, 2020
729b3a4
Adding support for GpuIndexIVFFlat and GpuIndexIVFScalarQuantizer
viclafargue Sep 4, 2020
2fa57fa
Completing documentation
viclafargue Sep 4, 2020
5c44a18
Small fixes
viclafargue Sep 7, 2020
8a46e42
Merge branch 'branch-0.16' into fea-multiple-knn-strategies
viclafargue Sep 7, 2020
9c05d87
Adding test
viclafargue Sep 8, 2020
fa7d004
Improving tests
viclafargue Sep 8, 2020
2c66690
Corrections & improvements
viclafargue Sep 9, 2020
51362a7
Check style
viclafargue Sep 9, 2020
5837cc8
Update changelog
viclafargue Sep 9, 2020
49bb435
Adding include
viclafargue Sep 9, 2020
0d3cfe1
Merge branch-0.16
viclafargue Sep 24, 2020
f39cc21
First part of requested changes
viclafargue Sep 24, 2020
1a3af42
ANN parameters creation in separate file
viclafargue Sep 25, 2020
636ce58
Updating ANN methods documentation
viclafargue Sep 25, 2020
71b84f9
Merge branch-0.17
viclafargue Oct 30, 2020
bbff2c2
update related to raft
viclafargue Nov 2, 2020
1ef5fd5
Merge branch 'branch-0.17' into fea-multiple-knn-strategies
viclafargue Nov 2, 2020
1bb95df
Update changelog
viclafargue Nov 2, 2020
5eb0060
Merge branch-0.18
viclafargue Dec 8, 2020
046a127
Automated parameter determination to Python code
viclafargue Dec 8, 2020
c60d54c
Update changelog according to PR name
viclafargue Dec 8, 2020
96bddef
Lower values for ivfpq testing + testing trim down
viclafargue Dec 9, 2020
8775a42
Merge branch 'branch-0.18' into fea-multiple-knn-strategies
viclafargue Dec 10, 2020
6a97101
Update ivfpq test
viclafargue Dec 10, 2020
7e8ee31
Force index memory release in tests
viclafargue Dec 11, 2020
937e799
Merge branch 'branch-0.18' into fea-multiple-knn-strategies
viclafargue Dec 11, 2020
902849d
Removing changelog update
cjnolet Dec 11, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 70 additions & 18 deletions cpp/include/cuml/neighbors/knn.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,36 +36,88 @@ enum MetricType {
METRIC_Correlation
};


typedef enum {
Bruteforce,
PQ
} knnIndexType;


struct knnIndexParam {
viclafargue marked this conversation as resolved.
Show resolved Hide resolved
knnIndexType type;
bool automated;

size_t nlist;
size_t M;
size_t n_bits;
bool usePrecomputedTables;
};


/**
* @brief Flat C++ API function to perform a brute force knn on
* a series of input arrays and combine the results into a single
* output array for indexes and distances.
*
* @param[in] handle the cuml handle to use
* @param[in] input vector of pointers to the input arrays
* @param[in] sizes vector of sizes of input arrays
* @param[in] D the dimensionality of the arrays
* @param[in] search_items array of items to search of dimensionality D
* @param[in] n number of rows in search_items
* @param[out] res_I the resulting index array of size n * k
* @param[out] res_D the resulting distance array of size n * k
* @param[in] k the number of nearest neighbors to return
* @param[in] rowMajorIndex are the index arrays in row-major order?
* @param[in] rowMajorQuery are the query arrays in row-major order?
* @param[in] metric distance metric to use. Euclidean (L2) is used by
* default
* @brief Flat C++ API function to perform a brute force knn on
* a series of input arrays and combine the results into a single
* output array for indexes and distances.
*
* @param[in] handle the cuml handle to use
* @param[in] input vector of pointers to the input arrays
* @param[in] sizes vector of sizes of input arrays
* @param[in] D the dimensionality of the arrays
* @param[in] search_items array of items to search of dimensionality D
* @param[in] n number of rows in search_items
* @param[out] res_I the resulting index array of size n * k
* @param[out] res_D the resulting distance array of size n * k
* @param[in] k the number of nearest neighbors to return
* @param[in] rowMajorIndex are the index arrays in row-major order?
* @param[in] rowMajorQuery are the query arrays in row-major order?
* @param[in] metric distance metric to use. Euclidean (L2) is used by
* default
* @param[in] metric_arg the value of `p` for Minkowski (l-p) distances. This
* is ignored if the metric_type is not Minkowski.
* @param[in] expanded should lp-based distances be returned in their expanded
* form (e.g., without raising to the 1/p power).
*/
*/
void brute_force_knn(cumlHandle &handle, std::vector<float *> &input,
std::vector<int> &sizes, int D, float *search_items, int n,
int64_t *res_I, float *res_D, int k,
bool rowMajorIndex = false, bool rowMajorQuery = false,
MetricType metric = MetricType::METRIC_L2,
float metric_arg = 2.0f, bool expanded = false);


/**
* @brief Flat C++ API function to perform a brute force knn on
* a series of input arrays and combine the results into a single
* output array for indexes and distances.
*
* @param[in] handle the cuml handle to use
* @param[in] params the parameters for the choosen KNN strategy
* @param[in] input vector of pointers to the input arrays
* @param[in] sizes vector of sizes of input arrays
* @param[in] D the dimensionality of the arrays
* @param[in] search_items array of items to search of dimensionality D
* @param[in] n number of rows in search_items
* @param[out] res_I the resulting index array of size n * k
* @param[out] res_D the resulting distance array of size n * k
* @param[in] k the number of nearest neighbors to return
* @param[in] rowMajorIndex are the index arrays in row-major order?
* @param[in] rowMajorQuery are the query arrays in row-major order?
* @param[in] metric distance metric to use. Euclidean (L2) is used by
* default
* @param[in] metric_arg the value of `p` for Minkowski (l-p) distances. This
* is ignored if the metric_type is not Minkowski.
* @param[in] expanded should lp-based distances be returned in their expanded
* form (e.g., without raising to the 1/p power).
*/
void perform_knn(cumlHandle &handle, ML::knnIndexParam* params,
viclafargue marked this conversation as resolved.
Show resolved Hide resolved
std::vector<float *> &input, std::vector<int> &sizes,
int D, float *search_items, int n,
int64_t *res_I, float *res_D, int k,
bool rowMajorIndex = false, bool rowMajorQuery = false,
MetricType metric = MetricType::METRIC_L2,
float metric_arg = 2.0f, bool expanded = false);


/**
* @brief Flat C++ API function to perform a knn classification using a
* given a vector of label arrays. This supports multilabel classification
Expand Down
18 changes: 18 additions & 0 deletions cpp/src/knn/knn.cu
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,24 @@ void brute_force_knn(cumlHandle &handle, std::vector<float *> &input,
rowMajorQuery, nullptr, metric, metric_arg, expanded);
}

void perform_knn(cumlHandle &handle, knnIndexParam* algo_params,
std::vector<float *> &input, std::vector<int> &sizes,
int D, float *search_items, int n,
int64_t *res_I, float *res_D, int k,
bool rowMajorIndex, bool rowMajorQuery,
MetricType metric, float metric_arg, bool expanded) {
ASSERT(input.size() == sizes.size(),
"input and sizes vectors must be the same size");

std::vector<cudaStream_t> int_streams = handle.getImpl().getInternalStreams();

MLCommon::Selection::perform_knn(
algo_params, input, sizes, D, search_items, n, res_I, res_D, k,
handle.getImpl().getDeviceAllocator(), handle.getImpl().getStream(),
int_streams.data(), handle.getImpl().getNumInternalStreams(), rowMajorIndex,
rowMajorQuery, nullptr, metric, metric_arg, expanded);
}

void knn_classify(cumlHandle &handle, int *out, int64_t *knn_indices,
std::vector<int *> &y, size_t n_index_rows,
size_t n_query_rows, int k) {
Expand Down
Loading