Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial cut for a cuVS Java API #450

Open
wants to merge 43 commits into
base: branch-25.02
Choose a base branch
from

Conversation

chatman
Copy link

@chatman chatman commented Nov 8, 2024

A Java API for cuVS for easy integration into Apache Lucene or other Java based projects.

Try:

./build.sh libcuvs
./build.sh java

For generating docs, mvn javadoc:javadoc

Prerequisites:

  • JDK 22
  • Maven 3.9.6+

Todo:

  • Generate project panama classes using jextract on every build
  • Algorithms other than Cagra
  • Prefiltering in cagra

Co-authored-by: Vivek Narang <vivek@searchscale.com>
@chatman chatman requested review from a team as code owners November 8, 2024 14:05
@chatman chatman requested a review from msarahan November 8, 2024 14:05
Copy link

copy-pr-bot bot commented Nov 8, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the CMake label Nov 8, 2024
@chatman
Copy link
Author

chatman commented Nov 8, 2024

FYI @cjnolet ^
An ExampleApp.java is added as a starting point for the review.

@cjnolet cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Nov 8, 2024
@cjnolet cjnolet changed the base branch from branch-24.10 to branch-24.12 November 8, 2024 16:06
@cjnolet
Copy link
Member

cjnolet commented Nov 8, 2024

/ok to test

@chatman chatman changed the title [WIP] Initial cut for a cuVS Java API Initial cut for a cuVS Java API Nov 18, 2024
@chatman
Copy link
Author

chatman commented Nov 18, 2024

@naramgvivek10 Let's move CuVSResources to the cuvs package instead of common? That way we can abstract out the internals of Panama out of sight of the users.

@@ -1,5 +1,6 @@
## common
__pycache__
.gitignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't check .gitignore into .gitignore. We need to know when this file changes.

@@ -0,0 +1,7 @@
export CMAKE_PREFIX_PATH=`pwd`/../cpp/build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this logic in the top-level build.sh please? We do that with all of the other language support.

@@ -0,0 +1,697 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we've discussed having this automatically generate on each build in a future iteraiton. Can you please create a cuVS Github issue to do that?

public void testIndexingAndSearchingFlow() throws Throwable {

// Sample data and query
float[][] dataset = {{ 0.74021935f, 0.9209938f }, { 0.03902049f, 0.9689629f }, { 0.92514056f, 0.4463501f }, { 0.6673192f, 0.10993068f }};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding tests makes it very hard to debug and troubleshoot problems, and they also become very brittle because we've found changes in the compute architectures can introduce flakiness and downright failures from hardcoded tests. Please generate the data instead of hardcoding the values and use bfknn to validate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added a randomized test in my latest commit. We noticed for topK > 64 the returned values are incorrect (neighbor values are in the range outside of the dataset size).

@cjnolet cjnolet changed the base branch from branch-24.12 to branch-25.02 December 12, 2024 04:27
narangvivek10 and others added 24 commits December 12, 2024 09:06
… etc. (#6)

* Ability to configure CAGRA compression parameters
* Enabling indexing threads
* Enabling RMM pool resource configuration
* Bug fixes for wrong values passed in index and search parameters
* Deallocation of resources using Autoclosable
* Including a randomized test

Co-authored-by: Vivek Narang <vivek@searchscale.com>
Co-authored-by: Ishan Chattopadhyaya <ishan@apache.org>
Co-authored-by: Puneet Ahuja <puneet@searchscale.com>
Co-authored-by: Vivek Narang <vivek@searchscale.com>
… etc. (#6)

* Ability to configure CAGRA compression parameters
* Enabling indexing threads
* Enabling RMM pool resource configuration
* Bug fixes for wrong values passed in index and search parameters
* Deallocation of resources using Autoclosable
* Including a randomized test

Co-authored-by: Vivek Narang <vivek@searchscale.com>
Co-authored-by: Ishan Chattopadhyaya <ishan@apache.org>
Co-authored-by: Puneet Ahuja <puneet@searchscale.com>
* Bruteforce API implementation

Co-authored-by: Vivek Narang <vivek@searchscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants