Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IVF-PQ tutorial notebook #1544

Merged
merged 26 commits into from
Jul 31, 2023

Conversation

achirkin
Copy link
Contributor

@achirkin achirkin commented May 23, 2023

Add a new folder notebooks and a notebook with an example of tweaking ivf-pq using a dataset from ann-benchmarks.com

@achirkin achirkin added feature request New feature or request non-breaking Non-breaking change 2 - In Progress Currenty a work in progress labels May 23, 2023
@achirkin achirkin changed the base branch from branch-23.06 to branch-23.08 May 23, 2023 07:48
@achirkin achirkin self-assigned this May 30, 2023
@github-actions github-actions bot removed the python label May 30, 2023
@tfeher
Copy link
Contributor

tfeher commented Jul 18, 2023

Let's dust this off for the coming release. I think the only thing missing is the last section about the parameters for index building. Since we will have an IVF-Flat notebook that discusses clustering parameters (n-clusters, trainset ratio), we can focus here on additional parameters for the product quantization.

@achirkin achirkin marked this pull request as ready for review July 20, 2023 12:07
@achirkin achirkin added 3 - Ready for Review and removed 2 - In Progress Currenty a work in progress labels Jul 20, 2023
@achirkin achirkin requested a review from tfeher July 20, 2023 12:08
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for the notebook! This looks great, a really detailed discussion of IVF-PQ search and training. While this content is great for the deep dive blog that we are planning, I am wondering whether this is the right level of detail for the example notebook. I feel that there would be a value in making it more concise. Let's wait what the team says.

notebooks/tutorial_ivf_pq.ipynb Show resolved Hide resolved
notebooks/tutorial_ivf_pq.ipynb Outdated Show resolved Hide resolved
}
],
"source": [
"bench_k = np.exp2(np.arange(10)).astype(np.int32)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we need a specific benchmark section here on k, It is good to point out in the text, that the number of neighbors are controlled by k, but actually benchmarking that could be postponed to the blog.

notebooks/tutorial_ivf_pq.ipynb Outdated Show resolved Hide resolved
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -0,0 +1,1226 @@
{
Copy link
Member

@cjnolet cjnolet Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most notebooks that we publish for RAPIDS have longer descriptions before the first cell. I think this is a great lead-in sentence but can you please provide more details about what the notebook is going to do? Can you move the links and descriptions for the datasets up here as well so it's more immediately obvious where they came from and that the notebook is going to use them?


Reply via ReviewNB

notebooks/tutorial_ivf_pq.ipynb Show resolved Hide resolved
@@ -0,0 +1,1226 @@
{
Copy link
Member

@cjnolet cjnolet Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    def show_properties(obj):

This is really nice. I almost wonder if this should eventually be added to pylibraft itself (and for the corresponding __str__() methods be overloaded to print it.


Reply via ReviewNB

notebooks/tutorial_ivf_pq.ipynb Show resolved Hide resolved
notebooks/tutorial_ivf_pq.ipynb Show resolved Hide resolved
notebooks/tutorial_ivf_pq.ipynb Show resolved Hide resolved
notebooks/tutorial_ivf_pq.ipynb Show resolved Hide resolved
notebooks/tutorial_ivf_pq.ipynb Show resolved Hide resolved
notebooks/tutorial_ivf_pq.ipynb Show resolved Hide resolved
@@ -0,0 +1,1226 @@
{
Copy link
Member

@cjnolet cjnolet Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This is useful when in conjugation with add_data_on_build = True"

I think "This can also be useful when used in conjunction with...". Or I would suggest just removing it all together. The add_data_on_build is surely a convenience, but just because it's set to false doesn't mean you can assume anything more about how many vectors are ultimately going to be added to the index.


Reply via ReviewNB

Copy link
Contributor Author

@achirkin achirkin Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a funny typo :)
Regarding the usefulness: I meant to say that if add_data_on_build = False, then kmeans_trainset_fraction doesn't make much sense: a user can just pass a smaller dataset to the same effect.
I've added a small note to clarify this.

@cjnolet
Copy link
Member

cjnolet commented Jul 21, 2023

@achirkin ReviewNB didn't allow me to submit a message w/ the review. It's really a delight reading through the tutorial. It's very in-depth and I think it's going to be a great resource for users. I was able to make my way through most of it, but not all of it. I'm going to try and read through the rest next week.

@achirkin
Copy link
Contributor Author

achirkin commented Jul 25, 2023

@cjnolet, @tfeher many thanks for the reviews! While I still plan to do some minor changes tomorrow, I think it's ready for another round.

@cjnolet
Copy link
Member

cjnolet commented Jul 27, 2023

@achirkin minor feedback from me at this point. End-to-end, this really is a great tutorial and I'm actually thinking we might want to consider keeping this as it is even after the blog is published (and maybe providing an additional notebook with more concise and simple usage examples for the blog?)

I think developers and users alike can benefit from this notebook as it gives a pretty thorough but readable overview of IVF-PQ. Thanks again for creating this!

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem! I think this notebook is good to go for 23.08 and we can revisit some portions for 23.10. Overall I think it's a great deep dive into ivf pq.

@raydouglass raydouglass merged commit c957037 into rapidsai:branch-23.08 Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

4 participants