Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify bench/ann scripts to Python based module #1642

Merged
merged 33 commits into from
Jul 26, 2023

Conversation

divyegala
Copy link
Member

Closes #1633

@divyegala divyegala added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 11, 2023
@github-actions github-actions bot added the cpp label Jul 11, 2023
@github-actions github-actions bot removed the python label Jul 13, 2023
cpp/bench/ann/scripts/run.py Outdated Show resolved Hide resolved
cpp/bench/ann/scripts/get_dataset.py Outdated Show resolved Hide resolved
Comment on lines 35 to 42
def convert_hdf5_to_fbin(path, normalize):
if normalize and "angular" in path:
p = subprocess.Popen(["python", "scripts/hdf5_to_fbin.py", "-n",
"%s" % path])
else:
p = subprocess.Popen(["python", "scripts/hdf5_to_fbin.py",
"%s" % path])
p.wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of invoking via a subprocess - can we just call the python function directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, that script doesn't have a callable function. Would you prefer I refactor the script to make it work or we can do it later?

divyegala and others added 4 commits July 14, 2023 09:46
Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments

cpp/bench/ann/scripts/data_export.py Outdated Show resolved Hide resolved
cpp/bench/ann/scripts/data_export.py Outdated Show resolved Hide resolved
cpp/bench/ann/scripts/data_export.py Outdated Show resolved Hide resolved
cpp/bench/ann/scripts/get_dataset.py Outdated Show resolved Hide resolved
Copy link
Contributor

@wphicks wphicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just one suggestion to avoid old style string formatting, but otherwise I think this is good.

cpp/bench/ann/scripts/get_dataset.py Outdated Show resolved Hide resolved
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are looking good. I did a first-pass to provide some feedback.

cpp/bench/ann/scripts/plot.py Outdated Show resolved Hide resolved
dependencies.yaml Outdated Show resolved Hide resolved
docs/source/cuda_ann_benchmarks.md Outdated Show resolved Hide resolved
docs/source/cuda_ann_benchmarks.md Outdated Show resolved Hide resolved
docs/source/cuda_ann_benchmarks.md Outdated Show resolved Hide resolved
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so close. I'm really really excited about this. The readme is looking great and the fact that we can now do these end-to-end are perfect. Most of my feedback now is about going just a little farther and easing everything possible for the new user.

docs/source/raft_ann_benchmarks.md Outdated Show resolved Hide resolved
docs/source/raft_ann_benchmarks.md Outdated Show resolved Hide resolved
docs/source/raft_ann_benchmarks.md Outdated Show resolved Hide resolved
docs/source/raft_ann_benchmarks.md Outdated Show resolved Hide resolved
docs/source/raft_ann_benchmarks.md Outdated Show resolved Hide resolved
# (1) prepare dataset
# download manually "Ground Truth" file of "Yandex DEEP"
# suppose the file name is deep_new_groundtruth.public.10K.bin
../../scripts/split_groundtruth.pl deep_new_groundtruth.public.10K.bin groundtruth
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should wrap this in python for consistency. It's just confusing seeing a bunch of python scripts and then seeing pearl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should slip this task to a follow up PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that can work as an immediate follow-up. I'd prefer for it to still be worked into 23.08. We don't necessarily need to support every unique parameter combination initially- so long as we support what's needed to run a basic benchmark end-to-end for billion-scale.

docs/source/raft_ann_benchmarks.md Outdated Show resolved Hide resolved
scripts/ann-benchmarks/get_dataset.py Outdated Show resolved Hide resolved
Comment on lines +9 to +12
mamba env create --name raft_ann_benchmarks -f conda/environments/bench_ann_cuda-118_arch-x86_64.yaml
conda activate raft_ann_benchmarks

mamba install -c rapidsai libraft-ann-bench
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be realistic to setup a conda environment that does not depend on cuda conda packages and uses system cuda installation instead? I'd love to be able to use these scripts using docker containers with latest cuda drivers. In fact, I think this would be the main use case on devtech side: to test and adjust raft implementation for the upcoming hardware.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, with some testing we can come up with a more minimal environment that does not include any cuda conda packages. Let me know if we can work on this together.

@divyegala divyegala marked this pull request as ready for review July 25, 2023 01:20
@divyegala divyegala requested review from a team as code owners July 25, 2023 01:20
@divyegala divyegala requested a review from cjnolet July 25, 2023 19:22
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks so much @divyegala!

@cjnolet
Copy link
Member

cjnolet commented Jul 26, 2023

/merge

@rapids-bot rapids-bot bot merged commit f99a418 into rapidsai:branch-23.08 Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

[DOC] Simplify bench-ann scripts and documentation
7 participants