-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] OPG infra and all-gather smoke test #820
Conversation
@@ -66,6 +66,7 @@ conda install -c nvidia -c rapidsai -c rapidsai-nightly -c conda-forge -c defaul | |||
distributed>=2.12.0 \ | |||
dask-cudf=${MINOR_VERSION} \ | |||
dask-cuda=${MINOR_VERSION} \ | |||
nccl>=2.5 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rlratzel anything else we should do in ci/
to enable OPG testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with what an OPG build needs, but does MPI need to be enabled too? If so, will you need to set the BUILD_MPI
cmake flag? If that's the case, you may also want to update the build.sh
in the root of cugraph so users (and scripts like this one) can build from source correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MPI can be enabled as long as the machine has:
- MPI installed
- multiple GPUs
- call OPG gtests while spanning multiple processes
Considering these requirements this cannot be set as a default path for users in build.sh
as of know. I can expose an option in build.sh
though.
My question was related to CI machines and capabilities to support this OPG testing from DevOps perspective though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was related to CI machines and capabilities to support this OPG testing from DevOps perspective though.
@raydouglass do you know the answer to this question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi GPU testing is not available in gpuCI and I don't think MPI is installed. So sounds like you should not set OPG testing as default.
@@ -104,6 +104,13 @@ set(CMAKE_EXE_LINKER_FLAGS "-Wl,--disable-new-dtags") | |||
option(BUILD_TESTS "Configure CMake to build tests" | |||
ON) | |||
|
|||
option(BUILD_MPI "Build with MPI" OFF) | |||
if (BUILD_MPI) | |||
find_package(MPI REQUIRED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MPI must be installed on the machine if this flag is ON
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor suggestions and comments, intended mainly if this code is going to persist for a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov Report
@@ Coverage Diff @@
## branch-0.14 #820 +/- ##
============================================
Coverage 58.14% 58.14%
============================================
Files 44 44
Lines 1264 1264
============================================
Hits 735 735
Misses 529 529 Continue to review full report at Codecov.
|
Use
-DBUILD_MPI=ON
to build and link OPG features.This stack is compatible with the OPG CSR Pagernak proto
In a few weeks, we may want to replace MPI by UCX (after we can access the comms in RAFT). This current stack allows adding and running what we already have in the meantime.