Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] OPG infra and all-gather smoke test #820

Merged
merged 10 commits into from
Apr 10, 2020

Conversation

afender
Copy link
Member

@afender afender commented Apr 9, 2020

  • get nccl from conda (with the option of using a local one)
  • cmake and setup.py find/include/link nccl.
  • find/include/link MPI
  • All gather test

Use -DBUILD_MPI=ON to build and link OPG features.

This stack is compatible with the OPG CSR Pagernak proto
In a few weeks, we may want to replace MPI by UCX (after we can access the comms in RAFT). This current stack allows adding and running what we already have in the meantime.

Screen Shot 2020-04-08 at 4 15 41 PM

@afender afender requested review from a team as code owners April 9, 2020 16:35
@afender afender changed the title [WIP][skip-ci] OPG infra and all-gather smoke test [WIP] OPG infra and all-gather smoke test Apr 9, 2020
@@ -66,6 +66,7 @@ conda install -c nvidia -c rapidsai -c rapidsai-nightly -c conda-forge -c defaul
distributed>=2.12.0 \
dask-cudf=${MINOR_VERSION} \
dask-cuda=${MINOR_VERSION} \
nccl>=2.5 \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rlratzel anything else we should do in ci/ to enable OPG testing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with what an OPG build needs, but does MPI need to be enabled too? If so, will you need to set the BUILD_MPI cmake flag? If that's the case, you may also want to update the build.sh in the root of cugraph so users (and scripts like this one) can build from source correctly.

Copy link
Member Author

@afender afender Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MPI can be enabled as long as the machine has:

  • MPI installed
  • multiple GPUs
  • call OPG gtests while spanning multiple processes

Considering these requirements this cannot be set as a default path for users in build.sh as of know. I can expose an option in build.sh though.

My question was related to CI machines and capabilities to support this OPG testing from DevOps perspective though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was related to CI machines and capabilities to support this OPG testing from DevOps perspective though.

@raydouglass do you know the answer to this question?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi GPU testing is not available in gpuCI and I don't think MPI is installed. So sounds like you should not set OPG testing as default.

@@ -104,6 +104,13 @@ set(CMAKE_EXE_LINKER_FLAGS "-Wl,--disable-new-dtags")
option(BUILD_TESTS "Configure CMake to build tests"
ON)

option(BUILD_MPI "Build with MPI" OFF)
if (BUILD_MPI)
find_package(MPI REQUIRED)
Copy link
Member Author

@afender afender Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MPI must be installed on the machine if this flag is ON

@afender afender changed the title [WIP] OPG infra and all-gather smoke test [REVIEW] OPG infra and all-gather smoke test Apr 9, 2020
Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor suggestions and comments, intended mainly if this code is going to persist for a while.

cpp/tests/test_utils.h Outdated Show resolved Hide resolved
cpp/tests/test_utils.h Show resolved Hide resolved
cpp/tests/test_utils.h Outdated Show resolved Hide resolved
cpp/tests/test_utils.h Outdated Show resolved Hide resolved
Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-io
Copy link

codecov-io commented Apr 9, 2020

Codecov Report

Merging #820 into branch-0.14 will not change coverage by %.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##           branch-0.14     #820   +/-   ##
============================================
  Coverage        58.14%   58.14%           
============================================
  Files               44       44           
  Lines             1264     1264           
============================================
  Hits               735      735           
  Misses             529      529           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b236d1...95af9e1. Read the comment docs.

@BradReesWork BradReesWork merged commit 8acb743 into rapidsai:branch-0.14 Apr 10, 2020
@afender afender deleted the opg_env branch April 5, 2021 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants