-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] OPG infra and all-gather smoke test #820
Merged
Merged
Changes from 8 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
7966607
Merge pull request #38 from rapidsai/branch-0.13
afender 79bc376
Merge pull request #39 from rapidsai/branch-0.14
afender 35aba60
adding MPI to cmake
afender 1f02fc2
adding nccl
afender e4e71a9
added allgather test
afender a7e155e
adding test file
afender 6995671
changelog
afender c921b7e
Merge branch 'branch-0.14' into opg_env
afender 0dbc162
improved nccl/mpi direct macros in test suite
afender 95af9e1
Merge branch 'opg_env' of github.com:afender/cugraph into opg_env
afender File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -104,6 +104,13 @@ set(CMAKE_EXE_LINKER_FLAGS "-Wl,--disable-new-dtags") | |
option(BUILD_TESTS "Configure CMake to build tests" | ||
ON) | ||
|
||
option(BUILD_MPI "Build with MPI" OFF) | ||
if (BUILD_MPI) | ||
find_package(MPI REQUIRED) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MPI must be installed on the machine if this flag is |
||
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${MPI_C_COMPILE_FLAGS}") | ||
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${MPI_CXX_COMPILE_FLAGS}") | ||
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${MPI_CXX_LINK_FLAGS}") | ||
endif(BUILD_MPI) | ||
################################################################################################### | ||
# - cmake modules --------------------------------------------------------------------------------- | ||
|
||
|
@@ -302,6 +309,15 @@ add_dependencies(gunrock cugunrock) | |
|
||
set_property(TARGET gunrock PROPERTY IMPORTED_LOCATION ${CUGUNROCK_DIR}/lib/libgunrock.a) | ||
|
||
# - NCCL | ||
if(NOT NCCL_PATH) | ||
find_package(NCCL REQUIRED) | ||
else() | ||
message("-- Manually set NCCL PATH to ${NCCL_PATH}") | ||
set(NCCL_INCLUDE_DIRS ${NCCL_PATH}/include) | ||
set(NCCL_LIBRARIES ${NCCL_PATH}/lib/libnccl.so) | ||
endif(NOT NCCL_PATH) | ||
|
||
################################################################################################### | ||
# - library targets ------------------------------------------------------------------------------- | ||
|
||
|
@@ -383,7 +399,6 @@ add_dependencies(cugraph cugunrock) | |
|
||
################################################################################################### | ||
# - include paths --------------------------------------------------------------------------------- | ||
|
||
target_include_directories(cugraph | ||
PRIVATE | ||
"${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}" | ||
|
@@ -399,6 +414,8 @@ target_include_directories(cugraph | |
"${CUHORNET_INCLUDE_DIR}/primitives" | ||
"${CMAKE_CURRENT_SOURCE_DIR}/src" | ||
"${CUGUNROCK_DIR}/include" | ||
"${NCCL_INCLUDE_DIRS}" | ||
"${MPI_CXX_INCLUDE_PATH}" | ||
PUBLIC | ||
"${CMAKE_CURRENT_SOURCE_DIR}/include" | ||
) | ||
|
@@ -407,7 +424,7 @@ target_include_directories(cugraph | |
# - link libraries -------------------------------------------------------------------------------- | ||
|
||
target_link_libraries(cugraph PRIVATE | ||
${CUDF_LIBRARY} ${RMM_LIBRARY} gunrock ${NVSTRINGS_LIBRARY} cublas cusparse curand cusolver cudart cuda ${LIBCYPHERPARSER_LIBRARY}) | ||
${CUDF_LIBRARY} ${RMM_LIBRARY} gunrock ${NVSTRINGS_LIBRARY} cublas cusparse curand cusolver cudart cuda ${LIBCYPHERPARSER_LIBRARY} ${MPI_CXX_LIBRARIES} ${NCCL_LIBRARIES}) | ||
if(OpenMP_CXX_FOUND) | ||
target_link_libraries(cugraph PRIVATE | ||
################################################################################################### | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
# Copyright (c) 2019, NVIDIA CORPORATION. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
# Based on FindPNG.cmake from cmake 3.14.3 | ||
|
||
#[=======================================================================[.rst: | ||
FindNCCL | ||
-------- | ||
|
||
Find libnccl, the NVIDIA Collective Communication Library. A hint to find NCCL | ||
can be provided by setting NCCL_INSTALL_DIR. | ||
|
||
Imported targets | ||
^^^^^^^^^^^^^^^^ | ||
|
||
This module defines the following :prop_tgt:`IMPORTED` target: | ||
|
||
``NCCL::NCCL`` | ||
The libnccl library, if found. | ||
|
||
Result variables | ||
^^^^^^^^^^^^^^^^ | ||
|
||
This module will set the following variables in your project: | ||
|
||
``NCCL_INCLUDE_DIRS`` | ||
where to find nccl.h , etc. | ||
``NCCL_LIBRARIES`` | ||
the libraries to link against to use NCCL. | ||
``NCCL_FOUND`` | ||
If false, do not try to use NCCL. | ||
``NCCL_VERSION_STRING`` | ||
the version of the NCCL library found | ||
|
||
#]=======================================================================] | ||
|
||
find_path(NCCL_NCCL_INCLUDE_DIR nccl.h HINTS ${NCCL_INSTALL_DIR} PATH_SUFFIXES include) | ||
|
||
#TODO: Does this need to support finding the static library? | ||
|
||
list(APPEND NCCL_NAMES nccl libnccl) | ||
set(_NCCL_VERSION_SUFFIXES 2) | ||
|
||
foreach(v IN LISTS _NCCL_VERSION_SUFFIXES) | ||
list(APPEND NCCL_NAMES nccl${v} libnccl${v}) | ||
endforeach() | ||
unset(_NCCL_VERSION_SUFFIXES) | ||
# For compatibility with versions prior to this multi-config search, honor | ||
# any NCCL_LIBRARY that is already specified and skip the search. | ||
if(NOT NCCL_LIBRARY) | ||
find_library(NCCL_LIBRARY_RELEASE NAMES ${NCCL_NAMES} HINTS ${NCCL_INSTALL_DIR} PATH_SUFFIXES lib) | ||
include(${CMAKE_ROOT}/Modules/SelectLibraryConfigurations.cmake) | ||
select_library_configurations(NCCL) | ||
mark_as_advanced(NCCL_LIBRARY_RELEASE) | ||
endif() | ||
unset(NCCL_NAMES) | ||
|
||
# Set by select_library_configurations(), but we want the one from | ||
# find_package_handle_standard_args() below. | ||
unset(NCCL_FOUND) | ||
|
||
if (NCCL_LIBRARY AND NCCL_NCCL_INCLUDE_DIR) | ||
set(NCCL_INCLUDE_DIRS ${NCCL_NCCL_INCLUDE_DIR} ) | ||
set(NCCL_LIBRARY ${NCCL_LIBRARY}) | ||
|
||
if(NOT TARGET NCCL::NCCL) | ||
add_library(NCCL::NCCL UNKNOWN IMPORTED) | ||
set_target_properties(NCCL::NCCL PROPERTIES | ||
INTERFACE_INCLUDE_DIRECTORIES "${NCCL_INCLUDE_DIRS}") | ||
if(EXISTS "${NCCL_LIBRARY}") | ||
set_target_properties(NCCL::NCCL PROPERTIES | ||
IMPORTED_LINK_INTERFACE_LANGUAGES "C" | ||
IMPORTED_LOCATION "${NCCL_LIBRARY}") | ||
endif() | ||
endif() | ||
endif () | ||
|
||
if (NCCL_NCCL_INCLUDE_DIR AND EXISTS "${NCCL_NCCL_INCLUDE_DIR}/nccl.h") | ||
file(STRINGS "${NCCL_NCCL_INCLUDE_DIR}/nccl.h" nccl_major_version_str REGEX "^#define[ \t]+NCCL_MAJOR[ \t]+[0-9]+") | ||
string(REGEX REPLACE "^#define[ \t]+NCCL_MAJOR[ \t]+([0-9]+)" "\\1" nccl_major_version_str "${nccl_major_version_str}") | ||
|
||
file(STRINGS "${NCCL_NCCL_INCLUDE_DIR}/nccl.h" nccl_minor_version_str REGEX "^#define[ \t]+NCCL_MINOR[ \t]+[0-9]+") | ||
string(REGEX REPLACE "^#define[ \t]+NCCL_MINOR[ \t]+([0-9]+)" "\\1" nccl_minor_version_str "${nccl_minor_version_str}") | ||
|
||
file(STRINGS "${NCCL_NCCL_INCLUDE_DIR}/nccl.h" nccl_patch_version_str REGEX "^#define[ \t]+NCCL_PATCH[ \t]+[0-9]+") | ||
string(REGEX REPLACE "^#define[ \t]+NCCL_PATCH[ \t]+([0-9]+)" "\\1" nccl_patch_version_str "${nccl_patch_version_str}") | ||
|
||
file(STRINGS "${NCCL_NCCL_INCLUDE_DIR}/nccl.h" nccl_suffix_version_str REGEX "^#define[ \t]+NCCL_SUFFIX[ \t]+\".*\"") | ||
string(REGEX REPLACE "^#define[ \t]+NCCL_SUFFIX[ \t]+\"(.*)\"" "\\1" nccl_suffix_version_str "${nccl_suffix_version_str}") | ||
|
||
set(NCCL_VERSION_STRING "${nccl_major_version_str}.${nccl_minor_version_str}.${nccl_patch_version_str}${nccl_suffix_version_str}") | ||
|
||
unset(nccl_major_version_str) | ||
unset(nccl_minor_version_str) | ||
unset(nccl_patch_version_str) | ||
unset(nccl_suffix_version_str) | ||
endif () | ||
|
||
include(${CMAKE_ROOT}/Modules/FindPackageHandleStandardArgs.cmake) | ||
find_package_handle_standard_args(NCCL | ||
REQUIRED_VARS NCCL_LIBRARY NCCL_NCCL_INCLUDE_DIR | ||
VERSION_VAR NCCL_VERSION_STRING) | ||
|
||
mark_as_advanced(NCCL_NCCL_INCLUDE_DIR NCCL_LIBRARY) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
#include "gtest/gtest.h" | ||
#include <cugraph.h> | ||
#include "test_utils.h" | ||
#include <string.h> | ||
#include <mpi.h> | ||
#include <nccl.h> | ||
#include <thrust/device_vector.h> | ||
#include <thrust/functional.h> | ||
|
||
TEST(allgather, success) | ||
{ | ||
int p = 1, r = 0, dev = 0, dev_count = 0; | ||
MPICHECK(MPI_Comm_size(MPI_COMM_WORLD, &p)); | ||
MPICHECK(MPI_Comm_rank(MPI_COMM_WORLD, &r)); | ||
CUDA_RT_CALL(cudaGetDeviceCount(&dev_count)); | ||
|
||
// shortcut for device ID here | ||
// may need something smarter later | ||
dev = r%dev_count; | ||
// cudaSetDevice must happen before ncclCommInitRank | ||
CUDA_RT_CALL(cudaSetDevice(dev)); | ||
|
||
// print info | ||
printf("# Rank %2d - Pid %6d - device %2d\n", | ||
r, getpid(), dev); | ||
|
||
// NCCL init | ||
ncclUniqueId id; | ||
ncclComm_t comm; | ||
if (r == 0) NCCLCHECK(ncclGetUniqueId(&id)); | ||
MPICHECK(MPI_Bcast((void *)&id, sizeof(id), MPI_BYTE, 0, MPI_COMM_WORLD)); | ||
NCCLCHECK(ncclCommInitRank(&comm, p, id, r)); | ||
MPICHECK(MPI_Barrier(MPI_COMM_WORLD)); | ||
|
||
//allocate device buffers | ||
int size = 3; | ||
float *sendbuff, *recvbuff; | ||
CUDA_RT_CALL(cudaMalloc(&sendbuff, size * sizeof(float))); | ||
CUDA_RT_CALL(cudaMalloc(&recvbuff, size*p * sizeof(float))); | ||
|
||
//init values | ||
thrust::fill(thrust::device_pointer_cast(sendbuff), | ||
thrust::device_pointer_cast(sendbuff + size), (float)r); | ||
thrust::fill(thrust::device_pointer_cast(recvbuff), | ||
thrust::device_pointer_cast(recvbuff + size*p), -1.0f); | ||
|
||
// ncclAllGather | ||
NCCLCHECK(ncclAllGather((const void*)sendbuff, (void*)recvbuff, size, ncclFloat, comm, cudaStreamDefault)); | ||
|
||
// expect each rankid printed size times in ascending order | ||
if (r == 0) { | ||
thrust::device_ptr<float> dev_ptr(recvbuff); | ||
std::cout.precision(15); | ||
thrust::copy(dev_ptr, dev_ptr + size*p, std::ostream_iterator<float>(std::cout, " ")); | ||
std::cout << std::endl; | ||
} | ||
|
||
//free device buffers | ||
CUDA_RT_CALL(cudaFree(sendbuff)); | ||
CUDA_RT_CALL(cudaFree(recvbuff)); | ||
|
||
//finalizing NCCL | ||
NCCLCHECK(ncclCommDestroy(comm)); | ||
} | ||
|
||
int main( int argc, char** argv ) | ||
{ | ||
testing::InitGoogleTest(&argc,argv); | ||
MPI_Init(&argc, &argv); | ||
rmmInitialize(nullptr); | ||
int rc = RUN_ALL_TESTS(); | ||
rmmFinalize(); | ||
MPI_Finalize(); | ||
return rc; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rlratzel anything else we should do in
ci/
to enable OPG testing?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with what an OPG build needs, but does MPI need to be enabled too? If so, will you need to set the
BUILD_MPI
cmake flag? If that's the case, you may also want to update thebuild.sh
in the root of cugraph so users (and scripts like this one) can build from source correctly.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MPI can be enabled as long as the machine has:
Considering these requirements this cannot be set as a default path for users in
build.sh
as of know. I can expose an option inbuild.sh
though.My question was related to CI machines and capabilities to support this OPG testing from DevOps perspective though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raydouglass do you know the answer to this question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi GPU testing is not available in gpuCI and I don't think MPI is installed. So sounds like you should not set OPG testing as default.