Skip to content

Commit

Permalink
Let CMake handle NCCL detection instead of our handcrafted Python scr…
Browse files Browse the repository at this point in the history
…ipt. (#22930)

Summary:
 ---

How does the current code subsume all detections in the deleted `nccl.py`?

- The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`.

- The main NCCL detection happens in [FindNCCL.cmake](https://github.com/pytorch/pytorch/blob/8377d4b32c12206a0f9401e81a5e5796c8fc01a8/cmake/Modules/FindNCCL.cmake), which is called by [nccl.cmake](https://github.com/pytorch/pytorch/blob/8377d4b32c12206a0f9401e81a5e5796c8fc01a8/cmake/External/nccl.cmake). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this.

- `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`.

- Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`.
  * The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA.  `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled.

  * Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly:
     - It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system.
     - It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ).

 ---

Regarding for relevant issues:

- pytorch/pytorch#12063 and pytorch/pytorch#2877: These are properly handled, as explained in the updated comment.
- pytorch/pytorch#2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection).
- b7e258f A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests:

> When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions.
Pull Request resolved: pytorch/pytorch#22930

Differential Revision: D16440275

Pulled By: ezyang

fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa
  • Loading branch information
xuhdev authored and facebook-github-bot committed Jul 23, 2019
1 parent e4b75c6 commit 60c46dd
Show file tree
Hide file tree
Showing 6 changed files with 39 additions and 121 deletions.
11 changes: 9 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,15 @@ option(USE_LITE_PROTO "Use lite protobuf instead of full." OFF)
option(USE_LMDB "Use LMDB" OFF)
option(USE_METAL "Use Metal for iOS build" ON)
option(USE_NATIVE_ARCH "Use -march=native" OFF)
option(USE_NCCL "Use NCCL" ON)
option(USE_SYSTEM_NCCL "Use system-wide NCCL" OFF)
cmake_dependent_option(
USE_NCCL "Use NCCL" ON
"USE_CUDA;UNIX;NOT APPLE" OFF)
cmake_dependent_option(
USE_STATIC_NCCL "Use static NCCL" OFF
"USE_NCCL" OFF)
cmake_dependent_option(
USE_SYSTEM_NCCL "Use system-wide NCCL" OFF
"USE_NCCL" OFF)
option(USE_NNAPI "Use NNAPI" OFF)
option(USE_NNPACK "Use NNPACK" ON)
option(USE_NUMA "Use NUMA (only available on Linux)" ON)
Expand Down
17 changes: 3 additions & 14 deletions cmake/External/nccl.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,12 @@ if (NOT __NCCL_INCLUDED)
set(__NCCL_INCLUDED TRUE)

if (USE_SYSTEM_NCCL)
# if we have explicit paths passed from setup.py, use those
if (NCCL_INCLUDE_DIR)
# used by gloo cmake among others
SET(NCCL_INCLUDE_DIRS ${NCCL_INCLUDE_DIR})
SET(NCCL_LIBRARIES ${NCCL_SYSTEM_LIB})
set(NCCL_FOUND TRUE)
# NCCL_ROOT, NCCL_LIB_DIR, NCCL_INCLUDE_DIR will be accounted in the following line.
find_package(NCCL REQUIRED)
if (NCCL_FOUND)
add_library(__caffe2_nccl INTERFACE)
target_link_libraries(__caffe2_nccl INTERFACE ${NCCL_LIBRARIES})
target_include_directories(__caffe2_nccl INTERFACE ${NCCL_INCLUDE_DIRS})
else()
find_package(NCCL)
if (NCCL_FOUND)
add_library(__caffe2_nccl INTERFACE)
target_link_libraries(__caffe2_nccl INTERFACE ${NCCL_LIBRARIES})
target_include_directories(__caffe2_nccl INTERFACE ${NCCL_INCLUDE_DIRS})
endif()
endif()
else()
torch_cuda_get_nvcc_gencode_flag(NVCC_GENCODE)
Expand Down Expand Up @@ -57,5 +47,4 @@ if (NOT __NCCL_INCLUDED)
target_link_libraries(__caffe2_nccl INTERFACE ${NCCL_LIBRARIES})
target_include_directories(__caffe2_nccl INTERFACE ${NCCL_INCLUDE_DIRS})
endif()

endif()
43 changes: 24 additions & 19 deletions cmake/Modules/FindNCCL.cmake
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Find the nccl libraries
#
# The following variables are optionally searched for defaults
# NCCL_ROOT_DIR: Base directory where all NCCL components are found
# NCCL_ROOT: Base directory where all NCCL components are found
# NCCL_INCLUDE_DIR: Directory where NCCL header is found
# NCCL_LIB_DIR: Directory where NCCL library is found
#
Expand All @@ -14,32 +14,37 @@
# install NCCL in the same location as the CUDA toolkit.
# See https://github.com/caffe2/caffe2/issues/1601

set(NCCL_ROOT_DIR $ENV{NCCL_ROOT_DIR} CACHE PATH "Folder contains NVIDIA NCCL")
set(NCCL_INCLUDE_DIR $ENV{NCCL_INCLUDE_DIR} CACHE PATH "Folder contains NVIDIA NCCL headers")
set(NCCL_LIB_DIR $ENV{NCCL_LIB_DIR} CACHE PATH "Folder contains NVIDIA NCCL libraries")
set(NCCL_VERSION $ENV{NCCL_VERSION} CACHE STRING "Version of NCCL to build with")

if ($ENV{NCCL_ROOT_DIR})
message(WARNING "NCCL_ROOT_DIR is deprecated. Please set NCCL_ROOT instead.")
endif()
list(APPEND NCCL_ROOT $ENV{NCCL_ROOT_DIR} ${CUDA_TOOLKIT_ROOT_DIR})
# Compatible layer for CMake <3.12. NCCL_ROOT will be accounted in for searching paths and libraries for CMake >=3.12.
list(APPEND CMAKE_PREFIX_PATH ${NCCL_ROOT})

find_path(NCCL_INCLUDE_DIRS
NAMES nccl.h
HINTS
${NCCL_INCLUDE_DIR}
${NCCL_ROOT_DIR}
${NCCL_ROOT_DIR}/include
${CUDA_TOOLKIT_ROOT_DIR}/include)
HINTS ${NCCL_INCLUDE_DIR})

IF ($ENV{USE_STATIC_NCCL})
MESSAGE(STATUS "USE_STATIC_NCCL detected. Linking against static NCCL library")
SET(NCCL_LIBNAME "libnccl_static.a")
ELSE()
if (USE_STATIC_NCCL)
MESSAGE(STATUS "USE_STATIC_NCCL is set. Linking with static NCCL library.")
SET(NCCL_LIBNAME "nccl_static")
if (NCCL_VERSION) # Prefer the versioned library if a specific NCCL version is specified
set(CMAKE_FIND_LIBRARY_SUFFIXES ".a.${NCCL_VERSION}" ${CMAKE_FIND_LIBRARY_SUFFIXES})
endif()
else()
SET(NCCL_LIBNAME "nccl")
ENDIF()
if (NCCL_VERSION) # Prefer the versioned library if a specific NCCL version is specified
set(CMAKE_FIND_LIBRARY_SUFFIXES ".so.${NCCL_VERSION}" ${CMAKE_FIND_LIBRARY_SUFFIXES})
endif()
endif()

find_library(NCCL_LIBRARIES
NAMES ${NCCL_LIBNAME}
HINTS
${NCCL_LIB_DIR}
${NCCL_ROOT_DIR}
${NCCL_ROOT_DIR}/lib
${NCCL_ROOT_DIR}/lib/x86_64-linux-gnu
${NCCL_ROOT_DIR}/lib64
${CUDA_TOOLKIT_ROOT_DIR}/lib64)
HINTS ${NCCL_LIB_DIR})

include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(NCCL DEFAULT_MSG NCCL_INCLUDE_DIRS NCCL_LIBRARIES)
Expand Down
6 changes: 3 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@
# MIOPEN_LIBRARY
# specify where MIOpen is installed
#
# NCCL_ROOT_DIR
# NCCL_ROOT
# NCCL_LIB_DIR
# NCCL_INCLUDE_DIR
# specify where nccl is installed
Expand Down Expand Up @@ -187,7 +187,6 @@
from tools.setup_helpers.cmake import CMake
from tools.setup_helpers.cuda import CUDA_HOME, CUDA_VERSION
from tools.setup_helpers.cudnn import CUDNN_LIBRARY, CUDNN_INCLUDE_DIR
from tools.setup_helpers.nccl import NCCL_SYSTEM_LIB, NCCL_INCLUDE_DIR

try:
FileNotFoundError
Expand Down Expand Up @@ -387,7 +386,8 @@ def run(self):
else:
report('-- Not using MKLDNN')
if cmake_cache_vars['USE_NCCL'] and cmake_cache_vars['USE_SYSTEM_NCCL']:
report('-- Using system provided NCCL library at ' + NCCL_SYSTEM_LIB + ', ' + NCCL_INCLUDE_DIR)
report('-- Using system provided NCCL library at {}, {}'.format(cmake_cache_vars['NCCL_LIBRARIES'],
cmake_cache_vars['NCCL_INCLUDE_DIRS']))
elif cmake_cache_vars['USE_NCCL']:
report('-- Building NCCL library')
else:
Expand Down
7 changes: 0 additions & 7 deletions tools/setup_helpers/cmake.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@
check_env_flag, check_negative_env_flag)
from .cuda import USE_CUDA
from .dist_check import USE_DISTRIBUTED, USE_GLOO_IBVERBS
from .nccl import (USE_SYSTEM_NCCL, NCCL_INCLUDE_DIR, NCCL_ROOT_DIR,
NCCL_SYSTEM_LIB, USE_NCCL)
from .numpy_ import USE_NUMPY, NUMPY_INCLUDE_DIR


Expand Down Expand Up @@ -272,8 +270,6 @@ def generate(self, version, cmake_python_library, build_python, build_test, my_e
'USE_DISTRIBUTED': USE_DISTRIBUTED,
'USE_FBGEMM': not (check_env_flag('NO_FBGEMM') or
check_negative_env_flag('USE_FBGEMM')),
'USE_NCCL': USE_NCCL,
'USE_SYSTEM_NCCL': USE_SYSTEM_NCCL,
'USE_NUMPY': USE_NUMPY,
'USE_SYSTEM_EIGEN_INSTALL': 'OFF'
})
Expand All @@ -286,9 +282,6 @@ def generate(self, version, cmake_python_library, build_python, build_test, my_e
CMAKE_BUILD_TYPE=self._build_type,
INSTALL_TEST=build_test,
NUMPY_INCLUDE_DIR=escape_path(NUMPY_INCLUDE_DIR),
NCCL_INCLUDE_DIR=NCCL_INCLUDE_DIR,
NCCL_ROOT_DIR=NCCL_ROOT_DIR,
NCCL_SYSTEM_LIB=NCCL_SYSTEM_LIB,
CMAKE_INSTALL_PREFIX=install_dir,
CMAKE_C_FLAGS=cflags,
CMAKE_CXX_FLAGS=cflags,
Expand Down
76 changes: 0 additions & 76 deletions tools/setup_helpers/nccl.py

This file was deleted.

0 comments on commit 60c46dd

Please sign in to comment.