Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Let CMake handle NCCL detection instead of our handcrafted Python scr…
…ipt. (#22930) Summary: --- How does the current code subsume all detections in the deleted `nccl.py`? - The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`. - The main NCCL detection happens in [FindNCCL.cmake](https://github.com/pytorch/pytorch/blob/8377d4b32c12206a0f9401e81a5e5796c8fc01a8/cmake/Modules/FindNCCL.cmake), which is called by [nccl.cmake](https://github.com/pytorch/pytorch/blob/8377d4b32c12206a0f9401e81a5e5796c8fc01a8/cmake/External/nccl.cmake). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this. - `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`. - Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`. * The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA. `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled. * Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly: - It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system. - It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ). --- Regarding for relevant issues: - pytorch/pytorch#12063 and pytorch/pytorch#2877: These are properly handled, as explained in the updated comment. - pytorch/pytorch#2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection). - b7e258f A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests: > When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions. Pull Request resolved: pytorch/pytorch#22930 Differential Revision: D16440275 Pulled By: ezyang fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa
- Loading branch information