Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/nccl dso #5001

Merged
merged 6 commits into from
Oct 23, 2017
Merged

Feature/nccl dso #5001

merged 6 commits into from
Oct 23, 2017

Conversation

reyoung
Copy link
Collaborator

@reyoung reyoung commented Oct 22, 2017

cherry-pick @dzhwinter 's commit to this PR, complete cmake file, and some unit tests.

@reyoung reyoung requested a review from dzhwinter October 22, 2017 23:40
wangkuiyi
wangkuiyi previously approved these changes Oct 23, 2017
@@ -68,6 +68,25 @@ else()
if(NOT CUDNN_FOUND)
message(FATAL_ERROR "Paddle need cudnn to compile")
endif()
if (NOT NCCL_INCLUDE_DIR)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for teaching me the logic here in our offline discussion @reyoung . I think we can shorten L71-L82 into

if (NOT NCCL_INCLUDE_DIR)
  message(FATAL_ERROR "Paddle needs the NCCL header file")
endif()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

message(FATAL_ERROR "Paddle need nccl header to compile")
endif()
endif()
if (NOT WITH_DSO)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we can marge the two conditions on L84 and L85 into one

if (NOT WITH_DSO AND NOT NCCL_LIBRARY)
  message(FATAL_ERROR "Paddle needs the NCCL libraries with WITH_DSO=OFF")
endif()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if(NOT ${CMAKE_SYSTEM_PROCESSOR})
set(TARGET_ARCH ${CMAKE_SYSTEM_PROCESSOR})
endif()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for curiosity, could we automatically download NCCL and put the header and library files into the build/third_party directory? We don't have to do this in this PR, but could just file an issue recording this idea.

Copy link
Collaborator Author

@reyoung reyoung Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nccl will be packaged to official CUDA docker image.

We can download nccl header in our Paddle source code. But to keep it simple, we just assume the developers are installed CUDA and related libraries well.

NOTE @dzhwinter 's issue, https://gitlab.com/nvidia/cuda/issues/10#note_43614309

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we shouldn't. This PR just moves NCCL from build/third_party to the docker built-in environment. Partly of this PR is cherry-picked from #4818, which automate download NCCL and build it locally. In #4818, we have to build it early than any other module, so we need to add NCCL as a dependency, which is not correct comparing with DSO

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand your point -- some users might already have a special version of NCCL installed on their systems, so PaddlePaddle should try to use it.

@@ -157,6 +162,14 @@ void GetLapackDsoHandle(void** dso_handle) {
#endif
}

void GetNcclDsoHandle(void** dso_handle) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nccl => NCCL

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -157,6 +162,14 @@ void GetLapackDsoHandle(void** dso_handle) {
#endif
}

void GetNcclDsoHandle(void** dso_handle) {
#if defined(__APPLE__) || defined(__OSX__)
GetDsoHandleFromSearchPath(FLAGS_nccl_dir, "libnccl.dylib", dso_handle);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some future PR, we should rename Dso into DSO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should use DSO for abbreviation dynamic shared object.

* @param **dso_handle dso handler
*
*/
void GetNcclDsoHandle(void** dso_handle);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, but should we rename

GetNcclDsoHandle => dso::LoadNCCL
GetLapackDsoHandle => dso::LoadLAPACK

@reyoung reyoung merged commit 43c6ff2 into PaddlePaddle:develop Oct 23, 2017
@reyoung reyoung deleted the feature/nccl_dso branch October 28, 2017 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants