Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow Serving build fails on GPU instance #336

Closed
sskgit opened this issue Feb 24, 2017 · 7 comments
Closed

Tensorflow Serving build fails on GPU instance #336

sskgit opened this issue Feb 24, 2017 · 7 comments

Comments

@sskgit
Copy link

sskgit commented Feb 24, 2017

System Info:
Ubuntu 16.04
CUDA 8.0
CuDNN 5

While running the build command
bazel build -c opt --config=cuda --spawn_strategy=standalone tensorflow_serving/...

received this error:
ERROR: no such target '@org_tensorflow//third_party/gpus/crosstool:crosstool': target 'crosstool' not declared in package 'third_party/gpus/crosstool' defined by /home/gpuadmin/.cache/bazel/_bazel_gpuadmin/2f87378698dce1c5bd8cb597c1e87ca0/external/org_tensorflow/third_party/gpus/crosstool/BUILD.
INFO: Elapsed time: 0.175s

Looked at Issue #186 and added to bazel.rc

  1. the crosstool in tools/bazel.rc is invalid (AFAIK). change @org_tensorflow//third_party/gpus/crosstool to @local_config_cuda//crosstool:toolchain.

After adding got another error:
ERROR: /home/gpuadmin/.cache/bazel/_bazel_gpuadmin/2f87378698dce1c5bd8cb597c1e87ca0/external/org_tensorflow/tensorflow/contrib/nccl/BUILD:23:1: C++ compilation of rule '@org_tensorflow//tensorflow/contrib/nccl:python/ops/_nccl_ops.so' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command
external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 77 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
In file included from external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager.cc:15:0:
external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager.h:23:44: fatal error: external/nccl_archive/src/nccl.h: No such file or directory
compilation terminated.

Also, referred to #225

For build I used these commands

bazel build -c opt --config=cuda tensorflow_serving/...

bazel build -c opt --config=cuda --spawn_strategy=standalone tensorflow_serving/...

Still the build fails... Any suggestions to fix this issue or or any other workaround?

@bjelkenhed
Copy link

I got the same error: fatal error: external/nccl_archive/src/nccl.h: No such file or directory
compilation terminated.

After running:
bazel build -c opt --config=cuda tensorflow_serving/...

System Info:
Ubuntu 16.04
CUDA 8.0
CuDNN 5.1.5

@jlertle
Copy link

jlertle commented Feb 25, 2017

To get around it you can comment out the dep for nccl in: tensorflow/tensorflow/contrib/BUILD @ line 42

@sskgit
Copy link
Author

sskgit commented Feb 25, 2017

Commenting out DEP for nccl in: serving/tensorflow/tensorflow/contrib/BUILD (as mentioned in #327) resolved the issue. The build was successful.

@sskgit sskgit closed this as completed Feb 25, 2017
@sskgit
Copy link
Author

sskgit commented Feb 28, 2017

Thanks @jlertle for your suggestion.

@kirilg
Copy link
Contributor

kirilg commented Feb 28, 2017

Good to hear there is a workaround, but I think the underlying issue would need to be fixed. Reopening the issue so we can track it.

@sugartom
Copy link

You can install nccl library as suggested in #327 (comment).

I saw exactly same error message when installing TF serving, and fixed it by following the above suggestion.

@gautamvasudevan
Copy link
Collaborator

Closing - please see the latest Docker examples for bringing up a build environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants