Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xla:gpu] Build with GPU support fails with linker error #10616

Open
pxanthopoulos opened this issue Mar 15, 2024 · 7 comments
Open

[xla:gpu] Build with GPU support fails with linker error #10616

pxanthopoulos opened this issue Mar 15, 2024 · 7 comments

Comments

@pxanthopoulos
Copy link

I am trying to build XLA from source following the instructions found below, with Docker & GPU support:

https://openxla.org/xla/build_from_source

More specifically, i cloned the XLA repo from a directory and executed the following commands:

  1. docker run --gpus all --name xla_gpu -w /xla -it -d --rm -v ./xla:/xla tensorflow/build:latest-python3.9 bash

(I added the --gpus all flag because the configure script failed as it could not find nvidia-smi.)

  1. docker exec -it xla_gpu bash

  2. ./configure.py --backend=CUDA with output:

INFO:root:Found path to clang at /usr/lib/llvm-17/bin/clang
INFO:root:Running echo __clang_major__ | /usr/lib/llvm-17/bin/clang -E -P -
INFO:root:/usr/lib/llvm-17/bin/clang reports major version 17.
INFO:root:Trying to find path to nvidia-smi...
INFO:root:Found path to nvidia-smi at /usr/bin/nvidia-smi
INFO:root:Found CUDA compute capabilities: ['7.0', '8.0']
INFO:root:Some CUDA config versions and paths were not provided, so trying to find them using find_cuda_config.py
INFO:root:Writing bazelrc to /xla/xla_configure.bazelrc...
  1. bazel build --test_output=all --spawn_strategy=sandboxed //xla/...

This step failed with the following error message:

ERROR: /xla/xla/tsl/cuda/BUILD.bazel:277:11: no such target '@local_config_nccl//:nccl_headers': target 'nccl_headers' not declared in package '' defined by /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/local_config_nccl/BUILD (Tip: use `query "@local_config_nccl//:*"` to see all the targets in that package) and referenced by '//xla/tsl/cuda:nccl_stub'
INFO: Repository double_conversion instantiated at:
  /xla/WORKSPACE:19:15: in <toplevel>
  /xla/workspace2.bzl:111:19: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:622:21: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:506:20: in _tf_repositories
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:89:35: in <toplevel>
INFO: Repository com_google_benchmark instantiated at:
  /xla/WORKSPACE:19:15: in <toplevel>
  /xla/workspace2.bzl:111:19: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:615:28: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:46:14: in _initialize_third_party
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/benchmark/workspace.bzl:9:20: in repo
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:89:35: in <toplevel>
INFO: Repository nccl_archive instantiated at:
  /xla/WORKSPACE:19:15: in <toplevel>
  /xla/workspace2.bzl:111:19: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:622:21: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:402:20: in _tf_repositories
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:89:35: in <toplevel>
INFO: Repository cutlass_archive instantiated at:
  /xla/WORKSPACE:19:15: in <toplevel>
  /xla/workspace2.bzl:125:21: in workspace
  /xla/workspace2.bzl:46:20: in _tf_repositories
  /xla/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /xla/third_party/repo.bzl:89:35: in <toplevel>
INFO: Repository zlib instantiated at:
  /xla/WORKSPACE:19:15: in <toplevel>
  /xla/workspace2.bzl:111:19: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:622:21: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:384:20: in _tf_repositories
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:89:35: in <toplevel>
INFO: Repository jsoncpp_git instantiated at:
  /xla/WORKSPACE:19:15: in <toplevel>
  /xla/workspace2.bzl:111:19: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:622:21: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:376:20: in _tf_repositories
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:89:35: in <toplevel>
INFO: Repository nvtx_archive instantiated at:
  /xla/WORKSPACE:19:15: in <toplevel>
  /xla/workspace2.bzl:111:19: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:622:21: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:412:20: in _tf_repositories
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:89:35: in <toplevel>
ERROR: Analysis of target '//xla/tsl/cuda:nccl_stub' failed; build aborted: Analysis failed

I overcame this error by editing the file /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/local_config_nccl/BUILD referenced at the error message. I added the following to the end of this file:

alias( name = "nccl_headers", actual = "@nccl_archive//:nccl_headers", visibility = ["//visibility:public"], )

Then, I reran the 4th step (the build command). After building ~39000 of the ~45000 targets, it then failed with the following error message:

ERROR: /xla/xla/tools/BUILD:124:14: Linking xla/tools/convert_computation failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //xla/tools:convert_computation) external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/xla/tools/convert_computation-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::~CudaPlatform()':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatformD2Ev+0x18): undefined reference to `stream_executor::ExecutorCache::~ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::~CudaPlatform()':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatformD0Ev+0x18): undefined reference to `stream_executor::ExecutorCache::~ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::GetExecutor(stream_executor::StreamExecutorConfig const&)':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatform11GetExecutorERKNS_20StreamExecutorConfigE+0x1d): undefined reference to `stream_executor::ExecutorCache::Get(stream_executor::StreamExecutorConfig const&)'
/usr/bin/ld: cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatform11GetExecutorERKNS_20StreamExecutorConfigE+0x49): undefined reference to `stream_executor::ExecutorCache::GetOrCreate(stream_executor::StreamExecutorConfig const&, std::function<absl::lts_20230802::StatusOr<std::unique_ptr<stream_executor::StreamExecutor, std::default_delete<stream_executor::StreamExecutor> > > ()> const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `_GLOBAL__sub_I_cuda_platform.cc':
cuda_platform.cc:(.text.startup+0x6b): undefined reference to `stream_executor::ExecutorCache::ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_executor_cuda_only.lo(cuda_executor.o): in function `stream_executor::gpu::GpuExecutor::GetKernel(stream_executor::MultiKernelLoaderSpec const&, stream_executor::Kernel*)':
cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x6c2): undefined reference to `stream_executor::KernelMetadata::set_registers_per_thread(int)'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x6f9): undefined reference to `stream_executor::KernelMetadata::set_shared_memory_bytes(int)'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x727): undefined reference to `stream_executor::Kernel::set_name(std::basic_string_view<char, std::char_traits<char> >)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_executor_cuda_only.lo(cuda_executor.o): in function `stream_executor::gpu::GpuExecutor::VlogOccupancyInfo(stream_executor::DeviceDescription const&, stream_executor::Kernel const&, stream_executor::ThreadDim const&, stream_executor::BlockDim const&)':
cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor17VlogOccupancyInfoERKNS_17DeviceDescriptionERKNS_6KernelERKNS_9ThreadDimERKNS_8BlockDimE+0x65): undefined reference to `stream_executor::KernelMetadata::registers_per_thread() const'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor17VlogOccupancyInfoERKNS_17DeviceDescriptionERKNS_6KernelERKNS_9ThreadDimERKNS_8BlockDimE+0x70): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, stream_executor::DeviceMemory<bool> >::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, unsigned long, stream_executor::DeviceMemory<bool> >::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, stream_executor::DeviceMemory<int>, int>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmmmmmmmmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmmmmmmmmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, stream_executor::DeviceMemory<int>, int>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::If(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<bool>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer2IfES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIbEESt8functionIFS2_PS7_EEE3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x76): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::IfElse(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<bool>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer6IfElseES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIbEESt8functionIFS2_PS7_EESN_E3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x76): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::Case(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<int>, std::vector<std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>, std::allocator<std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)> > >)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer4CaseES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIiEESt6vectorISt8functionIFS2_PS7_EESaISO_EEE3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x150): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (stream_executor::CommandBuffer*, unsigned long), stream_executor::gpu::GpuCommandBuffer::For(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, int, stream_executor::DeviceMemory<int>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_1>::_M_invoke(std::_Any_data const&, stream_executor::CommandBuffer*&&, unsigned long&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEPN15stream_executor13CommandBufferEmEZNS3_3gpu16GpuCommandBuffer3ForEN3tsl3gtl7IntTypeINS4_21ExecutionScopeId_tag_ElEEPNS3_14StreamExecutorEiNS3_12DeviceMemoryIiEESt8functionIFS2_S5_EEE3$_1E9_M_invokeERKSt9_Any_dataOS5_Om+0xb6): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::For(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, int, stream_executor::DeviceMemory<int>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer3ForES9_PNS6_14StreamExecutorEiNS6_12DeviceMemoryIiEESt8functionIFS2_PS7_EEE3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x81): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o):gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEPN15stream_executor13CommandBufferEmEZNS3_3gpu16GpuCommandBuffer5WhileEN3tsl3gtl7IntTypeINS4_21ExecutionScopeId_tag_ElEEPNS3_14StreamExecutorENS3_12DeviceMemoryIbEESt8functionIFS2_SD_S5_EESI_IFS2_S5_EEE3$_1E9_M_invokeERKSt9_Any_dataOS5_Om+0xff): more undefined references to `stream_executor::KernelMetadata::shared_memory_bytes() const' follow
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ERROR: /xla/xla/tools/BUILD:53:14: Linking xla/tools/hex_floats_to_packed_literal failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //xla/tools:hex_floats_to_packed_literal) external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/xla/tools/hex_floats_to_packed_literal-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::~CudaPlatform()':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatformD2Ev+0x18): undefined reference to `stream_executor::ExecutorCache::~ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::~CudaPlatform()':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatformD0Ev+0x18): undefined reference to `stream_executor::ExecutorCache::~ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::GetExecutor(stream_executor::StreamExecutorConfig const&)':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatform11GetExecutorERKNS_20StreamExecutorConfigE+0x1d): undefined reference to `stream_executor::ExecutorCache::Get(stream_executor::StreamExecutorConfig const&)'
/usr/bin/ld: cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatform11GetExecutorERKNS_20StreamExecutorConfigE+0x49): undefined reference to `stream_executor::ExecutorCache::GetOrCreate(stream_executor::StreamExecutorConfig const&, std::function<absl::lts_20230802::StatusOr<std::unique_ptr<stream_executor::StreamExecutor, std::default_delete<stream_executor::StreamExecutor> > > ()> const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `_GLOBAL__sub_I_cuda_platform.cc':
cuda_platform.cc:(.text.startup+0x6b): undefined reference to `stream_executor::ExecutorCache::ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_executor_cuda_only.lo(cuda_executor.o): in function `stream_executor::gpu::GpuExecutor::GetKernel(stream_executor::MultiKernelLoaderSpec const&, stream_executor::Kernel*)':
cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x6c2): undefined reference to `stream_executor::KernelMetadata::set_registers_per_thread(int)'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x6f9): undefined reference to `stream_executor::KernelMetadata::set_shared_memory_bytes(int)'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x727): undefined reference to `stream_executor::Kernel::set_name(std::basic_string_view<char, std::char_traits<char> >)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_executor_cuda_only.lo(cuda_executor.o): in function `stream_executor::gpu::GpuExecutor::VlogOccupancyInfo(stream_executor::DeviceDescription const&, stream_executor::Kernel const&, stream_executor::ThreadDim const&, stream_executor::BlockDim const&)':
cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor17VlogOccupancyInfoERKNS_17DeviceDescriptionERKNS_6KernelERKNS_9ThreadDimERKNS_8BlockDimE+0x65): undefined reference to `stream_executor::KernelMetadata::registers_per_thread() const'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor17VlogOccupancyInfoERKNS_17DeviceDescriptionERKNS_6KernelERKNS_9ThreadDimERKNS_8BlockDimE+0x70): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, stream_executor::DeviceMemory<bool> >::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, unsigned long, stream_executor::DeviceMemory<bool> >::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, stream_executor::DeviceMemory<int>, int>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmmmmmmmmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmmmmmmmmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, stream_executor::DeviceMemory<int>, int>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::If(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<bool>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer2IfES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIbEESt8functionIFS2_PS7_EEE3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x76): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::IfElse(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<bool>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer6IfElseES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIbEESt8functionIFS2_PS7_EESN_E3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x76): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::Case(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<int>, std::vector<std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>, std::allocator<std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)> > >)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer4CaseES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIiEESt6vectorISt8functionIFS2_PS7_EESaISO_EEE3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x150): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (stream_executor::CommandBuffer*, unsigned long), stream_executor::gpu::GpuCommandBuffer::For(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, int, stream_executor::DeviceMemory<int>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_1>::_M_invoke(std::_Any_data const&, stream_executor::CommandBuffer*&&, unsigned long&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEPN15stream_executor13CommandBufferEmEZNS3_3gpu16GpuCommandBuffer3ForEN3tsl3gtl7IntTypeINS4_21ExecutionScopeId_tag_ElEEPNS3_14StreamExecutorEiNS3_12DeviceMemoryIiEESt8functionIFS2_S5_EEE3$_1E9_M_invokeERKSt9_Any_dataOS5_Om+0xb6): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::For(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, int, stream_executor::DeviceMemory<int>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer3ForES9_PNS6_14StreamExecutorEiNS6_12DeviceMemoryIiEESt8functionIFS2_PS7_EEE3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x81): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o):gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEPN15stream_executor13CommandBufferEmEZNS3_3gpu16GpuCommandBuffer5WhileEN3tsl3gtl7IntTypeINS4_21ExecutionScopeId_tag_ElEEPNS3_14StreamExecutorENS3_12DeviceMemoryIbEESt8functionIFS2_SD_S5_EESI_IFS2_S5_EEE3$_1E9_M_invokeERKSt9_Any_dataOS5_Om+0xff): more undefined references to `stream_executor::KernelMetadata::shared_memory_bytes() const' follow
clang: error: linker command failed with exit code 1 (use -v to see invocation)
INFO: Elapsed time: 2349.613s, Critical Path: 522.55s
INFO: 39625 processes: 16490 internal, 1 local, 23134 processwrapper-sandbox.
FAILED: Build did NOT complete successfully
@MuYu-zhi
Copy link

I have met the same error with you, #10592, still awaiting a response.

@jtotzid
Copy link

jtotzid commented May 1, 2024

Same problem...
If i had to guess I would say there's a dependency declaration missing somewhere... but bazel is black magic i dare not look at...

@jtotzid
Copy link

jtotzid commented May 1, 2024

Something like

diff --git a/xla/stream_executor/cuda/BUILD b/xla/stream_executor/cuda/BUILD
index 2212fb622..bea1e01b9 100644
--- a/xla/stream_executor/cuda/BUILD
+++ b/xla/stream_executor/cuda/BUILD
@@ -75,6 +75,8 @@ cuda_only_cc_library(
             "//xla/stream_executor",
             "//xla/stream_executor:platform_manager",
             "//xla/stream_executor:stream_executor_interface",
+            "//xla/stream_executor:executor_cache",
+            "//xla/stream_executor:kernel",
             "//xla/stream_executor/gpu:gpu_driver_header",
             "//xla/stream_executor/gpu:gpu_executor_header",
             "//xla/stream_executor/platform",
diff --git a/xla/stream_executor/gpu/BUILD b/xla/stream_executor/gpu/BUILD
index f0843969d..348f89528 100644
--- a/xla/stream_executor/gpu/BUILD
+++ b/xla/stream_executor/gpu/BUILD
@@ -153,6 +153,7 @@ gpu_only_cc_library(
         ":gpu_types_header",
         "//xla/stream_executor",
         "//xla/stream_executor:stream_executor_interface",
+        "//xla/stream_executor:kernel",
         "@com_google_absl//absl/container:flat_hash_map",
         "@com_google_absl//absl/container:inlined_vector",
         "@com_google_absl//absl/functional:any_invocable",

gets pretty far.
But eventually fails linking as well with:

ERROR: /xla/xla/tests/BUILD:2472:12: Linking xla/tests/local_client_aot_test failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //xla/tests:local_client_aot_test) external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/xla/tests/local_client_aot_test-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/usr/bin/ld: bazel-out/k8-opt/bin/external/tsl/tsl/profiler/backends/cpu/libtraceme_recorder_impl.lo(traceme_recorder.o): in function `void __gnu_cxx::new_allocator<tsl::profiler::TraceMeRecorder::ThreadLocalRecorder>::construct<tsl::profiler::TraceMeRecorder::ThreadLocalRecorder>(tsl::profiler::TraceMeRecorder::ThreadLocalRecorder*)':
traceme_recorder.cc:(.text._ZN9__gnu_cxx13new_allocatorIN3tsl8profiler15TraceMeRecorder19ThreadLocalRecorderEE9constructIS4_JEEEvPT_DpOT0_[_ZN9__gnu_cxx13new_allocatorIN3tsl8profiler15TraceMeRecorder19ThreadLocalRecorderEE9constructIS4_JEEEvPT_DpOT0_]+0x6a): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::gpu::GpuCommandBuffer::Trace(stream_executor::Stream*, absl::lts_20230802::AnyInvocable<absl::lts_20230802::Status ()>)':
gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer5TraceEPNS_6StreamEN4absl12lts_2023080212AnyInvocableIFNS5_6StatusEvEEE+0x82): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer5TraceEPNS_6StreamEN4absl12lts_2023080212AnyInvocableIFNS5_6StatusEvEEE+0x10d): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::gpu::GpuCommandBuffer::Finalize()':
gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer8FinalizeEv+0x273): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer8FinalizeEv+0x2be): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_driver_cuda_only.a(cuda_driver.o):cuda_driver.cc:(.text._ZN15stream_executor3gpu9GpuDriver18GraphDebugDotPrintB5cxx11EP10CUgraph_stPKcb+0x93): more undefined references to `tsl::Env::Default()' follow

i.e. tsl/platform/default/* was not compiled?

@neeldani
Copy link

neeldani commented May 30, 2024

@pxanthopoulos were you able to find a solution? I am facing the same error when trying to build xla from source for GPU:

Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Reading 'startup' options from /users/neeld2/xla/.bazelrc: --windows_enable_symlinks
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=198
INFO: Reading rc options for 'build' from /users/neeld2/xla/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /users/neeld2/xla/.bazelrc:
  'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --features=-force_no_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility
INFO: Reading rc options for 'build' from /users/neeld2/xla/xla_configure.bazelrc:
  'build' options: --action_env CLANG_COMPILER_PATH=/usr/lib/llvm-17/bin/clang --repo_env CC=/usr/lib/llvm-17/bin/clang --repo_env BAZEL_COMPILER=/usr/lib/llvm-17/bin/clang --config nvcc_clang --action_env CLANG_CUDA_COMPILER_PATH=/usr/lib/llvm-17/bin/clang --action_env CUDA_TOOLKIT_PATH=/usr/local/cuda-12.3 --action_env TF_CUBLAS_VERSION=12.3.2 --action_env TF_CUDA_COMPUTE_CAPABILITIES=6.0 --action_env TF_CUDNN_VERSION=8 --repo_env TF_NEED_TENSORRT=0 --config nonccl --action_env LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64:/usr/local/cuda-12.3/lib64 --action_env PYTHON_BIN_PATH=/usr/bin/python --python_path /usr/bin/python --copt -Wno-sign-compare --copt -Wno-error=unused-command-line-argument --copt -Wno-gnu-offsetof-extensions --build_tag_filters -no_oss --test_tag_filters -no_oss
INFO: Found applicable config definition build:short_logs in file /users/neeld2/xla/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /users/neeld2/xla/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:nvcc_clang in file /users/neeld2/xla/.bazelrc: --config=cuda --action_env=TF_CUDA_CLANG=1 --action_env=TF_NVCC_CLANG=1 --@local_config_cuda//:cuda_compiler=nvcc
INFO: Found applicable config definition build:cuda in file /users/neeld2/xla/.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda
INFO: Found applicable config definition build:nonccl in file /users/neeld2/xla/.bazelrc: --define=no_nccl_support=true
INFO: Found applicable config definition build:monolithic in file /users/neeld2/xla/.bazelrc: --define framework_shared_object=false --define tsl_protobuf_header_only=false --experimental_link_static_libraries_once=false
INFO: Found applicable config definition build:linux in file /users/neeld2/xla/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --copt=-Wswitch --copt=-Werror=switch --copt=-Wno-error=unused-but-set-variable --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /users/neeld2/xla/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
DEBUG: /users/neeld2/xla/third_party/py/python_repo.bzl:98:14: 
HERMETIC_PYTHON_VERSION variable was not set correctly, using default version.
Python 3.11 will be used.
To select Python version, either set HERMETIC_PYTHON_VERSION env variable in
your shell:
  export HERMETIC_PYTHON_VERSION=3.12
OR pass it as an argument to bazel command directly or inside your .bazelrc
file:
  --repo_env=HERMETIC_PYTHON_VERSION=3.12
DEBUG: /users/neeld2/xla/third_party/py/python_repo.bzl:109:10: Using hermetic Python 3.11
DEBUG: /users/neeld2/xla/third_party/repo.bzl:132:14: 
Warning: skipping import of repository 'llvm-raw' because it already exists.
DEBUG: /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/tsl/third_party/repo.bzl:132:14: 
Warning: skipping import of repository 'nvtx_archive' because it already exists.
DEBUG: /users/neeld2/xla/third_party/repo.bzl:132:14: 
Warning: skipping import of repository 'jsoncpp_git' because it already exists.
DEBUG: /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:118:10: 
Auto-Configuration Warning: 'TMP' environment variable is not set, using 'C:\Windows\Temp' as default
DEBUG: /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:118:10: 
Auto-Configuration Warning: 'TMP' environment variable is not set, using 'C:\Windows\Temp' as default
ERROR: /users/neeld2/xla/xla/tsl/cuda/BUILD.bazel:278:11: no such target '@local_config_nccl//:nccl_headers': target 'nccl_headers' not declared in package '' defined by /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/local_config_nccl/BUILD (Tip: use `query "@local_config_nccl//:*"` to see all the targets in that package) and referenced by '//xla/tsl/cuda:nccl_stub'
INFO: Repository boringssl instantiated at:
  /users/neeld2/xla/WORKSPACE:46:15: in <toplevel>
  /users/neeld2/xla/workspace2.bzl:135:21: in workspace
  /users/neeld2/xla/workspace2.bzl:64:20: in _tf_repositories
  /users/neeld2/xla/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /users/neeld2/xla/third_party/repo.bzl:89:35: in <toplevel>
ERROR: Analysis of target '//xla/tsl/cuda:nccl_stub' failed; build aborted: Analysis failed
INFO: Elapsed time: 51.639s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (283 packages loaded, 18469 targets configured)
    currently loading: @upb//
    Fetching repository @pypi_lit; starting 11s
    Fetching repository @double_conversion; starting
    Fetching https://storage.googleapis.com/mirror.tensorflow.org/github.com/google/boringssl/archive/c00d7ca810e93780bd0c8ee4eea28f4f2ea4bcdc.tar.gz; 11.5 MiB (27.8%)
    Fetching /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/double_conversion; Extracting v3.2.0.tar.gz
    Fetching repository @curl; starting
    Fetching /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/curl; Extracting curl-8.4.0.tar.gz
    Fetching repository @scip; Restarting.

I tried passing the --config monolithic option, but it didn't work.

@jtotzid
Copy link

jtotzid commented May 31, 2024

@neeldani what's your configure step like? should look like ./configure.py --backend=CUDA --nccl

@neeldani
Copy link

This worked, thank you!

@Alon-Lau
Copy link

Alon-Lau commented Jul 5, 2024

Something like

diff --git a/xla/stream_executor/cuda/BUILD b/xla/stream_executor/cuda/BUILD
index 2212fb622..bea1e01b9 100644
--- a/xla/stream_executor/cuda/BUILD
+++ b/xla/stream_executor/cuda/BUILD
@@ -75,6 +75,8 @@ cuda_only_cc_library(
             "//xla/stream_executor",
             "//xla/stream_executor:platform_manager",
             "//xla/stream_executor:stream_executor_interface",
+            "//xla/stream_executor:executor_cache",
+            "//xla/stream_executor:kernel",
             "//xla/stream_executor/gpu:gpu_driver_header",
             "//xla/stream_executor/gpu:gpu_executor_header",
             "//xla/stream_executor/platform",
diff --git a/xla/stream_executor/gpu/BUILD b/xla/stream_executor/gpu/BUILD
index f0843969d..348f89528 100644
--- a/xla/stream_executor/gpu/BUILD
+++ b/xla/stream_executor/gpu/BUILD
@@ -153,6 +153,7 @@ gpu_only_cc_library(
         ":gpu_types_header",
         "//xla/stream_executor",
         "//xla/stream_executor:stream_executor_interface",
+        "//xla/stream_executor:kernel",
         "@com_google_absl//absl/container:flat_hash_map",
         "@com_google_absl//absl/container:inlined_vector",
         "@com_google_absl//absl/functional:any_invocable",

gets pretty far. But eventually fails linking as well with:

ERROR: /xla/xla/tests/BUILD:2472:12: Linking xla/tests/local_client_aot_test failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //xla/tests:local_client_aot_test) external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/xla/tests/local_client_aot_test-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/usr/bin/ld: bazel-out/k8-opt/bin/external/tsl/tsl/profiler/backends/cpu/libtraceme_recorder_impl.lo(traceme_recorder.o): in function `void __gnu_cxx::new_allocator<tsl::profiler::TraceMeRecorder::ThreadLocalRecorder>::construct<tsl::profiler::TraceMeRecorder::ThreadLocalRecorder>(tsl::profiler::TraceMeRecorder::ThreadLocalRecorder*)':
traceme_recorder.cc:(.text._ZN9__gnu_cxx13new_allocatorIN3tsl8profiler15TraceMeRecorder19ThreadLocalRecorderEE9constructIS4_JEEEvPT_DpOT0_[_ZN9__gnu_cxx13new_allocatorIN3tsl8profiler15TraceMeRecorder19ThreadLocalRecorderEE9constructIS4_JEEEvPT_DpOT0_]+0x6a): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::gpu::GpuCommandBuffer::Trace(stream_executor::Stream*, absl::lts_20230802::AnyInvocable<absl::lts_20230802::Status ()>)':
gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer5TraceEPNS_6StreamEN4absl12lts_2023080212AnyInvocableIFNS5_6StatusEvEEE+0x82): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer5TraceEPNS_6StreamEN4absl12lts_2023080212AnyInvocableIFNS5_6StatusEvEEE+0x10d): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::gpu::GpuCommandBuffer::Finalize()':
gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer8FinalizeEv+0x273): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer8FinalizeEv+0x2be): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_driver_cuda_only.a(cuda_driver.o):cuda_driver.cc:(.text._ZN15stream_executor3gpu9GpuDriver18GraphDebugDotPrintB5cxx11EP10CUgraph_stPKcb+0x93): more undefined references to `tsl::Env::Default()' follow

i.e. tsl/platform/default/* was not compiled?

So, the linker error how to resolve, I get the same error: undefined reference to `tsl::Env::Default()'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants