Fix link in Readme #2

bhack · 2022-08-09T22:56:36Z

No description provided.

PiperOrigin-RevId: 515308128

#MIGRATION_3P_TRITON__GIT_TO_THIRD_PARTY # Commits integrated - 726bdb984f2bcb48adfaa341ee7b0263be227b98 [FRONTEND][BACKEND] Fix constexpr assignment ; revert #24... by Zahi Moudallal <128723247+zahimoud@users.noreply.github.com> - 87a223d76fe32a28ca563c94215a95f505794c6d bump triton_shared (#2501) by Maksim Levental <maksim.levental@gmail.com> - 721897fcc4f942aa97d2e9ba3787a5e213758177 upgrade llvm to `b1115f8c` (NFC) (#2403) by Mehdi Amini <mamini@nvidia.com> - 05dc28be0e72dd496300a31b99a21a5a5118f8e9 [CI] refactor workflows (#2504) by Philippe Tillet <phil@openai.com> - 376acb610b5888263ee61713ff0a71e1d5908d69 [BUILD] Fix macos x86 build (#2505) by Thomas Raoux <thomas.raoux@openai.com> - 768fc1fcd98ecfc0892f8982b0bb009dd7bb11ea [FRONTEND] change hash to not require ptxas (#2476) by ian Bearman <ianb@microsoft.com> - e36d1665ca2f816212fc80ee2633caa66a0066bf [BACKEND] Fix unsupported view op created during optimiza... by Thomas Raoux <thomas.raoux@openai.com> - a980ec50f1ed3176e2603c25f73f0ddc031cf1d8 [BACKEND] Fixing f8e5m2 to bf16 conversion on A100 (#2508) by Zahi Moudallal <128723247+zahimoud@users.noreply.github.com> - a4f373938c9a4ba67105c5394c168945af4c990e [RUNTIME] Filter out paths that don't exist in json group... by Horace He <chilli@meta.com> - be1de890e1f9bdf0910521b5a536c332a1c1aa2f [BACKEND] Replace assert(0) with llvm::report_fatal_error... by Keren Zhou <kerenzhou@openai.com> - 0d57820be9ca360cf62cc3a7dc21aecc45a1c53a update triton-shared ref (#2506) by ian Bearman <ianb@microsoft.com> - bdf464e4a8f80ad6bd6a7b470cb3d36efd61c8a2 Make kernel_static_print test work when called twice. (#2... by Justin Lebar <justin.lebar@gmail.com> - 30186f401ec52d9addac79a60f418792875f7d11 Fix segfault in assertion test. (#2520) by Justin Lebar <justin.lebar@gmail.com> - dc9e3063d73d2410e1855e1ff258aa90a6158548 [HOPPER] Move to tl.make_block_ptr in flash_attention bac... by runseny <145632023+runseny@users.noreply.github.com> - b0c166b9e3f2f58c0906fa41f261787ebf3fef0d [BACKEND] Fixing bug in elementwise conversion (#2517) by Zahi Moudallal <128723247+zahimoud@users.noreply.github.com> - 4f4c07e7d586aae3daa802ce86a9aa935f8cda17 [CI] add text file containing LLVM commit hash by Ashay Rane <ashay@users.noreply.github.com> - 7af27fadee0fce2218a1353feea2f76ea25ad005 update hash to 76ce4736721a by Phil Tillet <phil@openai.com> - f192611ff3bdacb8d1d1cad084dfe4cd277a0ec9 Bump LLVM version to https://github.com/llvm/llvm-project... by Goran Flegar <gflegar@google.com> PiperOrigin-RevId: 576212898

Imported from GitHub PR #6599 FP8 cublasLt matmul uses fast accumulation when both operands' precision are DEFAULT. Otherwise fall back to high precision acuumulation. Issue##6168 This PR is closely related to Flax PR-![3416](google/flax#3416). Copybara import of the project: -- a4140da by shuw <shuw@nvidia.com>: Add FP8 fast accumulation support for cublasLt. -- 9684568 by shuw <shuw@nvidia.com>: Improve based on review #1 -- e906d76 by shuw <shuw@nvidia.com>: Improve based on review #2 Merging this change closes #6599 FUTURE_COPYBARA_INTEGRATE_REVIEW=#6599 from wenscarl:fp8_fast_accumulation e906d76 PiperOrigin-RevId: 578904075

Imported from GitHub PR #6599 FP8 cublasLt matmul uses fast accumulation when both operands' precision are DEFAULT. Otherwise fall back to high precision acuumulation. Issue##6168 This PR is closely related to Flax PR-![3416](google/flax#3416). Copybara import of the project: -- a4140da by shuw <shuw@nvidia.com>: Add FP8 fast accumulation support for cublasLt. -- 9684568 by shuw <shuw@nvidia.com>: Improve based on review #1 -- e906d76 by shuw <shuw@nvidia.com>: Improve based on review #2 Merging this change closes #6599 COPYBARA_INTEGRATE_REVIEW=#6599 from wenscarl:fp8_fast_accumulation e906d76 PiperOrigin-RevId: 578948593

…art #2 PiperOrigin-RevId: 598915673

…art #2 PiperOrigin-RevId: 599037622

PiperOrigin-RevId: 629829581

PiperOrigin-RevId: 629867362

… to Initialize() Imported from GitHub PR #12228 The first time that a NormThunk is executed, it will build a cudnn execution plan. This build step can hang if a NCCL collective is running at the same time. To fix this, I've moved the build step to take place during thunk initialization. We only observe this hang when using cudnn 9. Here's a backtrace from the hang that will be fixed: ``` Thread 585 (Thread 0x7fb9391ff640 (LWP 41364) "main.py"): #0 0x00007fd3d17cffd9 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fd3d17da24f in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fd070967dfe in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1 #3 0x00007fd0709c928a in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1 #4 0x00007f1970d76102 in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0 #5 0x00007f1970f2c999 in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0 #6 0x00007f1970a7d4ab in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0 #7 0x00007f1970d0a9cb in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0 #8 0x00007fce60b2a98c in cudnn::backend::ExecutionPlan::finalize_internal() () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0 #9 0x00007fce60aefbb1 in cudnn::backend::Descriptor::finalize() () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0 #10 0x00007fce60b15bec in cudnnBackendFinalize () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0 #11 0x00007fd2521b8f39 in cudnn_frontend::ExecutionPlanBuilder_v8::build() () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #12 0x00007fd2521734ba in stream_executor::gpu::(anonymous namespace)::GetExecPlanFromHeuristics(cudnn_frontend::OperationGraph_v8&&, stream_executor::gpu::(anonymous namespace)::CudnnHandle const&, bool) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #13 0x00007fd25216ff9b in stream_executor::gpu::CudnnSupport::NormRunnerFromDesc(stream_executor::Stream*, stream_executor::dnn::AlgorithmDesc const&, stream_executor::dnn::NormKind, double, stream_executor::dnn::TensorDescriptor const&, stream_executor::dnn::TensorDescriptor const&, stream_executor::dnn::TensorDescriptor const&, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #14 0x00007fd24e36b88b in stream_executor::dnn::NormOp::RunnerFromAlgorithmDesc(stream_executor::dnn::AlgorithmDesc const&, stream_executor::dnn::NormOp::Config, stream_executor::Stream*) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #15 0x00007fd24e36ae37 in stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}::operator()() const () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #16 0x00007fd24e36adbc in void absl::lts_20230802::base_internal::CallOnceImpl<stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}>(std::atomic<unsigned int>*, absl::lts_20230802::base_internal::SchedulingMode, stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}&&) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #17 0x00007fd24e36a9bd in stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #18 0x00007fd24e369d29 in xla::gpu::RunGpuNorm(xla::gpu::GpuNormConfig const&, stream_executor::DeviceMemoryBase const&, stream_executor::DeviceMemoryBase const&, stream_executor::DeviceMemoryBase const&, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, stream_executor::DeviceMemoryBase const&, stream_executor::Stream*, xla::gpu::RunNormOptions) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so #19 0x00007fd24e368be6 in xla::gpu::NormThunk::ExecuteOnStream(xla::gpu::Thunk::ExecuteParams const&) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so ``` Copybara import of the project: -- f535330 by Trevor Morris <tmorris@nvidia.com>: Fix hang with cudnn layer norm by moving cudnn init to Initialize() Merging this change closes #12228 COPYBARA_INTEGRATE_REVIEW=#12228 from trevor-m:tmorris-norm-init f535330 PiperOrigin-RevId: 633220207

Use std::aligned_storage_t trick to avoid default-initializing Node struct on a hot path. name old cpu/op new cpu/op delta BM_SelectAndScatterF32/128/process_time 791µs ± 4% 720µs ± 2% -8.93% BM_SelectAndScatterF32/256/process_time 3.20ms ± 4% 2.96ms ± 2% -7.46% BM_SelectAndScatterF32/512/process_time 13.7ms ± 5% 12.8ms ± 2% -6.80% name old time/op new time/op delta BM_SelectAndScatterF32/128/process_time 790µs ± 5% 719µs ± 1% -9.00% BM_SelectAndScatterF32/256/process_time 3.20ms ± 3% 2.96ms ± 1% -7.58% BM_SelectAndScatterF32/512/process_time 13.2ms ± 4% 12.3ms ± 1% -6.82% PiperOrigin-RevId: 657741110

Use std::aligned_storage_t trick to avoid default-initializing Node struct on a hot path. name old cpu/op new cpu/op delta BM_SelectAndScatterF32/128/process_time 791µs ± 4% 720µs ± 2% -8.93% BM_SelectAndScatterF32/256/process_time 3.20ms ± 4% 2.96ms ± 2% -7.46% BM_SelectAndScatterF32/512/process_time 13.7ms ± 5% 12.8ms ± 2% -6.80% name old time/op new time/op delta BM_SelectAndScatterF32/128/process_time 790µs ± 5% 719µs ± 1% -9.00% BM_SelectAndScatterF32/256/process_time 3.20ms ± 3% 2.96ms ± 1% -7.58% BM_SelectAndScatterF32/512/process_time 13.2ms ± 4% 12.3ms ± 1% -6.82% PiperOrigin-RevId: 658139935

…r RunBackend. Both of these call into LLVM code that reads the compiler options. Fixes the following race: ``` WARNING: ThreadSanitizer: data race (pid=869815) Read of size 1 at 0x7f8b24effc08 by thread T65: #0 llvm::cl::opt_storage<bool, false, false>::getValue() const /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1406:38 (xla_extension.so+0xa281417) (BuildId: 7f5d2098f168c4db) #1 llvm::cl::opt_storage<bool, false, false>::operator bool() const /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1410:38 (xla_extension.so+0xa281417) #2 llvm::CodeGenTargetMachineImpl::CodeGenTargetMachineImpl(llvm::Target const&, llvm::StringRef, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOptLevel) /proc/self/cwd/external/llvm-project/llvm/lib/CodeGen/CodeGenTargetMachineImpl.cpp:97:7 (xla_extension.so+0xa281417) #3 llvm::X86TargetMachine::X86TargetMachine(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) /proc/self/cwd/external/llvm-project/llvm/lib/Target/X86/X86TargetMachine.cpp:236:7 (xla_extension.so+0x9803b80) (BuildId: 7f5d2098f168c4db) #4 llvm::RegisterTargetMachine<llvm::X86TargetMachine>::Allocator(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:1258:16 (xla_extension.so+0x980757a) (BuildId: 7f5d2098f168c4db) #5 llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) const /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:462:12 (xla_extension.so+0x94ba529) (BuildId: 7f5d2098f168c4db) #6 llvm::EngineBuilder::selectTarget(llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::SmallVectorImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const&) /proc/self/cwd/external/llvm-project/llvm/lib/ExecutionEngine/TargetSelect.cpp:88:18 (xla_extension.so+0x94ba529) #7 xla::cpu::JitCompiler::InferTargetMachine(llvm::TargetOptions const&, llvm::CodeGenOptLevel, std::optional<tsl::port::CPUFeature>) /proc/self/cwd/external/xla/xla/backends/cpu/codegen/jit_compiler.cc:88:12 (xla_extension.so+0x48d070f) (BuildId: 7f5d2098f168c4db) #8 xla::cpu::CpuCompiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1017:3 (xla_extension.so+0x2f6dc47) (BuildId: 7f5d2098f168c4db) #9 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:749:3 (xla_extension.so+0x2f127e2) (BuildId: 7f5d2098f168c4db) #10 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:842:3 (xla_extension.so+0x2f127e2) #11 xla::TfrtCpuClient::Compile(mlir::ModuleOp, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:888:10 (xla_extension.so+0x2f13da2) (BuildId: 7f5d2098f168c4db) #12 xla::ifrt::PjRtLoadedExecutable::Create(xla::ifrt::PjRtCompatibleClient*, mlir::ModuleOp, xla::CompileOptions, std::vector<tsl::RCReference<xla::ifrt::LoadedHostCallback>, std::allocator<tsl::RCReference<xla::ifrt::LoadedHostCallback>>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_executable.cc:258:3 (xla_extension.so+0xdd04d77) (BuildId: 7f5d2098f168c4db) #13 xla::ifrt::PjRtCompiler::Compile(std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_compiler.cc:97:10 (xla_extension.so+0xdcfd29b) (BuildId: 7f5d2098f168c4db) #14 xla::PyClient::CompileIfrtProgram(xla::nb_class_ptr<xla::PyClient>, std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:443:5 (xla_extension.so+0xc62a228) (BuildId: 7f5d2098f168c4db) #15 xla::PyClient::Compile(xla::nb_class_ptr<xla::PyClient>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, xla::CompileOptions, std::vector<nanobind::capsule, std::allocator<nanobind::capsule>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:466:10 (xla_extension.so+0xc62b514) (BuildId: 7f5d2098f168c4db) Previous write of size 1 at 0x7f8b24effc08 by thread T66 (mutexes: write M0): #0 void llvm::cl::opt_storage<bool, false, false>::setValue<bool>(bool const&, bool) /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1401:11 (xla_extension.so+0x100bace9) (BuildId: 7f5d2098f168c4db) #1 void llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setDefaultImpl<bool, void>() /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h (xla_extension.so+0x100bace9) #2 llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setDefault() /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1474:32 (xla_extension.so+0x100bace9) #3 llvm::cl::Option::reset() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:460:3 (xla_extension.so+0x100cac0e) (BuildId: 7f5d2098f168c4db) #4 (anonymous namespace)::CommandLineParser::ResetAllOptionOccurrences() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:1478:17 (xla_extension.so+0x100cac0e) #5 llvm::cl::ResetAllOptionOccurrences() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:2831:17 (xla_extension.so+0x100caa72) (BuildId: 7f5d2098f168c4db) #6 xla::llvm_ir::LLVMCommandLineOptionsLock::LLVMCommandLineOptionsLock(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>> const&) /proc/self/cwd/external/xla/xla/service/llvm_ir/llvm_command_line_options.cc:70:5 (xla_extension.so+0x91d69f4) (BuildId: 7f5d2098f168c4db) #7 xla::cpu::CpuCompiler::RunBackend(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1727:39 (xla_extension.so+0x2f781c8) (BuildId: 7f5d2098f168c4db) #8 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:754:19 (xla_extension.so+0x2f12883) (BuildId: 7f5d2098f168c4db) #9 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:842:3 (xla_extension.so+0x2f12883) #10 xla::TfrtCpuClient::Compile(mlir::ModuleOp, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:888:10 (xla_extension.so+0x2f13da2) (BuildId: 7f5d2098f168c4db) #11 xla::ifrt::PjRtLoadedExecutable::Create(xla::ifrt::PjRtCompatibleClient*, mlir::ModuleOp, xla::CompileOptions, std::vector<tsl::RCReference<xla::ifrt::LoadedHostCallback>, std::allocator<tsl::RCReference<xla::ifrt::LoadedHostCallback>>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_executable.cc:258:3 (xla_extension.so+0xdd04d77) (BuildId: 7f5d2098f168c4db) #12 xla::ifrt::PjRtCompiler::Compile(std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_compiler.cc:97:10 (xla_extension.so+0xdcfd29b) (BuildId: 7f5d2098f168c4db) #13 xla::PyClient::CompileIfrtProgram(xla::nb_class_ptr<xla::PyClient>, std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:443:5 (xla_extension.so+0xc62a228) (BuildId: 7f5d2098f168c4db) #14 xla::PyClient::Compile(xla::nb_class_ptr<xla::PyClient>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, xla::CompileOptions, std::vector<nanobind::capsule, std::allocator<nanobind::capsule>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:466:10 (xla_extension.so+0xc62b514) (BuildId: 7f5d2098f168c4db) ``` PiperOrigin-RevId: 707655482

Fixes the following TSAN race: ``` WARNING: ThreadSanitizer: data race (pid=899472) Write of size 8 at 0x7f979e0f1cd8 by thread T69: #0 llvm::TargetRegistry::RegisterTargetMachine(llvm::Target&, llvm::TargetMachine* (*)(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool)) /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:827:27 (xla_extension.so+0x9803668) (BuildId: 6fa88e3910a5eb04) #1 llvm::RegisterTargetMachine<llvm::X86TargetMachine>::RegisterTargetMachine(llvm::Target&) /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:1250:5 (xla_extension.so+0x9803668) #2 LLVMInitializeX86Target /proc/self/cwd/external/llvm-project/llvm/lib/Target/X86/X86TargetMachine.cpp:69:43 (xla_extension.so+0x9803668) #3 llvm::InitializeNativeTarget() /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/TargetSelect.h:123:5 (xla_extension.so+0x48d2358) (BuildId: 6fa88e3910a5eb04) #4 xla::cpu::JitCompiler::Create(llvm::TargetOptions, xla::cpu::JitCompiler::Options, absl::lts_20230802::AnyInvocable<void (std::function<void ()>)>)::$_0::operator()() const /proc/self/cwd/external/xla/xla/backends/cpu/codegen/jit_compiler.cc:113:5 (xla_extension.so+0x48d2358) #5 xla::cpu::JitCompiler::Create(llvm::TargetOptions, xla::cpu::JitCompiler::Options, absl::lts_20230802::AnyInvocable<void (std::function<void ()>)>) /proc/self/cwd/external/xla/xla/backends/cpu/codegen/jit_compiler.cc:112:34 (xla_extension.so+0x48d209b) (BuildId: 6fa88e3910a5eb04) #6 xla::cpu::CpuCompiler::CompileLegacyCpuExecutable(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1416:3 (xla_extension.so+0x2f716a0) (BuildId: 6fa88e3910a5eb04) #7 xla::cpu::CpuCompiler::RunBackend(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1730:3 (xla_extension.so+0x2f7ae18) (BuildId: 6fa88e3910a5eb04) #8 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:759:19 (xla_extension.so+0x2f12915) (BuildId: 6fa88e3910a5eb04) #9 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:847:3 (xla_extension.so+0x2f12915) Previous read of size 8 at 0x7f979e0f1cd8 by thread T66: #0 llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) const /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:460:10 (xla_extension.so+0x94ba6db) (BuildId: 6fa88e3910a5eb04) #1 llvm::EngineBuilder::selectTarget(llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::SmallVectorImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const&) /proc/self/cwd/external/llvm-project/llvm/lib/ExecutionEngine/TargetSelect.cpp:88:18 (xla_extension.so+0x94ba6db) #2 xla::cpu::JitCompiler::InferTargetMachine(llvm::TargetOptions const&, llvm::CodeGenOptLevel, std::optional<tsl::port::CPUFeature>) /proc/self/cwd/external/xla/xla/backends/cpu/codegen/jit_compiler.cc:88:12 (xla_extension.so+0x48d096f) (BuildId: 6fa88e3910a5eb04) #3 xla::cpu::CpuCompiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1017:3 (xla_extension.so+0x2f70857) (BuildId: 6fa88e3910a5eb04) #4 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:754:3 (xla_extension.so+0x2f12874) (BuildId: 6fa88e3910a5eb04) #5 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:847:3 (xla_extension.so+0x2f12874) #6 xla::TfrtCpuClient::Compile(mlir::ModuleOp, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:893:10 (xla_extension.so+0x2f13ef2) (BuildId: 6fa88e3910a5eb04) ``` PiperOrigin-RevId: 707666400

Fixes the following TSAN race: ``` WARNING: ThreadSanitizer: data race (pid=899472) Write of size 8 at 0x7f979e0f1cd8 by thread T69: #0 llvm::TargetRegistry::RegisterTargetMachine(llvm::Target&, llvm::TargetMachine* (*)(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool)) /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:827:27 (xla_extension.so+0x9803668) (BuildId: 6fa88e3910a5eb04) #1 llvm::RegisterTargetMachine<llvm::X86TargetMachine>::RegisterTargetMachine(llvm::Target&) /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:1250:5 (xla_extension.so+0x9803668) #2 LLVMInitializeX86Target /proc/self/cwd/external/llvm-project/llvm/lib/Target/X86/X86TargetMachine.cpp:69:43 (xla_extension.so+0x9803668) #3 llvm::InitializeNativeTarget() /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/TargetSelect.h:123:5 (xla_extension.so+0x48d2358) (BuildId: 6fa88e3910a5eb04) #4 xla::cpu::JitCompiler::Create(llvm::TargetOptions, xla::cpu::JitCompiler::Options, absl::lts_20230802::AnyInvocable<void (std::function<void ()>)>)::$_0::operator()() const /proc/self/cwd/external/xla/xla/backends/cpu/codegen/jit_compiler.cc:113:5 (xla_extension.so+0x48d2358) #5 xla::cpu::JitCompiler::Create(llvm::TargetOptions, xla::cpu::JitCompiler::Options, absl::lts_20230802::AnyInvocable<void (std::function<void ()>)>) /proc/self/cwd/external/xla/xla/backends/cpu/codegen/jit_compiler.cc:112:34 (xla_extension.so+0x48d209b) (BuildId: 6fa88e3910a5eb04) #6 xla::cpu::CpuCompiler::CompileLegacyCpuExecutable(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1416:3 (xla_extension.so+0x2f716a0) (BuildId: 6fa88e3910a5eb04) #7 xla::cpu::CpuCompiler::RunBackend(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1730:3 (xla_extension.so+0x2f7ae18) (BuildId: 6fa88e3910a5eb04) #8 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:759:19 (xla_extension.so+0x2f12915) (BuildId: 6fa88e3910a5eb04) #9 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:847:3 (xla_extension.so+0x2f12915) Previous read of size 8 at 0x7f979e0f1cd8 by thread T66: #0 llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) const /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:460:10 (xla_extension.so+0x94ba6db) (BuildId: 6fa88e3910a5eb04) #1 llvm::EngineBuilder::selectTarget(llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::SmallVectorImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const&) /proc/self/cwd/external/llvm-project/llvm/lib/ExecutionEngine/TargetSelect.cpp:88:18 (xla_extension.so+0x94ba6db) #2 xla::cpu::JitCompiler::InferTargetMachine(llvm::TargetOptions const&, llvm::CodeGenOptLevel, std::optional<tsl::port::CPUFeature>) /proc/self/cwd/external/xla/xla/backends/cpu/codegen/jit_compiler.cc:88:12 (xla_extension.so+0x48d096f) (BuildId: 6fa88e3910a5eb04) #3 xla::cpu::CpuCompiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1017:3 (xla_extension.so+0x2f70857) (BuildId: 6fa88e3910a5eb04) #4 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:754:3 (xla_extension.so+0x2f12874) (BuildId: 6fa88e3910a5eb04) #5 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:847:3 (xla_extension.so+0x2f12874) #6 xla::TfrtCpuClient::Compile(mlir::ModuleOp, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:893:10 (xla_extension.so+0x2f13ef2) (BuildId: 6fa88e3910a5eb04) ``` PiperOrigin-RevId: 707701032

…r RunBackend. Both of these call into LLVM code that reads the compiler options. Fixes the following race: ``` WARNING: ThreadSanitizer: data race (pid=869815) Read of size 1 at 0x7f8b24effc08 by thread T65: #0 llvm::cl::opt_storage<bool, false, false>::getValue() const /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1406:38 (xla_extension.so+0xa281417) (BuildId: 7f5d2098f168c4db) #1 llvm::cl::opt_storage<bool, false, false>::operator bool() const /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1410:38 (xla_extension.so+0xa281417) #2 llvm::CodeGenTargetMachineImpl::CodeGenTargetMachineImpl(llvm::Target const&, llvm::StringRef, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOptLevel) /proc/self/cwd/external/llvm-project/llvm/lib/CodeGen/CodeGenTargetMachineImpl.cpp:97:7 (xla_extension.so+0xa281417) #3 llvm::X86TargetMachine::X86TargetMachine(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) /proc/self/cwd/external/llvm-project/llvm/lib/Target/X86/X86TargetMachine.cpp:236:7 (xla_extension.so+0x9803b80) (BuildId: 7f5d2098f168c4db) #4 llvm::RegisterTargetMachine<llvm::X86TargetMachine>::Allocator(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:1258:16 (xla_extension.so+0x980757a) (BuildId: 7f5d2098f168c4db) #5 llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) const /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:462:12 (xla_extension.so+0x94ba529) (BuildId: 7f5d2098f168c4db) #6 llvm::EngineBuilder::selectTarget(llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::SmallVectorImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const&) /proc/self/cwd/external/llvm-project/llvm/lib/ExecutionEngine/TargetSelect.cpp:88:18 (xla_extension.so+0x94ba529) #7 xla::cpu::JitCompiler::InferTargetMachine(llvm::TargetOptions const&, llvm::CodeGenOptLevel, std::optional<tsl::port::CPUFeature>) /proc/self/cwd/external/xla/xla/backends/cpu/codegen/jit_compiler.cc:88:12 (xla_extension.so+0x48d070f) (BuildId: 7f5d2098f168c4db) #8 xla::cpu::CpuCompiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1017:3 (xla_extension.so+0x2f6dc47) (BuildId: 7f5d2098f168c4db) #9 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:749:3 (xla_extension.so+0x2f127e2) (BuildId: 7f5d2098f168c4db) #10 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:842:3 (xla_extension.so+0x2f127e2) #11 xla::TfrtCpuClient::Compile(mlir::ModuleOp, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:888:10 (xla_extension.so+0x2f13da2) (BuildId: 7f5d2098f168c4db) #12 xla::ifrt::PjRtLoadedExecutable::Create(xla::ifrt::PjRtCompatibleClient*, mlir::ModuleOp, xla::CompileOptions, std::vector<tsl::RCReference<xla::ifrt::LoadedHostCallback>, std::allocator<tsl::RCReference<xla::ifrt::LoadedHostCallback>>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_executable.cc:258:3 (xla_extension.so+0xdd04d77) (BuildId: 7f5d2098f168c4db) #13 xla::ifrt::PjRtCompiler::Compile(std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_compiler.cc:97:10 (xla_extension.so+0xdcfd29b) (BuildId: 7f5d2098f168c4db) #14 xla::PyClient::CompileIfrtProgram(xla::nb_class_ptr<xla::PyClient>, std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:443:5 (xla_extension.so+0xc62a228) (BuildId: 7f5d2098f168c4db) #15 xla::PyClient::Compile(xla::nb_class_ptr<xla::PyClient>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, xla::CompileOptions, std::vector<nanobind::capsule, std::allocator<nanobind::capsule>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:466:10 (xla_extension.so+0xc62b514) (BuildId: 7f5d2098f168c4db) Previous write of size 1 at 0x7f8b24effc08 by thread T66 (mutexes: write M0): #0 void llvm::cl::opt_storage<bool, false, false>::setValue<bool>(bool const&, bool) /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1401:11 (xla_extension.so+0x100bace9) (BuildId: 7f5d2098f168c4db) #1 void llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setDefaultImpl<bool, void>() /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h (xla_extension.so+0x100bace9) #2 llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setDefault() /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1474:32 (xla_extension.so+0x100bace9) #3 llvm::cl::Option::reset() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:460:3 (xla_extension.so+0x100cac0e) (BuildId: 7f5d2098f168c4db) #4 (anonymous namespace)::CommandLineParser::ResetAllOptionOccurrences() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:1478:17 (xla_extension.so+0x100cac0e) #5 llvm::cl::ResetAllOptionOccurrences() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:2831:17 (xla_extension.so+0x100caa72) (BuildId: 7f5d2098f168c4db) #6 xla::llvm_ir::LLVMCommandLineOptionsLock::LLVMCommandLineOptionsLock(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>> const&) /proc/self/cwd/external/xla/xla/service/llvm_ir/llvm_command_line_options.cc:70:5 (xla_extension.so+0x91d69f4) (BuildId: 7f5d2098f168c4db) #7 xla::cpu::CpuCompiler::RunBackend(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1727:39 (xla_extension.so+0x2f781c8) (BuildId: 7f5d2098f168c4db) #8 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:754:19 (xla_extension.so+0x2f12883) (BuildId: 7f5d2098f168c4db) #9 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:842:3 (xla_extension.so+0x2f12883) #10 xla::TfrtCpuClient::Compile(mlir::ModuleOp, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:888:10 (xla_extension.so+0x2f13da2) (BuildId: 7f5d2098f168c4db) #11 xla::ifrt::PjRtLoadedExecutable::Create(xla::ifrt::PjRtCompatibleClient*, mlir::ModuleOp, xla::CompileOptions, std::vector<tsl::RCReference<xla::ifrt::LoadedHostCallback>, std::allocator<tsl::RCReference<xla::ifrt::LoadedHostCallback>>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_executable.cc:258:3 (xla_extension.so+0xdd04d77) (BuildId: 7f5d2098f168c4db) #12 xla::ifrt::PjRtCompiler::Compile(std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_compiler.cc:97:10 (xla_extension.so+0xdcfd29b) (BuildId: 7f5d2098f168c4db) #13 xla::PyClient::CompileIfrtProgram(xla::nb_class_ptr<xla::PyClient>, std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:443:5 (xla_extension.so+0xc62a228) (BuildId: 7f5d2098f168c4db) #14 xla::PyClient::Compile(xla::nb_class_ptr<xla::PyClient>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, xla::CompileOptions, std::vector<nanobind::capsule, std::allocator<nanobind::capsule>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:466:10 (xla_extension.so+0xc62b514) (BuildId: 7f5d2098f168c4db) ``` PiperOrigin-RevId: 707655482

…r RunBackend. Both of these call into LLVM code that reads the compiler options. Fixes the following race: ``` WARNING: ThreadSanitizer: data race (pid=869815) Read of size 1 at 0x7f8b24effc08 by thread T65: #0 llvm::cl::opt_storage<bool, false, false>::getValue() const /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1406:38 (xla_extension.so+0xa281417) (BuildId: 7f5d2098f168c4db) #1 llvm::cl::opt_storage<bool, false, false>::operator bool() const /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1410:38 (xla_extension.so+0xa281417) #2 llvm::CodeGenTargetMachineImpl::CodeGenTargetMachineImpl(llvm::Target const&, llvm::StringRef, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Reloc::Model, llvm::CodeModel::Model, llvm::CodeGenOptLevel) /proc/self/cwd/external/llvm-project/llvm/lib/CodeGen/CodeGenTargetMachineImpl.cpp:97:7 (xla_extension.so+0xa281417) #3 llvm::X86TargetMachine::X86TargetMachine(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) /proc/self/cwd/external/llvm-project/llvm/lib/Target/X86/X86TargetMachine.cpp:236:7 (xla_extension.so+0x9803b80) (BuildId: 7f5d2098f168c4db) #4 llvm::RegisterTargetMachine<llvm::X86TargetMachine>::Allocator(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:1258:16 (xla_extension.so+0x980757a) (BuildId: 7f5d2098f168c4db) #5 llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) const /proc/self/cwd/external/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:462:12 (xla_extension.so+0x94ba529) (BuildId: 7f5d2098f168c4db) #6 llvm::EngineBuilder::selectTarget(llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::SmallVectorImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>> const&) /proc/self/cwd/external/llvm-project/llvm/lib/ExecutionEngine/TargetSelect.cpp:88:18 (xla_extension.so+0x94ba529) #7 xla::cpu::JitCompiler::InferTargetMachine(llvm::TargetOptions const&, llvm::CodeGenOptLevel, std::optional<tsl::port::CPUFeature>) /proc/self/cwd/external/xla/xla/backends/cpu/codegen/jit_compiler.cc:88:12 (xla_extension.so+0x48d070f) (BuildId: 7f5d2098f168c4db) #8 xla::cpu::CpuCompiler::RunHloPasses(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1017:3 (xla_extension.so+0x2f6dc47) (BuildId: 7f5d2098f168c4db) #9 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:749:3 (xla_extension.so+0x2f127e2) (BuildId: 7f5d2098f168c4db) #10 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:842:3 (xla_extension.so+0x2f127e2) #11 xla::TfrtCpuClient::Compile(mlir::ModuleOp, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:888:10 (xla_extension.so+0x2f13da2) (BuildId: 7f5d2098f168c4db) #12 xla::ifrt::PjRtLoadedExecutable::Create(xla::ifrt::PjRtCompatibleClient*, mlir::ModuleOp, xla::CompileOptions, std::vector<tsl::RCReference<xla::ifrt::LoadedHostCallback>, std::allocator<tsl::RCReference<xla::ifrt::LoadedHostCallback>>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_executable.cc:258:3 (xla_extension.so+0xdd04d77) (BuildId: 7f5d2098f168c4db) #13 xla::ifrt::PjRtCompiler::Compile(std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_compiler.cc:97:10 (xla_extension.so+0xdcfd29b) (BuildId: 7f5d2098f168c4db) #14 xla::PyClient::CompileIfrtProgram(xla::nb_class_ptr<xla::PyClient>, std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:443:5 (xla_extension.so+0xc62a228) (BuildId: 7f5d2098f168c4db) #15 xla::PyClient::Compile(xla::nb_class_ptr<xla::PyClient>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, xla::CompileOptions, std::vector<nanobind::capsule, std::allocator<nanobind::capsule>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:466:10 (xla_extension.so+0xc62b514) (BuildId: 7f5d2098f168c4db) Previous write of size 1 at 0x7f8b24effc08 by thread T66 (mutexes: write M0): #0 void llvm::cl::opt_storage<bool, false, false>::setValue<bool>(bool const&, bool) /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1401:11 (xla_extension.so+0x100bace9) (BuildId: 7f5d2098f168c4db) #1 void llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setDefaultImpl<bool, void>() /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h (xla_extension.so+0x100bace9) #2 llvm::cl::opt<bool, false, llvm::cl::parser<bool>>::setDefault() /proc/self/cwd/external/llvm-project/llvm/include/llvm/Support/CommandLine.h:1474:32 (xla_extension.so+0x100bace9) #3 llvm::cl::Option::reset() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:460:3 (xla_extension.so+0x100cac0e) (BuildId: 7f5d2098f168c4db) #4 (anonymous namespace)::CommandLineParser::ResetAllOptionOccurrences() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:1478:17 (xla_extension.so+0x100cac0e) #5 llvm::cl::ResetAllOptionOccurrences() /proc/self/cwd/external/llvm-project/llvm/lib/Support/CommandLine.cpp:2831:17 (xla_extension.so+0x100caa72) (BuildId: 7f5d2098f168c4db) #6 xla::llvm_ir::LLVMCommandLineOptionsLock::LLVMCommandLineOptionsLock(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>> const&) /proc/self/cwd/external/xla/xla/service/llvm_ir/llvm_command_line_options.cc:70:5 (xla_extension.so+0x91d69f4) (BuildId: 7f5d2098f168c4db) #7 xla::cpu::CpuCompiler::RunBackend(std::unique_ptr<xla::HloModule, std::default_delete<xla::HloModule>>, stream_executor::StreamExecutor*, xla::Compiler::CompileOptions const&) /proc/self/cwd/external/xla/xla/service/cpu/cpu_compiler.cc:1727:39 (xla_extension.so+0x2f781c8) (BuildId: 7f5d2098f168c4db) #8 xla::JitCompile(xla::XlaComputation const&, absl::lts_20230802::Span<xla::Shape const* const>, xla::ExecutableBuildOptions const&, xla::ExecutionOptions const&, xla::Compiler::CompileOptions const&, int, std::function<void (xla::HloModuleConfig&)>) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:754:19 (xla_extension.so+0x2f12883) (BuildId: 7f5d2098f168c4db) #9 xla::TfrtCpuClient::Compile(xla::XlaComputation const&, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:842:3 (xla_extension.so+0x2f12883) #10 xla::TfrtCpuClient::Compile(mlir::ModuleOp, xla::CompileOptions) /proc/self/cwd/external/xla/xla/pjrt/cpu/cpu_client.cc:888:10 (xla_extension.so+0x2f13da2) (BuildId: 7f5d2098f168c4db) #11 xla::ifrt::PjRtLoadedExecutable::Create(xla::ifrt::PjRtCompatibleClient*, mlir::ModuleOp, xla::CompileOptions, std::vector<tsl::RCReference<xla::ifrt::LoadedHostCallback>, std::allocator<tsl::RCReference<xla::ifrt::LoadedHostCallback>>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_executable.cc:258:3 (xla_extension.so+0xdd04d77) (BuildId: 7f5d2098f168c4db) #12 xla::ifrt::PjRtCompiler::Compile(std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/pjrt_ifrt/pjrt_compiler.cc:97:10 (xla_extension.so+0xdcfd29b) (BuildId: 7f5d2098f168c4db) #13 xla::PyClient::CompileIfrtProgram(xla::nb_class_ptr<xla::PyClient>, std::unique_ptr<xla::ifrt::Program, std::default_delete<xla::ifrt::Program>>, std::unique_ptr<xla::ifrt::CompileOptions, std::default_delete<xla::ifrt::CompileOptions>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:443:5 (xla_extension.so+0xc62a228) (BuildId: 7f5d2098f168c4db) #14 xla::PyClient::Compile(xla::nb_class_ptr<xla::PyClient>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, xla::CompileOptions, std::vector<nanobind::capsule, std::allocator<nanobind::capsule>>) /proc/self/cwd/external/xla/xla/python/py_client.cc:466:10 (xla_extension.so+0xc62b514) (BuildId: 7f5d2098f168c4db) ``` PiperOrigin-RevId: 707721170

Fix readme

734c747

joker-eph changed the title ~~Fix readme~~ Fix link in Readme Aug 10, 2022

joker-eph merged commit f5a8cfe into openxla:main Aug 10, 2022

copybara-service bot pushed a commit that referenced this pull request Mar 9, 2023

Add build_test for auto_sharding_runner (CL #2)

b98a068

PiperOrigin-RevId: 515308128

copybara-service bot mentioned this pull request Mar 9, 2023

Add build_test for auto_sharding_runner (CL #2) #1801

Closed

copybara-service bot pushed a commit that referenced this pull request Mar 10, 2023

Add build_test for auto_sharding_runner (CL #2)

13eaf02

PiperOrigin-RevId: 515308128

SandSnip3r mentioned this pull request May 23, 2023

[XLA GPU] Support for mix type gemm bias addition fusion #2859

Closed

copybara-service bot mentioned this pull request Oct 24, 2023

Import openai/triton from GitHub. #6507

Closed

wenscarl added a commit to wenscarl/xla that referenced this pull request Nov 1, 2023

Improve based on review openxla#2

e906d76

copybara-service bot mentioned this pull request Nov 2, 2023

PR #6599: Fp8 Fast Accumulation support for cublasLt #6730

Closed

copybara-service bot pushed a commit that referenced this pull request Jan 16, 2024

[xla:gpu] Do not use ncclSend and ncclRecv directly and use NcclApi p…

f29ab1e

…art #2 PiperOrigin-RevId: 598915673

copybara-service bot mentioned this pull request Jan 16, 2024

[xla:gpu] Do not use ncclSend and ncclRecv directly and use NcclApi part #2 #8523

Merged

copybara-service bot pushed a commit that referenced this pull request Jan 16, 2024

[xla:gpu] Do not use ncclSend and ncclRecv directly and use NcclApi p…

adf7932

…art #2 PiperOrigin-RevId: 598915673

copybara-service bot pushed a commit that referenced this pull request Jan 17, 2024

[xla:gpu] Do not use ncclSend and ncclRecv directly and use NcclApi p…

8462b20

…art #2 PiperOrigin-RevId: 598915673

copybara-service bot pushed a commit that referenced this pull request Jan 17, 2024

[xla:gpu] Do not use ncclSend and ncclRecv directly and use NcclApi p…

0db54aa

…art #2 PiperOrigin-RevId: 598915673

copybara-service bot pushed a commit that referenced this pull request Jan 17, 2024

[xla:gpu] Do not use ncclSend and ncclRecv directly and use NcclApi p…

00e8017

…art #2 PiperOrigin-RevId: 598915673

copybara-service bot pushed a commit that referenced this pull request Jan 17, 2024

[xla:gpu] Do not use ncclSend and ncclRecv directly and use NcclApi p…

2e6a581

…art #2 PiperOrigin-RevId: 598915673

copybara-service bot pushed a commit that referenced this pull request Jan 17, 2024

[xla:gpu] Do not use ncclSend and ncclRecv directly and use NcclApi p…

84edc89

…art #2 PiperOrigin-RevId: 598915673

copybara-service bot pushed a commit that referenced this pull request Jan 17, 2024

[xla:gpu] Do not use ncclSend and ncclRecv directly and use NcclApi p…

588171c

…art #2 PiperOrigin-RevId: 599037622

wenscarl added a commit to wenscarl/xla that referenced this pull request Apr 10, 2024

Improve based on review openxla#2

b6a7223

copybara-service bot pushed a commit that referenced this pull request May 1, 2024

[xla:cpu] NFC: Remove deprecated XLA:CPU mlir based codegen part #2

0ab75ec

PiperOrigin-RevId: 629829581

copybara-service bot mentioned this pull request May 1, 2024

[xla:cpu] NFC: Remove deprecated XLA:CPU mlir based codegen part #2 #12038

Merged

copybara-service bot pushed a commit that referenced this pull request May 1, 2024

[xla:cpu] NFC: Remove deprecated XLA:CPU mlir based codegen part #2

a05ff09

PiperOrigin-RevId: 629867362

copybara-service bot mentioned this pull request Jul 31, 2024

[xla:cpu] Optimize ThunkExecutor::Execute part #2 #15567

Merged

loislo mentioned this pull request Oct 2, 2024

[NVIDIA GPU] Use memcpy for intra-node all-to-all #15144

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix link in Readme #2

Fix link in Readme #2

bhack commented Aug 9, 2022

Fix link in Readme #2

Fix link in Readme #2

Conversation

bhack commented Aug 9, 2022