Skip to content

[LLVM] Add GNU make jobserver support #145131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

yxsamliu
Copy link
Collaborator

This patch introduces support for the jobserver protocol to control parallelism for device offloading tasks.

When running a parallel build with a modern build system like make -jN or ninja -jN, each Clang process might also be configured to use multiple threads for its own tasks (e.g., via --offload-jobs=4). This can lead to an explosion of threads (N * 4), causing heavy system load, CPU contention, and ultimately slowing down the entire build.

This patch allows Clang to act as a cooperative client of the build system's jobserver. It extends the --offload-jobs option to accept the value 'jobserver'. With the recent addition of jobserver support to the Ninja build system, this functionality now benefits users of both Make and Ninja.

When --offload-jobs=jobserver is specified, Clang's thread pool will:

  1. Parse the MAKEFLAGS environment variable to find the jobserver details.
  2. Before dispatching a task, acquire a job slot from the jobserver. If none are available, the worker thread will block.
  3. Release the job slot once the task is complete.

This ensures that the total number of active offload tasks across all Clang processes does not exceed the limit defined by the parent build system, leading to more efficient and controlled parallel builds.

Implementation:

  • A new library, llvm/Support/Jobserver, is added to provide a platform-agnostic client for the jobserver protocol, with backends for Unix (FIFO) and Windows (semaphores).
  • llvm/Support/ThreadPool and llvm/Support/Parallel are updated with a jobserver_concurrency strategy to integrate this logic.
  • The Clang driver and linker-wrapper are modified to recognize the 'jobserver' argument and enable the new thread pool strategy.
  • New unit and integration tests are added to validate the feature.

This patch introduces support for the jobserver protocol to control
parallelism for device offloading tasks.

When running a parallel build with a modern build system like `make -jN`
or `ninja -jN`, each Clang process might also be configured to use
multiple threads for its own tasks (e.g., via `--offload-jobs=4`). This
can lead to an explosion of threads (N * 4), causing heavy system load,
CPU contention, and ultimately slowing down the entire build.

This patch allows Clang to act as a cooperative client of the build
system's jobserver. It extends the `--offload-jobs` option to accept the
value 'jobserver'. With the recent addition of jobserver support to the
Ninja build system, this functionality now benefits users of both Make
and Ninja.

When `--offload-jobs=jobserver` is specified, Clang's thread pool will:
1. Parse the MAKEFLAGS environment variable to find the jobserver details.
2. Before dispatching a task, acquire a job slot from the jobserver. If
   none are available, the worker thread will block.
3. Release the job slot once the task is complete.

This ensures that the total number of active offload tasks across all
Clang processes does not exceed the limit defined by the parent build
system, leading to more efficient and controlled parallel builds.

Implementation:
- A new library, `llvm/Support/Jobserver`, is added to provide a
  platform-agnostic client for the jobserver protocol, with backends
  for Unix (FIFO) and Windows (semaphores).
- `llvm/Support/ThreadPool` and `llvm/Support/Parallel` are updated
  with a `jobserver_concurrency` strategy to integrate this logic.
- The Clang driver and linker-wrapper are modified to recognize the
  'jobserver' argument and enable the new thread pool strategy.
- New unit and integration tests are added to validate the feature.
@yxsamliu yxsamliu requested review from MaskRay, Artem-B and jhuber6 June 21, 2025 01:54
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' platform:windows llvm:support labels Jun 21, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 21, 2025

@llvm/pr-subscribers-platform-windows
@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-support

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

This patch introduces support for the jobserver protocol to control parallelism for device offloading tasks.

When running a parallel build with a modern build system like make -jN or ninja -jN, each Clang process might also be configured to use multiple threads for its own tasks (e.g., via --offload-jobs=4). This can lead to an explosion of threads (N * 4), causing heavy system load, CPU contention, and ultimately slowing down the entire build.

This patch allows Clang to act as a cooperative client of the build system's jobserver. It extends the --offload-jobs option to accept the value 'jobserver'. With the recent addition of jobserver support to the Ninja build system, this functionality now benefits users of both Make and Ninja.

When --offload-jobs=jobserver is specified, Clang's thread pool will:

  1. Parse the MAKEFLAGS environment variable to find the jobserver details.
  2. Before dispatching a task, acquire a job slot from the jobserver. If none are available, the worker thread will block.
  3. Release the job slot once the task is complete.

This ensures that the total number of active offload tasks across all Clang processes does not exceed the limit defined by the parent build system, leading to more efficient and controlled parallel builds.

Implementation:

  • A new library, llvm/Support/Jobserver, is added to provide a platform-agnostic client for the jobserver protocol, with backends for Unix (FIFO) and Windows (semaphores).
  • llvm/Support/ThreadPool and llvm/Support/Parallel are updated with a jobserver_concurrency strategy to integrate this logic.
  • The Clang driver and linker-wrapper are modified to recognize the 'jobserver' argument and enable the new thread pool strategy.
  • New unit and integration tests are added to validate the feature.

Patch is 57.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145131.diff

18 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+3-2)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+14-8)
  • (modified) clang/test/Driver/hip-options.hip (+6)
  • (modified) clang/test/Driver/linker-wrapper.c (+2)
  • (modified) clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp (+12-6)
  • (modified) clang/tools/clang-linker-wrapper/LinkerWrapperOpts.td (+2-1)
  • (added) llvm/include/llvm/Support/Jobserver.h (+141)
  • (modified) llvm/include/llvm/Support/ThreadPool.h (+4)
  • (modified) llvm/include/llvm/Support/Threading.h (+18)
  • (modified) llvm/lib/Support/CMakeLists.txt (+1)
  • (added) llvm/lib/Support/Jobserver.cpp (+257)
  • (modified) llvm/lib/Support/Parallel.cpp (+50-4)
  • (modified) llvm/lib/Support/ThreadPool.cpp (+89-3)
  • (modified) llvm/lib/Support/Threading.cpp (+5)
  • (added) llvm/lib/Support/Unix/Jobserver.inc (+195)
  • (added) llvm/lib/Support/Windows/Jobserver.inc (+75)
  • (modified) llvm/unittests/Support/CMakeLists.txt (+1)
  • (added) llvm/unittests/Support/JobserverTest.cpp (+436)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 0ffd8c40da7da..771b336b5585f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1243,8 +1243,9 @@ def offload_compression_level_EQ : Joined<["--"], "offload-compression-level=">,
   HelpText<"Compression level for offload device binaries (HIP only)">;
 
 def offload_jobs_EQ : Joined<["--"], "offload-jobs=">,
-  HelpText<"Specify the number of threads to use for device offloading tasks"
-           " during compilation.">;
+  HelpText<"Specify the number of threads to use for device offloading tasks "
+           "during compilation. Can be a positive integer or the string "
+           "'jobserver' to use the make-style jobserver from the environment.">;
 
 defm offload_via_llvm : BoolFOption<"offload-via-llvm",
   LangOpts<"OffloadViaLLVM">, DefaultFalse,
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 2bb42a319eccf..a2a206eec700f 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -9293,14 +9293,20 @@ void LinkerWrapper::ConstructJob(Compilation &C, const JobAction &JA,
   addOffloadCompressArgs(Args, CmdArgs);
 
   if (Arg *A = Args.getLastArg(options::OPT_offload_jobs_EQ)) {
-    int NumThreads;
-    if (StringRef(A->getValue()).getAsInteger(10, NumThreads) ||
-        NumThreads <= 0)
-      C.getDriver().Diag(diag::err_drv_invalid_int_value)
-          << A->getAsString(Args) << A->getValue();
-    else
-      CmdArgs.push_back(
-          Args.MakeArgString("--wrapper-jobs=" + Twine(NumThreads)));
+    StringRef Val = A->getValue();
+
+    if (Val.equals_insensitive("jobserver"))
+      CmdArgs.push_back(Args.MakeArgString("--wrapper-jobs=jobserver"));
+    else {
+      int NumThreads;
+      if (Val.getAsInteger(10, NumThreads) || NumThreads <= 0) {
+        C.getDriver().Diag(diag::err_drv_invalid_int_value)
+            << A->getAsString(Args) << Val;
+      } else {
+        CmdArgs.push_back(
+            Args.MakeArgString("--wrapper-jobs=" + Twine(NumThreads)));
+      }
+    }
   }
 
   const char *Exec =
diff --git a/clang/test/Driver/hip-options.hip b/clang/test/Driver/hip-options.hip
index a07dca3638565..1f2b1b4858b02 100644
--- a/clang/test/Driver/hip-options.hip
+++ b/clang/test/Driver/hip-options.hip
@@ -259,3 +259,9 @@
 // RUN:   --offload-arch=gfx1100 --offload-new-driver --offload-jobs=0x4 %s 2>&1 | \
 // RUN:   FileCheck -check-prefix=INVJOBS %s
 // INVJOBS: clang: error: invalid integral value '0x4' in '--offload-jobs=0x4'
+
+// RUN: %clang -### -Werror --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
+// RUN:   --offload-arch=gfx1100 --offload-new-driver --offload-jobs=jobserver %s 2>&1 | \
+// RUN:   FileCheck -check-prefix=JOBSV %s
+// JOBSV: clang-linker-wrapper{{.*}} "--wrapper-jobs=jobserver"
+
diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c
index 80b1a5745a123..de14f8cd29a13 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -114,6 +114,8 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN:   -fembed-offload-object=%t.out
 // RUN: clang-linker-wrapper --dry-run --host-triple=x86_64-unknown-linux-gnu --wrapper-jobs=4 \
 // RUN: --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CUDA-PAR
+// RUN: clang-linker-wrapper --dry-run --host-triple=x86_64-unknown-linux-gnu --wrapper-jobs=jobserver \
+// RUN: --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CUDA-PAR
 
 // CUDA-PAR: fatbinary{{.*}}-64 --create {{.*}}.fatbin
 
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 0f1fa8b329fd6..fffbe054c1ca1 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -1420,12 +1420,18 @@ int main(int Argc, char **Argv) {
 
   parallel::strategy = hardware_concurrency(1);
   if (auto *Arg = Args.getLastArg(OPT_wrapper_jobs)) {
-    unsigned Threads = 0;
-    if (!llvm::to_integer(Arg->getValue(), Threads) || Threads == 0)
-      reportError(createStringError("%s: expected a positive integer, got '%s'",
-                                    Arg->getSpelling().data(),
-                                    Arg->getValue()));
-    parallel::strategy = hardware_concurrency(Threads);
+    StringRef Val = Arg->getValue();
+    if (Val.equals_insensitive("jobserver"))
+      parallel::strategy = jobserver_concurrency();
+    else {
+      unsigned Threads = 0;
+      if (!llvm::to_integer(Val, Threads) || Threads == 0) {
+        reportError(createStringError(
+            "%s: expected a positive integer or 'jobserver', got '%s'",
+            Arg->getSpelling().data(), Val.data()));
+      } else
+        parallel::strategy = hardware_concurrency(Threads);
+    }
   }
 
   if (Args.hasArg(OPT_wrapper_time_trace_eq)) {
diff --git a/clang/tools/clang-linker-wrapper/LinkerWrapperOpts.td b/clang/tools/clang-linker-wrapper/LinkerWrapperOpts.td
index 17fb9db35fe39..7b3f632e25a6a 100644
--- a/clang/tools/clang-linker-wrapper/LinkerWrapperOpts.td
+++ b/clang/tools/clang-linker-wrapper/LinkerWrapperOpts.td
@@ -53,7 +53,8 @@ def wrapper_time_trace_granularity : Joined<["--"], "wrapper-time-trace-granular
 
 def wrapper_jobs : Joined<["--"], "wrapper-jobs=">,
   Flags<[WrapperOnlyOption]>, MetaVarName<"<number>">,
-  HelpText<"Sets the number of parallel jobs to use for device linking">;
+  HelpText<"Sets the number of parallel jobs for device linking. Can be a "
+            "positive integer or 'jobserver'.">;
 
 def override_image : Joined<["--"], "override-image=">,
   Flags<[WrapperOnlyOption]>, MetaVarName<"<kind=file>">,
diff --git a/llvm/include/llvm/Support/Jobserver.h b/llvm/include/llvm/Support/Jobserver.h
new file mode 100644
index 0000000000000..0ddcc9bc7e472
--- /dev/null
+++ b/llvm/include/llvm/Support/Jobserver.h
@@ -0,0 +1,141 @@
+//===- llvm/Support/Jobserver.h - Jobserver Client --------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines a client for the GNU Make jobserver protocol. This allows
+// LLVM tools to coordinate parallel execution with a parent `make` process.
+//
+// The jobserver protocol is a mechanism for GNU Make to share its pool of
+// available "job slots" with the subprocesses it invokes. This is particularly
+// useful for tools that can perform parallel operations themselves (e.g., a
+// multi-threaded linker or compiler). By participating in this protocol, a
+// tool can ensure the total number of concurrent jobs does not exceed the
+// limit specified by the user (e.g., `make -j8`).
+//
+// How it works:
+//
+// 1. Establishment:
+//    A child process discovers the jobserver by inspecting the `MAKEFLAGS`
+//    environment variable. If a jobserver is active, this variable will
+//    contain a `--jobserver-auth=<value>` argument. The format of `<value>`
+//    determines how to communicate with the server.
+//
+// 2. The Implicit Slot:
+//    Every command invoked by `make` is granted one "implicit" job slot. This
+//    means a tool can always perform at least one unit of work without needing
+//    to communicate with the jobserver. This implicit slot should NEVER be
+//    released back to the jobserver.
+//
+// 3. Acquiring and Releasing Slots:
+//    On POSIX systems, the jobserver is implemented as a pipe. The
+//    `--jobserver-auth` value specifies either a path to a named pipe
+//    (`fifo:PATH`) or a pair of file descriptors (`R,W`). The pipe is
+//    pre-loaded with single-character tokens, one for each available job slot.
+//
+//    - To acquire an additional slot, a client reads a single-character token
+//      from the pipe.
+//    - To release a slot, the client must write the *exact same* character
+//      token back to the pipe.
+//
+//    It is critical that a client releases all acquired slots before it exits,
+//    even in cases of error, to avoid deadlocking the build.
+//
+// Example:
+//    A multi-threaded linker invoked by `make -j8` wants to use multiple
+//    threads. It first checks for the jobserver. It knows it has one implicit
+//    slot, so it can use one thread. It then tries to acquire 7 more slots by
+//    reading 7 tokens from the jobserver pipe. If it only receives 3 tokens,
+//    it knows it can use a total of 1 (implicit) + 3 (acquired) = 4 threads.
+//    Before exiting, it must write the 3 tokens it read back to the pipe.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_JOBSERVER_H
+#define LLVM_SUPPORT_JOBSERVER_H
+
+#include "llvm/ADT/StringRef.h"
+#include <memory>
+#include <string>
+
+namespace llvm {
+
+/// A JobSlot represents a single job slot that can be acquired from or released
+/// to a jobserver pool. This class is move-only.
+class JobSlot {
+public:
+  /// Default constructor creates an invalid instance.
+  JobSlot() = default;
+
+  // Move operations are allowed.
+  JobSlot(JobSlot &&Other) noexcept : Value(Other.Value) { Other.Value = -1; }
+  JobSlot &operator=(JobSlot &&Other) noexcept {
+    if (this != &Other) {
+      this->Value = Other.Value;
+      Other.Value = -1;
+    }
+    return *this;
+  }
+
+  // Copy operations are disallowed.
+  JobSlot(const JobSlot &) = delete;
+  JobSlot &operator=(const JobSlot &) = delete;
+
+  /// Returns true if this instance is valid (either implicit or explicit).
+  bool isValid() const { return Value >= 0; }
+
+  /// Returns true if this instance represents the implicit job slot.
+  bool isImplicit() const { return Value == kImplicitValue; }
+
+  static JobSlot createExplicit(uint8_t V) {
+    return JobSlot(static_cast<int16_t>(V));
+  }
+
+  static JobSlot createImplicit() { return JobSlot(kImplicitValue); }
+
+  uint8_t getExplicitValue() const;
+  bool isExplicit() const { return isValid() && !isImplicit(); }
+
+private:
+  friend class JobserverClient;
+  friend class JobserverClientImpl;
+
+  JobSlot(int16_t V) : Value(V) {}
+
+  static constexpr int16_t kImplicitValue = 256;
+  int16_t Value = -1;
+};
+
+/// The public interface for a jobserver client.
+/// This client is a lazy-initialized singleton that is created on first use.
+class JobserverClient {
+public:
+  virtual ~JobserverClient();
+
+  /// Tries to acquire a job slot from the pool. On failure (e.g., if the pool
+  /// is empty), this returns an invalid JobSlot instance. The first successful
+  /// call will always return the implicit slot.
+  virtual JobSlot tryAcquire() = 0;
+
+  /// Releases a job slot back to the pool.
+  virtual void release(JobSlot Slot) = 0;
+
+  /// Returns the number of job slots available, as determined on first use.
+  /// This value is cached. Returns 0 if no jobserver is active.
+  virtual unsigned getNumJobs() const = 0;
+
+  /// Returns the singleton instance of the JobserverClient.
+  /// The instance is created on the first call to this function.
+  /// Returns a nullptr if no jobserver is configured or an error occurs.
+  static JobserverClient *getInstance();
+
+  /// Resets the singleton instance. For testing purposes only.
+  static void resetForTesting();
+};
+
+} // end namespace llvm
+
+#endif // LLVM_SUPPORT_JOBSERVER_H
diff --git a/llvm/include/llvm/Support/ThreadPool.h b/llvm/include/llvm/Support/ThreadPool.h
index 9272760fc140a..3f0013700392d 100644
--- a/llvm/include/llvm/Support/ThreadPool.h
+++ b/llvm/include/llvm/Support/ThreadPool.h
@@ -16,6 +16,7 @@
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/Config/llvm-config.h"
 #include "llvm/Support/Compiler.h"
+#include "llvm/Support/Jobserver.h"
 #include "llvm/Support/RWMutex.h"
 #include "llvm/Support/Threading.h"
 #include "llvm/Support/thread.h"
@@ -184,6 +185,7 @@ class LLVM_ABI StdThreadPool : public ThreadPoolInterface {
   void grow(int requested);
 
   void processTasks(ThreadPoolTaskGroup *WaitingForGroup);
+  void processTasksWithJobserver();
 
   /// Threads in flight
   std::vector<llvm::thread> Threads;
@@ -212,6 +214,8 @@ class LLVM_ABI StdThreadPool : public ThreadPoolInterface {
 
   /// Maximum number of threads to potentially grow this pool to.
   const unsigned MaxThreadCount;
+
+  JobserverClient *TheJobserver = nullptr;
 };
 #endif // LLVM_ENABLE_THREADS
 
diff --git a/llvm/include/llvm/Support/Threading.h b/llvm/include/llvm/Support/Threading.h
index d3fe0a57ee44e..88846807f111a 100644
--- a/llvm/include/llvm/Support/Threading.h
+++ b/llvm/include/llvm/Support/Threading.h
@@ -142,6 +142,11 @@ constexpr bool llvm_is_multithreaded() { return LLVM_ENABLE_THREADS; }
     /// the thread shall remain on the actual CPU socket.
     LLVM_ABI std::optional<unsigned>
     compute_cpu_socket(unsigned ThreadPoolNum) const;
+
+    /// If true, the thread pool will attempt to coordinate with a GNU Make
+    /// jobserver, acquiring a job slot before processing a task. If no
+    /// jobserver is found in the environment, this is ignored.
+    bool UseJobserver = false;
   };
 
   /// Build a strategy from a number of threads as a string provided in \p Num.
@@ -210,6 +215,19 @@ constexpr bool llvm_is_multithreaded() { return LLVM_ENABLE_THREADS; }
     return S;
   }
 
+  /// Returns a thread strategy that attempts to coordinate with a GNU Make
+  /// jobserver. The number of active threads will be limited by the number of
+  /// available job slots. If no jobserver is detected in the environment, this
+  /// strategy falls back to the default hardware_concurrency() behavior.
+  inline ThreadPoolStrategy jobserver_concurrency() {
+    ThreadPoolStrategy S;
+    S.UseJobserver = true;
+    // We can still request all threads be created, as they will simply
+    // block waiting for a job slot if the jobserver is the limiting factor.
+    S.ThreadsRequested = 0; // 0 means 'use all available'
+    return S;
+  }
+
   /// Return the current thread id, as used in various OS system calls.
   /// Note that not all platforms guarantee that the value returned will be
   /// unique across the entire system, so portable code should not assume
diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index 45d961e994a1a..daacab128b498 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -205,6 +205,7 @@ add_llvm_component_library(LLVMSupport
   InstructionCost.cpp
   IntEqClasses.cpp
   IntervalMap.cpp
+  Jobserver.cpp
   JSON.cpp
   KnownBits.cpp
   KnownFPClass.cpp
diff --git a/llvm/lib/Support/Jobserver.cpp b/llvm/lib/Support/Jobserver.cpp
new file mode 100644
index 0000000000000..a0a3eeb64be28
--- /dev/null
+++ b/llvm/lib/Support/Jobserver.cpp
@@ -0,0 +1,257 @@
+//===- llvm/Support/Jobserver.cpp - Jobserver Client Implementation -------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Support/Jobserver.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/Config/llvm-config.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Support/raw_ostream.h"
+
+#include <atomic>
+#include <memory>
+#include <mutex>
+#include <new>
+
+#define DEBUG_TYPE "jobserver"
+
+using namespace llvm;
+
+namespace {
+struct JobserverConfig {
+  enum Mode {
+    None,
+    PosixFifo,
+    PosixPipe,
+    Win32Semaphore,
+  };
+  Mode TheMode = None;
+  std::string Path;
+  int ReadFD = -1;
+  int WriteFD = -1;
+};
+} // namespace
+
+namespace {
+Expected<JobserverConfig> parseNativeMakeFlags(StringRef MakeFlags);
+} // namespace
+
+class JobserverClientImpl : public JobserverClient {
+  bool IsInitialized = false;
+  std::atomic<bool> HasImplicitSlot{true};
+  unsigned NumJobs = 0;
+
+public:
+  JobserverClientImpl(const JobserverConfig &Config);
+  ~JobserverClientImpl() override;
+
+  JobSlot tryAcquire() override;
+  void release(JobSlot Slot) override;
+  unsigned getNumJobs() const override { return NumJobs; }
+
+  bool isValid() const { return IsInitialized; }
+
+private:
+#if defined(LLVM_ON_UNIX)
+  int ReadFD = -1;
+  int WriteFD = -1;
+  std::string FifoPath;
+#elif defined(_WIN32)
+  void *Semaphore = nullptr;
+#endif
+};
+
+// Include the platform-specific parts of the class.
+#if defined(LLVM_ON_UNIX)
+#include "Unix/Jobserver.inc"
+#elif defined(_WIN32)
+#include "Windows/Jobserver.inc"
+#endif
+
+JobserverClient::~JobserverClient() = default;
+
+uint8_t JobSlot::getExplicitValue() const {
+  assert(isExplicit() && "Cannot get value of implicit or invalid slot");
+  return static_cast<uint8_t>(Value);
+}
+
+static std::once_flag GJobserverOnceFlag;
+static std::unique_ptr<JobserverClient> GJobserver;
+
+/// This is the main entry point for acquiring a jobserver client. It uses a
+/// std::call_once to ensure the singleton `GJobserver` instance is created
+/// safely in a multi-threaded environment. On first call, it reads the
+/// `MAKEFLAGS` environment variable, parses it, and attempts to construct and
+/// initialize a `JobserverClientImpl`. If successful, the global instance is
+/// stored in `GJobserver`. Subsequent calls will return the existing instance.
+JobserverClient *JobserverClient::getInstance() {
+  std::call_once(GJobserverOnceFlag, []() {
+    LLVM_DEBUG(
+        dbgs()
+        << "JobserverClient::getInstance() called for the first time.\n");
+    const char *MakeFlagsEnv = getenv("MAKEFLAGS");
+    if (!MakeFlagsEnv) {
+      errs() << "Warning: failed to create jobserver client due to MAKEFLAGS "
+                "environment variable not found\n";
+      return;
+    }
+
+    LLVM_DEBUG(dbgs() << "Found MAKEFLAGS = \"" << MakeFlagsEnv << "\"\n");
+
+    auto ConfigOrErr = parseNativeMakeFlags(MakeFlagsEnv);
+    if (Error Err = ConfigOrErr.takeError()) {
+      errs() << "Warning: failed to create jobserver client due to invalid "
+                "MAKEFLAGS environment variable: "
+             << toString(std::move(Err)) << "\n";
+      return;
+    }
+
+    JobserverConfig Config = *ConfigOrErr;
+    if (Config.TheMode == JobserverConfig::None) {
+      errs() << "Warning: failed to create jobserver client due to jobserver "
+                "mode missing in MAKEFLAGS environment variable\n";
+      return;
+    }
+
+    if (Config.TheMode == JobserverConfig::PosixPipe) {
+#if defined(LLVM_ON_UNIX)
+      if (!areFdsValid(Config.ReadFD, Config.WriteFD)) {
+        errs() << "Warning: failed to create jobserver client due to invalid "
+                  "Pipe FDs in MAKEFLAGS environment variable\n";
+        return;
+      }
+#endif
+    }
+
+    auto Client = std::make_unique<JobserverClientImpl>(Config);
+    if (Client->isValid()) {
+      LLVM_DEBUG(dbgs() << "Jobserver client created successfully!\n");
+      GJobserver = std::move(Client);
+    } else
+      errs() << "Warning: jobserver client initialization failed.\n";
+  });
+  return GJobserver.get();
+}
+
+/// For testing purposes only. This function resets the singleton instance by
+/// destroying the existing client and re-initializing the `std::once_flag`.
+/// This allows tests to simulate the first-time initialization of the
+/// jobserver client multiple times.
+void JobserverClient::resetForTesting() {
+  GJobserver.reset();
+  // Re-construct the std::once_flag in place to reset the singleton state.
+  new (&GJobserverOnceFlag) std::once_flag();
+}
+
+namespace {
+/// A helper function that checks if `Input` starts with `Prefix`.
+/// If it does, it removes the prefix from `Input`, assigns the remainder to
+/// `Value`, and returns true. Otherwise, it returns false.
+bool getPrefixedValue(StringRef Input, StringRef Prefix, StringRef &Value) {
+  if (Input.consume_front(Prefix)) {
+    Value = Input;
+    return true;
+  }
+  return false;
+}
+
+/// A helper function to parse a string in the format "R,W" where R and W are
+/// ...
[truncated]

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions c,h,inc,cpp -- llvm/include/llvm/Support/Jobserver.h llvm/lib/Support/Jobserver.cpp llvm/lib/Support/Unix/Jobserver.inc llvm/lib/Support/Windows/Jobserver.inc llvm/unittests/Support/JobserverTest.cpp clang/lib/Driver/ToolChains/Clang.cpp clang/test/Driver/linker-wrapper.c clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp llvm/include/llvm/Support/ThreadPool.h llvm/include/llvm/Support/Threading.h llvm/lib/Support/Parallel.cpp llvm/lib/Support/ThreadPool.cpp llvm/lib/Support/Threading.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Support/Unix/Jobserver.inc b/llvm/lib/Support/Unix/Jobserver.inc
index 48599cf11..c22e99baa 100644
--- a/llvm/lib/Support/Unix/Jobserver.inc
+++ b/llvm/lib/Support/Unix/Jobserver.inc
@@ -14,9 +14,9 @@
 #include <cassert>
 #include <cerrno>
 #include <fcntl.h>
+#include <string.h>
 #include <sys/stat.h>
 #include <unistd.h>
-#include <string.h>
 
 namespace {
 /// Returns true if the given file descriptor is a FIFO (named pipe).
@@ -166,8 +166,8 @@ void JobserverClientImpl::release(JobSlot Slot) {
   }
 
   uint8_t Token = Slot.getExplicitValue();
-  LLVM_DEBUG(dbgs() << "Releasing explicit token '" << (char)Token
-                    << "' to FD " << WriteFD << ".\n");
+  LLVM_DEBUG(dbgs() << "Releasing explicit token '" << (char)Token << "' to FD "
+                    << WriteFD << ".\n");
 
   // For FIFO-based jobservers, the write FD might not be open yet.
   // Open it on the first release.
diff --git a/llvm/lib/Support/Windows/Jobserver.inc b/llvm/lib/Support/Windows/Jobserver.inc
index f7fd6cf89..b73e98932 100644
--- a/llvm/lib/Support/Windows/Jobserver.inc
+++ b/llvm/lib/Support/Windows/Jobserver.inc
@@ -21,7 +21,7 @@
 /// successfully, the client is marked as initialized.
 JobserverClientImpl::JobserverClientImpl(const JobserverConfig &Config) {
   Semaphore = (void *)::OpenSemaphoreA(SEMAPHORE_MODIFY_STATE | SYNCHRONIZE,
-                                      FALSE, Config.Path.c_str());
+                                       FALSE, Config.Path.c_str());
   if (Semaphore != nullptr)
     IsInitialized = true;
 }
@@ -39,7 +39,8 @@ JobserverClientImpl::~JobserverClientImpl() {
 /// If the wait times out, it means no slots are available, and an invalid
 /// slot is returned.
 JobSlot JobserverClientImpl::tryAcquire() {
-  if (!IsInitialized) return JobSlot();
+  if (!IsInitialized)
+    return JobSlot();
 
   // First, grant the implicit slot.
   if (HasImplicitSlot.exchange(false, std::memory_order_acquire)) {
@@ -61,7 +62,8 @@ JobSlot JobserverClientImpl::tryAcquire() {
 /// by one using `ReleaseSemaphore`, making the slot available to other
 /// processes.
 void JobserverClientImpl::release(JobSlot Slot) {
-  if (!IsInitialized || !Slot.isValid()) return;
+  if (!IsInitialized || !Slot.isValid())
+    return;
 
   if (Slot.isImplicit()) {
     [[maybe_unused]] bool was_already_released =

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may need to be discussed in a wider forum.

While I do not think that build job management is compiler's job, cooperation with the outer build system may be useful. I may be persuaded that it's worth it.

The question is -- is that useful enough to warrant feature creep in clang driver? Perhaps it may be better to give external tools a way to manage offloading compilation sub-jobs. E.g. we could produce a json with the actions and input/output dependencies and let the build tool schedule them.

It's certainly feasible for a "classic" CUDA compilation before the new driver, but I can see that it may be problematic for a more complicated new-driver offloading machinery that does a lot more under the hood. Integration with the build tools will also be an issue. Given that the build server does exist, cooperating with it may be a relatively easy compromise.

@jhuber6
Copy link
Contributor

jhuber6 commented Jun 23, 2025

Yeah, I think overall it might be better to have a way to split up these offload compilations more discretely. Likely we'd want the ability to compile the device code portion independently and then bundle it back together or something. The changes here seem really invasion for such a niche application.

@yxsamliu
Copy link
Collaborator Author

GNU make jobserver support is not limited to offload parallel jobs, it can also be used by LTO in lld.

Currently LLVM has parallel and thread pool support but lack a mechanism to coordinate with the build system (make, ninja, msbuild), which cause them to overload the system when enabled with the build systems. The issue with --offload-jobs is just a demonstration of the generic limitation of the LLVM parallel support.

I will open an RFC at LLVM discourse about whether LLVM needs GNU make jobserver support.

@yxsamliu yxsamliu changed the title [Clang][Driver] Add jobserver support for --offload-jobs [LLVM] Add GNU make jobserver support Jun 24, 2025
@yxsamliu
Copy link
Collaborator Author

Comment on lines +137 to +138
auto Releaser =
make_scope_exit([&] { TheJobserver->release(std::move(Slot)); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have exceptions, so I think this should just release it directly rather than using scope_exit.

Comment on lines -82 to +89
// in order for wait() to properly detect that even if the queue is
// in order for wait() to a properly detect that even if the queue is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong.

break; // Successfully acquired a slot.

// If the jobserver is busy, yield to let other threads run.
std::this_thread::yield();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still going to have pretty high CPU usage. We have an exponential backoff helper in llvm/Support/ExponentialBackoff.h. Although ideally this would be event based.

Comment on lines +218 to +219
// The SlotReleaser is destroyed here, returning the token to the jobserver.
// The loop then repeats to process the next job.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not try to get another task while keeping the slot?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category llvm:support platform:windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants