Implement GNU Make jobserver client protocol support in Ninja #2506

digit-google · 2024-10-01T08:24:31Z

This PR implements the GNU Make jobserver client protocol in Ninja. This implements all client modes for Posix (pipe and fifo) and Windows (semaphore).

This protocol allows several participating processes to coordinate parallel build tasks / processing work.
For example, GNU Make, and now Ninja, use it to control how many parallel commands they dispatch at any given time. The Rust compiler and linker, and some C++ compilers (e.g. Clang has -flto=jobserver), use that to control how many parallel threads in a single invocation. The protocol is also implemented by the cargo Rust tool (but the latter only sets the CARGO_MAKEFLAGS environment variable).

Client mode is useful when Ninja is invoked as part of a more complex build, that launches several build tasks in parallel (e.g. recursive Make or CMake invocations). In this mode, Ninja detects that MAKEFLAGS contains --jobserver-auth or --jobserver-fds options, and uses the job slot pool to control its own dispatch of parallel build commands. It also passes the MAKEFLAGS value to child processes to let them participate in the coordination protocol.

This also includes a new script misc/jobserver_pool.py that can be used as a standalone job slot pool implementation, which can be used any client directly for testing.

This has been tested on large Fuchsia build plans, with certain build configurations that launch 24 sub-Ninja builds from a top-level Ninja build plan. With remote builders enabled, this reduces the total time from 22minutes to 12minutes.

This work is inspired by contributions from many other developers, including @hundeboll (see PR #2450), @mcprat (see PR #2474) and @stefanb2 (PR #1140) to name a few.

EDIT: (Removed mention of server mode as this has been pushed to a future PR).

kalvdans

Some minor comments on the code; I haven't actually tried your branch.

misc/jobserver_pool.py

src/jobserver.cc

src/jobserver.h

Neustradamus · 2024-10-12T10:38:23Z

To follow this important PR.

src/jobserver.cc

misc/jobserver_pool.py

src/jobserver-posix.cc

nanonyme · 2024-10-15T13:45:51Z

Does this work together with -flto=auto such that ninja limits flto threads when working as job server?

nekopsykose · 2024-10-15T17:18:20Z

Does this work together with -flto=auto such that ninja limits flto threads when working as job server?

for gcc, it should yes. the gcc lto wrapper basically emits a makefile and calls make on it to run the ltrans jobs, and that sub-make will take jobserver tokens from the ninja jobserver in this pr.

for clang, there is no equivalent- no part of the thinlto process or lld itself takes jobserver tokens (but maybe i missed something), that would require implementing the protocol (reading env vars, having lld and perhaps also the linker plugin used in other linkers spawn threads only based on grabbed tokens, ..)

kaspar030 · 2024-10-17T08:07:46Z

Just to report: I'm developing laze, a build system calling out to Ninja for actual building.

One of laze's core features is its native support for large build matrizes of mostly similar build configurations (e.g., build something for dozens to hundreds of slightly different embedded devices).

In one project, there's a test application that calls out to cmake for building a library, and cmake calls Ninja recursively. So when building this application for 10 boards at the same time (one laze command resulting in one Ninja call), previously, this would on my laptop call 10 cmake+Ninja simultaneously, and each sub-Ninja would run up to 10 gcc instance. While there's enough RAM so this actually finishes, my laptop becomes unresponsive, typing laggy, ...

Using the Ninja binary as built by this PRs CI output and with NINJA_JOBSERVER=1, the number of gcc instances stays at or below 10. Which IMO is the expected behavior and this PR finally adds this to Ninja.

I gave this a hyperfine run to see if total build time is affected, and it seems like the jobserver version is slightly slower, within margin of error:

tests/pkg/relic on  add_laze_buildfiles took 40s371ms 
❯ hyperfine -p "rm -Rf ../../../build" "laze b"
Benchmark 1: laze b
  Time (mean ± σ):     41.419 s ±  1.242 s    [User: 241.710 s, System: 39.031 s]
  Range (min … max):   38.262 s … 42.548 s    10 runs
 
tests/pkg/relic on  add_laze_buildfiles took 6m57s586ms 
❯ NINJA_JOBSERVER=0 hyperfine -p "rm -Rf ../../../build" "laze b"
Benchmark 1: laze b
  Time (mean ± σ):     41.102 s ±  0.968 s    [User: 245.940 s, System: 38.790 s]
  Range (min … max):   38.645 s … 41.908 s    10 runs

(but, the jobserver version did not make typing here laggy ... 🙂)

sw · 2024-10-17T13:30:31Z

Using --jobserver without an argument doesn't work for me.

c:\>ninja --jobserver
ninja: fatal: invalid -j parameter

c:\>ninja --jobserver=0
ninja: error: loading 'build.ninja': The system cannot find the file specified.

c:\>ninja --jobserver=1
ninja: error: loading 'build.ninja': The system cannot find the file specified.

Also, setting NINJA_JOBSERVER and then trying to limit the number of parallel builds on the command line doesn't work, but that seems to be intended ("Explicit parallelism (-j), ignoring NINJA_JOBSERVER environment variable."). Maybe that's what @thesamesam alluded to in #1139. I don't really see a reason for this - is it so the child processes don't accidentally try to be jobservers as well?

With ninja -jX --jobserver=1, it seems to work as expected. We are using Ninja alongside CMake on Windows in a project with many ExternalProjects, which up to now would cause the X*N problem. So I hope that a solution can finally be merged.

robUx4 · 2024-10-18T10:34:39Z

I gave a test on the VLC contrib build which builds in parallel more than a hundred libraries using autotools (make), CMake (ninja) and meson (ninja).

The maximum number of threads seems to be respected on my local machine.

In the CI things are working properly as well in Debian and Ubuntu.

On one machine it logs:

ninja: Jobserver mode detected: k -j48 --jobserver-auth=4,5

On the other:

ninja: Jobserver mode detected: k -j64 -Orecurse --jobserver-auth=3,4

nanonyme · 2024-10-18T12:19:20Z

Using --jobserver without an argument doesn't work for me.
c:\>ninja --jobserver
ninja: fatal: invalid -j parameter

c:\>ninja --jobserver=0
ninja: error: loading 'build.ninja': The system cannot find the file specified.

c:\>ninja --jobserver=1
ninja: error: loading 'build.ninja': The system cannot find the file specified.
Also, setting NINJA_JOBSERVER and then trying to limit the number of parallel builds on the command line doesn't work, but that seems to be intended ("Explicit parallelism (-j), ignoring NINJA_JOBSERVER environment variable."). Maybe that's what @thesamesam alluded to in #1139. I don't really see a reason for this - is it so the child processes don't accidentally try to be jobservers as well?

With ninja -jX --jobserver=1, it seems to work as expected. We are using Ninja alongside CMake on Windows in a project with many ExternalProjects, which up to now would cause the X*N problem. So I hope that a solution can finally be merged.

This sounds like an inconvenient limitation. Why couldn't ninja be jobserver with environment variable when explicit -jN is set?

digit-google · 2024-10-18T16:09:40Z

Thanks, the reason why --jobserver does not work on Windows for @sw is interesting. On this platform, we use our own src/getopt.c implementation which, apparently, only supports optional arguments for short options, not long one.

Besides that, the getopt_long() manpage states that arguments for long options should be provided as --option=arg or --option arg only, but does not say how optional arguments should be processed. This means that something like ninja --jobserver <target> is ambiguous, as it would technically be interpreted as equivalent to ninja --jobserver=<target> which will likely fail.

I am going to get rid of the problem by making --jobserver a simple flag, and adding --jobserver-mode=<mode> to specify the mode instead (so --jobserver-mode=0 will be needed to disable the feature even if NINJA_JOBSERVER is defined in the environment).

Apart from that, @nanonyme is correct that this was to avoid child processes to become jobserver themselves by accident. However, this can be solved by ensuring that NINJA_JOBSERVER is never defined in the these processes, which is simpler, so I'll change this too.

Quick question regarding behavior: Currently:

Using an explicit -j1 disables jobserver client mode, as well as pool mode, as this is interpreted by the client not wanting parallel dispatch).
Using an explicit -j0 disables jobserver pool mode, but not client mode, as this is interpreted by the client asking for "infinite parallelism", which to me seems only useful to see how bad the system reacts under heavy load.

If anyone thinks this is not reasonable or would create a problem for their workflow, let me know. I selected these conditions on a hunch since I never use these myself (well except -j1 in very rare cases).

kepstin · 2024-10-21T16:26:50Z

Is there any chance that jobserver mode could be made "automatic" by default? I.e. act as a jobserver client if a jobserver is available from the environment, otherwise (if -j is set) start a jobserver pool?

This would simplify the use of ninja quite a bit - you don't have to worry about remembering to set an extra ninja command-line parameter or environment variable to ensure that recursive builds, rust, gcc lto., etc. parallelize properly.

jhasse · 2024-10-21T17:48:03Z

Hm ... for automatic detection of the client: Should this be done by checking MAKEFLAGS?
And whether to automatically spawn a server: Build edges could specify if they are a jobserver client and if one edge with that option is part of the build, ninja would activate the jobserver automatically.

Some more general comments about this PR:

I don't like the new environment variable NINJA_JOBSERVER, we should keep them to a minimum and I don't see why it is needed.
I would only implement the newer (better) fifo mode on Linux. That way the --jobserver-mode wouldn't be needed.

configure.py

doc/manual.asciidoc

jhasse · 2024-10-21T17:53:22Z

doc/manual.asciidoc

+
+- Dry-run (i.e. `-n` or `--dry-run`) is not enabled.
+
+- `-j1` (no parallelism) is not used on the command line.


hm ... why though?

Because -j1 is how you tell Ninja you do not want to launch parallel jobs at all. So using it in the current implementation disables both client and pool mode at the same time (it doesn't make sense to use a pool of one job slot).

Similarly, -j0 means "infinite parallelism", so it disables pool mode, but not client mode.

I can change that if you prefer, but I believe these are sane defaults. I clarified this behavior in the documentation though.

But as a client wouldn't couldn't I still be limited by jobserver? I.e. waiting on a token to become available. But I guess that isn't supported yet as nothing will wake up ninjas main loop in that case?

The possible behaviors are the following when a jobserver pool is in place in MAKEFLAGS:

A) -j1 disables parallel tasks and ignores the pool
B) -j2 and above (also -j0) ignore the command-line job count, and use the pool to control parallelism.
C) Without a command-line -j parameter, use the pool.

D) -j1 is ignored, Ninja uses the pool to control parallelism.
E) -j2 and above (also -j0) ignore the pool and use the command-line job count instead.

The current PR implements A) and B) because -j1 is the only way to tell Ninja that we do not want parallelism, and the only reason to do that would be for debugging a build or because the system is very constrained (e.g. not enough RAM). Hence, ignoring it if a jobserver pool is in place seems unhelpful.

The reason B) exists (explicit job counts are ignored when the pool is in place) is because many build scripts will invoke Ninja with an explicit count, oblivious to the fact that a jobserver is in place. Doing this allows everything to work transparently without modifying tons of configuration files or scripts in complex multi-build systems.

In theory, you can setup a pool with a depth of 1, but since each client receives an implicit job slot when it starts, whether -j1 ignores the pool or not results in exactly the same behavior whether A) or D) is implemented.

A) sounds pretty reasonable; but I think B) should generate a warning.

FWIW, there is already an Info() message printed in this case "jobserver detected ...", unless --quiet is specified.

I think I favor C (if I understand it correctly that it means: with -j always ignore the pool).

It's simple. Less code, less documentation.

It's intuitive: When I specify a job count, I would want ninja to respect that and not override the behavior based on an environment variable.

We don't change behavior with an update. ninja 1.12 uses 42 jobs if you pass -j42, ninja 1.13 should too.

We do change the behavior without -j - but that currently already means "auto detect" and I think it's expectable for that auto detect mechanism to improve in an update.

People might currently use -j<some low number> as a workaround to calm ninja because it doesn't respect the jobserver. We shouldn't encourage them to keep a workaround in, so encourage the removal.

src/build.h

src/jobserver-win32.cc

src/jobserver.h

src/ninja.cc

digit-google · 2024-10-23T18:32:56Z

To answer @kepstin's question. Client mode is already setup automatically by inspecting the MAKEFLAGS variable. If it detects that Ninja is called within the context of another jobserver pool, it will automatically use it to control how it dispatches parallel jobs, and will also pass the variable to sub-commands as well to let them participate in the coordination protocol.

On the other hand, starting the pool in Ninja automatically seems risky to me, because it can change much more than Ninja's behavior (all sub-commands as well). For example, there are rare cases (unbalanced builds) where this will result in a slower build overall (e.g. as described here).

That's why I believe the NINJA_JOBSERVER environment variable is a good compromise here. You can set it to 1 and Ninja will automatically start a pool (if not already running in a client context). It is something that also allows this to work without modifying build systems and scripts, or wrappers such a cmake --build or meson compile and many others).

I know @jhasse dislikes environment variables, but I believe that this is one of the rare cases where the benefits outweigh the annoyances.

digit-google · 2024-10-23T18:40:39Z

To answer @jhasse's latest comment now, client mode is already started automatically by looking at MAKEFLAGS, there is nothing to do to use it, and NINJA_JOBSERVER is only here to control when Ninja implements the pool itself.

Adding a special syntax to the Ninja build plan to indicate that specific actions support the protocol is doable, but it will mean the feature won't be usable until all Ninja generators support it, which may be a veeeeery long time. I don't think it's really useful here. That might be useful to explicitly disable it for certain commands (though on Posix one can simply start the command with MAKEFLAGS= ...., Win32 is more complicated).

For Posix, I think it is far preferrable to use --jobserver-mode=pipe as the default at the moment as many multi-build systems run on older distributions that do not have GNU Make 4.4 yet (which is the one which implements fifo mode). Even my Debian 12-based Linux distribution at work is only providing GNU Make 4.3 today.

I imagine we could switch to fifo transparently in a few years when GNU Make 4.4+ is more widespread. But since there is absolutely no performance difference between the two modes, and that our client implementation supports both transparently, when Ninja is not the pool implementation, pipe seems a reasonable choice for now.

eli-schwartz · 2024-10-23T19:03:52Z

That might be useful to explicitly disable it for certain commands (though on Posix one can simply start the command with MAKEFLAGS= ...., Win32 is more complicated).

This is why ninja should support a dedicated syntax to run a build rule with specific environment variables set -- trivial on POSIX, annoying on Win32 so would require invoking processes two different ways depending on whether the build plan has demanded an environment variable, iirc.

Not all ninja generators permit specifying arbitrary shell command syntax for build rules. For precisely the reason of predictable cross platform behavior.

jhasse · 2024-10-24T16:45:36Z

Even my Debian 12-based Linux distribution at work is only providing GNU Make 4.3 today.

That's not because GNU Make 4.4 is brand new (it's 2 years old actually), but because Debian is such a bad distribution. If we wanted to be pragmatic we could have merged the first PR 8 years ago.

digit-google · 2024-10-24T17:39:43Z

That's not because GNU Make 4.4 is brand new (it's 2 years old actually), but because Debian is such a bad distribution. If we wanted to be pragmatic we could have merged the first PR 8 years ago.

Well, just like build systems there are only two types of Linux distributions: those that people complain about, and those that nobody uses ;-)

htot · 2024-10-25T09:02:14Z

People: the world is bigger than just your distributions packages. The demand for jobserver support becomes relevant when you are using a build system that builds many different packages using combinations of make, ninja and possibly other methods, like Yocto's bitbake (if you are unfamiliar with that just imagine the build system that builds all of debian's packages). This is why it is unrealistic to convert everything to ninja, there are 1000's of packages to be built.

Now Yocto installs it's own make and ninja version, master contains make 4.4.1. To get the jobserver working in bitbake I patched it based on a suggestion from the Yocto developers, and forked one of stephan's versions. This has been working for at least a year.

I would expect a similar need for jobserver in buildroot.

The importance is that if you have a bitbake build machine that does m parallel makes/ninja's on n cores (+ ht), you are going to need about m x n x 1GB RAM (per user building world).

nanonyme · 2024-10-25T10:10:14Z

In any case I wouldn't expect stable distributions to pick up 1.12.0 or above as update. The scheduling changes means that some legacy builds break. So this PR is not going to apply to historic but future distro releases. In orher words, parties who get this are highly likely to have new enough make.

ArsenArsen · 2024-10-27T09:35:02Z

For Posix, I think it is far preferrable to use --jobserver-mode=pipe as the default at the moment as many multi-build systems run on older distributions that do not have GNU Make 4.4 yet (which is the one which implements fifo mode). Eveno I'm not convinced. my Debian 12-based Linux distribution at work is only providing GNU Make 4.3 today.

Odds are that time will start passing, even for Debian, enough
to get make 4.4 at around the same time or before the hypothetical Ninja release that'd include this patch, though.

So I'm not sure it's actually advantageous to default to pipe. I'm exporting NINJA_JOBSERVER=fifo for my testing.

digit-google · 2024-11-11T12:18:27Z

@nanonyme: Good points, done!

digit-google · 2024-11-11T14:42:28Z

Regarding the last pushes, I had to modify the parallelism of the jobserver test from 10 to 4 parallel tasks, as it failed on CI builders, which probably run on low-powered VMs.

nanonyme · 2024-11-13T22:11:18Z

For the record, I think even the client-mode alone will already be highly useful for the use case when there is a build orchestrator which runs multiple builds in parallel and limits overall CPU usage by mounting the fifo into multiple isolated build sandboxes. Looking forward to this landing.

chriselrod · 2024-11-14T10:59:47Z

For the record, I think even the client-mode alone will already be highly useful for the use case when there is a build orchestrator which runs multiple builds in parallel and limits overall CPU usage by mounting the fifo into multiple isolated build sandboxes. Looking forward to this landing.

I'm using cmake as my build system for a library with a bunch of feature-level #ifdefs (e.g., different code paths for avx512f, avx2, generic, etc), so I use make to configure and run different builds with the different feature levels, and using gcc vs clang. When using ninja as the build system, I need to make -j1 on low core count systems. I'm looking forward to being able to make -j(nproc).

nanonyme · 2024-11-14T11:35:59Z

For the record, I think even the client-mode alone will already be highly useful for the use case when there is a build orchestrator which runs multiple builds in parallel and limits overall CPU usage by mounting the fifo into multiple isolated build sandboxes. Looking forward to this landing.

I'm using cmake as my build system for a library with a bunch of feature-level #ifdefs (e.g., different code paths for avx512f, avx2, generic, etc), so I use make to configure and run different builds with the different feature levels, and using gcc vs clang. When using ninja as the build system, I need to make -j1 on low core count systems. I'm looking forward to being able to make -j(nproc).

Sure. It just came up here but was not said explicitly that there are users who want to build multiple autotools, cmake and meson projects in parallel through build orchestration tooling. For this to work without overloading system, all make and ninja processes must be in jobserver client mode.

nanonyme · 2024-11-19T17:35:03Z

Sorry for spamming everyone but @jhasse what does 1.13.0 milestone actually mean in terms of expected calendar time? I see there's still some PR's with 1.12.2 milestone targeting master as well.

jhasse · 2024-11-23T19:52:27Z

It means the next major release. You can get an idea of how often these happen by looking at the history of the releases - probably about 1 to 2 years.

1.12.2 won't target master but the commits will be cherry-picked on the release branch.

digit-google · 2025-01-07T14:12:43Z

Hello, I know that @jhasse is preparing the 1.13 release. Can we submit this PR into the repository now?

jhasse · 2025-01-07T21:07:05Z

I think there are a few leftovers for the command line configuration which can be removed. Can you remove those?

This implements a GNU jobserver token pool that will be used for testing the upcoming jobserver Ninja client implementation. Note that the implementation is basic and doesn't try to deal with broken protocol clients (which release more tokens than they acquired). Supporting them would require something vastly more complex that would monitor the state of the pipe/fifo at all times.

digit-google · 2025-01-10T12:21:36Z

I just uploaded a rebased version of the PR, but I fail to see what left-overs you are referring to. Can you add some comments in the code to point at them?

jhasse · 2025-01-10T17:23:20Z

src/jobserver.h

+    /// this also accepts "0" for kModeNone. and "1" for kModeDefault
+    /// which is useful when reading the value from an environment
+    /// variable.
+    static std::pair<bool, Mode> ModeFromString(const std::string& str);


This method for example, I think it's unused besides tests?

Ah, gotcha. Thanks, I removed all of Jobserver::{ModeToString(), ModeFromString(), GetValidModesAsString() and GetNativeValidModesAsString()) from the PR then.

This adds two new classes related to GNU jobserver support and related unit-tests: `Jobserver::Slot` models a single job slot, which includes both the "implicit" slot value assigned to each process spawned by Make (or the top-level pool implementation), as well as "explicit" values that come from the Posix pipe, or Win32 semaphore decrements. `Jobserver::Config` models the Jobserver pool implementation to use based on the value of the `MAKEFLAGS` environment variable.

This adds a new interface class for jobserver clients, providing a way to acquire and release job slots easily. Creating a concrete instance takes a Jobserver::Config as argument, which is used to pick the appropriate implementation and initialize it. This commit includes both Posix and Win32 implementations.

Detect that the environment variable MAKEFLAGS specifies a jobserver pool to use, and automatically use it to control build parallelism when this is the case. This is disabled is `--dry-run` or an explicit `-j<COUNT>` is passed on the command-line. Note that the `-l` option used to limit dispatch based on the overall load factor will still be in effect if used. + Use default member initialization for BuildConfig struct. + Add a new regression test suite that uses the misc/jobserver_pool.py script that was introduced in a previous commit, to verify that everything works properly.

mcprat

I was thinking now would be an appropriate time for some thoughts before this is merged...

mcprat · 2025-01-25T01:11:40Z

src/jobserver-posix.cc

+bool SetCloseOnExecFd(int fd) {
+  int flags = fcntl(fd, F_GETFD, 0);
+  if (!(flags & FD_CLOEXEC)) {
+    int ret = fcntl(fd, F_SETFD, flags | FD_CLOEXEC);
+    if (ret < 0)
+      return false;
+  }
+  return true;
+}
+
+// Duplicate the descriptor and make the result non-blocking and
+// close-on-exec.
+bool DuplicateDescriptor(int from_fd, int* to_fd) {
+  int new_fd = dup(from_fd);
+  if (new_fd < 0) {
+    return false;
+  }
+  if (!SetNonBlockingFd(new_fd) || !SetCloseOnExecFd(new_fd)) {
+    ::close(new_fd);
+    return false;
+  }
+  *to_fd = new_fd;
+  return true;
+}
+


If I understand correctly, this is something that only a server instance needs to do...

No, making the descriptors themselves non-blocking would mean the inherited versions passed to Ninja sub-commands would also be non-blocking, which will break many tools that may want to read them synchronously (e.g. using a thread pool).

Again, I'm not suggesting that duplicating and forcing non-blocking should not be done, it should. I'm saying that it's the server that does or will do this and not the client, so it would be done already and the client would be doing it again. The descriptors on the client side should already be redirecting to duplicates on the server side, if I understand correctly... I might be off though...

in the case of backwards compatibility before the version of make where they ensured that there is no blocking (before version 4.2 if I remember right), the blocking flag should be checked first, but still no duplicating by a client...

I'm afraid I don't understand what you are saying.

Why would clients assume that the server has set the descriptors in blocking or non-blocking mode, when passed without --jobserver-auth=R,W ? That seems an implementation detail that is left out of the specification for the protocol.

mcprat · 2025-01-25T01:28:40Z

src/ninja.cc

-#include "deps_log.h"
 #include "clean.h"
 #include "command_collector.h"
 #include "debug_flags.h"
 #include "depfile_parser.h"
+#include "deps_log.h"


can we keep unrelated changes out? it may affect how easily it can be backported

mcprat · 2025-01-25T01:32:33Z

src/ninja.cc

+#ifndef _WIN32
+    if (jobserver_config.mode == Jobserver::Config::kModePipe) {
+      status->Warning(
+          "Jobserver 'pipe' mode detected, a pool that implements 'fifo' mode "
+          "would be more reliable!");
+    }
+#endif


I can't help but express that this warning is just obnoxious. The cases where fifo is "better" are extremely fringe and nearly arbitrary for the vast majority of usage...

the people screaming about it in the issue page were suffering from a blocked read or otherwise mistaken

I would prefer just printing of the status of what type of jobserver implementation is used as it initialized

Why did Make add the fifo method then?

The warning is obnoxious, because the only way you can get the warning is if software external to Ninja doesn't offer fifo support, perhaps because you installed GNU Make from your distro and installed ninja from the prebuilt third-party binaries available on PyPI.

fifo is more reliable e.g. for cases of nested wrapping when GNU Make doesn't know which process to expose the pipe to, or for badly written Makefiles. It makes it more likely that any subprocesses actually correctly detect the jobserver. But if you're currently using the pipe because that is all your infrastructure supports, you're going to very likely read this as a "smartass lecture", change nothing about your infrastructure, but possibly receive a bad mood from your interaction with the Ninja software.

It doesn't matter to me, personally. If I ever see this warning I will mentally redirect it to /dev/null and give it the passing thought "eh, those ninja developers sure are the world's biggest optimists".

It is performance theatre. I can't even make up my mind whether I care about the topic enough to beg for the warning to be removed. The most frustrating part of this is the fact that reviewing this diff hunk expends mental energy better expended on getting jobserver support integrated.

If I ever see this warning I will mentally redirect it to /dev/null [...]

Hm ... in that case I think I'm more in favor of not adding 'pipe' mode support at all.

I agree with @eli-schwartz here. The warning was only added because @jhasse insisted on it as a condition to keep fifo support (see conversation for full details), but it is 99.5% annoying the user for nearly no practical benefit, so I am in favor of removing it too.

As a reminder, the reason we support both pipe and fifo is because some distros, such as Debian or Ubuntu, still do not ship with GNU Make 4.4 (which is the one introducing fifo mode) and won't until likely the end of the year. And developers prefer to stick to official packages whenever possible.

The difference in complexity between supporting one or two modes is rather limited to a little bit of parsing, and initialization of the pipe descriptors to be used at runtime, and do not affect Ninja much otherwise. I don't see a good reason not to support both.

Why did Make add the fifo method then?

so people don't have to type the + in front of their rules

so multiple top-level executions of make building different things can share the same job slots

I can't think of anything else... and none of that is directly related to "stablity"

so people don't have to type the + in front of their rules

so multiple top-level executions of make building different things can share the same job slots

Sounds "better" to me.

I don't really understand this. You wanted the warning so that people who use ninja and don't have a new enough GNU Make are warned that it would be best if they updated their Make.

Actually I want people to stop using distributions that don't upgrade important packages like make in 2.5 years.
Where is your nagging on the Debian bug tracker? You probably don't do that because the Debian bug tracker is an outdated, ugly, annoying and hard-to-use pile of garbage.

Actually I want people to stop using distributions that don't upgrade important packages like make in 2.5 years. Where is your nagging on the Debian bug tracker? You probably don't do that because the Debian bug tracker is an outdated, ugly, annoying and hard-to-use pile of garbage.

This is a a shortsighted pipe dream. Nearly all current (most recent), let alone supported versions of linux in the enterprise space are on 4.3 or earlier. For example, RHEL 8 is still is on make 4.2. RHEL 9 (current) is on 4.3. 10 hasn't even been released yet so Red Hat Enterprise Linux users de facto don't have make 4.4 yet. make is fundamental piece of any linux distribution and (rightfully!) upgraded later than other system packages for any given distribution as it affects the way the entire distribution is built in most cases.

There is zero chance make 4.4 becomes widely available to the majority of Linux distributions in any time that could be considered "soon". If pipe support is ripped out at this point, you might as well just cancel the whole PR as it's useless without it.

This PR has drug on long enough. This feature has been desired/needed for years and is in a good place, though pipe support is a must have. It should probably just be merged now, and we can iterate it in future PRs if we want to tweak it further.

[...] then won't be able to enable fifo mode until they ensure that all protocol clients (compiler, linker, tools, whatever) also support this mode [...]

I think this will happen later rather than sooner if we make the impression that the pipe mode is an alternative implementation with equivalent support and not a deprecated old method that shouldn't be used.

What do you think about the following changes:

Only mention --jobserver-auth=fifo: in manual.asciidoc

Remove unit tests for pipe mode.

Remove pipe mode from *.py files (that would be a lot more than 50 lines I think).

Print a warning when pipe mode is used.

@jhasse here is your previous comment agreeing that implementing pipe mode is reasonable at least for now (and by not documenting it, we can therefore freely remove it at some point in the future).

Developers of other protocol clients (such as compilers and linkers) certainly need to be informed about this need. End users who use both ninja and a compiler / linker / other tool, need some fallback (deprecated old method) and are unfortunately beholden to "stable release" distros. They cannot directly do anything about the warning but will gladly take the warning if it means they can get such as Make and Ninja to share a common jobserver protocol while waiting for an updated Make to reach them.

I would happily nag the Debian bug tracker (which I don't find to be outdated, ugly, annoying, or hard-to-use, despite that it uses old-fashioned HTML without any interactive javascript) if I thought it would help. They will unfortunately, with tremendous regret, tell me that there is nothing they can do -- the Debian Freeze Policy forbids them from modifying core toolchain packages in Debian Bookworm later than 2023-01-12 (yes I know that is quite early, it is barely two and a half months after GNU Make 4.4 was released with the first ever implementation of the fifo mode) and they are as powerless as I am to do anything about it.

The status quo of three months ago was that everyone agreed that merging pipe support with a warning nag was an acceptable compromise that unblocked people's ability to utilize jobservers in production while satisfying the Ninja developers' desire to avoid being locked into supporting pipe support forever. It would be fantastic if we could all recognize that this is a solved topic and simply move on from here without relitigating.

Actually I want people to stop using distributions that don't upgrade important packages like make in 2.5 years. Where is your nagging on the Debian bug tracker? You probably don't do that because the Debian bug tracker is an outdated, ugly, annoying and hard-to-use pile of garbage.

Better to focus your energy on things within your span of control.

This is a a shortsighted pipe dream. Nearly all current (most recent), let alone supported versions of linux in the enterprise space are on 4.3 or earlier. For example, RHEL 8 is still is on make 4.2. RHEL 9 (current) is on 4.3. 10 hasn't even been released yet so Red Hat Enterprise Linux users de facto don't have make 4.4 yet. make is fundamental piece of any linux distribution and (rightfully!) upgraded later than other system packages for any given distribution as it affects the way the entire distribution is built in most cases.

Yocto 5.0 (LTS released 2024.04) has make 4.4.1. The question is why are they still at ninja 1.11.1.

There is zero chance make 4.4 becomes widely available to the majority of Linux distributions in any time that could be considered "soon". If pipe support is ripped out at this point, you might as well just cancel the whole PR as it's useless without it.

There are a lot IIoT distributions generated by Yocto, each specialized for a certain hardware. So, here you are probably referring to desktop distributions, or tier 1 distributions. I don't want to sound like a smart ass, but the applications of make and and ninja have a way lager scope than you might realize at first sight.
And on top of that, the application of a jobserver is most important exactly to distributions as they will want to build packages built by make and by ninja mixed, with the total number of compile jobs coordinated. So typically Yocto (and Buildroot) users might benefit from this first.

This PR has drug on long enough. This feature has been desired/needed for years and is in a good place, though pipe support is a must have. It should probably just be merged now, and we can iterate it in future PRs if we want to tweak it further.

Indeed, I am using a fork of ninja 1.12 + patches for almost 2 years in my Yocto setup. I think it would be best to move forward instead of setting up more road blocks.

jhasse mentioned this pull request Oct 1, 2024

Add GNU make jobserver client support #1139

Open

kalvdans reviewed Oct 4, 2024

View reviewed changes

misc/jobserver_pool.py Outdated Show resolved Hide resolved

misc/jobserver_pool.py Show resolved Hide resolved

src/jobserver.cc Outdated Show resolved Hide resolved

src/jobserver.h Outdated Show resolved Hide resolved

digit-google force-pushed the jobserver branch from 2e12658 to d80c054 Compare October 6, 2024 17:14

dhewg mentioned this pull request Oct 12, 2024

tools/ninja: update to 1.12.1 openwrt/openwrt#16424

Closed

sw reviewed Oct 12, 2024

View reviewed changes

src/jobserver.cc Outdated Show resolved Hide resolved

misc/jobserver_pool.py Outdated Show resolved Hide resolved

src/jobserver-posix.cc Outdated Show resolved Hide resolved

digit-google force-pushed the jobserver branch from d80c054 to be0ea05 Compare October 13, 2024 13:48

digit-google force-pushed the jobserver branch 2 times, most recently from 4c73cd6 to 146c55d Compare October 18, 2024 22:20

jhasse reviewed Oct 21, 2024

View reviewed changes

digit-google force-pushed the jobserver branch 2 times, most recently from d837d97 to be139bc Compare October 23, 2024 18:14

jhasse mentioned this pull request Oct 24, 2024

Add GNU make jobserver client support #1140

Closed

digit-google changed the title ~~Implement GNU Make jobserver protocol support in Ninja (client + server modes)~~ Implement GNU Make jobserver client protocol support in Ninja Nov 11, 2024

digit-google force-pushed the jobserver branch 3 times, most recently from 981bff4 to 3c347f5 Compare November 11, 2024 14:26

nanonyme mentioned this pull request Nov 13, 2024

feature: Support make jobserver protocol apache/buildstream#1968

Open

MuellerSeb mentioned this pull request Nov 14, 2024

Ninja 1.12.0 scikit-build/ninja-python-distributions#245

Open

jhasse added this to the 1.13.0 milestone Nov 15, 2024

orlitzky mentioned this pull request Nov 24, 2024

ninja uses more jobs than specified by makeflags sagemath/sage#38950

Open

digit-google force-pushed the jobserver branch 2 times, most recently from 6e1b038 to d18af81 Compare November 26, 2024 12:28

bonzini mentioned this pull request Dec 6, 2024

Ninja: default pool depth for link.exe mesonbuild/meson#13937

Open

orlitzky mentioned this pull request Dec 20, 2024

Feature request: GNU Make jobserver support rui314/mold#1381

Closed

digit-google force-pushed the jobserver branch from d18af81 to f32ae9c Compare January 10, 2025 12:20

jhasse reviewed Jan 10, 2025

View reviewed changes

digit-google added 3 commits January 13, 2025 11:06

digit-google force-pushed the jobserver branch from f32ae9c to ffac509 Compare January 13, 2025 10:07

mcprat reviewed Jan 25, 2025

View reviewed changes


		- Dry-run (i.e. `-n` or `--dry-run`) is not enabled.

		- `-j1` (no parallelism) is not used on the command line.

Implement GNU Make jobserver client protocol support in Ninja #2506

Are you sure you want to change the base?

Implement GNU Make jobserver client protocol support in Ninja #2506

Conversation

digit-google commented Oct 1, 2024 • edited Loading

kalvdans left a comment

Choose a reason for hiding this comment

Neustradamus commented Oct 12, 2024

nanonyme commented Oct 15, 2024

nekopsykose commented Oct 15, 2024

kaspar030 commented Oct 17, 2024

sw commented Oct 17, 2024 • edited Loading

robUx4 commented Oct 18, 2024

nanonyme commented Oct 18, 2024

digit-google commented Oct 18, 2024

kepstin commented Oct 21, 2024

jhasse commented Oct 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

digit-google commented Oct 23, 2024

digit-google commented Oct 23, 2024

eli-schwartz commented Oct 23, 2024 • edited Loading

jhasse commented Oct 24, 2024

digit-google commented Oct 24, 2024

htot commented Oct 25, 2024 • edited Loading

nanonyme commented Oct 25, 2024 • edited Loading

ArsenArsen commented Oct 27, 2024

digit-google commented Nov 11, 2024

digit-google commented Nov 11, 2024

nanonyme commented Nov 13, 2024

chriselrod commented Nov 14, 2024 • edited Loading

nanonyme commented Nov 14, 2024 • edited Loading

nanonyme commented Nov 19, 2024

jhasse commented Nov 23, 2024 • edited Loading

digit-google commented Jan 7, 2025

jhasse commented Jan 7, 2025

digit-google commented Jan 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcprat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eli-schwartz Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

digit-google commented Oct 1, 2024 •

edited

Loading

sw commented Oct 17, 2024 •

edited

Loading

eli-schwartz commented Oct 23, 2024 •

edited

Loading

htot commented Oct 25, 2024 •

edited

Loading

nanonyme commented Oct 25, 2024 •

edited

Loading

chriselrod commented Nov 14, 2024 •

edited

Loading

nanonyme commented Nov 14, 2024 •

edited

Loading

jhasse commented Nov 23, 2024 •

edited

Loading

eli-schwartz Jan 27, 2025 •

edited

Loading