-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GNU make jobserver client support #1139
Comments
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
I have tested this implementation over the last few weeks in two different recursive GNU make based build systems that originally had M+1 GNU make instances:
FYI: google/kati was used to convert existing single makefile GNU make parts to Ninja build file. |
Thanks for the patch! We've discussed this on the mailing list a few times (e.g. here https://groups.google.com/forum/#!searchin/ninja-build/jobserver/ninja-build/PUlsr7-jpI0/Ga19TOg1c14J). Ninja works best if it knows about the whole build. Now that kati exists, one can convert those to ninja files and munge them up to have a single build manifest (that's Android's transition strategy from Make to Ninja -- they use kati to get everything converted to Ninja files, and then they're incrementally converting directories to use something-not-make -- and then kati produces parts of their Ninja files and the new thing produces parts of the ninja files.) Is your use case that you have recursive makefiles? |
I could have guessed that this has been discussed before, because I'm surely not the first person facing such a situation. Here are my reasons for requesting this:
IMHO my patch provides a good solution, considering
|
wow +1 |
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
Another possible reason for having jobserver in ninja seems to be LTO support in gcc. -flto=jobserver tells gcc to use GNU make's job server mode to determine the number of parallel jobs. The alternative is to spawn a fixed number of jobs with e.g., -flto=16. |
I would like too have this feature merged, i simply cannot convert all projects to ninja-build because i'm not allowed to do that. @stefanb2 Thanks a lot for your work |
Can I just add my voice to the list of people who would like this to be merged? At my company we also use a nested build system, and with this patch it makes ninja behave very nicely indeed. We're not in the position to make ninja build everything yet. |
Please note that from a quick glance at the commit on @stefanb2's branch, I expect it doesn't work on Windows, where Make uses a different setup. |
@glandium correct, in the Windows build a no-op token pool implementation is included. But I fail to see why this would be a relevant reason for rejecting this pull request. That said, I'm pretty sure that it would be possible to provide an update that implements the token protocol used by Windows GNU make 4.x. Probably |
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
This would be really useful too when invoking ninja as part of another build tool, such as cargo. |
This should be very useful for super-project build, in our large code base, due to different compiler/environment config, we can not include all projects in one single ninja build, so we have 1 top-level and N sub-projects built by ninja , this config trigger Y*N problem. |
+1 - this is highly interesting for parallel builds with Note that in the catkin_tools scenario, it is not easy to merge the individual build.ninja files into a hierarchy of subninja files, because
|
@nico I would like to add my voice to having support for GNu make job-server support in ninja. Meta-buildsystems like OpenEmbedded (Yocto), OpenWRT, Buildroot and a lot of others, Such build systems will typically have this sequence per package they build:
And they will repeat that sequence for each and all packages that are needed to build the
Once all packages have been built and installed in the staging location, a system image Now, that was the quick overview. Since a system can be made of a lot of packages, we want to build as many packages in So, if we have a 8-core machine, we would want to build up to 8 jobs in parallel, which means For example, if 8 ninja-based packages are built in parallel and they do not share a job-server, And as has been already explained in previous posts in this thread, not every package is based Thanks for reading so far! :-) |
This reverts commit 0e6689d. Parallel builds are broken due to a mix of Make/Ninja and the job server not being operational. See ninja-build/ninja#1139 Signed-off-by: Anas Nashif <anas.nashif@intel.com>
+1. We also face this issue of Y*N ninjas while using CMake |
This reverts commit 0e6689d. Parallel builds are broken due to a mix of Make/Ninja and the job server not being operational. See ninja-build/ninja#1139 Signed-off-by: Anas Nashif <anas.nashif@intel.com>
You can say the same thing about all the existing functionality ninja has. My opinion is that it isn't fair or reasonable to ask this PR to be a special exception, but it would be fair iff someone wrote an end-to-end testing suite, then asked the jobserver PR to include jobserver coverage in it. |
Frankly, that PR would be fine to me, even without a full regression test suite, if it didn't spread tricky signal-handling code in what looks like unrelated parts of the source tree. This is a hackish design that is bound to be a maintenance nightmare for anyone that accepts that in their git repository. I assume that's why @jhasse, who has very very limited bandwidth to maintain Ninja, has not felt confident in accepting it. And for full disclosure, I am not an official Ninja maintainer in any way, but I maintain my own Ninja fork for the Fuchsia project in order to support a number of important additional features. While I do plan to implement jobserver support there to, this will not be based on this PR for exactly this reason. |
The short answer: I'm not waiting for anything. The long answer: This contribution is a side product of the migration of the internal code base at my former workplace to Android N. Android N build system introduced the kati-ninja-combo, which had severe negative impacts on build performance. These were not acceptable for the company, so I looked into adding jobserver client support to ninja. This turned out to be rather simple and the build performance problem was solved. As the resulting changes were already paid for, I requested for permission to contribute them upstream. IMHO there is nothing for me to do. Either
|
Kitware (CMake's authors) also maintain https://github.com/Kitware/ninja which is a fork/build with this PR, and the ninja you can install from PyPI https://pypi.org/project/ninja/, is actually this fork. |
Ditto. We have been using our fork as both (1) a staging area for features in review and (2) the version built and distributed1 on PyPI. For context, the distribution of both Footnotes |
Ninja has a PR for adding make jobserver support [1] that has been a widely debated PR for many... many years. Given that many people have forked to incorporate this PR, and it claims to solve a problem we have (OOM on gcc processes) it seems like it would be worthwhile using a well maintained fork instead of the main project. This is not a one way door. If we find that the project goes unmaintained, doesn't build, or otherwise has problems, we can always go back to using mainline. Of the forks that have pulled this in, there are: The Fuscia project [2] Their targets seem more specific and less generic, although their improvements seem more extensive. Kitware [3] Maintains a fork of ninja Docker [4] [1] ninja-build/ninja#1139 [2] https://fuchsia.googlesource.com/third_party/github.com/ninja-build/ninja/+/refs/heads/main/README.fuchsia [3] https://github.com/Kitware/ninja [4] https://github.com/dockbuild/ninja-jobserver ''' EXTRA_OEMESON_COMPILE:append = " \ --ninja-args='--tokenpool-master=fifo' \ " PARALLEL_MAKE = "-j 20" BB_NUMBER_THREADS = "20" ''' Signed-off-by: Ed Tanous <ed@tanous.net>
What's the current status on this? I'm interested in it from the meta build system perspective, where many different projects written in different languages and using different build systems are all compiled in a coordinated manner. Without make jobserver client support in ninja, meta build systems are forced to make one of the following terrible trade-offs:
If Ninja and other build systems supported the jobserver protocol, there would be another option:
To my knowledge, Ninja is the only real hold-out to make this a practical possibility. GNU Make and Rust's Cargo already support being jobserver clients. |
The current status is that after @stefanb2's PR died the death of eternally pending review, @hundeboll reimplemented it two weeks ago in #2450 and it has been approved and scheduled for inclusion in ninja 1.13.0 (but the merge button hasn't been hit). No jobserver master support, only client support, but this is probably not a worry for you. It would be nice if the new PR had linked to the issue as well but it is what it is. |
Hi all, I opened PR #2474 as a heavy rework of #2450 by @hundeboll, making us co-authors. This is an attempt to expedite the process of achieving a simple, efficient, style-compliant, specification-precise, and therefore acceptable implementation of a client for the GNU jobserver in order to finally have solid upstream support for it. In the PR, support for Windows is excluded, and preprocessor macros are in place to handle that exclusion and prepare spots where there would be Windows-specific declarations if or when Windows support is added later. There are multiple reasons for this, including my lack of being able to build and test on Windows, and having a shorter PR that's easier to review. I'm also aware that unit tests are currently missing at the moment. Also, my version of the implementation supports older versions of Make like 4.3 and 4.2, including the simple (anonymous) pipe method. After reading through this entire thread I have noticed a lot of confusion regarding that original method, including some false claims. However, I do understand that there was a reason to add the fifo method, even though it may be a minor or insignificant difference for most use cases. The code that allows support for previous versions is a very small diff compared to a version with only support for the fifo method, and I made sure to write the client so that parsing the newer method is attempted first. I have tested mostly my own use case which does not test environment/FD propagation to a subprocess of a subprocess, but I did at least one test that did so with a Makefile that calls ninja to a build.ninja that calls ninja again to run a different build.ninja. There are many other small changes that I outlined in the first post of the PR that deserve attention, so adding support for older versions of Make is NOT the only goal here. I have spent the last week or so self-reviewing and fixing minor issues found by the CI, so now would Thanks in advance |
I just uploaded PR #2506 as a complete rework of previous PRs. This was motivated by the fact that the latest version of #2474 degraded performance for my own builds significantly ( With my version, some of my builds go from 5m10s to 4m40s, and some others from 22m49s to 11m56s. My PR includes unit tests and even a small regression test suite. However, I think it would be important to get a reference real-world multi-build that could be used to verify that whatever implementation ends up in the Ninja source tree works correctly, while still being buildable in only a few minutes on a low-end or moderate developer machine (let's go for 8 cores and 16 GiB or RAM). Any suggestions for such a multi-build? Any recursive Ninja invocations (either from Meson or CMake) would fit the build I think. Or maybe something that uses Make at the top-level, but then builds multiple things with Ninja. |
so this "degraded performance"... is it just that I came up with a scaling method for capacity instead of "infinite"? then the performance would be the same if I put it back to infinite perhaps? maybe I should have mentioned my system has only 4 cores the jobserver protocol is supposed to be simple but what I'm seeing instead is as verbose and unreadable as you can possibly make it, pulling out all the tricks you have ever learned, for what benefit honestly? don't get me wrong, I'm sure it works great, it just looks vastly over-engineered at first glance |
This project is a collection of inner projects driven by an outer project: https://gitlab.kitware.com/paraview/common-superbuild You'll want to use the |
That's what I tried initially with your PR, but the resulting binary didn't provide any improvements to my build, except very reduced volatility. Here are the corresponding raw numbers I had at the time for a tiny Fuchsia build, in case you are interested:
But note that with your latest commit, things are far worse, unfortunately. You didn't even explain why you decided to change the capacity logic, nor provide any meaningful numbers for its usefulness. And again, you didn't provide any tests verifying the basic correctness of your code. I know you have worked considerably on this, and addressed a lot of review comments, but you didn't address all the issues/ The sad truth is that this was still not good enough, even performance wise, at least for my use case. Hence the motivation to start something else.
That's just your opinion. At this point, I am more interested in finding a common ground that we can all use to verify that whatever lands into the final Ninja tree works correctly for everyone. Because PR#2506 works for me doesn't mean it would work for you or other people, hence why it is so important to get good common regression tests. here. And fix whatever needs to be fixed otherwise. |
Ah ok... so there is much more going on here...
I was following the initial complaint or concern about the value being "infinite", so I tried to determine a working finite value, or at least I thought it was working, that coincides with the whole reason that the "capacity" value was set up in the first place in 8a2575e, in order to gently reach maximum parallelism without an instantaneous load which can shock the system. What you're calling "volatility" in this exact case may be on purpose... or am I misunderstanding something? The value I put is supposed to represent a capacity of So do you know which optimization is improving the performance and where? I would assume it has something to do with how the "acquire" and "release" functions handle the data. I would like to learn something and be corrected and informed instead of just bypassed. Perhaps extreme optimizations are necessary in one or a few spots, but it would be nice if we can determine where it actually matters so other spots in the code can return to simplicity. I understand that in the world of optimization, speed requires a sacrifice of otherwise saved code size, but it just seems a bit overboard.
would be nice if someone could have let me know
I was waiting for a sign that anything I was doing had even a chance of being accepted, something like "ok, everything looks good, now we need ____", before I spend significantly more time on the finishing requirements other than what downstream projects need, including documentation. The age of this thread was kept in mind...
without understanding the difference or having explanations to consider, opinions are all that remains
as was I, and still am. looking forward to trying it and working with you despite mixed feelings so far |
After experimenting today, I have good and bad news :-) The good news is that thanks to @mathstuf , I have been able to experiment with https://gitlab.kitware.com/paraview/common-superbuild which shows a deterministic though modest improvement when Here are my reproductions steps, where I am using the # Clone superbuild repository.
git clone https://gitlab.kitware.com/paraview/common-superbuild.git
# Configure a self-test build with a few projects with CMake in the `_build` directory.
cd common-superbuild/selftest
cmake -B _build -S. \
-DENABLE_zlib=1 \
-DENABLE_zstd=1 \
-DENABLE_szip=1 \
-DENABLE_snappy=1 \
-DENABLE_ninja=1 \
-DENABLE_jsoncpp=1 \
-DENABLE_freetype=1 \
-DENABLE_imath=1 \
-DENABLE_bzip2=1 \
-DENABLE_lz4=1 \
-GNinja
# Run Ninja to download and extract all project sources (this can be *long* and not parallel)
/usr/bin/time ninja -C _build download-all
# There is no configure-all target, so let's do it manually.
# This is also *long* and not parallel.
# NOTE: Adding certain projects (e.g. `llvm` or `meson`) requires building + installing
# python3, as well as all its dependencies, during their configure step. This skews benchmarked
# numbers tremendously, so avoid them if possible, or just disable the following two
# lines (benchmarks will include the times for the long on-parallel `configure` steps though :-()
CONFIGURE_TARGETS=$(ninja -C _build -t inputs all | grep -e '-configure$')
/usr/bin/time ninja -C _build ${CONFIGURE_TARGETS}
# Rename _build to _build0, so we can reuse its content in multiple builds.
# Unfortunately, `ninja -C _build clean` or `ninja -C _build -t clean` do not clean properly.
mv _build _build0
# Now benchmark Ninja without and with --jobserver
# The preparation step restores the _build directory to its pristine configured state.
hyperfine --prepare "rm -rf _build && cp -rp _build0 _build" \
"ninja -C _build" \
"ninja -C _build --jobserver" At first, naively running this on my powerful work machine doesn't show any significant difference:
But this is not surprising. This build is simply not very parallel and can use all the CPU cores it needs on this machine, with or without a jobserver. Hence the same overall build times. The interesting part is using Linux cgroups to restrict the number of CPU cores available to the commands, and here we say a difference. I am on a Debian system and using When using 8 cores only (e.g. When using 4 cores only (e.g. Here are the benchmark results for the 8-core case:
And for the 4-core case:
That's for the good news |
Notice however that the volatility of the jobserver builds in the previous section is high, but I can reproduce the same mean over several hyperfine runs locally :-/ Now, I also wanted to get a better understanding of what's going on, and also try to get something that is more parallel, so I ended up creating my own tiny super-build system. Its main benefit is the ability to generate a You can find it at https://github.com/digit-google/ninja-recursive-build, the Now for the bad news: I can see a jobserver-related reproducible performance regression with this new build system. Similarly to previously, running with all cores shows no significant difference (see below).
|
While those numbers are interesting, IMHO your test case does not really represent why people need jobserver support in ninja. There are multi-level build systems out there that blindly translate "component sub-build invocation" into a new independent parallel build, e.g. The final goal is that the top-level build tool provides a job server with T tokens and any build tool instance, including the top-level instance, adhere to the limit, i.e. in total there are never more than T build steps running in parallel (not counting the build tool instances itself, because each of them should be waiting for their children to complete). |
I think I have an explanation for this phenomenom though. As far as I know, none of the proposed implementations changes the fact that Ninja only decides to launch new commands when at least one of them has completed. In other words, if a job slot becomes available in the pool, Ninja will not see it until the end of one of its current running subprocesses, even if it has capacity to launch more of them. (This is not how GNU Make works by the way, it is capable of waiting for both process completion and new token in the FIFO at the same time, at least on Posix, not sure about Windows). Here's an example to illustrate this. Let's imagine a top-level build that invokes two sub-builds, each one of them has two parallel tasks, but one of them has short ones, and the other has long ones. If there are enough cores for everyone, everything can be dispatched in parallel, which would look like:
Which can be summarized as:
Now let's imagine that there are only 2 cores available on the machine this runs on, without a jobserver, 4 processes are started at the same time, and the OS will control parallelism by prempting them as it see fit. Assuming these are purely CPU-bound tasks, the most optimal scheduling would end up with a build time that is similar to this timeline:
Now, if a jobserver is used to synchronize the parallelism, it is possible to end up in a situation where each sub-build only gets one token initially:
The first sub-build completes, and reuses its token to finish its second task:
At the end of
And a long build time is the result, due to poor scheduling.
This would end up with a better build time, even though not the optimal one that can be achieved without a jobserver. |
Some feedback, since I could roll this feature in the Ninja binary used by the Fuchsia build, and changed our builder configurations to use it. The Fuchsia CI bots build around 90 different build configurations of the project, only a few of them actually launch Ninja sub-builds from the top-level one (to generate SDKs that require binaries from different CPU architectures and API levels, then merging the results into a final SDK archive, in case you're curious :-)) I could enable the jobserver feature for all build configurations and perform clean build time comparisons:
These confirm the savings I could observe locally on my own workstation (these are builds running on CI bots of various CPU/RAM dimensions, with remote builds enabled to boot, so we run everything with The jobserver feature is now enabled by default on all Fuchsia SDK build configurations. I addressed the few review comments on the PR #2506 that were sent, and would welcome any more feedback. I would be grateful if anyone could use the latest binaries generated by the Github CI and see if this improves their own multi-build times, or report issues they are facing. Regarding the sub-optimal scheduling described in my previous comment. Solving this will require changing the way |
I've been using it for the last week or so. The only quirk was the interaction between
Yeah, I'd hate to see it delayed further, especially given nobody so far (hopefully more can test) have raised issues with the changes. It also means that we're not fighting for interest/attention on a topic that deserves its own space, I think? |
We don't use the '+' in front because it doesn't use the jobserver from our make call, because of ninja: ninja-build/ninja#1139 (cherry picked from commit 0b2008e) (edited) edited: - DCMAKE_VERBOSE_MAKEFILE is placed differently on 3.0 - 3.0 doesn't build libplacebo with meson - 3.0 doesn't have basu - 3.0 doesn't have librist - 3.0 doesn't have medialibrary
As long as ninja is the only build execution tool, the current
ninja -jN
implementation works fine.But when you try to convert parts of an existing recursive GNU make based SW build system to ninja, then you have the following situation:
Simply calling `ninja -jY' isn't enough, because then the ninja instances will try to run Y*N jobs, plus the X jobs from the GNU make instances, causing the build host to overload. Relying on -lZ to fix this issue is sub-optimal, because load average is sometimes too slow to reflect the actual situation on the build host.
It would be nice if GNU make jobserver client support could be added to Ninja. Then the N ninja instances would cooperate with the M GNU make instances and on the build host only X jobs would be executed at one time.
The text was updated successfully, but these errors were encountered: