-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ideas for a gnumake jobserver version 2 #1
Comments
workaround: add proxy jobservers
child_2 aquires tokens through child_1 from make
we see the "actual fd pipes" in
|
see also https://github.com/ocaml/opam/wiki/Spec-for-GNU-make-jobserver-support
i still believe its more elegant to solve this with a central TCP tokenpool server, as TCP works on all systems, and we dont need "a million fds" ("a pair of fds is required for each process") |
simple solution: |
long discussion of problems with the jobserver protocol |
We created our own build jobserver implementation at my day job, using Unix datagram sockets. Unfortunately Windows doesn't support those - that doesn't affect us (we don't compile on Windows), but it would be nice if the mechanism worked the same way on all platforms. Unfortunately using TCP connections is more complexity and overhead. And we didn't do it as just a "token-grabbing" mechanism. Instead, the client requests N slots, and the server responds with how many it can actually use. And the client's request has more info than just the N - it also specifies the type of job it is and the target name, so the jobserver can make better scheduling decisions, because in practice not all jobs are equal. And we specified how OOM works as well, so we can kill+delay jobs during OOM state without failing the overall build. I think that part is key, because memory is just as much a limited resource as CPU or I/O. BTW, we do also scan for dead processes that didn't release their jobs so we can reclaim their allocated resources, but to be honest I'm not sure it really matters... if a job failed in such a way that it didn't return the resources, then the overall build will fail anyway. |
in my case (chromium, qtwebengine) the main build process (ninja) hangs when workers crash ninja tolerates when workers crash, and fails later when output files are missing. hence ...
|
see also NixOS/nixpkgs#143820 |
gnumake jobserver version 1
https://www.gnu.org/software/make/manual/html_node/Job-Slots.html
https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html
https://www.gnu.org/software/make/manual/html_node/Windows-Jobserver.html
jobserver + jobclient implementations
gnumake
https://github.com/stefanb2/ninja/tree/topic-tokenpool-master
jobclient implementations
https://github.com/olsner/jobclient
https://github.com/milahu/gnumake-jobclient-js
https://github.com/milahu/gnumake-jobclient-py
history
the jobserver feature was added to gnumake in year 1999 in this commit
bug: tokens are lost when workers crash
when workers crash without proper cleanup code
they will not release their tokens
and other jobs can hang forever, waiting for tokens
writing reliable cleanup code is hard
its easier to write a jobserver with active monitoring of child processes
when a worker process crashes without releasing its tokens
the jobserver assumes the tokens are free
limitation: child processes must inherit file descriptors
for example this fails with python's subprocess.Popen
which by default does not inherit fds
the only requirement should be:
inherit environment variables of the parent process
(env vars like MAKEFLAGS)
other job servers
aka job queues
https://github.com/gearman/gearmand
https://github.com/celery/celery
https://github.com/beanstalkd/beanstalkd
see also:
message passing
aka message queues
https://stackoverflow.com/questions/731233/activemq-or-rabbitmq-or-zeromq-or
prototype implementation
pymake: python implementation of gnu make
http://benjamin.smedbergs.us/pymake/
https://github.com/securesystemslab/pkru-safe-mozjs/tree/main/mozjs/build/pymake
https://github.com/linuxlizard/pymake
Parse GNU Makefiles with Python
psutil: get process tree
https://psutil.readthedocs.io/en/latest/#processes
https://github.com/milahu/nix-build-profiler (example use of psutil)
requirements
work on windows → no named pipes.
must support TCP (or a more abstract message-passing system like rabbitmq)
authentication. the jobserver can require workers to auth with a session cookie.
the session cookie is passed via env-var to the workers
sandboxing. work in limited environments, for example in a
nix-build
sandboxactive monitoring of the process tree.
libraries like psutil allow this at almost-zero cost
needed to monitor "bad" clients:
crashed clients who dont release their tokens
clients who ignore the jobserver and produce 100% cpu load
targets
build systems:
gnumake, cmake, ninja, cargo, meson, bazel ...
related: Build Systems à la Carte
The text was updated successfully, but these errors were encountered: