Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel make doesn't work on a clean checkout #935

Closed
osresearch opened this issue Dec 16, 2020 · 8 comments · May be fixed by #984
Closed

Parallel make doesn't work on a clean checkout #935

osresearch opened this issue Dec 16, 2020 · 8 comments · May be fixed by #984

Comments

@osresearch
Copy link
Collaborator

There are some components that do not correctly depend on the cross compiler, so a make for the clean checkout fails. Easy fixes are the various tools in $(COREBOOT_UTIL_DIR) like cbmem, superiotool and inteltool, which need to wait until the cross compiler is available.

@bwachter
Copy link

bwachter commented Jan 4, 2021

I'm currently running into this trying to build one of the x230 pull requests. -j isn't propagated to the subdirectories, and setting MAKE_JOBS breaks bootstrapping of the toolchain. Without anything set everything runs as -j1, though, which takes ages.

At least some of that seems to have been introduced in the past 6 months - I remember having some initial issues to make it use parallel builds back in summer, but once I had that working it took about 5 minutes for a full build (including bootstrapping the tool chain). So far I've been sitting on this now for about 1 hour, and still didn't get past the tool chain bootstrapping.

@tlaurion
Copy link
Collaborator

tlaurion commented Jan 5, 2021

@bwachter documentation has been updated here, rendered here, to state that CPUS=YY should be specified on the make BOARD=XXX Heads board build statement.

I'm still a bit confused here, since the main Makefile of Heads is taking the ouptut of nproc if not specified and populates CPUS variable which is then passed along if not defined on make initial call.

That hack was implemented, because some modules won't like to have -j forced, where others play along. The idea here was to pass CPUS down in other makefiles (modules/*) where they play fair, and go single threaded for when modules don't play well.

@osresearch?

@osresearch
Copy link
Collaborator Author

The issue with CPUS=xxx on the make command line is that it does not spawn multiple top-level jobs. So while the individual components might be built in parallel, there is no parallelism across the modules. On a rebuild after a make real.clean (so the cross compilers are intact) only one module is built at a time. On my build machine this makes the difference between a 90 second rebuild and a much longer process process.

make real.clean && time make V=1 CPUS=128
...
real	30m2.657s
user	25m18.422s
sys	7m0.333s

versus

make real.clean && time make V=1 -j128
...
real	1m27.378s
user	30m26.307s
sys	6m12.245s

@tlaurion
Copy link
Collaborator

@osresearch The problem with -j24 is that there is no locking between interdependent tasks.

Here, I launch the same build with CPUS=24(functional) and -j24(failing badly with files missing) over CircleCI.

The line rm -rf build/x230-hotp-maximized/* build/log/* && make -j24 V=1 BOARD=x230-hotp-maximized || touch /tmp/failed_build in CircleCI permits all logs created to be outputed in next task on CI, which otherwise would not be readable since -j24 ruins the output.

So the next task picks up created logs and outputs them on CI with delimitators: if [[ -f /tmp/failed_build ]]; then find ./build/ -name "*.log" -type f -mmin -1|while read log; do echo ""; echo '==>' "$log" '<=='; echo ""; cat $log;done; exit 1;else echo "Not failing. Continuing..."; fi

The resulting logs are concatenated here with ==> and <== separators.

@tlaurion
Copy link
Collaborator

@osresearch : i'm redoing a build without decompressing your cache file, which failed on CircleCI for your latest commit, since host binaries were not found on fed config.status (sed, gawk and other host binaries were not found on provided paths).

A CI build is happening here

@tlaurion
Copy link
Collaborator

tlaurion commented Nov 30, 2021

@osresearch seems like the missing culprit was in commenting MAKE_JOBS under global Mafefile on PR #984 (and deleting the configure cache which doesn't work under CircleCI).

Then a weird race condition happens only under newt module build, which can be hacked by forcing that module to be built only with one job. Will do a replacement PR once working around new CircleCI limitations pass.

@tlaurion
Copy link
Collaborator

tlaurion commented Dec 1, 2021

@osresearch @bwachter : I was successful into fixing partly this issue in that commit: 5e4309c

Where MAKE_JOBS was commented out into the main Makefile.
Improvements welcome.

tlaurion added a commit to tlaurion/heads that referenced this issue Dec 1, 2021
…impact on the now working build system prior of mergin linuxboot#1035 for tracibility of linuxboot#935 and linuxboot#984
@tlaurion
Copy link
Collaborator

tlaurion commented Feb 5, 2022

Tag me to reopen. CircleCI builds in parallel with 36 cores now under 45 minutes on clean checkout.

@tlaurion tlaurion closed this as completed Feb 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants