Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel: the captured cmake dependency does not build efficiently #65894

Closed
knz opened this issue May 31, 2021 · 11 comments · Fixed by #71735
Closed

bazel: the captured cmake dependency does not build efficiently #65894

knz opened this issue May 31, 2021 · 11 comments · Fixed by #71735
Labels
A-build-system C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-productivity Severe issues that impede the productivity of CockroachDB developers. T-dev-inf

Comments

@knz
Copy link
Contributor

knz commented May 31, 2021

Describe the problem

When running bazel for the first time after a c++ compiler update, or a version change etc, it needs to re-build its embedded (captured) copy of cmake again.

However, the build process for cmake is veeeerrrryyyyyyy looooooonnnnnngggggg
(upwards of 10 minutes)

The main problem is that the build flags do not include -j or something similar to parallelize the build, so we're seeing hundreds of c++ files compiled one after another.

To Reproduce

Build from scratch without a cached pre-built cmake.

Expected behavior

The cmake builds uses all 32 cores on my computer.

Additional data / screenshots

htop output:
image

Epic CRDB-8036

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-productivity Severe issues that impede the productivity of CockroachDB developers. labels May 31, 2021
@rickystewart
Copy link
Collaborator

The behavior is actually "correct" inasmuch as we should only be using multiple CPU's if we can have Bazel parallelize the workload, which it can't meaningfully if sub-tasks are arbitrarily using as many cores as they want under the scenes.

In terms of making this faster I'm more interested in patching rules_foreign_cc to download a pre-bootstrapped cmake from an appropriate place rather than building from scratch.

@knz
Copy link
Contributor Author

knz commented Jun 2, 2021

it can't meaningfully if sub-tasks are arbitrarily using as many cores as they want under the scenes.

Ok but how do you explain that the cmake build inside the bazel sandbox does not parallelize, whereas it does when run outside of the sandbox?

I'm more interested in patching rules_foreign_cc to download a pre-bootstrapped cmake from an appropriate place rather than building from scratch.

That would be lovely.

@rickystewart
Copy link
Collaborator

it can't meaningfully if sub-tasks are arbitrarily using as many cores as they want under the scenes.

Ok but how do you explain that the cmake build inside the bazel sandbox does not parallelize, whereas it does when run outside of the sandbox?

Not really sure what this question means. Doesn't your own observation answer this question?

The main problem is that the build flags do not include -j or something similar to parallelize the build, so we're seeing hundreds of c++ files compiled one after another.

@rickystewart
Copy link
Collaborator

bazel-contrib/rules_foreign_cc#329 for reference on the overarching issue here.

@knz
Copy link
Contributor Author

knz commented Jun 5, 2021

ok so the root cause is that bazel does not implement the job pool mechanism that GNU make supports (and I think cmake too). That's pretty unfortunate.

In any case, the issue at the top remains. Building cmake costs 5-10 minutes and is currently responsible for 60% of the run time of the "Bazel CI" target in TC.

@irfansharif
Copy link
Contributor

I'm seeing this too for local builds. The majority of the time spent in a bazel build is actually spent building cmake from scratch, which is surprising.

[2,367 / 3,189] BootstrapGNUMake external/rules_foreign_cc/toolchains/make [for host]; 379s darwin-sandbox ... (16 actions running)

@knz
Copy link
Contributor Author

knz commented Jul 13, 2021

Is there a way to tell Bazel to use the host cmake instead of baking a new one? At least for local builds?

@rickystewart
Copy link
Collaborator

rickystewart commented Jul 14, 2021

Yeah, this should do it:

diff --git a/WORKSPACE b/WORKSPACE
index 4066d7f09d..a232d4355d 100644
--- a/WORKSPACE
+++ b/WORKSPACE
@@ -172,7 +172,11 @@ git_repository(
 
 load("@rules_foreign_cc//foreign_cc:repositories.bzl", "rules_foreign_cc_dependencies")
 
-rules_foreign_cc_dependencies()
+rules_foreign_cc_dependencies(
+    register_built_tools = False,
+    register_default_tools = False,
+    register_preinstalled_tools = True,
+)
 
 # Load custom toolchains.
 load("//build/toolchains:REPOSITORIES.bzl", "toolchain_dependencies")

Might be a way to do it with command-line options instead of updating the workspace, not sure what that is yet though.

@knz
Copy link
Contributor Author

knz commented Jul 14, 2021

for now, I'll take that as a workaround. thank you!

(Still would be interesting to lower the time of the CI target, but for that I support your idea to download a pre-built binary)

@rickystewart
Copy link
Collaborator

(Still would be interesting to lower the time of the CI target, but for that I support your idea to download a pre-built binary)

I took a look and rules_foreign_cc already kind of supports this, but 1) I'm not sure if they've verified FreeBSD compat and 2) they haven't registered pre-built make binaries to download (only cmake and ninja, not that we're using the latter). Probably would be a pretty small work item to fix this all told.

@knz
Copy link
Contributor Author

knz commented Jul 14, 2021

I'm not sure if they've verified FreeBSD compat

I can use your workaround if need be

craig bot pushed a commit that referenced this issue Oct 20, 2021
71720: sql/pgwire: fix error code for decoding binary array r=otan a=rafiss

This matches Postgres.

Release note: None

71735: bazel: use preinstalled cmake/make for rules_foreign_cc tasks  r=rail a=rickystewart

This allows us to avoid bootstrapping `make` and `cmake` for `c-deps`
builds. It does require you to have `cmake` and `make` installed
locally, but I don't think that's a huge deal.

Closes #65894.
Closes #71734.

Release note: None

Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
@craig craig bot closed this as completed in 867b8fd Oct 20, 2021
irfansharif pushed a commit to irfansharif/cockroach that referenced this issue Oct 26, 2021
This allows us to avoid bootstrapping `make` and `cmake` for `c-deps`
builds. It does require you to have `cmake` and `make` installed
locally, but I don't think that's a huge deal.

Closes cockroachdb#65894.
Closes cockroachdb#71734.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-system C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-productivity Severe issues that impede the productivity of CockroachDB developers. T-dev-inf
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants