-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No way to use toolchains correctly in executable rule output #19645
Comments
Any updates on this? |
I'm very confused about your use case, hopefully we can figure out what makes sense. Typically, the intent of toolchains is that they are used during build actions: a c++ compiler is called, an analyzer is run, whatever. It sounds like you want some files from a toolchain, but you want your build rule to bundle those into the outputs? We've used toolchains for that in a few cases (the Python runtime is probably the best example), but it's always been a tricky fit, because it's using the tools as an output to be passed along. Maybe you don't want a toolchain, you want a direct (target configured) dependency on the tools, so that they can be bundled into your rule's output? |
I agree, but see my numerous linked examples in the footnote of the opening post. This mis-pattern is quite widespread, because folks (apparently) want to use toolchains in tests or bazel-run scripts (ie: the output of executable rules). This is a feature request for an improvement to bazel toolchains/exec_groups to make it possible to resolve toolchain for a rule's target/parent. Or alternatively we loudly declare this will 'never' be possible, and (as a community), we go and rewrite these examples to avoid toolchains. I'm ok with either path to address this issue - right now I'm kind of stuck waiting for a decision. Folks "expect" this to work, but it doesn't - and if I'm going to delete their toolchain then I'd at least like to be able to link to this PR (or some other doc) as evidence for why that was the correct change to their rules. |
Toolchains (uniquely) have two options for their dependencies:
So part of the solution is probably to be more careful of the target/exec distinction, here. It's unfortunately easy to get this wrong in the default case, since the base case is that the host platform is used as both the only exec platform and as the default target platform. I agree that there is probably more we can be doing to make this cleaner. I'm going to add this item to our backlog for tracking purposes. If you have any specific features you'd like to request, we'd be happy to see a proposal or further discussion to move this along. |
I think we're talking past each other. I (believe I am) familiar with cfg=target/cfg=exec and when to use them. This does not help with the problem described in this issue. I'll explain one example here in detail, to see if that helps: rules_oci has a target that can be used similar to This executable rule in this case is the following: def _impl(ctx):
crane = ctx.toolchains["@rules_oci//oci:crane_toolchain_type"]
yq = ctx.toolchains["@aspect_bazel_lib//lib:yq_toolchain_type"]
if not ctx.file.image.is_directory:
fail("image attribute must be a oci_image or oci_image_index")
_, _, _, maybe_digest, maybe_tag = util.parse_image(ctx.attr.repository)
if maybe_digest or maybe_tag:
fail("`repository` attribute should not contain digest or tag. got: {}".format(ctx.attr.repository))
executable = ctx.actions.declare_file("push_%s.sh" % ctx.label.name)
files = [ctx.file.image]
substitutions = {
"{{crane_path}}": crane.crane_info.binary.short_path,
"{{yq_path}}": yq.yqinfo.bin.short_path,
"{{image_dir}}": ctx.file.image.short_path,
"{{fixed_args}}": " ".join(_quote_args(["--repository", ctx.attr.repository])),
}
if ctx.attr.remote_tags:
files.append(ctx.file.remote_tags)
substitutions["{{tags}}"] = ctx.file.remote_tags.short_path
ctx.actions.expand_template(
template = ctx.file._push_sh_tpl,
output = executable,
is_executable = True,
substitutions = substitutions,
)
runfiles = ctx.runfiles(files = files)
runfiles = runfiles.merge(yq.default.default_runfiles)
runfiles = runfiles.merge(crane.default.default_runfiles)
return DefaultInfo(executable = executable, runfiles = runfiles)
# Refactored slightly from the original for clarity
oci_push = rule(
implementation = _impl,
attrs = _attrs,
toolchains = [
"@rules_oci//oci:crane_toolchain_type",
"@aspect_bazel_lib//lib:yq_toolchain_type",
],
executable = True,
) To summarise the relevant parts: this rule gets the paths for two tools ( But this use of toolchains is incorrect when cross-compiling! Specifically: When bazel host platform != exec platform (eg: run bazel on macos with a linux RBE cluster), the rule implementation above will run on the exec (linux) platform, but the output executable needs to work on the host (macos) platform. The toolchains will incorrectly resolve to the exec (linux) platform, and the executable output will be a script that points to linux tools that will fail when run on macos. The correct But there is NO WAY to resolve the toolchain for the target of a rule! ^^This is the bazel limitation described in this github issue. If there is anything I can do to expand or clarify the above explanation, please suggest it because it seems to be subtle/difficult for folks to recognise the issue here and I'd like to point folks to this comment for enlightenment. Afaik cfg=target is not relevant here, since there is nowhere to put that that will affect toolchain resolution - the Afaics this incorrect use (or misunderstanding) of toolchains is moderately common (see my examples in widely used rules), and can impact both bazel-run and bazel tests (both executable rules). Ideally, we improve bazel so that toolchains can be used in these important use cases. |
This looks very related to #21805, where I suggest the |
Related, but not the same - just to prevent an overzealous close-as-duplicate. I agree |
Hey @anguslees, I did some experimenting, and it looks like what you want is possible by doing the following:
The outgoing edge dependency can make use of the following transition: def _transition_exec_to_target_impl(settings, attr):
return {
"//command_line_option:extra_execution_platforms": settings["//command_line_option:platforms"],
}
transition_exec_to_target = transition(
implementation = _transition_exec_to_target_impl,
inputs = ["//command_line_option:platforms"],
outputs = ["//command_line_option:extra_execution_platforms"],
) I will send out a PR against rules_oci soon to get That said, I do think that there is a bit of voodoo magic going on. Transitioning on command line flags is a bit funky. Especially because we're doing it on |
Bazel doesn't provide a very elegant way to specify that a toolchain should be resolved in such a way that it can be executed on a *target* platform: bazelbuild/bazel#19645 To work around this, we can move the toolchains into a small helper rule named oci_push_tools(). We can use an outgoing edge transition on the oci_push() side to set --extra_execution_platforms equal to --platforms, thereby causing the toolchains to be resolved the way we want. Though I think using transitions for this is pretty slick, it's a shame that this has to be done by patching up command line options.
Oh wow - I remember considering this, but I think I must have just incorrectly concluded that transitioning Fwiw on bazel 6.4.0, I needed to force the //command_line_option:platforms labels to strings first: def _transition_exec_to_target_impl(settings, attr):
return {
"//command_line_option:extra_execution_platforms": [str(p) for p in settings["//command_line_option:platforms"]],
} Combined with the |
Bazel doesn't provide a very elegant way to specify that a toolchain should be resolved in such a way that it can be executed on a *target* platform: bazelbuild/bazel#19645 To work around this, we can pass in the toolchains through ordinary labels and apply an outgoing edge transition to set --extra_execution_platforms equal to --platforms. This causes the toolchains to be resolved the way we want. Though I think using transitions for this is pretty slick, it's a shame that this has to be done by patching up command line options.
Instead of using a transition to add new execution platforms, consider using a flag to activate them (once I sit down and implement the proposal, hopefully this week). |
The toolchain_files() and current_rust_toolchain() rules return files for the current 'exec' platform. This is perfecly reasonable if you want to use these tools as part of ctx.actions.run(). But if you want to use these as part of 'data = []' (runfiles), they must be built for the target. This change adds a helper rule, toolchain_files_for_target(), which can take the aforementioned rules as an input, transitioning it so that it yields files for the target platform. I'm not happy with how this rule is implemented, but there is little we can do about that until bazelbuild/bazel#19645 is addressed.
The toolchain_files() and current_rust_toolchain() rules return files for the current 'exec' platform. This is perfecly reasonable if you want to use these tools as part of ctx.actions.run(). But if you want to use these as part of 'data = []' (runfiles), they must be built for the target. This change adds a helper rule, toolchain_files_for_target(), which can take the aforementioned rules as an input, transitioning it so that it yields files for the target platform. I'm not happy with how this rule is implemented, but there is little we can do about that until bazelbuild/bazel#19645 is addressed.
The toolchain_files() and current_rust_toolchain() rules return files for the current 'exec' platform. This is perfecly reasonable if you want to use these tools as part of ctx.actions.run(). But if you want to use these as part of 'data = []' (runfiles), they must be built for the target. This change adds a helper rule, toolchain_files_for_target(), which can take the aforementioned rules as an input, transitioning it so that it yields files for the target platform. I'm not happy with how this rule is implemented, but there is little we can do about that until bazelbuild/bazel#19645 is addressed.
The toolchain_files() and current_rust_toolchain() rules return files for the current 'exec' platform. This is perfecly reasonable if you want to use these tools as part of ctx.actions.run(). But if you want to use these as part of 'data = []' (runfiles), they must be built for the target. This change adds a helper rule, toolchain_files_for_target(), which can take the aforementioned rules as an input, transitioning it so that it yields files for the target platform. I'm not happy with how this rule is implemented, but there is little we can do about that until bazelbuild/bazel#19645 is addressed.
See bazel-contrib/bazel-lib#875 for how it can be simplified.
Not very toolchain, only those who want to be used in cfg=target. Note that a toolchain can have binary for both cfg=exec and cfg=target. For example, python toolcahin can have python-bytecode-compiler with cfg=exec and python-interpreter with cfg=target. For a py_binary rule, it use python-bytecode-compiler to generate bytecode and package a binary that runs python-interpreter with the generated bytecode. How do you handle this case without having seperate toolchain for python-bytecode-compiler and python-interpreter? @EdSchouten's transition approach although seems to work but it does not really. There are cases it won't work. See bazel-contrib/rules_oci#590 (comment) |
As described earlier in this github comment thread, the toolchain author can't know if their toolchain is going to be used in a test/run (cfg=target), or used directly in an action (cfg=exec). So yes, we would have to pre-emptively declare and configure every 'tool' toolchain twice, just in case.
This is ~exactly what transition and toolchains were designed to support aiui! The classic case is a C compiler, that simultaneously needs a compiler for 'exec' platform, and a libc for 'target'. This is why a single toolchain resolution matches exec and target simultaneously, and the idea is that the toolchain has some attributes with cfg=exec (for files that are executed as part of build), and cfg=target (for files that are copied opaquely into the output). I agree that this is not how rules_python uses toolchains. In an alternate reality, we would have a single python toolchain-type that matched on exec and target, and this toolchain would have a python interpreter for both bytecode generation at build-time (matching the 'exec' constraints) and another python interpreter for copying into runfiles (matching 'target' constraints). I think we're talking in circles, and a github issue is perhaps not the best forum for a discussion. Happy to discuss further on bazel slack, or elsewhere. |
A toolchain should have a very clear API. For example, if it provides a binary, it should be clearly documented in the API that is should be used as part of the build action or as part of the output artifact.
The transition solution is completely wrong. See bazel-contrib/rules_oci#590 (comment) and bazel-contrib/rules_oci#590 (comment) I think we are confused with two different concepts, 1) toolchain used to run as part of the build action; 2) a (possibly prebuilt) binary that is registered through toolchain. For 1), suppose a toolchain provides CcCompileInfo(compile = a function) where CcCompileInfo.compile(srcs = "a.cc") returns a DefaultInfo that is a binary. When defining a cc_toolchain, you can provide a exec binary, e.g., gcc and implment For 2), we basically want a magic binary that works on very supported platform. One can acheive it by using alias(name="cc-compiler", actual = select(...)) However, this is not flexible enough as one may want to register their own compiler. Then we can use toolchain that accept a toolchain_type(name="binary")
toolchain_def(name="def-{platfrom}", binary = "{platfrom}_gcc", target_compatible_with={platfrom})
toolchain(name="cc-{platfrom}", toolchain="def-{platfrom}", toolchain_type="binary")
resolved_cc_binary(name="resolved_cc") # resolved_cc_binary use toolchain_type binary to retrive the binary. Then the user should just directly use _tool = attr.label(cfg="exec", executable=True, default = ":resolved_cc") If they want to use it in the output artifact _runner = attr.label(cfg="target", executable=True, default = ":resolved_cc") |
Let me see if I understand this correctly: in the current model of Bazel, a toolchain has two configurations: the one it runs on and the one it targets (e.g. a Rust compiler can run on x64 and produce ARM binaries). So each toolchain has to be tagged with these two configurations and Bazel selects one that matches the exec and target configurations and it assumes that the reason why one uses a toolchain is to run it during the build and consume its output. However, some people want to use the loose coupling afforded by toolchains to choose what to incorporate into their actual outputs: say, one may want to produce an SDK that runs on x64 and package an x64 to ARM compiler into it. In this case, the right thing to do is to match the "target" configuration of Bazel to the "exec" configuration of the toolchain. Bazel currently can't to that. The "two toolchain" workaround is to wrap every toolchain in two toolchain rules: one that's "honest" and one that "lies", in the above case saying that the target configuration of the compiler is x64. This is obviously wrong because, well, that's not what the toolchain does. @katre any clever ideas here? I can imagine a lot of things but you have a lot more state about this in your head than I do. |
There is nothing wrong and no lies. See my above comment. It is totally fine for a toolchain to provides binaries for target and/or exec. Normally, compiler toolchain provides exec binaries to run at build time and runtime toolchain provides target binaries to use at runtime. The issue is that people misunderstood the concept such that they use the wrong binary. If you use a java runtime toolchain, you should not run |
For Java, that's arguably the case since the JDK and the JVM are different, but what about a C++ compiler or just a regular tool? If you accept the above SDK use case as valid, Bazel currently forces you to put the exec configuration of the C++ toolchain into its I think the conceptual difference between us is that you consider "toolchain to be used in the build" and "toolchain to be packaged as the result of the build" different things; in my mind, they are just one "thing" but two different use cases: it's true that one needs to provide much more rich information about the toolchain to be invoked during the build than just to package it, but a list of files the toolchain is comprised of is necessary for both and it feels silly to duplicate the exec platform compatibility information and the set of files like this. It's a reasonable workaround, but it requires the owner of every toolchain to cooperate and very public interfaces should be very narrow. |
I agree with @BoleynSu here. The current approach is conceptually sound, the problem is that it's apparently too verbose or too hard to use correctly for simple cases that don't need the full flexibility afforded by toolchains. That's where the |
I don't think Bazel allows one to match the constraints of the target platform to the exec constraints of a toolchian, does it? (except by the awkward transition above #19645 (comment) ) There is some merit in the argument that "toolchain as functionality" and "toolchain as bunch of files" are different, though. The awkwardness here is that if it's to work, every author of a toolchain for a particular type must cooperate and it's extra boilerplate for all of them. |
Exactly. This is also the reason I've been highly skeptical of that approach. There won't be a way you can use your Mac to craft a Linux x86 SDK that targets some RTOS for ARM. |
@lberki Firstly, note that the transition approach is broken.
For your use case, I think you will need define a proper toolchain by bundling all the binaries(compiler/linker), configs(flags/config files), artifacts(static libraries/dynamtic libraries/source code libraries) instead of relying on the binaries provided by some other toolchain. @EdSchouten
|
This is a pretty interesting discussion, and the distinction between "toolchain which provides a binary as a tool for actions" and "toolchain which provides a binary to be packaged in an artifact" is a pretty subtle one. I think there are several patterns in this thread that could be applied to rules, using the current APIs. So, I think the next step is to figure out what is the correct outcome of this issue? Idea 1: Someone (possibly @BoleynSu?) could write up some documentation for bazel.build describing when this pattern is useful, and what rule authors need to do to use it. Idea 2: Possibly simultaneously, someone (@BoleynSu? @fmeum?) could write some helper macros for constructing these distinct varieties of toolchains (updating the documentation from the step above). Question 1: Is there an actual request for changes in Bazel (presumably in toolchain resolution) here? From the discussion so far, this feels like a problem in 1) documenting the underlying patterns and when they are useful, and 2) convincing rule authors to use them to make these available to users. Is that a reasonable summary? |
The only goal I have here is that this use case is doable somehow. I don't have a preference towards changing Bazel or providing libraries to do so in principle, however, in this case, the former seems more reasonable because (in order of what I think is the strongest to the weakest argument):
|
@katre @lberki I would prefer to make no changes to Bazel if these can be implemented as library (to make Bazel less complex). I can write some docs and would be great if @fmeum can also help. There is no need to ask very toolchain to be usable with the wrapper. For example, if I define a cc_toolchain, I may or may not want to expose the compiler to user of the toolchain, i.e., CcToolchainInfo(compile=a function to use the binary). Only if I do want to expose the binary, we can register the compiler binary to "//gcc:toolchain_type" and use "//gcc:resolved_binary" with cfg="exec" in cc_toolchain. Also I want to emphasis that the transition_exec_to_target approach is broken. See bazel-contrib/rules_oci#590 (comment) There are aleady a few PRs based on such hack merged and some pendding. @EdSchouten could you hold on pending PRs? |
I'm not sure "making Bazel less complex" is the right thing to optimze for here. It's a good idea to optimize for "less complexity is seen by the people who use Bazel", but that complexity is not just that of Bazel, but also that of all external Bazel code they depend on and that includes toolchain type definitions and their implementations. IOW, if we can buy simplicity in many toolchains with a little additional wrinkle in Bazel, that's a good deal. That said, your argument that toolchains are not necessarily single groups of files is a good one for arguing against finding a solution in Bazel (for example, even the C++ compiler is not just a single group of file because Bazel has the ability to take different sets of inputs for e.g. compile and link actions). |
Coming back to this: @lberki, I can see your point, but I'm very unclear if there's an actual request besides "make this easier". I'm happy to review and discuss a concrete proposal, but I'm not sure the Configurability team can actually spend any time on this in the near future. |
@katre fair enough; I think the insight above that a toolchain doesn't even necessarily need to provide any files to be a toolchain (which manifests in the code as toolchains being free to provider any provider, I think what may make sense is being able to ask Bazel "give me a toolchain whose host configuration matches my target configuration", but given that it has a workaround in Starlark (declaring two toolchains and two toolchain types) and that we'd have to decide what target configuration that toolchain should have, I don't think we should treat this as a priority. I guess if there was a concrete proposal to do so, @katre would entertain it, wouldn't he? |
I would be very happy to read and discuss a concrete proposal. |
Related issue bazelbuild/rules_shell#4 (comment) |
Description of the feature request:
A common pattern1 I see in rules is toolchain declarations that effectively capture a tool path, and templated executable/test rules that resolve the toolchain and substitute the tool path into the executable file output.
Unfortunately this is incorrect! When the exec and target platforms differ, the toolchain will be resolved for the exec platform, but the executable/test output needs to be for the target platform.
As far as I can see, there is no way to make toolchain resolution / platform transitions do the correct thing here. Since executable rules are how bazel tests work, this is a pretty major gap in the toolchains/platforms story. I can't file bugs against rules, because there is no reasonable solution to suggest. This is a feature request for a solution that is (a) correct, and also ideally (b) easy/obvious.
One onerous current workaround is to declare two toolchains for all such cases. One that uses exec_compatible_with/target_compatible_with as usual, and is used when the toolchain is invoked by an action directly. And a second toolchain that uses target_compatible_with only, which can be used in executable rule output. This is moderately terrible, since this duplicates work for everybody, confuses the meaning of exec_compatible_with vs target_compatible_with, and toolchain authors can't anticipate when their toolchain needs to be used in this way - undoing the loose coupling between author and consumer that is kind of the point of toolchains.
An incorrect (afaics) version of the two-toolchains approach is to only declare target_compatible_with toolchain, then create a "current_foo_toolchain" rule that resolves the toolchain for a particular target, and returns the toolchain provider through a rule attribute - see eg python toolchain. This is incorrect in general because another toolchain that depends on this toolchain cannot be resolved at the same time for the same exec+target combination (ie:
toolchains=[foo]
with a dependency on another rule withtoolchains=[bar]
, is not the same astoolchains=[foo, bar]
).As a possible implementation, I'd love to be able to do
exec_group(cfg='target')
, since I thinkctx.exec_groups["target"].toolchains
is an intuitive and 'obvious' way to address the above.Another option is to not use toolchains for what are effectively artifacts. Rule authors obviously like the loose coupling between toolchain declaration and use however, so this potential path would require widespread advocacy/documentation, and perhaps a canonical meta-library for an "artifact toolchain".
See also slack thread: https://bazelbuild.slack.com/archives/CA31HN1T3/p1690176577746239
Which category does this issue belong to?
Core
What underlying problem are you trying to solve with this feature?
It should be possible to resolve and refer to the correct toolchain for an executable rule output.
Which operating system are you running Bazel on?
macos and Linux
What is the output of
bazel info release
?6.3.2
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response
Footnotes
Examples of (incorrect) toolchains used for executable rules in rules_js, rules_docker, rules_oci, gazelle ↩
The text was updated successfully, but these errors were encountered: