-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a possibility to lazily map a depset #13132
Comments
cc @brandjon |
Neither the native C++ rules nor any other native ruleset actually have this optimization currently. The nonconstructive proof of this is that
CppLinkActionBuilder.build() and observe that the input linker inputs and derived sets are expanded—sometimes multiple times!
While saving analysis resources is always tempting, implementation of this proposal would be complex. ulfjack's work a few years ago, culminating in 873ba13, went in a direction opposite that which this proposal would demand, requiring the analysis inputs exposed by an action to be |
Hey, this is an excellent and well-researched issue as it was filed! I'll put on my Uber TL hat and carefully not recommend any particular course of action other than mentioning that this occurred to me a number of times, always in similar contexts, I can't recall which ones they were :( This will become a problem when we'll rewrite the C++ rules in Starlark, though (or maybe even earlier, if some other language has the same problem as Rust) I think in addition to the |
I'll agree with @benjaminp that it sounds difficult to add Starlark evaluation to the code paths that determine action inputs, although @lberki and @hlopko would know more about that than me. In the meantime, let's see if there are any motivating cases that can't reasonably be addressed by reworking the rule set to group provider information appropriately for the consumer. (Either way, we won't have bandwidth for this anytime soon.) |
I thought that adding Starlark evaluation to action input processing is not that onerous because we already have something similar with command line expansion already, although error handling would be more difficult in the action input case (it would suck if My main worry is that then we'd have the ability to build up complex lazy data structures from Starlark, which would make the analysis/execution phases even more complicated than they already are. |
Precisely. Also, perhaps there are fingerprinting concerns as well? Starlark code cannot be reliably fingerprinted, and this led to some annoyance and workarounds for command lines. |
...and fingerprinting also :) I think these issues are not insurmountable because in the worst case, we can enumerate the nested set and fingerprint the results [1] and verify it that it doesn't throw anything, but it does make it less of an obviously good idea: the extra enumeration is more complexity and more CPU time. 1: If it's deterministic, which one can't guarantee because we do have a way for Starlark to be affected by .d file parsing / include scanning. |
This is something that also came into my mind a while ago (although in the context of native rules) and I'd like to see getting implemented (my context was, in I have a very hacky prototype in Yannic#1006 that implements this for native mapper functions. Fingerprinting seems to be missing there, but I think I also had a version somewhere that included the identity of the mapper function into the fingerprint. I don't know what it would take to get this to a mergeable state or to make it work for Starlark functions. |
Lazily mapped depsets is certainly a feature I'd like to have in the future. Especially if we can completely prevent depset flattening (probably with some other primitive features). Also some of the parameters in ctx.args interfaces can be done via lazy mapping. The main issue of designing such feature are Starlark closures, which at the moment may be overtriggering on bzl file changes. That said, such a feature is a "big gun". For example the need for it in protos (mentioned in a comment by Yannic earlier in the thread) was completely removed by simplifying protos. |
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale. |
Hi dear Bazel folks,
My feature request is to add a possibility to apply a mapping function from F -> T onto a depset in order to lazily produce a depset of T from a depset of F. But of course I don't need this specific solution, I only need a solution for the problem I'm trying to solve.
The problem I'm trying to solve?
In
rules_rust
we allow depending on rules providingCcInfo
(provider used by C++ rules).CcInfo
contains a depset of linker inputs throughcc_info.linking_context.linker_inputs
. Each linker input contains a collections of LibraryToLink structs. We have to remember transitive LinkerInputs in Rust provider to generate correct linking command line, the simplified (very much simplified) algorithm is this:The good news is that we can produce the command line here lazily (that means without having to iterate the depset in analysis) through
map_each
parameter onargs.add_all
.map_each
accepts a function that receives single parameter and returns a list of flags. This approach is safe to use from the build speed and memory use perspective, as the function cannot capture anything from its environment, it can only map the attribute to a return value.What we cannot do lazily is to collect input artifacts for the linking action. We have to take the same depset of LinkerInputs struct from the example above and pick a specific library artifact roughtly following the same algorithm.
Is there a workaround?
Yes, to flatten the nested set in analysis.
Why do I need to solve this problem?
Without a solution it's not possible to depend on C++ rules from a Starlark rule in a memory and cpu efficient way. In addition,
rules_rust
have a very similar design challenge that C++ rules solved with LinkerInputs, and we would like to use the same solution.Why it hasn't been a problem yet?
Because C++ rules are implemented in Java where they can use
Iterables.transform
: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/rules/cpp/LinkerInputs.java;l=572?q=tolibraryartifacts. All existing Starlark rules that integrate with C++ rules (for examplecc_shared_library
) flatten the nested set.Why I think it's fine to map a depset?
In the past it was feared to call a user defined Starlark function in the execution phase. That fear is gone, we do exactly this in args. For
rules_rust
use case it's enough to impose exactly the same restrictions as on args - only functions with a single parameter are allowed.CC @lberki @meisterT @oquenchil @comius
Thank you all and have a lovely day :)
The text was updated successfully, but these errors were encountered: