Add a possibility to lazily map a depset #13132

hlopko · 2021-03-01T14:05:28Z

Hi dear Bazel folks,

My feature request is to add a possibility to apply a mapping function from F -> T onto a depset in order to lazily produce a depset of T from a depset of F. But of course I don't need this specific solution, I only need a solution for the problem I'm trying to solve.

The problem I'm trying to solve?

In rules_rust we allow depending on rules providing CcInfo (provider used by C++ rules). CcInfo contains a depset of linker inputs through cc_info.linking_context.linker_inputs. Each linker input contains a collections of LibraryToLink structs. We have to remember transitive LinkerInputs in Rust provider to generate correct linking command line, the simplified (very much simplified) algorithm is this:

flags = []
for linker_input in linker_inputs.to_list():
  for lib in linker_input.libraries_to_link:
    if lib.alwayslink:
        flags.extend([
            "-C",
            "link-arg=-Wl,--whole-archive",
            "-C",
            ("link-arg=%s" % lib.static_library.path),
            "-C",
            "link-arg=-Wl,--no-whole-archive",
        ])
     if lib.static_library:
        flags.extend(["-lstatic=%s" % get_lib_name(lib.static_library)])
     else:
        flags.extend(["-ldylib=%s" % lib.dynamic_library)])
  if linker_input.user_link_flags:
    flags.extend(linker_input.user_link_flags)

The good news is that we can produce the command line here lazily (that means without having to iterate the depset in analysis) through map_each parameter on args.add_all. map_each accepts a function that receives single parameter and returns a list of flags. This approach is safe to use from the build speed and memory use perspective, as the function cannot capture anything from its environment, it can only map the attribute to a return value.

What we cannot do lazily is to collect input artifacts for the linking action. We have to take the same depset of LinkerInputs struct from the example above and pick a specific library artifact roughtly following the same algorithm.

Is there a workaround?

Yes, to flatten the nested set in analysis.

Why do I need to solve this problem?

Without a solution it's not possible to depend on C++ rules from a Starlark rule in a memory and cpu efficient way. In addition, rules_rust have a very similar design challenge that C++ rules solved with LinkerInputs, and we would like to use the same solution.

Why it hasn't been a problem yet?

Because C++ rules are implemented in Java where they can use Iterables.transform: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/rules/cpp/LinkerInputs.java;l=572?q=tolibraryartifacts. All existing Starlark rules that integrate with C++ rules (for example cc_shared_library) flatten the nested set.

Why I think it's fine to map a depset?

In the past it was feared to call a user defined Starlark function in the execution phase. That fear is gone, we do exactly this in args. For rules_rust use case it's enough to impose exactly the same restrictions as on args - only functions with a single parameter are allowed.

CC @lberki @meisterT @oquenchil @comius

Thank you all and have a lovely day :)

The text was updated successfully, but these errors were encountered:

meisterT · 2021-03-01T14:11:38Z

cc @brandjon

benjaminp · 2021-03-01T15:23:11Z

Neither the native C++ rules nor any other native ruleset actually have this optimization currently. The nonconstructive proof of this is that CppLinkAction takes a NestedSet<Artifact> of inputs:

bazel/src/main/java/com/google/devtools/build/lib/rules/cpp/CppLinkAction.java

Line 156 in 7d78371

NestedSet<Artifact> inputs,

One can also trace CppLinkActionBuilder.build() and observe that the input linker inputs and derived sets are expanded—sometimes multiple times!

While saving analysis resources is always tempting, implementation of this proposal would be complex. ulfjack's work a few years ago, culminating in 873ba13, went in a direction opposite that which this proposal would demand, requiring the analysis inputs exposed by an action to be NestedSet<Artifact> rather than Iterable<Artifact>. This proposal would mean allowing Starlark execution deep in Bazel's core execution code. Should Starlark be invoked any time Bazel wants an action's inputs or should the output be cached after the first time to avoid overhead? Even assuming that change was allowed and made, it's not clear to me without a lot of thinking that the C++ rules could be adapted to use lazy input support or whether the benefits would be large enough.

lberki · 2021-03-01T16:26:53Z

Hey, this is an excellent and well-researched issue as it was filed!

I'll put on my Uber TL hat and carefully not recommend any particular course of action other than mentioning that this occurred to me a number of times, always in similar contexts, I can't recall which ones they were :( This will become a problem when we'll rewrite the C++ rules in Starlark, though (or maybe even earlier, if some other language has the same problem as Rust)

I think in addition to the Iterables.transform() trick you mentioned, we could in the past rely on cooperation from the other rule set so that it exposes the right set of artifacts as nested set (in this case, the depset of linker inputs). That looks like something that would make sense for C++ rules in general and would be a less intrusive change than the proposed lazy mapping of nested sets.

brandjon · 2021-03-02T16:20:50Z

I'll agree with @benjaminp that it sounds difficult to add Starlark evaluation to the code paths that determine action inputs, although @lberki and @hlopko would know more about that than me. In the meantime, let's see if there are any motivating cases that can't reasonably be addressed by reworking the rule set to group provider information appropriately for the consumer. (Either way, we won't have bandwidth for this anytime soon.)

lberki · 2021-03-03T07:42:53Z

I thought that adding Starlark evaluation to action input processing is not that onerous because we already have something similar with command line expansion already, although error handling would be more difficult in the action input case (it would suck if Action.getInputs() suddenly threw EvalException)

My main worry is that then we'd have the ability to build up complex lazy data structures from Starlark, which would make the analysis/execution phases even more complicated than they already are.

brandjon · 2021-03-03T18:42:09Z

(it would suck if Action.getInputs() suddenly threw EvalException)

Precisely. Also, perhaps there are fingerprinting concerns as well? Starlark code cannot be reliably fingerprinted, and this led to some annoyance and workarounds for command lines.

lberki · 2021-03-15T08:02:03Z

...and fingerprinting also :) I think these issues are not insurmountable because in the worst case, we can enumerate the nested set and fingerprint the results [1] and verify it that it doesn't throw anything, but it does make it less of an obviously good idea: the extra enumeration is more complexity and more CPU time.

1: If it's deterministic, which one can't guarantee because we do have a way for Starlark to be affected by .d file parsing / include scanning.

Yannic · 2021-04-15T21:19:26Z

This is something that also came into my mind a while ago (although in the context of native rules) and I'd like to see getting implemented (my context was, in ProtoInfo, to lazily transform NestedSet<ProtoSource> transitiveSources into NestedSet<Artifact> by applying src -> src.getSourceFile()).

I have a very hacky prototype in Yannic#1006 that implements this for native mapper functions. Fingerprinting seems to be missing there, but I think I also had a version somewhere that included the identity of the mapper function into the fingerprint. I don't know what it would take to get this to a mergeable state or to make it work for Starlark functions.

comius · 2023-08-22T08:00:10Z

Lazily mapped depsets is certainly a feature I'd like to have in the future. Especially if we can completely prevent depset flattening (probably with some other primitive features). Also some of the parameters in ctx.args interfaces can be done via lazy mapping.

The main issue of designing such feature are Starlark closures, which at the moment may be overtriggering on bzl file changes.

That said, such a feature is a "big gun". For example the need for it in protos (mentioned in a comment by Yannic earlier in the thread) was completely removed by simplifying protos.

github-actions · 2024-10-26T01:32:52Z

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.

hlopko added the type: feature request label Mar 1, 2021

oquenchil added the team-Starlark-Integration Issues involving Bazel's integration with Starlark, excluding builtin symbols label Mar 1, 2021

hlopko mentioned this issue Mar 1, 2021

Stop flattening depsets in analysis bazelbuild/rules_rust#622

Open

brandjon added team-Build-Language P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) and removed team-Starlark-Integration Issues involving Bazel's integration with Starlark, excluding builtin symbols labels Mar 2, 2021

hlopko mentioned this issue Jul 28, 2021

propagate additional inputs from native dependencies bazelbuild/rules_rust#858

Merged

brandjon added untriaged team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts and removed team-Build-Language labels Nov 4, 2022

comius added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) untriaged labels Aug 22, 2023

github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Oct 26, 2024

tjgq added not stale Issues or PRs that are inactive but not considered stale and removed stale Issues or PRs that are stale (no activity for 30 days) labels Oct 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a possibility to lazily map a depset #13132

Add a possibility to lazily map a depset #13132

hlopko commented Mar 1, 2021

meisterT commented Mar 1, 2021

benjaminp commented Mar 1, 2021

lberki commented Mar 1, 2021

brandjon commented Mar 2, 2021

lberki commented Mar 3, 2021

brandjon commented Mar 3, 2021

lberki commented Mar 15, 2021

Yannic commented Apr 15, 2021

comius commented Aug 22, 2023

github-actions bot commented Oct 26, 2024

Add a possibility to lazily map a depset #13132

Add a possibility to lazily map a depset #13132

Comments

hlopko commented Mar 1, 2021

The problem I'm trying to solve?

Is there a workaround?

Why do I need to solve this problem?

Why it hasn't been a problem yet?

Why I think it's fine to map a depset?

meisterT commented Mar 1, 2021

benjaminp commented Mar 1, 2021

lberki commented Mar 1, 2021

brandjon commented Mar 2, 2021

lberki commented Mar 3, 2021

brandjon commented Mar 3, 2021

lberki commented Mar 15, 2021

Yannic commented Apr 15, 2021

comius commented Aug 22, 2023

github-actions bot commented Oct 26, 2024