Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a possibility to lazily map a depset #13132

Open
hlopko opened this issue Mar 1, 2021 · 10 comments
Open

Add a possibility to lazily map a depset #13132

hlopko opened this issue Mar 1, 2021 · 10 comments
Labels
not stale Issues or PRs that are inactive but not considered stale P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts type: feature request

Comments

@hlopko
Copy link
Member

hlopko commented Mar 1, 2021

Hi dear Bazel folks,

My feature request is to add a possibility to apply a mapping function from F -> T onto a depset in order to lazily produce a depset of T from a depset of F. But of course I don't need this specific solution, I only need a solution for the problem I'm trying to solve.

The problem I'm trying to solve?

In rules_rust we allow depending on rules providing CcInfo (provider used by C++ rules). CcInfo contains a depset of linker inputs through cc_info.linking_context.linker_inputs. Each linker input contains a collections of LibraryToLink structs. We have to remember transitive LinkerInputs in Rust provider to generate correct linking command line, the simplified (very much simplified) algorithm is this:

flags = []
for linker_input in linker_inputs.to_list():
  for lib in linker_input.libraries_to_link:
    if lib.alwayslink:
        flags.extend([
            "-C",
            "link-arg=-Wl,--whole-archive",
            "-C",
            ("link-arg=%s" % lib.static_library.path),
            "-C",
            "link-arg=-Wl,--no-whole-archive",
        ])
     if lib.static_library:
        flags.extend(["-lstatic=%s" % get_lib_name(lib.static_library)])
     else:
        flags.extend(["-ldylib=%s" % lib.dynamic_library)])
  if linker_input.user_link_flags:
    flags.extend(linker_input.user_link_flags)

The good news is that we can produce the command line here lazily (that means without having to iterate the depset in analysis) through map_each parameter on args.add_all. map_each accepts a function that receives single parameter and returns a list of flags. This approach is safe to use from the build speed and memory use perspective, as the function cannot capture anything from its environment, it can only map the attribute to a return value.

What we cannot do lazily is to collect input artifacts for the linking action. We have to take the same depset of LinkerInputs struct from the example above and pick a specific library artifact roughtly following the same algorithm.

Is there a workaround?

Yes, to flatten the nested set in analysis.

Why do I need to solve this problem?

Without a solution it's not possible to depend on C++ rules from a Starlark rule in a memory and cpu efficient way. In addition, rules_rust have a very similar design challenge that C++ rules solved with LinkerInputs, and we would like to use the same solution.

Why it hasn't been a problem yet?

Because C++ rules are implemented in Java where they can use Iterables.transform: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/rules/cpp/LinkerInputs.java;l=572?q=tolibraryartifacts. All existing Starlark rules that integrate with C++ rules (for example cc_shared_library) flatten the nested set.

Why I think it's fine to map a depset?

In the past it was feared to call a user defined Starlark function in the execution phase. That fear is gone, we do exactly this in args. For rules_rust use case it's enough to impose exactly the same restrictions as on args - only functions with a single parameter are allowed.

CC @lberki @meisterT @oquenchil @comius

Thank you all and have a lovely day :)

@oquenchil oquenchil added the team-Starlark-Integration Issues involving Bazel's integration with Starlark, excluding builtin symbols label Mar 1, 2021
@meisterT
Copy link
Member

meisterT commented Mar 1, 2021

cc @brandjon

@benjaminp
Copy link
Collaborator

Neither the native C++ rules nor any other native ruleset actually have this optimization currently. The nonconstructive proof of this is that CppLinkAction takes a NestedSet<Artifact> of inputs:

One can also trace CppLinkActionBuilder.build() and observe that the input linker inputs and derived sets are expanded—sometimes multiple times!

While saving analysis resources is always tempting, implementation of this proposal would be complex. ulfjack's work a few years ago, culminating in 873ba13, went in a direction opposite that which this proposal would demand, requiring the analysis inputs exposed by an action to be NestedSet<Artifact> rather than Iterable<Artifact>. This proposal would mean allowing Starlark execution deep in Bazel's core execution code. Should Starlark be invoked any time Bazel wants an action's inputs or should the output be cached after the first time to avoid overhead? Even assuming that change was allowed and made, it's not clear to me without a lot of thinking that the C++ rules could be adapted to use lazy input support or whether the benefits would be large enough.

@lberki
Copy link
Contributor

lberki commented Mar 1, 2021

Hey, this is an excellent and well-researched issue as it was filed!

I'll put on my Uber TL hat and carefully not recommend any particular course of action other than mentioning that this occurred to me a number of times, always in similar contexts, I can't recall which ones they were :( This will become a problem when we'll rewrite the C++ rules in Starlark, though (or maybe even earlier, if some other language has the same problem as Rust)

I think in addition to the Iterables.transform() trick you mentioned, we could in the past rely on cooperation from the other rule set so that it exposes the right set of artifacts as nested set (in this case, the depset of linker inputs). That looks like something that would make sense for C++ rules in general and would be a less intrusive change than the proposed lazy mapping of nested sets.

@brandjon
Copy link
Member

brandjon commented Mar 2, 2021

I'll agree with @benjaminp that it sounds difficult to add Starlark evaluation to the code paths that determine action inputs, although @lberki and @hlopko would know more about that than me. In the meantime, let's see if there are any motivating cases that can't reasonably be addressed by reworking the rule set to group provider information appropriately for the consumer. (Either way, we won't have bandwidth for this anytime soon.)

@brandjon brandjon added team-Build-Language P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) and removed team-Starlark-Integration Issues involving Bazel's integration with Starlark, excluding builtin symbols labels Mar 2, 2021
@lberki
Copy link
Contributor

lberki commented Mar 3, 2021

I thought that adding Starlark evaluation to action input processing is not that onerous because we already have something similar with command line expansion already, although error handling would be more difficult in the action input case (it would suck if Action.getInputs() suddenly threw EvalException)

My main worry is that then we'd have the ability to build up complex lazy data structures from Starlark, which would make the analysis/execution phases even more complicated than they already are.

@brandjon
Copy link
Member

brandjon commented Mar 3, 2021

(it would suck if Action.getInputs() suddenly threw EvalException)

Precisely. Also, perhaps there are fingerprinting concerns as well? Starlark code cannot be reliably fingerprinted, and this led to some annoyance and workarounds for command lines.

@lberki
Copy link
Contributor

lberki commented Mar 15, 2021

...and fingerprinting also :) I think these issues are not insurmountable because in the worst case, we can enumerate the nested set and fingerprint the results [1] and verify it that it doesn't throw anything, but it does make it less of an obviously good idea: the extra enumeration is more complexity and more CPU time.

1: If it's deterministic, which one can't guarantee because we do have a way for Starlark to be affected by .d file parsing / include scanning.

@Yannic
Copy link
Contributor

Yannic commented Apr 15, 2021

This is something that also came into my mind a while ago (although in the context of native rules) and I'd like to see getting implemented (my context was, in ProtoInfo, to lazily transform NestedSet<ProtoSource> transitiveSources into NestedSet<Artifact> by applying src -> src.getSourceFile()).

I have a very hacky prototype in Yannic#1006 that implements this for native mapper functions. Fingerprinting seems to be missing there, but I think I also had a version somewhere that included the identity of the mapper function into the fingerprint. I don't know what it would take to get this to a mergeable state or to make it work for Starlark functions.

@brandjon brandjon added untriaged team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts and removed team-Build-Language labels Nov 4, 2022
@comius comius added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) untriaged labels Aug 22, 2023
@comius
Copy link
Contributor

comius commented Aug 22, 2023

Lazily mapped depsets is certainly a feature I'd like to have in the future. Especially if we can completely prevent depset flattening (probably with some other primitive features). Also some of the parameters in ctx.args interfaces can be done via lazy mapping.

The main issue of designing such feature are Starlark closures, which at the moment may be overtriggering on bzl file changes.

That said, such a feature is a "big gun". For example the need for it in protos (mentioned in a comment by Yannic earlier in the thread) was completely removed by simplifying protos.

Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Oct 26, 2024
@tjgq tjgq added not stale Issues or PRs that are inactive but not considered stale and removed stale Issues or PRs that are stale (no activity for 30 days) labels Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not stale Issues or PRs that are inactive but not considered stale P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts type: feature request
Projects
None yet
Development

No branches or pull requests

9 participants