Minimize Param usage in Task identities #6845

stuhood · 2018-11-30T17:34:57Z

While working on #5869 and thinking about #6478, it became clearer (although it had been in the back of my mind since #5788 (comment)) that we will need to be very careful where Params are used for options.

In particular: even when they are partitioned into scopes/Subsystems, option values used in node identities via Params represent a very big hammer for invalidation: because two nodes with different parameters are ... different nodes, node dirtying can have no effect on them. So toggling any option value that is consumed in a subgraph would invalidate that subgraph. This got me to thinking about how we could consume and filter options to make them smaller (if possible) before using them in Task identities.

One idea that I had (and implemented) yesterday, was to move the requesting of the "arguments" to a Task function out of impl Task, and into impl Select, and then using the arg values in the identity of the requested Task node, rather than the params used to compute them. This means that the identity of a Task node becomes 1) the function name, 2) the actual arg values that will be used to run it, 3) the Param values that are necessary to satisfy its Get requests.

In theory, this allows more reuse/dirtying of Task nodes, because if an argument to a Task is computed from a "large Param" like a set of options, this allows it to be filtered and scoped before it is included in the node identity. But this optimization ([0] in theory: more on this later) does not extend to Gets. The reason is that once we've begun running a Task, we definitely want the running of that Task memoized: as an extreme example, we don't want to start running it in and only memoize it once it has completed (that would mean that we would have the absolute minimum number of nodes in the graph, but any concurrent attempts to request a node would try to run in parallel until they were ready to store their values).

So: I think that we should land the change to move arg computation from Task to Select, because "if it is used carefully", it will mean that more nodes are reused. But what does "careful usage" look like? It looks like ensuring that as much as possible arrives as an argument to a Task, rather than later being requested as a Get. An open topic is: how do we appropriately encourage usage of args rather than Gets?

[0] The change is actually slightly slower because it needs to make more Keys out of Values for use in identities. But not prohibitively so I don't think.

The text was updated successfully, but these errors were encountered:

stuhood · 2018-12-03T19:12:20Z

Posted a draft of "Task node identity uses computed arguments" as #6858. Will be working on other things this week.

stuhood · 2018-12-03T19:17:45Z

Another open question: would doing more work in Select, and less in Task justify beginning to memoize Select again? Should benchmark.

stuhood · 2018-12-03T20:30:14Z

I just realized that #6858 actually generalizes to the entire rule graph! Unfortunately, while thinking about what that means, I realized that #6858 might actually be a pessimization in some cases.

The generalization is: before injecting a Param into a subgraph, it's a good idea to "minimize"/"shrink"/"simplify" it as much as possible. Looking at the rule graph, if you have a Param X which is "small"/"stable", and a Param Y that is "large"/"dynamic", what you'd like to do is "simplify" Y into X before injecting it into the identity of a subgraph.

But the reason why #6858 might be a pessimization is that "simpler" is not actually well defined for Params (hence all the scare quotes in here). For example, a common case in #6858 will be that for a Task that Selects a HydratedTarget, we will translate from a Param Address to a Param HydratedTarget to use as an argument to the Task. But we know intuitively that a HydratedTarget represents a "larger"/"more-complex"/"less-stable" Param than Address does, and so this "simplification" might mean that we duplicate Task nodes in cases where we would rather have reused them to allow for dirtying.

Nonetheless, I think that this might still be good news. If it's possible to define "smaller"/"simpler" for Params (perhaps: fewer inputs?), we should be able to apply graph rewrites during rule graph creation to completely solve the issue in the description (not just for Selects, but also for Gets).

stuhood · 2018-12-14T20:53:28Z

This doesn't feel like a blocker for M1... going to punt for a little while, but leave it in the project.

…er than a Param (#10827) ### Problem As described on #10062: any change to options values (including the CLI specs and passthrough args) currently completely invalidate all `@rules` that consume `Subsystem`s, because the "identities" (memoization keys) of the involved `@rules` change. As more heavy lifting has begun to depend on options, this has become more obvious and problematic. ### Solution As sketched in #10062 (comment), move to providing the `OptionsBootstrapper` (and in future, perhaps much smaller wrappers around the "args" and "env" instead) via a new uncacheable `SessionValues` intrinsic. More generally, the combination of `Params` for values that _should_ affect the identity of a `@rule`, and `SessionValues` for values that should _not_ affect the identity of a `@rule` seems to be a sufficient solution to the problem described on #6845. The vast majority of values consumed by `@rule`s should be computed from `Params`, so it's possible that the env/args will be the only values we ever provide via `SessionValues`: TBD. ### Result The case described in #10062 (comment) no longer invalidates the consumers of `Subsystem`s, and in general, only the `Subsystem`s that are affected by an option change should be invalidated in memory (although: #10834). Fixes #10062 and fixes #6845. [ci skip-build-wheels]

stuhood added the engine label Nov 30, 2018

stuhood added the P3 - M1 label Nov 30, 2018

stuhood mentioned this issue Dec 3, 2018

Task node identity uses computed arguments #6858

Closed

stuhood self-assigned this Dec 3, 2018

stuhood mentioned this issue Dec 3, 2018

Implement rules that directly produce Subsystems from options #5869

Closed

stuhood removed their assignment Dec 3, 2018

stuhood removed the P3 - M1 label Dec 14, 2018

stuhood mentioned this issue Mar 5, 2019

Support options consumption during legacy BuildGraph hydration #7316

Closed

stuhood mentioned this issue Apr 23, 2019

Update engine README for Params #7600

Merged

stuhood mentioned this issue Sep 12, 2020

Performance tuning for inference during test iteration #10062

Closed

stuhood mentioned this issue Sep 22, 2020

OptionsBootstrapper is provided via a new SessionValues facility rather than a Param #10827

Merged

stuhood closed this as completed in #10827 Sep 22, 2020

stuhood mentioned this issue Jun 17, 2021

Re-enable concurrent runs for pantsd in v2 #7654

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimize Param usage in Task identities #6845

Minimize Param usage in Task identities #6845

stuhood commented Nov 30, 2018 •

edited

Loading

stuhood commented Dec 3, 2018

stuhood commented Dec 3, 2018

stuhood commented Dec 3, 2018

stuhood commented Dec 14, 2018

Minimize Param usage in Task identities #6845

Minimize Param usage in Task identities #6845

Comments

stuhood commented Nov 30, 2018 • edited Loading

stuhood commented Dec 3, 2018

stuhood commented Dec 3, 2018

stuhood commented Dec 3, 2018

stuhood commented Dec 14, 2018

stuhood commented Nov 30, 2018 •

edited

Loading