Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimize Param usage in Task identities #6845

Closed
stuhood opened this issue Nov 30, 2018 · 4 comments · Fixed by #10827
Closed

Minimize Param usage in Task identities #6845

stuhood opened this issue Nov 30, 2018 · 4 comments · Fixed by #10827

Comments

@stuhood
Copy link
Member

stuhood commented Nov 30, 2018

While working on #5869 and thinking about #6478, it became clearer (although it had been in the back of my mind since #5788 (comment)) that we will need to be very careful where Params are used for options.

In particular: even when they are partitioned into scopes/Subsystems, option values used in node identities via Params represent a very big hammer for invalidation: because two nodes with different parameters are ... different nodes, node dirtying can have no effect on them. So toggling any option value that is consumed in a subgraph would invalidate that subgraph. This got me to thinking about how we could consume and filter options to make them smaller (if possible) before using them in Task identities.

One idea that I had (and implemented) yesterday, was to move the requesting of the "arguments" to a Task function out of impl Task, and into impl Select, and then using the arg values in the identity of the requested Task node, rather than the params used to compute them. This means that the identity of a Task node becomes 1) the function name, 2) the actual arg values that will be used to run it, 3) the Param values that are necessary to satisfy its Get requests.

In theory, this allows more reuse/dirtying of Task nodes, because if an argument to a Task is computed from a "large Param" like a set of options, this allows it to be filtered and scoped before it is included in the node identity. But this optimization ([0] in theory: more on this later) does not extend to Gets. The reason is that once we've begun running a Task, we definitely want the running of that Task memoized: as an extreme example, we don't want to start running it in and only memoize it once it has completed (that would mean that we would have the absolute minimum number of nodes in the graph, but any concurrent attempts to request a node would try to run in parallel until they were ready to store their values).

So: I think that we should land the change to move arg computation from Task to Select, because "if it is used carefully", it will mean that more nodes are reused. But what does "careful usage" look like? It looks like ensuring that as much as possible arrives as an argument to a Task, rather than later being requested as a Get. An open topic is: how do we appropriately encourage usage of args rather than Gets?

[0] The change is actually slightly slower because it needs to make more Keys out of Values for use in identities. But not prohibitively so I don't think.

@stuhood
Copy link
Member Author

stuhood commented Dec 3, 2018

Posted a draft of "Task node identity uses computed arguments" as #6858. Will be working on other things this week.

@stuhood stuhood self-assigned this Dec 3, 2018
@stuhood
Copy link
Member Author

stuhood commented Dec 3, 2018

Another open question: would doing more work in Select, and less in Task justify beginning to memoize Select again? Should benchmark.

@stuhood
Copy link
Member Author

stuhood commented Dec 3, 2018

I just realized that #6858 actually generalizes to the entire rule graph! Unfortunately, while thinking about what that means, I realized that #6858 might actually be a pessimization in some cases.

The generalization is: before injecting a Param into a subgraph, it's a good idea to "minimize"/"shrink"/"simplify" it as much as possible. Looking at the rule graph, if you have a Param X which is "small"/"stable", and a Param Y that is "large"/"dynamic", what you'd like to do is "simplify" Y into X before injecting it into the identity of a subgraph.

But the reason why #6858 might be a pessimization is that "simpler" is not actually well defined for Params (hence all the scare quotes in here). For example, a common case in #6858 will be that for a Task that Selects a HydratedTarget, we will translate from a Param Address to a Param HydratedTarget to use as an argument to the Task. But we know intuitively that a HydratedTarget represents a "larger"/"more-complex"/"less-stable" Param than Address does, and so this "simplification" might mean that we duplicate Task nodes in cases where we would rather have reused them to allow for dirtying.


Nonetheless, I think that this might still be good news. If it's possible to define "smaller"/"simpler" for Params (perhaps: fewer inputs?), we should be able to apply graph rewrites during rule graph creation to completely solve the issue in the description (not just for Selects, but also for Gets).

@stuhood stuhood removed their assignment Dec 3, 2018
@stuhood stuhood removed the P3 - M1 label Dec 14, 2018
@stuhood
Copy link
Member Author

stuhood commented Dec 14, 2018

This doesn't feel like a blocker for M1... going to punt for a little while, but leave it in the project.

stuhood added a commit that referenced this issue Sep 22, 2020
…er than a Param (#10827)

### Problem

As described on #10062: any change to options values (including the CLI specs and passthrough args) currently completely invalidate all `@rules` that consume `Subsystem`s, because the "identities" (memoization keys) of the involved `@rules` change. As more heavy lifting has begun to depend on options, this has become more obvious and problematic.

### Solution

As sketched in #10062 (comment), move to providing the `OptionsBootstrapper` (and in future, perhaps much smaller wrappers around the "args" and "env" instead) via a new uncacheable `SessionValues` intrinsic.

More generally, the combination of `Params` for values that _should_ affect the identity of a `@rule`, and `SessionValues` for values that should _not_ affect the identity of a `@rule` seems to be a sufficient solution to the problem described on #6845. The vast majority of values consumed by `@rule`s should be computed from `Params`, so it's possible that the env/args will be the only values we ever provide via `SessionValues`: TBD.

### Result

The case described in #10062 (comment) no longer invalidates the consumers of `Subsystem`s, and in general, only the `Subsystem`s that are affected by an option change should be invalidated in memory (although: #10834).

Fixes #10062 and fixes #6845.

[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant