-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimize Param usage in Task identities #6845
Comments
Posted a draft of "Task node identity uses computed arguments" as #6858. Will be working on other things this week. |
Another open question: would doing more work in |
I just realized that #6858 actually generalizes to the entire rule graph! Unfortunately, while thinking about what that means, I realized that #6858 might actually be a pessimization in some cases. The generalization is: before injecting a But the reason why #6858 might be a pessimization is that "simpler" is not actually well defined for Nonetheless, I think that this might still be good news. If it's possible to define "smaller"/"simpler" for |
This doesn't feel like a blocker for M1... going to punt for a little while, but leave it in the project. |
…er than a Param (#10827) ### Problem As described on #10062: any change to options values (including the CLI specs and passthrough args) currently completely invalidate all `@rules` that consume `Subsystem`s, because the "identities" (memoization keys) of the involved `@rules` change. As more heavy lifting has begun to depend on options, this has become more obvious and problematic. ### Solution As sketched in #10062 (comment), move to providing the `OptionsBootstrapper` (and in future, perhaps much smaller wrappers around the "args" and "env" instead) via a new uncacheable `SessionValues` intrinsic. More generally, the combination of `Params` for values that _should_ affect the identity of a `@rule`, and `SessionValues` for values that should _not_ affect the identity of a `@rule` seems to be a sufficient solution to the problem described on #6845. The vast majority of values consumed by `@rule`s should be computed from `Params`, so it's possible that the env/args will be the only values we ever provide via `SessionValues`: TBD. ### Result The case described in #10062 (comment) no longer invalidates the consumers of `Subsystem`s, and in general, only the `Subsystem`s that are affected by an option change should be invalidated in memory (although: #10834). Fixes #10062 and fixes #6845. [ci skip-build-wheels]
While working on #5869 and thinking about #6478, it became clearer (although it had been in the back of my mind since #5788 (comment)) that we will need to be very careful where
Params
are used for options.In particular: even when they are partitioned into scopes/
Subsystems
, option values used in node identities viaParams
represent a very big hammer for invalidation: because two nodes with different parameters are ... different nodes, node dirtying can have no effect on them. So toggling any option value that is consumed in a subgraph would invalidate that subgraph. This got me to thinking about how we could consume and filter options to make them smaller (if possible) before using them in Task identities.One idea that I had (and implemented) yesterday, was to move the requesting of the "arguments" to a Task function out of
impl Task
, and intoimpl Select
, and then using the arg values in the identity of the requestedTask
node, rather than the params used to compute them. This means that the identity of aTask
node becomes 1) the function name, 2) the actual arg values that will be used to run it, 3) theParam
values that are necessary to satisfy itsGet
requests.In theory, this allows more reuse/dirtying of Task nodes, because if an argument to a
Task
is computed from a "largeParam
" like a set of options, this allows it to be filtered and scoped before it is included in the node identity. But this optimization ([0] in theory: more on this later) does not extend toGets
. The reason is that once we've begun running aTask
, we definitely want the running of thatTask
memoized: as an extreme example, we don't want to start running it in and only memoize it once it has completed (that would mean that we would have the absolute minimum number of nodes in the graph, but any concurrent attempts to request a node would try to run in parallel until they were ready to store their values).So: I think that we should land the change to move arg computation from
Task
toSelect
, because "if it is used carefully", it will mean that more nodes are reused. But what does "careful usage" look like? It looks like ensuring that as much as possible arrives as an argument to aTask
, rather than later being requested as aGet
. An open topic is: how do we appropriately encourage usage of args rather thanGet
s?[0] The change is actually slightly slower because it needs to make more
Keys
out ofValues
for use in identities. But not prohibitively so I don't think.The text was updated successfully, but these errors were encountered: