Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependees of Uncacheable nodes should be cleaned once per Session #10798

Closed
stuhood opened this issue Sep 17, 2020 · 0 comments · Fixed by #10814
Closed

Dependees of Uncacheable nodes should be cleaned once per Session #10798

stuhood opened this issue Sep 17, 2020 · 0 comments · Fixed by #10814

Comments

@stuhood
Copy link
Member

stuhood commented Sep 17, 2020

Currently, the UncacheableDependencies state is the same as the Dirty state, with two differences:

  1. In poll calls, we don't re-run UncacheableDependencies nodes unless they are marked Dirty.
  2. Dirty nodes go to the Clean state if their dependencies have not changed, but an UncacheableDependencies node goes back to UncacheableDependencies after being successfully cleaned.

The latter point is a potential silent performance issue. Because an UncacheableDependencies node does not go to the Clean state for the current session, any non-concurrent request within the same session for an UncacheableDependencies node will cause it to be re-cleaned (recursively down to the Uncacheable node that is somewhere below it). While cleaning nodes is fairly cheap (and the Uncacheable node will correctly one run once per session), it's possible that we will repeatedly re-clean an UncacheableDependencies node within one session... in the worst case this could involve n! node-cleaning-visits for a path of length n between a root and an Uncacheable node, although that should be almost impossible in practice due to the likely shape of the graph.

To fix this, we'll likely want to incorporate the SessionId into the UncacheableDependencies state (or allow a node to be marked "Clean but only for a particular session"), similar to what we do for Uncacheable. This would allow the UncacheableDependencies node to run once per session like Uncacheable does.

(as a medium term tangent: it's likely that the SessionId that we have on nodes should act more like a MVCC timestamp, which would allow for concurrent runs of a node for different sessions to have different values.)

stuhood added a commit that referenced this issue Sep 21, 2020
### Problem

As described in #10798, nodes that depend on `Uncacheable` nodes (those in the `UncacheableDependencies` state) might currently run multiple times per session, even though the `Uncacheable` nodes themselves will only run once. Although cleaning is fairly cheap, the worst case for how much total cleaning might hypothetically be done is quite large. 

### Solution

Similar to `Uncacheable` nodes, give `UncacheableDependencies` nodes a `RunId` that they are clean for, and clean them once per session by updating it. Add stats to allow for testing that nodes were cleaned, and confirm that re-running within a single session does not clean any nodes.

### Result

Fixes #10798.

[ci skip-build-wheels]
stuhood added a commit that referenced this issue Sep 21, 2020
### Problem

In order to resolve #10062, we're planning to remove the `OptionsBootstrapper` `Param`, and instead compute it using an uncacheable `@rule`. Doing so performantly relies on successfully "cleaning" (see #10798) all of the nodes above options parsing in cases where the output has not changed, but experimentation and inspection of the types involved in [construct_scope_*](https://github.com/pantsbuild/pants/blob/dadbcc5b7cb08493ff8f5b0c7f88e896e20a7d7e/src/python/pants/option/optionable.py#L40-L64) and [scope_options](https://github.com/pantsbuild/pants/blob/dadbcc5b7cb08493ff8f5b0c7f88e896e20a7d7e/src/python/pants/engine/internals/options_parsing.py#L23-L37) showed that there were missing equals implementations that prevented cleaning.

### Solution

Give `Subsystem` and `OptionsValueContainer` structural implementations of `__eq__`. Because they will only be used as return values / positional args (ie, the rust side `Value` struct), they don't need `__hash__`.

Additionally, remove `--parent-build-id`, which is not currently consumed for metrics, and which causes lots of invalidation, because every run ends up with new options values. If this facility needs to be revived, it's likely that it should be set only in the context of the `InteractiveProcess` intrinsic, rather than globally... and perhaps not using the options system at all.
 
### Result

Together with #10814 and another PR to follow, `Subsystem` nodes above an uncacheable `@rule` that computes the `OptionsBootstrapper` can be successfully cleaned, which causes early cutoff that allows the rest of the graph to be cleaned with a minimum number of nodes re-running.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant