Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance tuning for inference during test iteration #10062

Closed
benjyw opened this issue Jun 16, 2020 · 13 comments · Fixed by #10815 or #10827
Closed

Performance tuning for inference during test iteration #10062

benjyw opened this issue Jun 16, 2020 · 13 comments · Fixed by #10815 or #10827
Assignees
Milestone

Comments

@benjyw
Copy link
Contributor

benjyw commented Jun 16, 2020

Specifically, we want to optimize the:

  • time-to-test-rerun
@stuhood

This comment has been minimized.

@benjyw benjyw added this to the 2.0.x milestone Jul 16, 2020
@benjyw benjyw self-assigned this Jul 16, 2020
@stuhood stuhood changed the title Performance tuning Performance tuning for inference during test iteration Sep 9, 2020
@stuhood
Copy link
Member

stuhood commented Sep 9, 2020

I've highjacked this one to reference a particular case that is important: switching between test targets triggers more dep inference computation than I would expect.

For example: running:

./pants -ldebug test src/python/pants/util/filtering_test.py

and then:

./pants -ldebug test src/python/pants/util/eval_test.py

still causes the following rules to run the second time:

...
12:57:30.33 [DEBUG] Completed: Creating map of third party targets to Python modules
12:57:33.55 [DEBUG] Completed: Creating map of first party targets to Python modules
...

Looking at the rule graph for these (generated with ./pants --native-engine-visualize-to=$dir test), it looks like this is likely because those rules transitively depend on the OptionsBootstrapper, and the change in CLI args means that the previous value of the node cannot be used.

So this potentially relates to #10360, in that we might want to move more of options parsing into @rules to prevent this kind of invalidation. The OptionsBootstrapper is way too large a Param to be passing in from the top of the graph, but there is probably also a smaller fix available where we shrink the OptionsBootstrapper to just what is used to create Subsystems (which would basically just mean removing the Specs, I think).

@stuhood
Copy link
Member

stuhood commented Sep 12, 2020

Relates to #6845 (comment): the OptionsBootstrapper is a very large Param: making it smaller is good, but making the env/args an uncacheable node and moving all of options parsing into the engine via #10360 might be better. Depends what the overhead of uncacheable nodes is in practice.

@stuhood stuhood modified the milestones: 2.0.x, 2.0.0rc0 Sep 15, 2020
@stuhood
Copy link
Member

stuhood commented Sep 15, 2020

Bumped this one up to the RC: will aim to only tackle the quick fix here, unless someone thinks we should try to tackle both #10360 and this one together.

@benjyw
Copy link
Contributor Author

benjyw commented Sep 15, 2020

I think #10360 can wait for now. Let's get the quick fix in and see how much that fixes.

@stuhood stuhood assigned stuhood and unassigned benjyw Sep 15, 2020
@stuhood
Copy link
Member

stuhood commented Sep 17, 2020

Went on a tangent to write #10793 and open #10798.

I'm unsure how much of a performance issue #10798 is in practice, so I'm going to experiment with removing OptionsBootstrapper from the QueryRule we install for Goals, and computing it with an uncacheable @rule instead. If #10798 isn't a problem in practice, then computing the OB with an uncacheable @rule will be a great incremental step toward #10360, without the need to actually port as much code to rules yet.

@stuhood
Copy link
Member

stuhood commented Sep 18, 2020

I have a prototype of this working, with two or three small patches to clean up and land. Should be able to finish by EOD Monday.

stuhood added a commit that referenced this issue Sep 21, 2020
### Problem

In order to resolve #10062, we're planning to remove the `OptionsBootstrapper` `Param`, and instead compute it using an uncacheable `@rule`. Doing so performantly relies on successfully "cleaning" (see #10798) all of the nodes above options parsing in cases where the output has not changed, but experimentation and inspection of the types involved in [construct_scope_*](https://github.com/pantsbuild/pants/blob/dadbcc5b7cb08493ff8f5b0c7f88e896e20a7d7e/src/python/pants/option/optionable.py#L40-L64) and [scope_options](https://github.com/pantsbuild/pants/blob/dadbcc5b7cb08493ff8f5b0c7f88e896e20a7d7e/src/python/pants/engine/internals/options_parsing.py#L23-L37) showed that there were missing equals implementations that prevented cleaning.

### Solution

Give `Subsystem` and `OptionsValueContainer` structural implementations of `__eq__`. Because they will only be used as return values / positional args (ie, the rust side `Value` struct), they don't need `__hash__`.

Additionally, remove `--parent-build-id`, which is not currently consumed for metrics, and which causes lots of invalidation, because every run ends up with new options values. If this facility needs to be revived, it's likely that it should be set only in the context of the `InteractiveProcess` intrinsic, rather than globally... and perhaps not using the options system at all.
 
### Result

Together with #10814 and another PR to follow, `Subsystem` nodes above an uncacheable `@rule` that computes the `OptionsBootstrapper` can be successfully cleaned, which causes early cutoff that allows the rest of the graph to be cleaned with a minimum number of nodes re-running.
@stuhood stuhood reopened this Sep 21, 2020
@stuhood
Copy link
Member

stuhood commented Sep 21, 2020

The remaining work for this is moving the OptionsBootstrapper from being a Param to being per-session information.

At a semantic level, we want to make the args and env into Session-specific information, such that rather than affecting the identity of Nodes in the graph, they instead only affect their values. I prototyped that via #10827 by providing the args/env via an uncacheable singleton (before working on the prerequisite PRs: #10815, #10814), and there is a path forward to allowing for concurrent sessions by storing per-session values in Uncacheable(Dependencies). But there are some design questions to be answered for the API.

From an end-user API perspective (e.g.: SchedulerSession.request), we need to decide how this information should be provided. My feeling is that although we could continue to accept a "per-session" value like this as a magical Param type that doesn't affect the identity of nodes (maybe by stripping it out of the SchedulerSession.request args and setting it in a Session-specific spot), that might complicate the explanation of Params in general. It's important that Params are easy to explain, because they are the bread and butter of how nodes are identified in the graph, and AFAICT the "per-session" feature described here will be much more obscure, and more of a footnote of the API. We've made it this far without having such a thing, so we're unlikely to be swimming in usecases for it.

So I'm thinking that instead of a magical Param, this might be better as a new (and maybe private, for now) @rule type. Strawdesign:

# installed for a particular type
def PerSessionValueRule(typ: Type) -> Rule: ..

# set in the session constructor, and available to be used in `requests` made using the session
session = scheduler.new_session(.., OptionsBootstrapper.create(..))
# may consume the per-session type without it affecting the identity of the request
session.request(MyGoal, Specs(..))

A benefit to this sketch is that it helps to differentiate the identity of your request (the Params you use) from information that will not be part of the identity. This moves the "args" and "env" closer to being environmental inputs similar to the filesystem. It remains (relatively) easy to explain that: "a node is identified by its definition and the Params that it consumes", rather than certain types of Params being magical.

A downside is that this might complicate usage of the SchedulerSession.request and RuleRunner.request APIs when compared to using some magic to decide which Params are per-session (but this is fuzzier, because you need to be able to explain the magic).

@Eric-Arellano
Copy link
Contributor

This moves the "args" and "env" closer to being environmental inputs similar to the filesystem

I like this modeling.

A downside is that this might complicate usage of the SchedulerSession.request and RuleRunner.request APIs when compared to using some magic to decide which Params are per-session

Right now, you can change the args dynamically in a test for the exact same RuleRunner. We use this a lot to test the behavior of something with the flag on vs. the flag off.

Do you expect that instead the options would be similar to a "bootstrap option", that you have to create a new RuleRunner every time? I'm trying to understand how both approaches impact tests.

@stuhood
Copy link
Member

stuhood commented Sep 21, 2020

Do you expect that instead the options would be similar to a "bootstrap option", that you have to create a new RuleRunner every time? I'm trying to understand how both approaches impact tests.

You'd need a new SchedulerSession, or you'd need to mutate your current one. If we followed the builder pattern for both of them, that might look like:

rule_runner = rule_runner.with_session_value(OptionsBootstrapper(..))

...which would return a copy of the RuleRunner+SchedulerSession.

@Eric-Arellano
Copy link
Contributor

That's what I was thinking with a classmethod like that.

But would you then have to recreate all your test setup, like rule_runner.create_file()? I think the build root would change, which is a bummer.

I wonder if we could do something like:

rule_runner.with_session_value(OptionsBootstrapper(..), preserve_files=True)

@stuhood
Copy link
Member

stuhood commented Sep 21, 2020

But would you then have to recreate all your test setup, like rule_runner.create_file()? I think the build root would change, which is a bummer.

Just depends on what you want to do with the API. I would not expect rule_runner.with_session_value(..) to be equivalent to constructing a new RuleRunner from scratch (that's what the constructor is for), so it would feel reasonable to me for the buildroot to stick around.

It's also possible that because some of the other methods of RuleRunner are (necessarily) side-effecting (create_file), that RuleRunner should use mutability, even though the SchedulerSession does not. That might look like:

class RuleRunner:
    def set_session_value(value: ..) -> None:
        self.session = self.session.with_session_value(value)

@Eric-Arellano
Copy link
Contributor

Cool. If we could keep the buildroot around, then that sounds like an acceptable API to me. In many ways, more explicit with how to pass args, rather than "sometimes you can include this in your rule_runner.request() but sometimes you can't".

stuhood added a commit that referenced this issue Sep 22, 2020
…er than a Param (#10827)

### Problem

As described on #10062: any change to options values (including the CLI specs and passthrough args) currently completely invalidate all `@rules` that consume `Subsystem`s, because the "identities" (memoization keys) of the involved `@rules` change. As more heavy lifting has begun to depend on options, this has become more obvious and problematic.

### Solution

As sketched in #10062 (comment), move to providing the `OptionsBootstrapper` (and in future, perhaps much smaller wrappers around the "args" and "env" instead) via a new uncacheable `SessionValues` intrinsic.

More generally, the combination of `Params` for values that _should_ affect the identity of a `@rule`, and `SessionValues` for values that should _not_ affect the identity of a `@rule` seems to be a sufficient solution to the problem described on #6845. The vast majority of values consumed by `@rule`s should be computed from `Params`, so it's possible that the env/args will be the only values we ever provide via `SessionValues`: TBD.

### Result

The case described in #10062 (comment) no longer invalidates the consumers of `Subsystem`s, and in general, only the `Subsystem`s that are affected by an option change should be invalidated in memory (although: #10834).

Fixes #10062 and fixes #6845.

[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants