Performance tuning for inference during test iteration #10062

benjyw · 2020-06-16T01:33:50Z

Specifically, we want to optimize the:

time-to-test-rerun

The text was updated successfully, but these errors were encountered:

stuhood · 2020-09-09T20:04:42Z

I've highjacked this one to reference a particular case that is important: switching between test targets triggers more dep inference computation than I would expect.

For example: running:

./pants -ldebug test src/python/pants/util/filtering_test.py

and then:

./pants -ldebug test src/python/pants/util/eval_test.py

still causes the following rules to run the second time:

...
12:57:30.33 [DEBUG] Completed: Creating map of third party targets to Python modules
12:57:33.55 [DEBUG] Completed: Creating map of first party targets to Python modules
...

Looking at the rule graph for these (generated with ./pants --native-engine-visualize-to=$dir test), it looks like this is likely because those rules transitively depend on the OptionsBootstrapper, and the change in CLI args means that the previous value of the node cannot be used.

So this potentially relates to #10360, in that we might want to move more of options parsing into @rules to prevent this kind of invalidation. The OptionsBootstrapper is way too large a Param to be passing in from the top of the graph, but there is probably also a smaller fix available where we shrink the OptionsBootstrapper to just what is used to create Subsystems (which would basically just mean removing the Specs, I think).

stuhood · 2020-09-12T00:41:34Z

Relates to #6845 (comment): the OptionsBootstrapper is a very large Param: making it smaller is good, but making the env/args an uncacheable node and moving all of options parsing into the engine via #10360 might be better. Depends what the overhead of uncacheable nodes is in practice.

stuhood · 2020-09-15T17:33:04Z

Bumped this one up to the RC: will aim to only tackle the quick fix here, unless someone thinks we should try to tackle both #10360 and this one together.

benjyw · 2020-09-15T19:25:00Z

I think #10360 can wait for now. Let's get the quick fix in and see how much that fixes.

stuhood · 2020-09-17T00:43:38Z

Went on a tangent to write #10793 and open #10798.

I'm unsure how much of a performance issue #10798 is in practice, so I'm going to experiment with removing OptionsBootstrapper from the QueryRule we install for Goals, and computing it with an uncacheable @rule instead. If #10798 isn't a problem in practice, then computing the OB with an uncacheable @rule will be a great incremental step toward #10360, without the need to actually port as much code to rules yet.

stuhood · 2020-09-18T21:36:34Z

I have a prototype of this working, with two or three small patches to clean up and land. Should be able to finish by EOD Monday.

### Problem In order to resolve #10062, we're planning to remove the `OptionsBootstrapper` `Param`, and instead compute it using an uncacheable `@rule`. Doing so performantly relies on successfully "cleaning" (see #10798) all of the nodes above options parsing in cases where the output has not changed, but experimentation and inspection of the types involved in [construct_scope_*](https://github.com/pantsbuild/pants/blob/dadbcc5b7cb08493ff8f5b0c7f88e896e20a7d7e/src/python/pants/option/optionable.py#L40-L64) and [scope_options](https://github.com/pantsbuild/pants/blob/dadbcc5b7cb08493ff8f5b0c7f88e896e20a7d7e/src/python/pants/engine/internals/options_parsing.py#L23-L37) showed that there were missing equals implementations that prevented cleaning. ### Solution Give `Subsystem` and `OptionsValueContainer` structural implementations of `__eq__`. Because they will only be used as return values / positional args (ie, the rust side `Value` struct), they don't need `__hash__`. Additionally, remove `--parent-build-id`, which is not currently consumed for metrics, and which causes lots of invalidation, because every run ends up with new options values. If this facility needs to be revived, it's likely that it should be set only in the context of the `InteractiveProcess` intrinsic, rather than globally... and perhaps not using the options system at all. ### Result Together with #10814 and another PR to follow, `Subsystem` nodes above an uncacheable `@rule` that computes the `OptionsBootstrapper` can be successfully cleaned, which causes early cutoff that allows the rest of the graph to be cleaned with a minimum number of nodes re-running.

stuhood · 2020-09-21T21:32:54Z

The remaining work for this is moving the OptionsBootstrapper from being a Param to being per-session information.

At a semantic level, we want to make the args and env into Session-specific information, such that rather than affecting the identity of Nodes in the graph, they instead only affect their values. I prototyped that via #10827 by providing the args/env via an uncacheable singleton (before working on the prerequisite PRs: #10815, #10814), and there is a path forward to allowing for concurrent sessions by storing per-session values in Uncacheable(Dependencies). But there are some design questions to be answered for the API.

From an end-user API perspective (e.g.: SchedulerSession.request), we need to decide how this information should be provided. My feeling is that although we could continue to accept a "per-session" value like this as a magical Param type that doesn't affect the identity of nodes (maybe by stripping it out of the SchedulerSession.request args and setting it in a Session-specific spot), that might complicate the explanation of Params in general. It's important that Params are easy to explain, because they are the bread and butter of how nodes are identified in the graph, and AFAICT the "per-session" feature described here will be much more obscure, and more of a footnote of the API. We've made it this far without having such a thing, so we're unlikely to be swimming in usecases for it.

So I'm thinking that instead of a magical Param, this might be better as a new (and maybe private, for now) @rule type. Strawdesign:

# installed for a particular type
def PerSessionValueRule(typ: Type) -> Rule: ..

# set in the session constructor, and available to be used in `requests` made using the session
session = scheduler.new_session(.., OptionsBootstrapper.create(..))
# may consume the per-session type without it affecting the identity of the request
session.request(MyGoal, Specs(..))

A benefit to this sketch is that it helps to differentiate the identity of your request (the Params you use) from information that will not be part of the identity. This moves the "args" and "env" closer to being environmental inputs similar to the filesystem. It remains (relatively) easy to explain that: "a node is identified by its definition and the Params that it consumes", rather than certain types of Params being magical.

A downside is that this might complicate usage of the SchedulerSession.request and RuleRunner.request APIs when compared to using some magic to decide which Params are per-session (but this is fuzzier, because you need to be able to explain the magic).

Eric-Arellano · 2020-09-21T21:40:31Z

This moves the "args" and "env" closer to being environmental inputs similar to the filesystem

I like this modeling.

A downside is that this might complicate usage of the SchedulerSession.request and RuleRunner.request APIs when compared to using some magic to decide which Params are per-session

Right now, you can change the args dynamically in a test for the exact same RuleRunner. We use this a lot to test the behavior of something with the flag on vs. the flag off.

Do you expect that instead the options would be similar to a "bootstrap option", that you have to create a new RuleRunner every time? I'm trying to understand how both approaches impact tests.

stuhood · 2020-09-21T21:44:37Z

Do you expect that instead the options would be similar to a "bootstrap option", that you have to create a new RuleRunner every time? I'm trying to understand how both approaches impact tests.

You'd need a new SchedulerSession, or you'd need to mutate your current one. If we followed the builder pattern for both of them, that might look like:

rule_runner = rule_runner.with_session_value(OptionsBootstrapper(..))

...which would return a copy of the RuleRunner+SchedulerSession.

Eric-Arellano · 2020-09-21T21:46:29Z

That's what I was thinking with a classmethod like that.

But would you then have to recreate all your test setup, like rule_runner.create_file()? I think the build root would change, which is a bummer.

I wonder if we could do something like:

rule_runner.with_session_value(OptionsBootstrapper(..), preserve_files=True)

stuhood · 2020-09-21T21:54:06Z

But would you then have to recreate all your test setup, like rule_runner.create_file()? I think the build root would change, which is a bummer.

Just depends on what you want to do with the API. I would not expect rule_runner.with_session_value(..) to be equivalent to constructing a new RuleRunner from scratch (that's what the constructor is for), so it would feel reasonable to me for the buildroot to stick around.

It's also possible that because some of the other methods of RuleRunner are (necessarily) side-effecting (create_file), that RuleRunner should use mutability, even though the SchedulerSession does not. That might look like:

class RuleRunner:
    def set_session_value(value: ..) -> None:
        self.session = self.session.with_session_value(value)

Eric-Arellano · 2020-09-21T21:55:45Z

Cool. If we could keep the buildroot around, then that sounds like an acceptable API to me. In many ways, more explicit with how to pass args, rather than "sometimes you can include this in your rule_runner.request() but sometimes you can't".

…er than a Param (#10827) ### Problem As described on #10062: any change to options values (including the CLI specs and passthrough args) currently completely invalidate all `@rules` that consume `Subsystem`s, because the "identities" (memoization keys) of the involved `@rules` change. As more heavy lifting has begun to depend on options, this has become more obvious and problematic. ### Solution As sketched in #10062 (comment), move to providing the `OptionsBootstrapper` (and in future, perhaps much smaller wrappers around the "args" and "env" instead) via a new uncacheable `SessionValues` intrinsic. More generally, the combination of `Params` for values that _should_ affect the identity of a `@rule`, and `SessionValues` for values that should _not_ affect the identity of a `@rule` seems to be a sufficient solution to the problem described on #6845. The vast majority of values consumed by `@rule`s should be computed from `Params`, so it's possible that the env/args will be the only values we ever provide via `SessionValues`: TBD. ### Result The case described in #10062 (comment) no longer invalidates the consumers of `Subsystem`s, and in general, only the `Subsystem`s that are affected by an option change should be invalidated in memory (although: #10834). Fixes #10062 and fixes #6845. [ci skip-build-wheels]

This comment has been minimized.

Sign in to view

benjyw assigned gshuflin Jul 7, 2020

Eric-Arellano added the performance label Jul 16, 2020

benjyw added this to the 2.0.x milestone Jul 16, 2020

stuhood unassigned gshuflin Jul 16, 2020

benjyw self-assigned this Jul 16, 2020

stuhood changed the title ~~Performance tuning~~ Performance tuning for inference during test iteration Sep 9, 2020

stuhood modified the milestones: 2.0.x, 2.0.0rc0 Sep 15, 2020

stuhood assigned stuhood and unassigned benjyw Sep 15, 2020

This was referenced Sep 19, 2020

Useful equality for relevant options types #10815

Merged

Clean nodes with uncacheable dependencies once per session #10814

Merged

stuhood closed this as completed in #10815 Sep 21, 2020

stuhood reopened this Sep 21, 2020

stuhood mentioned this issue Sep 22, 2020

OptionsBootstrapper is provided via a new SessionValues facility rather than a Param #10827

Merged

stuhood closed this as completed in #10827 Sep 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance tuning for inference during test iteration #10062

Performance tuning for inference during test iteration #10062

benjyw commented Jun 16, 2020 •

edited by stuhood

Loading

This comment has been minimized.

stuhood commented Sep 9, 2020 •

edited

Loading

stuhood commented Sep 12, 2020 •

edited

Loading

stuhood commented Sep 15, 2020

benjyw commented Sep 15, 2020

stuhood commented Sep 17, 2020 •

edited

Loading

stuhood commented Sep 18, 2020

stuhood commented Sep 21, 2020 •

edited

Loading

Eric-Arellano commented Sep 21, 2020

stuhood commented Sep 21, 2020 •

edited

Loading

Eric-Arellano commented Sep 21, 2020

stuhood commented Sep 21, 2020 •

edited

Loading

Eric-Arellano commented Sep 21, 2020

Performance tuning for inference during test iteration #10062

Performance tuning for inference during test iteration #10062

Comments

benjyw commented Jun 16, 2020 • edited by stuhood Loading

This comment has been minimized.

stuhood commented Sep 9, 2020 • edited Loading

stuhood commented Sep 12, 2020 • edited Loading

stuhood commented Sep 15, 2020

benjyw commented Sep 15, 2020

stuhood commented Sep 17, 2020 • edited Loading

stuhood commented Sep 18, 2020

stuhood commented Sep 21, 2020 • edited Loading

Eric-Arellano commented Sep 21, 2020

stuhood commented Sep 21, 2020 • edited Loading

Eric-Arellano commented Sep 21, 2020

stuhood commented Sep 21, 2020 • edited Loading

Eric-Arellano commented Sep 21, 2020

benjyw commented Jun 16, 2020 •

edited by stuhood

Loading

stuhood commented Sep 9, 2020 •

edited

Loading

stuhood commented Sep 12, 2020 •

edited

Loading

stuhood commented Sep 17, 2020 •

edited

Loading

stuhood commented Sep 21, 2020 •

edited

Loading

stuhood commented Sep 21, 2020 •

edited

Loading

stuhood commented Sep 21, 2020 •

edited

Loading