Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow specifying multiple Params to a Get() #7490

Closed
cosmicexplorer opened this issue Apr 3, 2019 · 8 comments · Fixed by #16668
Closed

allow specifying multiple Params to a Get() #7490

cosmicexplorer opened this issue Apr 3, 2019 · 8 comments · Fixed by #16668
Assignees

Comments

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Apr 3, 2019

Note: #12946 is prework for this change.


Currently, we can depend on any number of params in the rule graph in an @rule declaration, which may be specified either via Params when invoking scheduler.product_request(), via "singleton" rules provided at scheduler construction, or as outputs of other rules (see src/python/pants/engine/README.md). However, when we want to invoke a ruleset inside the body of another rule, we can only use rulesets requiring exactly one param due to the current limitations of the await Get(...) syntax. We would like to do:

@rule
def _d() -> D:
  return D()

@rule
def _e(c: C, d: D) -> E:
  return E(c, d)

@rule
def _c(b: B) -> C:
  return C(b)

@rule
def _f(c: C, d: D) -> F:
  return F(c, d)

@rule
async def _a(b: B) -> A:
  c = await Get(C, B, b) # we can do this now
  d = await Get(D) # (if D requires no params) -- this won't work right now
  e = await Get(Em Params((C, c), (D, d))) # this is a proposed syntax
  f = await Get(F, Params(C(...), D(...))) # proposed syntax for inline construction
  # ...

This would likely require editing pants.engine.selectors.Get to allow for a variable number of subjects, which would also involve updating the _RuleVisitor in rules.py to understand the new Params() syntax in arguments for Get nodes. There would then likely need to be some small modifications on the rust side where we parse the output of a PyGeneratorResponse (or whatever that struct is called), and also a change where we then generate Gets in order to correctly hook these up into the rule graph. Since we know that this multiple dispatch works already in @rule parameter selectors, it's not likely to require major changes to support this in await Get(...).

@stuhood
Copy link
Member

stuhood commented Apr 3, 2019

d = yield Get(D) # (if D requires no params) -- this won't work right now

Should probably not work (EDIT: See below)... that should be accomplished via a parameter selector to the rule instead, as that would allow the work to compute D to proceed concurrently before the rule starts running. But having an error message to that effect would be helpful.

EDIT: We've since (Dec 2021) seen that it is useful to be able to make some operations lazy by moving them under conditionals in your @rule body, so this is worth supporting too.

@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Apr 3, 2019

Agreed on that, we'd probably want to raise an error message to that effect if we see anything like:

  • Get[D]()
  • Get[D](Params())

In general though, we won't be able to catch all of these potential concurrency losses with local heuristics. A very silly form of inter-procedural optimization added to the _RuleVisitor would potentially be able to recognize not just zero-param Gets, but also something like:

@rule
def f(b: B, c: C) -> A:
  d = yield Get[D](B, b) # this could just be another selector
  # ...

and suggest instead:

@rule
def f(b: B, c: C, d: D) -> A:
  # ...

Calling it "inter-procedural optimization" is a bit much -- however, we definitely could (in _RuleVisitor):

  1. collect the @rule parameter selectors before we descend into the body of f
  2. when analyzing a Get() node, check if all of its params are already provided as parameter selectors to f
  3. raise an error describing how to modify the rule to move the input into a parameter selector

This would disallow some currently-valid constructions, but we can either add syntax to Get() to get around that, or we could not (I think this is probably better) and the user has to use an intermediate datatype to avoid the check from triggering (which we could describe in a lengthy, descriptive error message).

@stuhood
Copy link
Member

stuhood commented Apr 3, 2019

In general though, we won't be able to catch all of these potential concurrency losses with local heuristics. A very silly form of inter-procedural optimization added to the _RuleVisitor would potentially be able to recognize not just zero-param Gets, but also something like:

Mm, I don't think that that is worth doing, because it would catch only the very most basic cases... catching anything more advanced would require flow analysis, and I don't think that that is worth it.

EDIT: Oh... or maybe not. But still low priority I think.

@cosmicexplorer
Copy link
Contributor Author

Low priority but perhaps a high benefit / effort ratio since we'd just be adding an if statement when we process Get() nodes! And it would also happen to cover the case of yield Get(D, Params()) mentioned above automatically, which is cool.

@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented Jan 1, 2020

I updated the PR description and comments to use the new syntax for rules and typed Get statements.

How feasible is implementing this?

@rule
async def _a(b: B) -> A:
  c = await Get[C](B, b) # we can do this now
  e = await Get[E](Params((C, c), (D, d))) # this is a proposed syntax
  f = await Get[F](Params(C(...), D(...))) # proposed syntax for inline construction
  # ...

The conversation in this thread became focused on how to handle yield Get[A](), which is a separate conversation better saved for a followup, imo.

Eric-Arellano added a commit that referenced this issue Feb 13, 2020
## Problem

When running `./pants test src/python/pants/rules/core/run_test.py`, someone who has never used Pants before would expect Pants to only test `run_test.py`. However, Pants will end up testing all of `rules/core:tests` and users must specify `./pants test src/python/pants/rules/core/run_test.py --pytest-args='-k RunTest'` to get what they want.

At the same time, we must still use the metadata from the owning target in order to prepare the test, such as resolving `dependencies`.

## Solution

Preserve the `OriginSpec` and pass it down all the way to `python_test_runner.py` so that it may pass more precise arguments to Pytest. Previously, we always passed every single file belonging to the `sources` of the target. Now, when given a filesystem spec, we will pass the files explicitly specified (which are a subset of the target's `sources`).

In order to preserve the `OriginSpec`, we make several changes.

**These changes are clunky due to the lack of multiple params in `await Get` (#7490). This PR helps to guide that design, though, while adding a useful feature needed today.**

### `AddressesWithOrigins`

The test `@goal_rule` now requests `AddressesWithOrigins`, rather than `Addresses`, so that we may preserve the original specs.

Here, we add a method to go from `AddressesWithOrigins -> AddressWithOrigin`.

### `HydratedTargetWithOrigin`

This allows us to have a hydrated target without losing the `OriginSpec` information. 

We need to be able to use `HydratedTarget` because this is how we access the underlying `TargetAdaptor` necessary to get the `TestTarget` union working. The original `AddressWithOrigin` from the `@goal_rule` isn't hydrated enough.

### `TargetAdaptorWithOrigin`

Finally, we add `TargetAdaptorWithOrigin`, along with a type for each distinct `TargetAdaptor`, such as `PythonTestsAdaptorWithOrigin`.

We need so many new classes for the union mechanism to work properly. `python_test_runner.py` must have some way to signal that it only works with `python_tests` targets, which we previously did by registering `UnionRule(TestTarget, PythonTestsAdaptor)`. But, we still need to preserve the `OriginSpec`, so we introduce `PythonTestsAdaptorWithOrigin`.

#### Rejected alternative: field on `TargetAdaptor`

One alternative design is to insert the `OriginSpec` proactively on the `TargetAdaptor`. This means that we would not need to introduce any `TargetAdaptorWithOrigin` types.

However, this approach suffers from not being typed. Given the type-driven API of rules, we want some structured way to specify that `python_test_runner.py` _must_ have the `OriginSpec` preserved, rather than checking for some optional field defined on the `TargetAdaptor`.

## Result

`./pants test src/python/pants/rules/core/run_test.py` only runs over `run_test.py` now. 

In a followup, we will add precise file args to `fmt2` and `lint2`.
@stuhood stuhood self-assigned this Apr 12, 2020
@stuhood stuhood changed the title allow specifying multiple Params to a yield Get() allow specifying multiple Params to a Get() Apr 12, 2020
@Eric-Arellano
Copy link
Contributor

I hit a need tonight for being able to await Get for a singleton.

After a refactor I've done to Pytest coverage, getting the CoverageConfig file is now a singleton rule. But that setup is non-trivial, as we map the transitive closure of every source file -> source root. I don't want to add this as a param to the rule because it's a waste of performance if coverage is not being used.

@stuhood
Copy link
Member

stuhood commented Oct 11, 2021

#12946 now describes some useful prework for this one, so we should probably consider this one to be blocked on that one.

@stuhood stuhood self-assigned this Aug 24, 2022
@stuhood
Copy link
Member

stuhood commented Aug 24, 2022

An alternative syntax that I quite like:

output = await Get(Output, {input1: Input1, input2: Input2})

This works because Get inputs already must always have eq/hash. Putting the declared type of the value in the dict value also has the benefit of looking like a type-ascription (and is growable, if we want to add more information in the dict value: such as whether a param is optionally consumed).

A downside though is the extra cost of creating a dict (and actually hashing and potentially comparing the values) when using multiple parameters. I think that using multiple parameters will be rare enough (for example in #13682: only at the boundaries between platforms) that this will not be an issue.

stuhood added a commit that referenced this issue Aug 24, 2022
As described in #12946 (comment), `DependencyKey` being abstract makes changing the number of provided parameters to a `Get` more challenging than necessary.

To prepare for #12946 and #7490, this change converts `DependencyKey` to a concrete type containing a `smallvec` of provided parameters.

[ci skip-build-wheels]
stuhood added a commit that referenced this issue Aug 26, 2022
)

In order to support differentiating `@rule` subgraphs for different "environments" as described in the design doc for #13682, `@rules` need to be able inject more than one parameter into any particular subgraph.

As a concrete example: the `test` `@rules` need to be able to request "a `TestResult` for this `FieldSet` in this `Environment`". This change adds support for multiple parameters to a `Get`, such that that use case would be spelled out as:
```python
test_result = await Get(TestResult, {field_set: FieldSet, environment: Environment})
``` 

Fixes #7490.

[ci skip-build-wheels]
cczona pushed a commit to cczona/pants that referenced this issue Sep 1, 2022
As described in pantsbuild#12946 (comment), `DependencyKey` being abstract makes changing the number of provided parameters to a `Get` more challenging than necessary.

To prepare for pantsbuild#12946 and pantsbuild#7490, this change converts `DependencyKey` to a concrete type containing a `smallvec` of provided parameters.

[ci skip-build-wheels]
cczona pushed a commit to cczona/pants that referenced this issue Sep 1, 2022
…tsbuild#16668)

In order to support differentiating `@rule` subgraphs for different "environments" as described in the design doc for pantsbuild#13682, `@rules` need to be able inject more than one parameter into any particular subgraph.

As a concrete example: the `test` `@rules` need to be able to request "a `TestResult` for this `FieldSet` in this `Environment`". This change adds support for multiple parameters to a `Get`, such that that use case would be spelled out as:
```python
test_result = await Get(TestResult, {field_set: FieldSet, environment: Environment})
``` 

Fixes pantsbuild#7490.

[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants