-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update engine README for Params #7600
Conversation
e3cc42b
to
29253d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Stu! This was my first time reading the doc and I found it very useful.
One source of `Params` is the root of a request, where a `Param` type that may be provided by a caller | ||
of the engine can be declared using a `RootRule`. Installing a `RootRule` is sometimes necessary to | ||
seal the rule graph in cases where a `Param` could only possibly be computed outside of the rule graph | ||
and then passed in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. I think it would help to have an example specific to the idea of a RootRule
. My main confusion is what does "root" mean in "root of a request"?
29253d1
to
28eec9c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this! I think it could still use some clarification, as I'm more versed in engine concepts than most people who are going to read this, and I still couldn't wrap my brain around Params based on this text.
@@ -17,63 +17,69 @@ Once the engine is instantiated with a valid set of `@rule`s, a caller can synch | |||
computation of any of the product types provided by those `@rule`s by calling: | |||
|
|||
```python | |||
# Request a ThingINeed (a `Product`) for the thing_i_have (a `Subject`). | |||
# Request a ThingINeed (a `Product`) for a thing_i_have (a `Param`). | |||
thing_i_need, = scheduler.product_request(ThingINeed, [thing_i_have]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comma after thing_i_need
a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No: it's unpacking a single result from a list. I feel like I like this syntax better than:
thing_i_need = scheduler.product_request(ThingINeed, [thing_i_have])[0]
... but if we want to avoid this, we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thing_i_need = scheduler.product_request(ThingINeed, [thing_i_have])[0]
is less surprising to me. I recommend using that style both in documentation and source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comma is subtle. Perhaps .pop()
as the idiom is a compromise, but it would seem nicer to assert a single item which the tuple unpack does. Maybe wrapping this case up in an API pulls its weight and scheduler.single_product_request(ThingINeed, [thing_i_have])
should be a thing (I think it was at some point?). Maybe request
and request_single
would be better names - the product bit is perhaps redundant - the only thing you can request is a product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's always pants.util.collections.assert_single_element()
! That is what I have used for this exact purpose.
c5d21e9
to
2869046
Compare
Resolved a few points: thank you. Please take another look! |
|
||
The return value of an `@rule` is known as a `Product`. At some level, you can think | ||
of `(product_type, params_set)` as a "key" that uniquely identifies a particular `Product` value | ||
and `@rule` execution. If an `@rule` is able to produce a `Product` without consuming any `Params`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence confused me because I thought it was closely related to the prior sentence. I recommend adding a transition word like Further, if an @rule
.
The return value of an `@rule` is known as a `Product`. At some level, you can think | ||
of (`params_set`, `product_type`) as a "key" that uniquely identifies a particular `Product` value | ||
and `@rule` execution. And as expected, if an `@rule` is able to produce a `Product` without consuming | ||
any `Params`, then the `@rule` will run exactly once, and the value that it produces will be a singleton. | ||
|
||
#### Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rereading this again, I echo this suggestion. The example doesn't seem closely related to the actual topic of Rule ambiguity
.
In practical use, using basic types like `StringType` or `IntType` does not provide enough | ||
information to disambiguate between various types of data. So declaring small `datatype` | ||
definitions to provide a unique and descriptive type is strongly recommended: | ||
In practical use, builtin types like `str` or `int` do not provide enough information to disambiguate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps to create compound types and to disambiguate between various types of data
? I do think the idea of collections is an important one to explain.
When reading this, I wasn't immediately thinking "huh how would I make a list of ints?" But that's an important thing to know it's possible, so I think it's worth proactively mentioning it.
`tests/python/pants_test/test_base.py` to generate and execute a scheduler from a given set of | ||
rules. | ||
The recommended way to install `@rules` is to return them as a list from a `def rules()` definition | ||
in a plugin's `register.py` file. Unit tests can either invoke `@rules` with fully mocked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short example would be helpful of what the def rules()
looks like. You don't need to show the implementation of the rules, just the signatures. Something like this.
class List:
@rule(List, [Console, List.Options, Specs])
def list_targets(console, list_options, specs):
...
def rules():
return [
list_targets,
]
I'd think of this example like an ~integration example. You showed above how the building blocks work like datatype
and @rule
. Here, you show how it all comes together.
@@ -240,7 +231,7 @@ Work stalled on the later phases of the `RoundEngine` and talks re-booted about | |||
it stood and proposed the idea of a "tuple-engine". With some license taken in representation, this | |||
idea took the `RoundEngine` to the extreme of generating a round for each target-task pair. The | |||
pair formed the tuple of schedulable work and this concept combined with others to form the design | |||
[here][tuple-design]. | |||
[here][https://docs.google.com/document/d/1OARyIZSnw6XQiPlMydi57l_tS_JbFTJH6KLX61kPInI/edit?usp=sharing]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
src/rust/engine/src/rule_graph.rs
Outdated
@@ -674,38 +674,30 @@ impl<'t> GraphMaker<'t> { | |||
if satisfiable_entries.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these Rust changes be split out into a separate PR, so that this one becomes solely a doc update?
src/python/pants/engine/README.md
Outdated
`Subject` values as it runs (see the section on `Get` requests below). Because the subject for | ||
an `@rule` is chosen by callers, a `Subject` can be of any (hashable) type that a user might want | ||
to compute a product for. | ||
When an `@rule` runs, the set of `Param`s that it transitively consumes make up the unique identity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this read as?: When an "at-rule"
Otherwise it seems it should be: When a @rule
- and if so there are several other an @rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do read these as "at-rule", in general. Part of that is to distinguish them from rules in other systems.
src/python/pants/engine/README.md
Outdated
an `@rule` is chosen by callers, a `Subject` can be of any (hashable) type that a user might want | ||
to compute a product for. | ||
When an `@rule` runs, the set of `Param`s that it transitively consumes make up the unique identity | ||
for that instance of the `@rule`. Any hashable type with useful equality may be used as a `Param`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transitively? Is that true? If it is true is it useful to communicate here? If @rules behave like pure functions, only the Params the rule directly consumes should make up its identity. Likewise for Get requests which are just requests to execute an unknown-named rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transitive bit is somewhat true: it looks like you commented on the previous revision though: I think I've made it slightly clearer now.
And yes, they are pure functions, but we begin memoization of the execution of the @rule
before we know what its arguments will be: at that point we have only the Params
that will be used to compute its arguments (and which we have determined are necessary via rule graph compilation).
I looked briefly at memoizing using the arguments: there is more info here: #6845 ... still an open optimization question.
src/python/pants/engine/README.md
Outdated
The return value of an `@rule` is known as a `Product`. At some level, you can think | ||
of (`params_set`, `product_type`) as a "key" that uniquely identifies a particular `Product` value | ||
and `@rule` execution. And as expected, if an `@rule` is able to produce a `Product` without consuming | ||
any `Params`, then the `@rule` will run exactly once, and the value that it produces will be a singleton. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the spirit of the comment above, if its true rules just behave like pure functions, alot of specialized terminology introduced here / by names in the codebase - could be explained in a short-circuited way for the portion of the audience familiar with all this.
any other dependencies. On the other hand, if the type _doesn't_ match, the engine doesn't give up: | ||
it will next look for any other registered `@rule`s that can compute an IntType Product for the | ||
Subject (and so on, recursively). | ||
When the engine encounters this `@rule` while compiling the rule graph for `str`-producing-`@rules`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once more on the pure function theme. Given that rules are just pure functions, there are really just two interesting bits:
- How does the engine decide to call a function (you address a bit of this here).
- How does the user call a function (Get).
@@ -17,63 +17,69 @@ Once the engine is instantiated with a valid set of `@rule`s, a caller can synch | |||
computation of any of the product types provided by those `@rule`s by calling: | |||
|
|||
```python | |||
# Request a ThingINeed (a `Product`) for the thing_i_have (a `Subject`). | |||
# Request a ThingINeed (a `Product`) for a thing_i_have (a `Param`). | |||
thing_i_need, = scheduler.product_request(ThingINeed, [thing_i_have]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comma is subtle. Perhaps .pop()
as the idiom is a compromise, but it would seem nicer to assert a single item which the tuple unpack does. Maybe wrapping this case up in an API pulls its weight and scheduler.single_product_request(ThingINeed, [thing_i_have])
should be a thing (I think it was at some point?). Maybe request
and request_single
would be better names - the product bit is perhaps redundant - the only thing you can request is a product.
of `Params`. For example, if there was an `@rule` that could produce an `int` without consuming any | ||
`Params` at all (ie, a singleton), then that `@rule` would always be chosen first. If all `@rules` to | ||
produce `int`s required at least one `Param`, then the engine would next see whether the input `Params` | ||
contained an `int`, or whether there were any `@rules` that required only one `Param`, then two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence buries the lede a bit: you start by talking about needing to produce an int from Params, but here you imply that the int itself can be a Param.
Does the output of some other rule count as a Param, or are Params just the things that are injected into the boundary of the graph "from outside"?
In other words, if I have rule1
that takes a Foo and returns a Bar, and then rule2
that takes a Bar and returns a Baz, is it correct to refer to the Foo as a Param? What about the Bar?
Sorry to keep banging on this, but this README is going to be extremely useful and important, so it's best to make sure it's crystal clear.
|
||
The engine executes your `@rule`s in order to (recursively) compute a `Product` of the requested | ||
type for a given `Subject`. This recursive type search leads to a very loosely coupled (and yet | ||
type for a set of `Param`s. This recursive type search leads to a loosely coupled (and yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give formal definitions of Product and Param here? So that there's something to hang the subsequent explanation and examples off?
@@ -17,63 +17,69 @@ Once the engine is instantiated with a valid set of `@rule`s, a caller can synch | |||
computation of any of the product types provided by those `@rule`s by calling: | |||
|
|||
```python | |||
# Request a ThingINeed (a `Product`) for the thing_i_have (a `Subject`). | |||
# Request a ThingINeed (a `Product`) for a thing_i_have (a `Param`). | |||
thing_i_need, = scheduler.product_request(ThingINeed, [thing_i_have]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's always pants.util.collections.assert_single_element()
! That is what I have used for this exact purpose.
Graphviz: | ||
To help visualize executions, the engine can render both the static rule graph that is compiled | ||
on startup, and also the content of the `Graph` that is produced while `@rules` run. This generates | ||
`dot` files that can be rendered using Graphviz: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Problem While iterating in another branch, I noticed that a shadowed/unreachable `@rule` was not triggering an error. Most likely this issue was introduced when we began monomorphizing `@rules`. ### Solution Fix the error for unreachable `@rules`. Additionally, backport a simplification to the `@rule` selection algorithm from #7600 which makes it significantly easier to describe. ### Result Better error messages when `@rules` are shadowed or otherwise unreachable.
Does pants/src/python/pants/engine/selectors.py Lines 15 to 25 in 5eed2e7
|
I'd also like to get #7679 (review) incorporated here as a "how to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it!
# Delete this line to force a full CI run for documentation-only changes. [ci skip] # Documentation-only change.
2869046
to
1837a65
Compare
Thank you everyone for the useful feedback: unfortunately I don't have the time to invest in continuing to edit this right now, but I've needed to refer folks to it recently to help explain v2. I'm going to land as is with @benjyw 's shipit, and hope that I can loop back around to improve it soon. |
Problem
The engine
README.md
had not been updated for the merge of "subjects" andVariants
intoParams
(#6170).Solution
Overhaul the explanations to refer to
Params
, and expand on howParams
are propagated from dependents to dependencies.Additionally, simplify and document the codepath in rule graph construction that selects which
@rule
dependencies to use.