Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update engine README for Params #7600

Merged
merged 2 commits into from
Aug 2, 2019

Conversation

stuhood
Copy link
Member

@stuhood stuhood commented Apr 20, 2019

Problem

The engine README.md had not been updated for the merge of "subjects" and Variants into Params (#6170).

Solution

Overhaul the explanations to refer to Params, and expand on how Params are propagated from dependents to dependencies.

Additionally, simplify and document the codepath in rule graph construction that selects which @rule dependencies to use.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Stu! This was my first time reading the doc and I found it very useful.

src/python/pants/engine/README.md Outdated Show resolved Hide resolved
src/python/pants/engine/README.md Outdated Show resolved Hide resolved
src/python/pants/engine/README.md Show resolved Hide resolved
src/python/pants/engine/README.md Outdated Show resolved Hide resolved
src/python/pants/engine/README.md Outdated Show resolved Hide resolved
src/python/pants/engine/README.md Show resolved Hide resolved
src/python/pants/engine/README.md Show resolved Hide resolved
One source of `Params` is the root of a request, where a `Param` type that may be provided by a caller
of the engine can be declared using a `RootRule`. Installing a `RootRule` is sometimes necessary to
seal the rule graph in cases where a `Param` could only possibly be computed outside of the rule graph
and then passed in.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. I think it would help to have an example specific to the idea of a RootRule. My main confusion is what does "root" mean in "root of a request"?

src/python/pants/engine/README.md Show resolved Hide resolved
src/python/pants/engine/README.md Show resolved Hide resolved
Copy link
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this! I think it could still use some clarification, as I'm more versed in engine concepts than most people who are going to read this, and I still couldn't wrap my brain around Params based on this text.

@@ -17,63 +17,69 @@ Once the engine is instantiated with a valid set of `@rule`s, a caller can synch
computation of any of the product types provided by those `@rule`s by calling:

```python
# Request a ThingINeed (a `Product`) for the thing_i_have (a `Subject`).
# Request a ThingINeed (a `Product`) for a thing_i_have (a `Param`).
thing_i_need, = scheduler.product_request(ThingINeed, [thing_i_have])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comma after thing_i_need a typo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No: it's unpacking a single result from a list. I feel like I like this syntax better than:

thing_i_need = scheduler.product_request(ThingINeed, [thing_i_have])[0]

... but if we want to avoid this, we can.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thing_i_need = scheduler.product_request(ThingINeed, [thing_i_have])[0] is less surprising to me. I recommend using that style both in documentation and source code.

Copy link
Contributor

@jsirois jsirois Apr 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comma is subtle. Perhaps .pop() as the idiom is a compromise, but it would seem nicer to assert a single item which the tuple unpack does. Maybe wrapping this case up in an API pulls its weight and scheduler.single_product_request(ThingINeed, [thing_i_have]) should be a thing (I think it was at some point?). Maybe request and request_single would be better names - the product bit is perhaps redundant - the only thing you can request is a product.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's always pants.util.collections.assert_single_element()! That is what I have used for this exact purpose.

src/python/pants/engine/README.md Outdated Show resolved Hide resolved
src/python/pants/engine/README.md Show resolved Hide resolved
src/python/pants/engine/README.md Outdated Show resolved Hide resolved
src/python/pants/engine/README.md Outdated Show resolved Hide resolved
@stuhood
Copy link
Member Author

stuhood commented Apr 21, 2019

Resolved a few points: thank you. Please take another look!


The return value of an `@rule` is known as a `Product`. At some level, you can think
of `(product_type, params_set)` as a "key" that uniquely identifies a particular `Product` value
and `@rule` execution. If an `@rule` is able to produce a `Product` without consuming any `Params`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence confused me because I thought it was closely related to the prior sentence. I recommend adding a transition word like Further, if an @rule.

The return value of an `@rule` is known as a `Product`. At some level, you can think
of (`params_set`, `product_type`) as a "key" that uniquely identifies a particular `Product` value
and `@rule` execution. And as expected, if an `@rule` is able to produce a `Product` without consuming
any `Params`, then the `@rule` will run exactly once, and the value that it produces will be a singleton.

#### Example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rereading this again, I echo this suggestion. The example doesn't seem closely related to the actual topic of Rule ambiguity.

In practical use, using basic types like `StringType` or `IntType` does not provide enough
information to disambiguate between various types of data. So declaring small `datatype`
definitions to provide a unique and descriptive type is strongly recommended:
In practical use, builtin types like `str` or `int` do not provide enough information to disambiguate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps to create compound types and to disambiguate between various types of data? I do think the idea of collections is an important one to explain.

When reading this, I wasn't immediately thinking "huh how would I make a list of ints?" But that's an important thing to know it's possible, so I think it's worth proactively mentioning it.

`tests/python/pants_test/test_base.py` to generate and execute a scheduler from a given set of
rules.
The recommended way to install `@rules` is to return them as a list from a `def rules()` definition
in a plugin's `register.py` file. Unit tests can either invoke `@rules` with fully mocked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short example would be helpful of what the def rules() looks like. You don't need to show the implementation of the rules, just the signatures. Something like this.

class List:

  @rule(List, [Console, List.Options, Specs])
  def list_targets(console, list_options, specs):
    ...
  
  def rules():
    return [
      list_targets,
    ]

I'd think of this example like an ~integration example. You showed above how the building blocks work like datatype and @rule. Here, you show how it all comes together.

@@ -240,7 +231,7 @@ Work stalled on the later phases of the `RoundEngine` and talks re-booted about
it stood and proposed the idea of a "tuple-engine". With some license taken in representation, this
idea took the `RoundEngine` to the extreme of generating a round for each target-task pair. The
pair formed the tuple of schedulable work and this concept combined with others to form the design
[here][tuple-design].
[here][https://docs.google.com/document/d/1OARyIZSnw6XQiPlMydi57l_tS_JbFTJH6KLX61kPInI/edit?usp=sharing].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@@ -674,38 +674,30 @@ impl<'t> GraphMaker<'t> {
if satisfiable_entries.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these Rust changes be split out into a separate PR, so that this one becomes solely a doc update?

`Subject` values as it runs (see the section on `Get` requests below). Because the subject for
an `@rule` is chosen by callers, a `Subject` can be of any (hashable) type that a user might want
to compute a product for.
When an `@rule` runs, the set of `Param`s that it transitively consumes make up the unique identity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this read as?: When an "at-rule"
Otherwise it seems it should be: When a @rule - and if so there are several other an @rule

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do read these as "at-rule", in general. Part of that is to distinguish them from rules in other systems.

an `@rule` is chosen by callers, a `Subject` can be of any (hashable) type that a user might want
to compute a product for.
When an `@rule` runs, the set of `Param`s that it transitively consumes make up the unique identity
for that instance of the `@rule`. Any hashable type with useful equality may be used as a `Param`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transitively? Is that true? If it is true is it useful to communicate here? If @rules behave like pure functions, only the Params the rule directly consumes should make up its identity. Likewise for Get requests which are just requests to execute an unknown-named rule.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transitive bit is somewhat true: it looks like you commented on the previous revision though: I think I've made it slightly clearer now.

And yes, they are pure functions, but we begin memoization of the execution of the @rule before we know what its arguments will be: at that point we have only the Params that will be used to compute its arguments (and which we have determined are necessary via rule graph compilation).

I looked briefly at memoizing using the arguments: there is more info here: #6845 ... still an open optimization question.

The return value of an `@rule` is known as a `Product`. At some level, you can think
of (`params_set`, `product_type`) as a "key" that uniquely identifies a particular `Product` value
and `@rule` execution. And as expected, if an `@rule` is able to produce a `Product` without consuming
any `Params`, then the `@rule` will run exactly once, and the value that it produces will be a singleton.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of the comment above, if its true rules just behave like pure functions, alot of specialized terminology introduced here / by names in the codebase - could be explained in a short-circuited way for the portion of the audience familiar with all this.

any other dependencies. On the other hand, if the type _doesn't_ match, the engine doesn't give up:
it will next look for any other registered `@rule`s that can compute an IntType Product for the
Subject (and so on, recursively).
When the engine encounters this `@rule` while compiling the rule graph for `str`-producing-`@rules`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once more on the pure function theme. Given that rules are just pure functions, there are really just two interesting bits:

  1. How does the engine decide to call a function (you address a bit of this here).
  2. How does the user call a function (Get).

@@ -17,63 +17,69 @@ Once the engine is instantiated with a valid set of `@rule`s, a caller can synch
computation of any of the product types provided by those `@rule`s by calling:

```python
# Request a ThingINeed (a `Product`) for the thing_i_have (a `Subject`).
# Request a ThingINeed (a `Product`) for a thing_i_have (a `Param`).
thing_i_need, = scheduler.product_request(ThingINeed, [thing_i_have])
Copy link
Contributor

@jsirois jsirois Apr 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comma is subtle. Perhaps .pop() as the idiom is a compromise, but it would seem nicer to assert a single item which the tuple unpack does. Maybe wrapping this case up in an API pulls its weight and scheduler.single_product_request(ThingINeed, [thing_i_have]) should be a thing (I think it was at some point?). Maybe request and request_single would be better names - the product bit is perhaps redundant - the only thing you can request is a product.

of `Params`. For example, if there was an `@rule` that could produce an `int` without consuming any
`Params` at all (ie, a singleton), then that `@rule` would always be chosen first. If all `@rules` to
produce `int`s required at least one `Param`, then the engine would next see whether the input `Params`
contained an `int`, or whether there were any `@rules` that required only one `Param`, then two
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence buries the lede a bit: you start by talking about needing to produce an int from Params, but here you imply that the int itself can be a Param.

Does the output of some other rule count as a Param, or are Params just the things that are injected into the boundary of the graph "from outside"?

In other words, if I have rule1 that takes a Foo and returns a Bar, and then rule2 that takes a Bar and returns a Baz, is it correct to refer to the Foo as a Param? What about the Bar?

Sorry to keep banging on this, but this README is going to be extremely useful and important, so it's best to make sure it's crystal clear.


The engine executes your `@rule`s in order to (recursively) compute a `Product` of the requested
type for a given `Subject`. This recursive type search leads to a very loosely coupled (and yet
type for a set of `Param`s. This recursive type search leads to a loosely coupled (and yet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give formal definitions of Product and Param here? So that there's something to hang the subsequent explanation and examples off?

@@ -17,63 +17,69 @@ Once the engine is instantiated with a valid set of `@rule`s, a caller can synch
computation of any of the product types provided by those `@rule`s by calling:

```python
# Request a ThingINeed (a `Product`) for the thing_i_have (a `Subject`).
# Request a ThingINeed (a `Product`) for a thing_i_have (a `Param`).
thing_i_need, = scheduler.product_request(ThingINeed, [thing_i_have])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's always pants.util.collections.assert_single_element()! That is what I have used for this exact purpose.

Graphviz:
To help visualize executions, the engine can render both the static rule graph that is compiled
on startup, and also the content of the `Graph` that is produced while `@rules` run. This generates
`dot` files that can be rendered using Graphviz:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could link to #7024 or #7509 here.

stuhood added a commit that referenced this pull request May 7, 2019
### Problem

While iterating in another branch, I noticed that a shadowed/unreachable `@rule` was not triggering an error. Most likely this issue was introduced when we began monomorphizing `@rules`.

### Solution

Fix the error for unreachable `@rules`. Additionally, backport a simplification to the `@rule` selection algorithm from #7600 which makes it significantly easier to describe.

### Result

Better error messages when `@rules` are shadowed or otherwise unreachable.
@Eric-Arellano
Copy link
Contributor

Does engine.selectors.Get need to be updated to longer say Subject anymore too in its datatype and its docstring?

class Get(datatype([
('product', _type_field),
('subject_declared_type', _type_field),
'subject',
])):
"""Experimental synchronous generator API.
May be called equivalently as either:
# verbose form: Get(product_type, subject_declared_type, subject)
# shorthand form: Get(product_type, subject_type(subject))
"""

@stuhood
Copy link
Member Author

stuhood commented Jul 30, 2019

I'd also like to get #7679 (review) incorporated here as a "how to test @rules" section.

Copy link
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

# Delete this line to force a full CI run for documentation-only changes.
[ci skip]  # Documentation-only change.
@stuhood
Copy link
Member Author

stuhood commented Aug 2, 2019

Thank you everyone for the useful feedback: unfortunately I don't have the time to invest in continuing to edit this right now, but I've needed to refer folks to it recently to help explain v2. I'm going to land as is with @benjyw 's shipit, and hope that I can loop back around to improve it soon.

@stuhood stuhood merged commit cdb2207 into pantsbuild:master Aug 2, 2019
@stuhood stuhood deleted the stuhood/params-docs branch August 2, 2019 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants