-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update engine README for Params #7600
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,63 +17,74 @@ Once the engine is instantiated with a valid set of `@rule`s, a caller can synch | |
computation of any of the product types provided by those `@rule`s by calling: | ||
|
||
```python | ||
# Request a ThingINeed (a `Product`) for the thing_i_have (a `Subject`). | ||
# Request a ThingINeed (a `Product`) for a thing_i_have (a `Param`). | ||
thing_i_need, = scheduler.product_request(ThingINeed, [thing_i_have]) | ||
``` | ||
|
||
The engine then takes care of concurrently executing all dependencies of the matched `@rule`s to | ||
produce the requested value. | ||
|
||
### Products and Subjects | ||
### Products and Params | ||
|
||
The engine executes your `@rule`s in order to (recursively) compute a `Product` of the requested | ||
type for a given `Subject`. This recursive type search leads to a very loosely coupled (and yet | ||
type for a set of `Param`s. This recursive type search leads to a loosely coupled (and yet | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we give formal definitions of Product and Param here? So that there's something to hang the subsequent explanation and examples off? |
||
still statically checked) form of dependency injection. | ||
|
||
When an `@rule` runs, it runs for a particular `Subject` value, which is part of the unique | ||
identity for that instance of the `@rule`. An `@rule` can request dependencies for different | ||
`Subject` values as it runs (see the section on `Get` requests below). Because the subject for | ||
an `@rule` is chosen by callers, a `Subject` can be of any (hashable) type that a user might want | ||
to compute a product for. | ||
When an `@rule` runs, it requires a set of `Param`s that the engine has determined are needed | ||
to compute its transitive `@rule` dependencies. So although an `@rule` might not have a particular | ||
`Param` type in its signature, it might depend on another `@rule` that does need that `Param`, and | ||
would thus need that `Param` in order to run. To see which `Params` the engine needs to run each | ||
`@rule`, refer to the `Visualization` section below. | ||
|
||
The return value of an `@rule` for a particular `Subject` is known as a `Product`. At some level, | ||
you can think of (`subject_value`, `product_type`) as a "key" that uniquely identifies a particular | ||
Product value and `@rule` execution. | ||
Any hashable type with useful equality may be used as a `Param`, and additional `Params` can be | ||
provided to an `@rule`'s dependencies via `Get` requests (see below). Each `Param` value in a set | ||
of `Params` is unique by type, so if `@rules` recursively introduce a particular `Param` type, | ||
there will still only be one value for that type in each `@rule`, but it will change as you move | ||
deeper into the dependency graph. | ||
|
||
The return value of an `@rule` is known as a `Product`. At some level, you can think | ||
stuhood marked this conversation as resolved.
Show resolved
Hide resolved
|
||
of `(product_type, params_set)` as a "key" that uniquely identifies a particular `Product` value | ||
and `@rule` execution. If an `@rule` is able to produce a `Product` without consuming any `Params`, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sentence confused me because I thought it was closely related to the prior sentence. I recommend adding a transition word like |
||
then the `@rule` will run exactly once, and the value that it produces will be a singleton. | ||
|
||
#### Example | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great! This is really helpful. We might want to move the ambiguity information to a new subsection called There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rereading this again, I echo this suggestion. The example doesn't seem closely related to the actual topic of |
||
|
||
As a very simple example, you might register the following `@rule` that can compute a `String` | ||
Product given a single `Int` input. | ||
Product given a single `Int` argument. | ||
|
||
```python | ||
@rule(StringType, [IntType]) | ||
@rule(str, [int]) | ||
def int_to_str(an_int): | ||
return '{}'.format(an_int) | ||
return str(an_int) | ||
``` | ||
|
||
The first argument to the `@rule` decorator is the Product (ie, return) type for the `@rule`. The | ||
second argument is a list of parameter selectors that declare the types of the input parameters for | ||
the `@rule`. In this case, because the Product type is `StringType` and there is one parameter | ||
selector (for `IntType`), this `@rule` represents a conversion from `IntType` to `StrType`, with no | ||
other inputs. | ||
The first argument to the `@rule` decorator is the `Product` (ie, return) type for the `@rule`. The | ||
second argument is a list of "parameter selectors" that declare the types of the input parameters for | ||
stuhood marked this conversation as resolved.
Show resolved
Hide resolved
|
||
the `@rule`. In this case, because the `Product` type is `str` and there is one parameter | ||
selector (for `int`), this `@rule` represents a conversion from `int` to `str`, with no other inputs. | ||
|
||
When the engine encounters this `@rule` while compiling the rule graph for `str`-producing-`@rules`, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Once more on the pure function theme. Given that rules are just pure functions, there are really just two interesting bits:
|
||
it will next go hunting for the dependency `@rule` that can produce an `int` using the fewest number | ||
of `Params`. For example, if there was an `@rule` that could produce an `int` without consuming any | ||
`Params` at all (ie, a singleton), then that `@rule` would always be chosen first. If all `@rules` to | ||
produce `int`s required at least one `Param`, then the engine would next see whether the input `Params` | ||
contained an `int`, or whether there were any `@rules` that required only one `Param`, then two | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sentence buries the lede a bit: you start by talking about needing to produce an int from Params, but here you imply that the int itself can be a Param. Does the output of some other rule count as a Param, or are Params just the things that are injected into the boundary of the graph "from outside"? In other words, if I have Sorry to keep banging on this, but this README is going to be extremely useful and important, so it's best to make sure it's crystal clear. |
||
`Params`, and so on. | ||
|
||
When the engine statically checks whether it can use this `@rule` to create a string for a | ||
Subject, it will first see whether there are any ways to get an IntType for that Subject. If | ||
the subject is already of `type(subject) == IntType`, then the `@rule` will be satisfiable without | ||
any other dependencies. On the other hand, if the type _doesn't_ match, the engine doesn't give up: | ||
it will next look for any other registered `@rule`s that can compute an IntType Product for the | ||
Subject (and so on, recursively). | ||
In cases where this search detects any ambiguity (generally because there are two or more `@rules` that | ||
can provide the same product with the same number of parameters), rule graph compilation will fail with | ||
a useful error message. | ||
|
||
### Datatypes | ||
|
||
In practical use, using basic types like `StringType` or `IntType` does not provide enough | ||
information to disambiguate between various types of data. So declaring small `datatype` | ||
definitions to provide a unique and descriptive type is strongly recommended: | ||
In practical use, builtin types like `str` or `int` do not provide enough information to disambiguate | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the issue not providing enough information, or instead you can't easily compose complex data types like a Currently this suggests it's wrong to ever use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, well after reading the example maybe both reasons are cause to use a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, it's both. A feature, not a bug =) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps When reading this, I wasn't immediately thinking "huh how would I make a list of ints?" But that's an important thing to know it's possible, so I think it's worth proactively mentioning it. |
||
between various types of data in `@rule` signatures, so declaring small `datatype` definitions to | ||
provide a unique and descriptive type is highly recommended: | ||
|
||
```python | ||
class FormattedInt(datatype(['content'])): pass | ||
|
||
@rule(FormattedInt, [IntType]) | ||
@rule(FormattedInt, [int]) | ||
def int_to_str(an_int): | ||
return FormattedInt('{}'.format(an_int)) | ||
stuhood marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
@@ -105,29 +116,32 @@ class TypedDatatype(datatype([('field_name', Exactly(str, int))])): | |
``` | ||
|
||
Assigning a specific type to a field can be somewhat unidiomatic in Python, and may be unexpected or | ||
unnatural to use. Additionally, the engine already applies a form of implicit type checking by | ||
ensuring there is a unique path from subject to product when a product request is made. However, | ||
regardless of whether the object is created directly with type-checked fields or whether it's | ||
produced from a set of rules by the engine's dependency injection, it is extremely useful to | ||
formalize the assumptions made about the value of an object into a specific type, even if the type | ||
just wraps a single field. The `datatype()` function makes it simple and efficient to apply that | ||
strategy. | ||
unnatural to use. However, regardless of whether the object is created directly with type-checked | ||
fields or whether it's produced from a set of rules by the engine's dependency injection, it is | ||
extremely useful to formalize the assumptions made about the value of an object into a specific type, | ||
even if the type just wraps a single field. The `datatype()` function makes it simple and efficient | ||
to apply that strategy. | ||
|
||
stuhood marked this conversation as resolved.
Show resolved
Hide resolved
|
||
### Parameter selectors and Gets | ||
### Gets and RootRules | ||
|
||
As demonstrated above, parameter selectors select `@rule` inputs in the context of a particular | ||
`Subject` (and its `Variants`: discussed below). But it is frequently necessary to "change" the | ||
subject and request products for subjects other than the one that the `@rule` is running for. | ||
As demonstrated above, parameter selectors select `@rule` arguments in the context of a set of `Params`. | ||
But where do `Params` come from? | ||
|
||
In cases where this is necessary, `@rule`s may be written as coroutines (ie, using the python | ||
`yield` statement) that yield "`Get` requests" that request products for other subjects. Just like | ||
`@rule` parameter selectors, `Get` requests instantiated in the body of an `@rule` are statically | ||
checked to be satisfiable in the set of installed `@rule`s. | ||
One source of `Params` is the root of a request, where a `Param` type that may be provided by a caller | ||
of the engine can be declared using a `RootRule`. Installing a `RootRule` is sometimes necessary to | ||
seal the rule graph in cases where a `Param` could only possibly be computed outside of the rule graph | ||
and then passed in. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand this. I think it would help to have an example specific to the idea of a |
||
|
||
The second case for introducing new `Params` occurs within the running graph when an `@rule` needs | ||
to pass values to its dependencies that are necessary to compute a product. In this case, `@rule`s may | ||
be written as coroutines (ie, using the python `yield` statement) that yield "`Get` requests" that request | ||
products for other `Params`. Just like `@rule` parameter selectors, `Get` requests instantiated in the | ||
body of an `@rule` are statically checked to be satisfiable in the set of installed `@rule`s. | ||
|
||
#### Example | ||
|
||
For example, you could declare an `@rule` that requests FileContent for each entry in a Files list, | ||
and then concatentates that content into a (typed) string: | ||
and then concatentates that content into a (datatype-wrapped) string: | ||
|
||
```python | ||
@rule(ConcattedFiles, [Files]) | ||
|
@@ -136,27 +150,27 @@ def concat(files): | |
yield ConcattedFiles(''.join(fc.content for fc in file_content_list)) | ||
``` | ||
|
||
This `@rule` declares that: "for any Subject for which we can compute `Files`, we can also compute | ||
`ConcattedFiles`". Each yielded `Get` request results in FileContent for a different File Subject | ||
from the Files list. | ||
This `@rule` declares that: "for any `Params` for which we can compute `Files`, we can also compute | ||
`ConcattedFiles`". Each yielded `Get` request results in FileContent for a different File `Param` | ||
from the Files list. And, happily, all of these requests can proceed in parallel. | ||
|
||
### Advanced Param Usage | ||
stuhood marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Variants | ||
Sometimes `@rule`s will need to consume multiple `Params` in order to tailor their output Products | ||
to their consumers. | ||
|
||
Certain `@rule`s will also need parameters provided by their dependents in order to tailor their output | ||
Products to their consumers. For example, a javac `@rule` might need to know the version of the java | ||
platform for a given dependent binary target (say Java 9), or an ivy `@rule` might need to identify a | ||
globally consistent ivy resolve for a test target. To allow for this the engine introduces the | ||
concept of `Variants`, which are passed recursively from dependents to dependencies. | ||
For example, a javac `@rule` might need to know the version of the java platform for a given | ||
dependent binary target, or an ivy `@rule` might need to identify a globally consistent ivy resolve | ||
for a test target. In both of these cases, the `@rule` requires two `Params` to be in scope. But | ||
due to the fact that `Params` are implicitly propagated from dependents to dependencies, it's possible | ||
for these `Params` to be provided much higher in the graph, without intermediate `@rules` needing to | ||
be aware of them. | ||
|
||
If a Rule uses a `SelectVariants` Selector to indicate that a variant is required, consumers can use | ||
a `@[type]=[name]` address syntax extension to pass a variant that matches a particular configuration | ||
for a `@rule`. A dependency declared as `src/java/com/example/lib:lib` specifies no particular variant, but | ||
`src/java/com/example/lib:lib@java=java8` asks for the configured variant of the lib named "java8". | ||
The result would be that any subgraph that transitively consumed a `Param` to produce Java 11 (for | ||
example) would be safely isolated and distinct from one that produced Java 9. | ||
|
||
Additionally, it is possible to specify the "default" variants for an Address by installing an `@rule` | ||
function that can provide `Variants(default=..)`. Since the purpose of variants is to collect | ||
information from dependents, only default variant values which have not been set by a dependent | ||
will be used. | ||
_(This section needs an example, but that will have to wait for | ||
[#7490](https://github.com/pantsbuild/pants/issues/7490)!)_ | ||
|
||
## Internal API | ||
|
||
|
@@ -168,44 +182,32 @@ To compute a value for a Node, the engine uses the `Node.run` method starting fr | |
roots. If a Node needs more inputs, it requests them via `Context.get`, which will declare a | ||
dependency, and memoize the computation represented by the requested `Node`. | ||
|
||
The initial Nodes are [launched by the engine](https://github.com/pantsbuild/pants/blob/16d43a06ba3751e22fdc7f69f009faeb59a33930/src/rust/engine/src/scheduler.rs#L116-L126), | ||
but the rest of execution is driven by Nodes recursively calling `Context.get` to request their | ||
dependencies. | ||
This recorded `Graph` tracks all dependencies between `@rules` and builtin "intrinsic" rules that | ||
provide filesystem and network access. That dependency tracking allows for invalidation and dirtying | ||
of `Nodes` as their dependencies change. | ||
|
||
### Registering Rules | ||
## Registering Rules | ||
|
||
Currently, it is only possible to load rules into the pants scheduler in two ways: by importing and | ||
using them in `src/python/pants/bin/engine_initializer.py`, or by adding them to the list returned | ||
by a `rules()` method defined in `src/python/backend/<backend_name>/register.py`. Plugins cannot add | ||
new rules yet. Unit tests, however, can mix in `TestBase` from | ||
`tests/python/pants_test/test_base.py` to generate and execute a scheduler from a given set of | ||
rules. | ||
The recommended way to install `@rules` is to return them as a list from a `def rules()` definition | ||
in a plugin's `register.py` file. Unit tests can either invoke `@rules` with fully mocked | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Short example would be helpful of what the class List:
@rule(List, [Console, List.Options, Specs])
def list_targets(console, list_options, specs):
...
def rules():
return [
list_targets,
] I'd think of this example like an ~integration example. You showed above how the building blocks work like |
||
dependencies via `pants_test.engine.util.run_rule`, or extend `pants_test.test_base.TestBase` to | ||
construct and execute a scheduler for a given set of rules. | ||
|
||
In general, there are two types of rules that you can define: | ||
|
||
1. an `@rule`, which has a single product type and selects its inputs as described above. | ||
2. a `RootRule`, which declares a type that can be used as a *subject*, which means it can be | ||
provided as an input to a `product_request()`. | ||
|
||
In more depth, a `RootRule` for some type is required when no other rule might provide that | ||
type (i.e. it is not provided as the product of any `@rule`) in some context. In the absence of a | ||
`RootRule`, any subject type involved in a request "at runtime" (i.e. via `product_request()`), | ||
would show up as an an unused or impossible path in the rule graph. Another potential name for | ||
`RootRule` might be `ParamRule`, or something similar, as it can be thought of as saying that the | ||
type represents a sort of "public API entrypoint" via a `product_request()`. | ||
|
||
Note that `Get` requests do not require a `RootRule`, as their requests are statically verified when | ||
the `@rule` definition is parsed, so we know before runtime that they might be requested. | ||
2. a `RootRule`, which declares a type that a caller of the engine may provide as a `Param` in a | ||
call to `Scheduler.product_request(..)` (ie, at the "root" of the graph). | ||
|
||
This interface is being actively developed at this time and this documentation may be out of | ||
date. Please feel free to file an issue or pull request if you notice any outdated or incorrect | ||
information in this document! | ||
|
||
## Execution | ||
## Visualization | ||
|
||
The engine executes work concurrently wherever possible; to help visualize executions, a visualization | ||
tool is provided that, after executing a `Graph`, generates a `dot` file that can be rendered using | ||
Graphviz: | ||
To help visualize executions, the engine can render both the static rule graph that is compiled | ||
on startup, and also the content of the `Graph` that is produced while `@rules` run. This generates | ||
`dot` files that can be rendered using Graphviz: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
```console | ||
$ mkdir viz | ||
|
@@ -214,17 +216,6 @@ $ ls viz | |
run.0.dot | ||
``` | ||
|
||
## Native Engine | ||
|
||
The native engine is integrated into the pants codebase via `native.py` in | ||
this directory along with `build-support/bin/native/bootstrap.sh` which ensures a | ||
pants native engine library is built and available for linking. The glue is the | ||
sha1 hash of the native engine source code used as its version by the `Native` | ||
class. This hash is maintained by `build-support/bin/native/bootstrap.sh` and | ||
output to the `native_engine_version` file in this directory. Any modification | ||
to this resource file's location will need adjustments in | ||
`build-support/bin/native/bootstrap.sh` to ensure the linking continues to work. | ||
|
||
## History | ||
|
||
The need for an engine that could schedule all work as a result of linking required products to | ||
stuhood marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
@@ -240,7 +231,7 @@ Work stalled on the later phases of the `RoundEngine` and talks re-booted about | |
it stood and proposed the idea of a "tuple-engine". With some license taken in representation, this | ||
idea took the `RoundEngine` to the extreme of generating a round for each target-task pair. The | ||
pair formed the tuple of schedulable work and this concept combined with others to form the design | ||
[here][tuple-design]. | ||
[here][https://docs.google.com/document/d/1OARyIZSnw6XQiPlMydi57l_tS_JbFTJH6KLX61kPInI/edit?usp=sharing]. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for fixing! |
||
|
||
Meanwhile, need for fine-grained parallelism was acute to help speed up jvm compilation, especially | ||
in the context of scala and mixed scala & java builds. Twitter spiked on a project to implement | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comma after
thing_i_need
a typo?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No: it's unpacking a single result from a list. I feel like I like this syntax better than:
... but if we want to avoid this, we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thing_i_need = scheduler.product_request(ThingINeed, [thing_i_have])[0]
is less surprising to me. I recommend using that style both in documentation and source code.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comma is subtle. Perhaps
.pop()
as the idiom is a compromise, but it would seem nicer to assert a single item which the tuple unpack does. Maybe wrapping this case up in an API pulls its weight andscheduler.single_product_request(ThingINeed, [thing_i_have])
should be a thing (I think it was at some point?). Mayberequest
andrequest_single
would be better names - the product bit is perhaps redundant - the only thing you can request is a product.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's always
pants.util.collections.assert_single_element()
! That is what I have used for this exact purpose.