Update documentation for @rule.

pantsbuild · Mar 31, 2018 · 56dd7bd · 56dd7bd
1 parent 390675a
commit 56dd7bd
Showing 1 changed file with 121 additions and 105 deletions.
diff --git a/src/python/pants/engine/README.md b/src/python/pants/engine/README.md
@@ -1,145 +1,138 @@
-# The New Engine
+# The (New) Engine
 
-## Scheduling
+## API
 
-In the current RoundEngine, work is scheduled and then later performed via the `Task` interface. In
-the new engine execution occurs via simple functions, with inputs selected via an input
-selection clause made up of `Selector` objects (described later).
+The end user API for the engine is based on the registration of `@rules`, which are functions
+or coroutines with statically declared inputs and outputs. A Pants (plugin) developer can write
+and install additional `@rule`s to extend the functionality of Pants.
 
-## History
-
-The need for an engine that could schedule all work as a result of linking required products to
-their producers in multiple rounds was identified sometime in the middle of 2013 as a result of
-new requirements on the `IdeaGen` task forced by the use of pants in the Twitter birdcage repo.  The
-design document for this "RoundEngine" is
-[here](https://docs.google.com/document/d/1MwOFcr4W6KbzPdbaj_ntJ36a0NRoiKyWLed0ziobsr4/edit#heading=h.rsohbvtm7zng).
-Some work was completed along these lines and an initial version of the `RoundEngine` was
-integrated into the pants mainline and is used today.
+The set of installed `@rule`s is statically checked as a closed world: this compilation step occurs
+on startup, and identifies all unreachable or unsatisfiable rules before execution begins. This
+allows most composition errors to be detected immediately, and also provides for easy introspection
+of the build. To inspect the set of rules that are installed and which product types can be
+computed, you can pass the `--native-engine-visualize-to=$dir` flag, which will write out a graph
+of reachable `@rules`.
 
-Work stalled on the later phases of the `RoundEngine` and talks re-booted about the future of the
-`RoundEngine`.  Foursquare folks had been thinking about general problems with the `RoundEngine` as
-it stood and proposed the idea of a "tuple-engine".  With some license taken in representation, this
-idea took the `RoundEngine` to the extreme of generating a round for each target-task pair.  The
-pair formed the tuple of schedulable work and this concept combined with others to form the design
-[here][tuple-design].
+Once the engine is instantiated with a valid set of `@rules`, a caller can synchronously request
+computation of any of the product types provided by those `@rules` by calling:
 
-Meanwhile, need for fine-grained parallelism was acute to help speed up jvm compilation, especially
-in the context of scala and mixed scala & java builds.  Twitter spiked on a project to implement
-a target-level scheduling system scoped to just the jvm compilation tasks.  This bore fruit and
-served as further impetus to get a "tuple-engine" designed and constructed to bring the benefits
-seen in the jvm compilers to the wider pants world of tasks.
-
-### API
-
-#### End User API
-
-The end user API for the engine is based on the registration of `Rules`, which are made up of:
-
-1. a `Product` or return type of a function,
-2. a list of dependency `Selectors` which match inputs to the function,
-3. the function itself.
-
-A `Rule` fully declares the inputs and outputs for its function: there is no imperative API for
-requesting additional inputs during execution of a function. While a tight constraint,
-this has the advantage of forcing decomposition of work into functions which are loosely
-coupled by only the types of their inputs and outputs, and which are naturally isolated, cacheable,
-and parallelizable.
-
-A function is guaranteed to execute only when all of its inputs are ready for use. The Scheduler
-considers executing a Rule when it determines that it needs to produce the declared
-output `Product` type of that function for a particular `Subject`. But the Scheduler will only
-actually run a Rule if it is able to (recursively) find sources for each of the
-function's inputs.
+```python
+# Request a ThingINeed (a `Product`) for the thing_i_have (a `Subject`).
+thing_i_need, = scheduler.product_request(ThingINeed, [thing_i_have])
+```
 
-See below for more information on `Products`, `Subjects`, and `Selectors`.
+The engine then takes care of concurrently executing all dependencies of the matched `@rules` to
+produce the requested value.
 
-#### Internal API
+### Products and Subjects
 
-Internally, the `Scheduler` uses end user `Rules` to create private `Node` objects and
-build a `Graph` of futures that links them to their dependency Nodes. A Node represents a unique
-computation and the data for a Node implicitly acts as its own key/identity.
+The return value of an `@rule` for a particular `Subject` is known as a `Product`. At some level, you
+can think of (`subject_value`, `product_type`) as a "key" that uniquely identifies a particular
+Product value. The engine executes your `@rules` in order to (recursively) compute a Product of the
+requested type for a given Subject.
 
-To compute a value for a Node, the Scheduler uses the `Node.run` method starting from requested
-roots. If a Node needs more inputs, it requests them via `Context.get`, which will declare a
-dependency, and memoize the computation represented by the `Node`.
+This recursive type search leads to a very loosely coupled (and yet still statically checked) form
+of dependency injection.
 
-The initial Nodes are [launched by the scheduler](https://github.com/pantsbuild/pants/blob/16d43a06ba3751e22fdc7f69f009faeb59a33930/src/rust/engine/src/scheduler.rs#L116-L126),
-but the rest of the scheduling is driven by Nodes recursively calling `Context.get` to request
-dependencies.
+#### Example
 
-### Products and Subjects
+As a very simple example, you might register the following `@rule` that can compute a `String`
+Product given a single `Int` input.
 
-A `Product` is a strongly typed value specific to a particular `Subject`. End user Rules execute
-in order to (recursively) compute a Product for a Subject: as a very simple example, one might
-register the following Rule that can compute a `String` Product given a single `Int` input
-by calling the `str` function:
+```python
+@rule(StringType, [Select(IntType)])
+def int_to_str(an_int):
+  return '{}'.format(an_int)
+```
 
-    @rule(StringType, [Select(IntType)])
-    def int_to_str(an_int):
-      return str(an_int)
+The first argument to the `@rule` decorator is the Product (ie, return) type for the @rule. The
+second argument is a list of `Selectors` that declare the types of the input arguments to the
+`@rule`. In this case, because the Product type is `StringType` and there is one `Selector`
+(`Select(IntType)`), this `@rule` represents a conversion from `IntType` to `StrType`, with no
+other inputs.
 
-When the Scheduler wants to decide whether it can use this Rule to create a string for a
+When the engine statically checks whether it can use this `@rule` to create a string for a
 Subject, it will first see whether there are any ways to get an IntType for that Subject. If
-the subject is already of `type(subject) == IntType`, then the Rule will be able to
-execute immediately. On the other hand, if the type _doesn't_ match, the Scheduler doesn't give up:
-it will next look for any other registered Rules that can compute an IntType Product for the
+the subject is already of `type(subject) == IntType`, then the @rule will be satisfiable without
+any other depenencies. On the other hand, if the type _doesn't_ match, the engine doesn't give up:
+it will next look for any other registered @rules that can compute an IntType Product for the
 Subject (and so on, recursively.)
 
-This recursive type search leads to some very interesting (and, admittedly, somewhat "magical")
-properties. If there is any path through the Rule graph that allows for conversion
-from one type to another, it will be found and executed.
+In practical use, using basic types like `StringType` or `IntType` does not provide enough
+information to disambiguate between various types of data: So declaring small `datatype`
+definitions to provide a unique and descriptive type is strongly recommended:
+
+```python
+class FormattedInt(datatype('FormattedInt', ['content'])): pass
 
-### Selectors
+@rule(FormattedInt, [Select(IntType)])
+def int_to_str(an_int):
+  return FormattedInt('{}'.format(an_int))
+```
+
+### Selectors and Gets
 
-As demonstrated above, the `Selector` classes select function inputs in the context of a particular
-`Subject` (and its `Variants`: discussed below). For example, it might select a `Product` for the given
-Subject (`Select`), or for other Subject(s) selected from fields of a Product (`SelectDependencies`,
-`SelectProjection`).
+As demonstrated above, the `Selector` classes select `@rule` inputs in the context of a particular
+`Subject` (and its `Variants`: discussed below). But it is frequently necessary to "change" the
+subject and request products for subjects other than the one that the `@rule` is running for.
 
-One very important thing to keep in mind is that Selectors like `SelectDependencies` and `SelectProjection`
-"change" the Subject within a particular subgraph. For example, `SelectDependencies`
-results in new subgraphs for each Subject in a list of values that was computed for some original Subject.
-Concretely, a Rule could use SelectDependencies to select FileContent for each entry in a Files list,
-and then concatentate that content into a string:
+In cases where this is necessary, `@rule`s may be written as coroutines (ie, using the python
+`yield` statement) that yield "`Get` requests" that request products for other subjects. Just like
+`@rule` parameter Selectors, `Get` requests instatiated in the body of an `@rule` are statically
+checked to be satisfiable in the set of installed `@rules`.
 
-    @rule(StringType, [SelectDependencies(FileContent, Files)])
-    def concat(file_content_list):
-      return ''.join(fc.content for fc in file_content_list)
+#### Example
 
-This Rule declares that: "for any Subject for which we can compute a 'Files' object, we can also
-compute a StringType". Each subgraph will contain an attempt to get FileContent for a different
-File Subject from the Files list.
+For example, you could declare an `@rule` that requests FileContent for each entry in a Files list,
+and then concatentates that content into a (typed) string:
 
-In practical use, using `StringType` or `IntType` directly would probably not provide enough information
-to disambiguate between various types of data: So declaring small `datatype` definitions to provide
-a unique and descriptive type is strongly recommended:
+```python
+@rule(ConcattedFiles, [Select(Files)])
+def concat(files):
+  file_content_list = yield [Get(FileContent, File(f)) for f in files]
+  yield ConcattedFiles(''.join(fc.content for fc in file_content_list))
+```
 
-    class ConcattedFiles(datatype('ConcattedFiles', ['content'])):
-      pass
+This @rule declares that: "for any Subject for which we can compute `Files`, we can also compute
+`ConcattedFiles`". Each yielded `Get` request results in FileContent for a different File Subject
+from the Files list.
 
 ### Variants
 
-Certain Rules will also need parameters provided by their dependents in order to tailor their output
-Products to their consumers.  For example, a javac planner might need to know
-the version of the java platform for a given dependent binary target (say Java 6), or an ivy Rule
-might need to identify a globally consistent ivy resolve for a test target.  To allow for this the
-engine introduces the concept of `variants`, which are passed recursively from dependents to
-dependencies.
+Certain @rules will also need parameters provided by their dependents in order to tailor their output
+Products to their consumers.  For example, a javac `@rule` might need to know the version of the java
+platform for a given dependent binary target (say Java 9), or an ivy @rule might need to identify a
+globally consistent ivy resolve for a test target.  To allow for this the engine introduces the
+concept of `Variants`, which are passed recursively from dependents to dependencies.
 
 If a Rule uses a `SelectVariants` Selector to indicate that a variant is required, consumers can use
 a `@[type]=[name]` address syntax extension to pass a variant that matches a particular configuration
-for a Rule. A dependency declared as `src/java/com/example/lib:lib` specifies no particular variant, but
+for a `@rule`. A dependency declared as `src/java/com/example/lib:lib` specifies no particular variant, but
 `src/java/com/example/lib:lib@java=java8` asks for the configured variant of the lib named "java8".
 
-Additionally, it is possible to specify the "default" variants for an Address by installing a Rule
-function that can provide `Variants(default=..)`. Again, since the purpose of variants is to collect
+Additionally, it is possible to specify the "default" variants for an Address by installing an @rule
+function that can provide `Variants(default=..)`. Since the purpose of variants is to collect
 information from dependents, only default variant values which have not been set by a dependent
 will be used.
 
+## Internal API
+
+Internally, the engine uses end user `@rules` to create private `Node` objects and
+build a `Graph` of futures that links them to their dependency Nodes. A Node represents a unique
+computation and the data for a Node implicitly acts as its own key/identity.
+
+To compute a value for a Node, the engine uses the `Node.run` method starting from requested
+roots. If a Node needs more inputs, it requests them via `Context.get`, which will declare a
+dependency, and memoize the computation represented by the requested `Node`.
+
+The initial Nodes are [launched by the engine](https://github.com/pantsbuild/pants/blob/16d43a06ba3751e22fdc7f69f009faeb59a33930/src/rust/engine/src/scheduler.rs#L116-L126),
+but the rest of execution is driven by Nodes recursively calling `Context.get` to request their
+dependencies.
+
 ## Execution
 
-The Scheduler executes work concurrently wherever possible; to help visualize executions, a visualization
-tool is provided that, after executing a `ProductGraph`, generates a `dot` file that can be rendered using
+The engine executes work concurrently wherever possible; to help visualize executions, a visualization
+tool is provided that, after executing a `Graph`, generates a `dot` file that can be rendered using
 Graphviz:
 
 ```console
@@ -159,3 +152,26 @@ class. This hash is maintained by `build-support/bin/native/bootstrap.sh` and
 output to the `native_engine_version` file in this directory. Any modification
 to this resource file's location will need adjustments in
 `build-support/bin/native/bootstrap.sh` to ensure the linking continues to work.
+
+## History
+
+The need for an engine that could schedule all work as a result of linking required products to
+their producers in multiple rounds was identified sometime in the middle of 2013 as a result of
+new requirements on the `IdeaGen` task forced by the use of pants in the Twitter birdcage repo.  The
+design document for this "RoundEngine" is
+[here](https://docs.google.com/document/d/1MwOFcr4W6KbzPdbaj_ntJ36a0NRoiKyWLed0ziobsr4/edit#heading=h.rsohbvtm7zng).
+Some work was completed along these lines and an initial version of the `RoundEngine` was
+integrated into the pants mainline and is used today.
+
+Work stalled on the later phases of the `RoundEngine` and talks re-booted about the future of the
+`RoundEngine`.  Foursquare folks had been thinking about general problems with the `RoundEngine` as
+it stood and proposed the idea of a "tuple-engine".  With some license taken in representation, this
+idea took the `RoundEngine` to the extreme of generating a round for each target-task pair.  The
+pair formed the tuple of schedulable work and this concept combined with others to form the design
+[here][tuple-design].
+
+Meanwhile, need for fine-grained parallelism was acute to help speed up jvm compilation, especially
+in the context of scala and mixed scala & java builds.  Twitter spiked on a project to implement
+a target-level scheduling system scoped to just the jvm compilation tasks.  This bore fruit and
+served as further impetus to get a "tuple-engine" designed and constructed to bring the benefits
+seen in the jvm compilers to the wider pants world of tasks.