Design v2 Target API #4535

stuhood · 2017-05-01T18:18:46Z

The current state of the "target" API that is consumed by @rules is very scattered, mostly due to a need for backward compatibility, and the incremental introduction of v2 to the rest of the codebase.

Now that @console_rules have begun to be used more widely, and this API (that was originally only designed to act as adaptors to allow pantsd to consume @rules for BUILD file parsing) is beginning to be used by end users writing @rules, it's time for another round of design.

A previous bit of design was done in https://docs.google.com/document/d/102EFbwk6cpM9-_4ZSYhMQYA0zL1UKbXzW-DCd8KsFeg/edit?usp=sharing, but some of the considerations there are stale: see comments on this ticket, and example usages of these APIs in the v2 python test runner and in v2 list for more information.

A quick explanation of the status quo as of ~~March 2018~~ ~~April 2019~~ August 2019 ~~February 2020:

None of these APIs are public yet, and they're all going to need to change... they came out of the engine experiment, and they support more features than we need. So we need to figure out which bits to prune.

Currently the hierarchy of construction (bottom to top) is:

HydratedStruct - A Struct that has been deserialized from a BUILD file. Currently this type supports inlining other Structs via address references: see this example ...where because the configurations field is expecting to receive Structs, the inline address reference there is expanded and inlined into
TargetAdaptor - A Struct subclass representing a target. So in the case of expanding a legacy build graph, the above step operates on concrete subclasses of TargetAdaptor. A TargetAdaptor still has sources represented as a PathGlobs object in a SourcesField wrapper: ie, the source globs haven't been expanded.
HydratedTarget - An object containing enough information to actually construct the legacy BuildGraph interface (with an EagerFilesetWithSpec object representing fully expanded and fingerprinted globs). As described on Declare v2 products on v1 Tasks #4769 and in the document linked above, this is not the interface we want to expose to anyone, as it is too granular to avoid expanding sources if the usecase doesn't require them.
LegacyHydratedTarget - a HydratedTarget but with a BuildFileAddress rather than an Address, which is needed for V1 to work properly.

The text was updated successfully, but these errors were encountered:

stuhood · 2017-06-07T19:31:27Z

Relates to #4641: people are running into the partial coverage of the previous target API in 1.3.0.

stuhood · 2018-02-06T19:23:04Z

Relates to #3991 ... it's possible that with v1 build file parsing "gone" in master, that approach might be more feasible than before.

stuhood · 2018-03-29T01:23:21Z

We should reevaluate this post #5580: it's possible that @jsirois's idea to make Structs inheritable/mergeable/etc with typed fields is totally excellent once we have easier @rule APIs.

stuhood · 2018-04-12T02:25:14Z

@jsirois : I had another thought here around how to do extensibility without subclassing (which remained an open question in the doc linked in the description).

I think that a somewhat natural way to constrain the types that are legal in a particular field while still allowing for extension, would be to have "named TypeConstraints unions". As an example: for determining which Configuration/Field types are legal on Structs, rule registration could mutate or aggregate a TypeConstraint called "LegalFieldTypes" (or "legal_field_types", depending). @rules which needed to consume "all legal field types" would use that constraint by name... meanwhile, the @rule graph would still need to have @rules installed that could satisfy each of the unioned members of the named constraint... so if you had a yield Get(HydratedField, LegalFieldTypes, x) call, each member of the LegalFieldTypes union would need to have an @rule that could provide a HydratedField.

This "named type unions" concept could also apply to other plugin extension points: @rule authors would add some possible type to a union, and then add an @rule to satisfy it.

stuhood · 2018-06-16T00:46:51Z

This "named type unions" concept could also apply to other plugin extension points: @rule authors would add some possible type to a union, and then add an @rule to satisfy it.

It occurs to me that in order for consuming rules to consume the output of this type of union, they'd need to be oblivious to the union members... and in order for that to happen and still be useful, the union members would probably all need a shared parent class. So... this might actually look a bit like subtyping with a closed universe of subclasses?

stuhood · 2018-09-09T21:55:23Z

This is related to #6449, because the implementation of dep inference relies on implementers being able to inject new Field types and rules to provide them.

I think that all of the sketches described above have begun to solidify in my mind around "subtyping with a closed universe of subclasses". But that will require a bit of refactoring of how unions work now, and I don't think we should start that until #5788 is completed.

So while it is a bit of a yakshave, I'm going to consider this blocked by #5788.

cosmicexplorer · 2018-09-10T17:00:00Z

I'm probably missing something, but:

This "named type unions" concept could also apply to other plugin extension points: @rule authors would add some possible type to a union, and then add an @rule to satisfy it.

I don't understand what the named union approach provides -- the requirement of a closed universe of possible types of input is already achieved by requiring a unique path from subject to product in the rule graph for every subj/prod pair requested.

It occurs to me that in order for consuming rules to consume the output of this type of union, they'd need to be oblivious to the union members... and in order for that to happen and still be useful, the union members would probably all need a shared parent class. So... this might actually look a bit like subtyping with a closed universe of subclasses?

Rules consuming some product A are already oblivious to the path from the subject to A? if we're thinking about requiring some common parent class anyway, it seems like instead we could make A just a normal datatype containing all of the info that rules consuming A would need, and rely on the fact that there must be a single unique path from subject to A for polymorphism? In the case of A = Field, I would think a rule author can make whatever kind of type F they want to represent their fields, and then define an @rule from F to A.

each member of the LegalFieldTypes union would need to have an @rule that could provide a HydratedField

This seems like exactly the guarantee we already have by requiring a unique path from product to subject. I'm looking at the doc linked up top and I cannot understand what the explicit, closed union concept provides over the implicit closed union that the rule graph already provides.

I have no problem with the idea, I just think requiring a unique path in the rule graph is a very simple model to use and debug and I would prefer to use the polymorphism from that already than require anyone to start subclassing anything. The property "this type subclasses this other type" means "this type provides some known set of methods/attrs" -- it seems like that can be achieved by using a datatype A which provides the methods/attrs you require, and then requiring that there be a unique path from whatever type the rule author can think of to A. I don't see what utility further narrowing that has as of now, it seems to make the model more complex. I could very easily be missing something obvious.

stuhood · 2018-09-10T17:06:28Z

Re-reading my own comments, I think that I flip-flopped a few times between discussing "output" types (like HydratedField) and "input" types (like Field / SourcesField / BundleField). It's not the case that the input types need to be subclasses of any particular type, because the @rule needn't actually consume them, and can just pass them from place to place.

Given that confusion, I think I'll probably refresh the design doc for another round of review.

stuhood · 2018-09-10T17:10:52Z

I don't understand what the named union approach provides -- the requirement of a closed universe of possible types of input is already achieved by requiring a unique path from subject to product in the rule graph for every subj/prod pair requested.

@cosmicexplorer : It provides extensibility into existing @rules that might not be aware of all of the types of a union. Concretely, it allows replacing:

if isinstance(x, SourcesField):
  hf = yield Get(HydratedField, SourcesField, x)
elif isinstance(x, BundleField):
  hf = yield Get(HydratedField, BundleField, x)
else:
  ..

with:

hf = yield Get(HydratedField, FieldsUnionType, x)

The former has a hardcoded list of legal fields, and so you cannot add additional Field types (or @rules to satisfy them) without editing that rule.

stuhood · 2018-09-20T20:52:08Z

I rebased the AggregationRule concept out of #6170 before landing it, because it doesn't quite fit this usecase, but was in the same area. Pushed it over here for reference: 25322b1

stuhood · 2019-09-27T19:52:32Z

EDIT: This question is resolved for now: see #8368!

From a discussion with @benjyw in https://pantsbuild.slack.com/archives/C0D7TNJHL/p1569606120265000, my feeling is that there might be a fundamental decision around how to use @union; do we (either, I think?):

allow for "runtime" typechecks against a @union? IE, give @rules the ability to check whether an object is a member of a @union?
add some sort of "default"ing facility to @union, so that if something isn't a member of the type, a different @rule runs.

The fundamental difference between these choices in a case like

pants/src/python/pants/rules/core/test.py

Lines 71 to 73 in 44d4ab5

    
           # NB: This has the effect of "casting" a TargetAdaptor to a member of the TestTarget union. If the 
        
           # TargetAdaptor is not a member of the union, it will fail at runtime with a useful error message. 
        
           result = yield Get(TestResult, TestTarget, target.adaptor)

is that the former would mean "don't request a TestResult if we can't test something", and the latter would mean: "you can always get a TestResult, but sometimes it will be a noop provided by a default implementation".

I'm leaning toward the former, because it's less magical, but it all relates to how strongly typed targets are in general.

benjyw · 2019-09-27T20:22:11Z

Agreed re the former probably being nicer.

illicitonion · 2019-09-30T11:07:24Z

I think we definitely want runtime checks; @console_rules should each have explicit boilerplate to exclude non-matching targets, because they should all probably handle that differently. We can probably provide some helper functions to make it less boilerplatey as the need arises.

cosmicexplorer · 2019-09-30T14:47:22Z

We can probably provide some helper functions to make it less boilerplatey as the need arises.

I think this is the approach we should take for all v2 functionality! It seems to have allowed us to make more progress on this sort of API question recently.

While I would really like to eventually figure out a way to express UnionRule-y relations in a way which somehow doesn't allow for the case where the union relation doesn't exist, since as of right now it seems like there's no clear way to do that, I think choosing specifically to allow the explicit check for now, with an eye to some way to clean that up later, sounds like a great approach.

stuhood · 2019-10-08T17:51:23Z

Closing the loop: #4535 (comment) was resolved in #8368!

stuhood · 2019-10-24T00:27:16Z

This is also related to #7022: it's possible that target definitions (and macros) should themselves be implemented as @rules, which would allow them to declare their dependencies.

stuhood · 2019-10-24T23:24:30Z

Attaching notes from a design discussion today.

The summary of these is that we think that there are roughly two levels to be concerned with:

Parsing: ie, how do I define the universe of legal things that can be parsed from BUILD files into typed Target objects. The notes suggest that:
- we should have two base classes: Field and Target.
  - A Target has a classmember that is a union of its legal Field types (we could maybe generate this synthetically like we do for Goal.Options), and would have a getter target.get(Field) for the instance of a field (or a default? or fields themselves have defaults, so it is infallible?)
  - Field might have a required class method Field.default(Target) that would create a default instance of the Field for that Target type (aka, "default Sources (glob) for PythonLibrary").
Codegen: ie, how do I go from a typed Target object of one type (ie ThriftTarget) to the type that a caller/dependee wants (ie, ScalaTarget or Classpath).

Both of these can be worked on roughly independently: the only issue with starting Codegen before Parsing is that the types that you would consume as input to codegenny-conversions would be our existing awkward placeholder types (adaptors/etc).

Additionally, we thought that dependency inference is likely to "just work" almost regardless of design. And we thought that it was not crystal clear "how many" unions we need to have for the lint/fmt case, and that more experimentation was necessary there.

cosmicexplorer · 2019-10-25T00:27:37Z

we should have two base classes: Field and Target.

I think that conceptually, we could absolutely use a @union for each of these and just create UnionRules. We could even go further, and make each of these into the @dataclasses Field and Target, where "adding a new Field type A" would then mean "defining a type A and writing a @ruleset mapping A -> Field".

In practice, I think a base class just seems like a better API, so I am definitely still behind using that approach. I think all of the above are actually reducible to each other anyway, so I don't think we would be losing anything or introducing any inconsistency by taking this approach now.

Super hype about the result of this discussion today!

### Problem `HydratedTarget` and `TargetAdaptor` currently both duplicate the exact same information in the fields `address` and `dependencies`, i.e. `hydrated_target.address == hydrated_target.adaptor.address` and `hydrated_target.dependencies == hydrated_target.adaptor.dependencies`. Why keep both these types, then? We need `HydratedTarget` as a simple wrapper around `TargetAdaptor` due to the engine being a type-driven API. Sometimes, we want a generic way to refer to a target, regardless of what the target type is, i.e. `HydratedTarget`. Other times, we do care about the target type, such as a `PythonTestsAdaptor` vs. `PythonLibraryAdaptor`. So, we need both types, but duplicating the information in both types makes the distinction between the two less clear—we want to emphasize that `HydratedTarget` is nothing more than a wrapper around `TargetAdaptor` for the purpose of having a generic target type for the engine. Duplicating the fields also complicates the implementation, e.g. making it harder to use `HydratedTarget` in tests. ### Solution Simplify `HydratedTarget` to have only one public* field of `adaptor`. Whenever we need to access the address or dependencies, we can use `hydrated_target.adaptor.address` and `hydrated_target.adaptor.dependencies`. *`HydratedTarget` gets passed around hundreds of times to the engine when constructing the build graph, so it needs to use `__eq__` hundreds of times to check for cache hits. To improve the speed of this, we override `HydratedTarget.__eq__` to short circuit if the addresses do not match. However, using `adaptor.address` instead of a direct field `._address` results in a slight performance hit when running `build-support/bin/mypy.py` (which runs on every target in Pants). Why? `TargetAdaptor.address` is ~10x slower than `HydratedTarget.address` (0.000_000_1 vs. 0.000_001 seconds). So, we add a private field `HydratedTarget._address` to avoid this performance regression when hashing. Performance stays the same - it takes 4.65 seconds to build the build graph when running `build-support/bin/mypy.py`. ### Result The difference between `HydratedTarget` and `TargetAdaptor` is hopefully more explicit. There are no (known) performance regressions for both V1 and V2. It still takes ~4.65 seconds to generate the build graph when running `build-support/bin/mypy.py`. This is sort of prework for #4535 as it simplifies the status quo.

## Problem See #4535 and the [recent design doc](https://docs.google.com/document/d/1nxPdvuzgCPKhTabhfYBN2tbcr85enSR83OpcvdfWXfY/edit). This design implements the main goals of the Target API: 1. Extensibility - add new target types. 2. Extensibility - add new fields to pre-existing target types. 3. Typed fields. * See [Typed Fields](https://docs.google.com/document/d/1nxPdvuzgCPKhTabhfYBN2tbcr85enSR83OpcvdfWXfY/edit#heading=h.ctjckb8e5t03) for a justification of this. 4. All fields are lazily hydrated/validated. 5. Has a utility for filtering based on required fields (see `test_has_fields()`) * This is important to how rules will consume the Target API. `python_test_runner.py` might say something like `if my_tgt.has_fields([PythonSources, Timeout])`. * See [Proposed Design](https://docs.google.com/document/d/1nxPdvuzgCPKhTabhfYBN2tbcr85enSR83OpcvdfWXfY/edit#heading=h.z97og7gj9rvs) for the importance of using Pure Python for this filtering, rather than engine code. 6. Works 100% with MyPy. 7. Nice `repr` and `dir` functions to allow for easy debugging. ## Solution Add `Target` and `Field` types. A `Target` type is a combination of several `Field`s that are valid _together_. Given a target, the two main expected API calls will be: ```python my_tgt.has_fields([PythonSources, Compatibility]) compatibility: Compatibility = my_tgt.get(Compatibility) ``` MyPy understands both of these calls, including `Target.get` thanks to generics. ### Lazy hydration - primitive vs. async fields About 5% of fields require the engine for hydration, e.g. `sources` and `bundles`. The other 95% of fields, though, are nothing more than a `str`, `List[str]`, `bool`, etc. We do not want to complicate accessing the majority of fields. While some field values will need to be accessed via `await Get`, the majority should not be this complex. So, we introduce `PrimitiveField` and `AsyncField` to distinguish between the two. Thanks to MyPy and autocomplete, rule authors will quickly know when working with an `AsyncField` that they cannot call `my_field.value` like they normally could. ### Extensibility - add new fields on pre-existing targets This is a new feature that we want as a better version of `tags`. To add a new field, plugin authors simply need to create the new field then register, for example, `UnionRule(PythonLibrary, TypeChecked)`. The Target constructor will recognize this new field and treat it as first-class. Core Pants code can safely ignore the field, while plugins can now use the new field. ## Followups 1. Add a way to go from `HydratedStruct -> Target`, so that we can actually use this with production code. 2. Add syntactic sugar for declaring new fields, e.g. add `BoolField` and `StringField`. 3. Add a mechanism to actually register new target types and their aliases, e.g. a new backend registration hook ```python def targets() -> Sequence[Type[Target]]: return (PythonLibrary, PythonTests) ``` 4. Add implementations for Python targets as a real-world use case. 5. Use the new Python target types in some rules like Black and Isort. 6. Allow going from a V2 `Target` to a V1 `Target`. 7. Improve error messages for invalid fields and targets, e.g. point to the actual target upon failure. 8. Remove `TargetAdaptor`.

Eric-Arellano · 2020-05-18T06:14:18Z

Closing now that the Target API has been in usage for about a month, including for codegen.

There's a bit of remaining work:

Finish documentation at https://pants.readme.io/docs/target-api.
Deprecate TargetAdaptor in favor of the Target API.

But the actual API is implemented and used in the wild.

stuhood added the engine label May 1, 2017

stuhood mentioned this issue May 1, 2017

Design strategy for externalizing Target fields #3560

Closed

stuhood added this to the 1.4.0 milestone Jun 7, 2017

stuhood mentioned this issue Jun 7, 2017

Broken SourcesField support for globs #4641

Closed

stuhood mentioned this issue Jun 7, 2017

Design unification of macros/CAOFs/struct-inheritance #3561

Closed

stuhood removed this from the 1.4.x milestone Jan 12, 2018

stuhood mentioned this issue Feb 5, 2018

Port the python pipeline to the v2 engine #5440

Closed

This was referenced Mar 11, 2018

Further --changed optimization #5579

Merged

make Target cacheable in the ProductGraph #3991

Closed

This was referenced May 21, 2018

Remove the BuildGraph reference from Target #5668

Closed

WIP: SnapshotBackedEagerFilesetWithSpec #5856

Closed

This was referenced Jun 5, 2018

Expose Target's sources attribute as a Snapshot to Tasks #5762

Closed

source attribute is automatically promoted to sources #5908

Merged

stuhood mentioned this issue Jun 19, 2018

Deprecate legacy default sources API #5987

Closed

stuhood mentioned this issue Aug 9, 2018

allow lambdas in @rules to perform Get requests somehow #5933

Closed

This was referenced Sep 9, 2018

Specifying deps outside of BUILD files (aka "dep inference") #6449

Closed

Add support for merging products produced by multiple promises #2526

Closed

Merge subjects and variants into Params, and remove Noop #6170

Merged

stuhood added the P3 - M3 label Sep 28, 2018

stuhood mentioned this issue Sep 16, 2019

Extract generalized rule to strip source roots #8284

Closed

stuhood mentioned this issue Sep 26, 2019

UnionRules don't sufficiently model shared superclasses #8343

Closed

illicitonion mentioned this issue Sep 27, 2019

Add black support as a v2 rule #8350

Merged

stuhood mentioned this issue Sep 27, 2019

Provide a way to get the closure (i.e. all flattened targets) for a TransitiveHydratedTarget #7726

Closed

illicitonion mentioned this issue Oct 8, 2019

Start creating binary goal with support for python #8399

Merged

stuhood mentioned this issue Oct 22, 2019

Use UnionMembership to reduce coupling in fmt and lint v2 rules #8490

Closed

stuhood mentioned this issue Oct 30, 2019

Add lint-v2 implementation for python with black, and script to be used in pre-commit hook #8553

Merged

cosmicexplorer mentioned this issue Dec 4, 2019

./pants --v2 export-v2 [HACKY PROTOTYPE -- NOT FOR MERGING] #8760

Closed

stuhood mentioned this issue Jan 24, 2020

Coverage #8978

Closed

cosmicexplorer mentioned this issue Feb 4, 2020

vague timeline for the v2 jvm backend #9064

Closed

10 tasks

stuhood mentioned this issue Feb 11, 2020

Replace BuildFileAddress with Address for HydratedTarget and TargetAdaptor #9100

Merged

Eric-Arellano mentioned this issue Feb 12, 2020

Respect precise file arguments with V2 ./pants test #9120

Merged

This was referenced Feb 20, 2020

[WIP] create a 'target_adaptors' entry in v2 plugins and backends #9153

Closed

add type-checked get_field() method to TargetAdaptor and consume v1 target payloads #9188

Closed

Eric-Arellano mentioned this issue Mar 3, 2020

Deduplicate HydratedTarget and TargetAdaptor #9219

Merged

Eric-Arellano mentioned this issue Mar 12, 2020

Add core abstractions for the V2 Target API #9284

Merged

stuhood assigned Eric-Arellano Apr 29, 2020

stuhood mentioned this issue Apr 29, 2020

Target Payload validation needs to be tightened / implemented #146

Closed

Eric-Arellano closed this as completed May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design v2 Target API #4535

Design v2 Target API #4535

stuhood commented May 1, 2017 •

edited by Eric-Arellano

Loading

stuhood commented Jun 7, 2017

stuhood commented Feb 6, 2018

stuhood commented Mar 29, 2018

stuhood commented Apr 12, 2018

stuhood commented Jun 16, 2018 •

edited

Loading

stuhood commented Sep 9, 2018

cosmicexplorer commented Sep 10, 2018

stuhood commented Sep 10, 2018

stuhood commented Sep 10, 2018

stuhood commented Sep 20, 2018

stuhood commented Sep 27, 2019 •

edited

Loading

benjyw commented Sep 27, 2019

illicitonion commented Sep 30, 2019

cosmicexplorer commented Sep 30, 2019 •

edited

Loading

stuhood commented Oct 8, 2019

stuhood commented Oct 24, 2019

stuhood commented Oct 24, 2019 •

edited

Loading

cosmicexplorer commented Oct 25, 2019 •

edited

Loading

Eric-Arellano commented May 18, 2020

Design v2 Target API #4535

Design v2 Target API #4535

Comments

stuhood commented May 1, 2017 • edited by Eric-Arellano Loading

stuhood commented Jun 7, 2017

stuhood commented Feb 6, 2018

stuhood commented Mar 29, 2018

stuhood commented Apr 12, 2018

stuhood commented Jun 16, 2018 • edited Loading

stuhood commented Sep 9, 2018

cosmicexplorer commented Sep 10, 2018

stuhood commented Sep 10, 2018

stuhood commented Sep 10, 2018

stuhood commented Sep 20, 2018

stuhood commented Sep 27, 2019 • edited Loading

benjyw commented Sep 27, 2019

illicitonion commented Sep 30, 2019

cosmicexplorer commented Sep 30, 2019 • edited Loading

stuhood commented Oct 8, 2019

stuhood commented Oct 24, 2019

stuhood commented Oct 24, 2019 • edited Loading

cosmicexplorer commented Oct 25, 2019 • edited Loading

Eric-Arellano commented May 18, 2020

stuhood commented May 1, 2017 •

edited by Eric-Arellano

Loading

stuhood commented Jun 16, 2018 •

edited

Loading

stuhood commented Sep 27, 2019 •

edited

Loading

cosmicexplorer commented Sep 30, 2019 •

edited

Loading

stuhood commented Oct 24, 2019 •

edited

Loading

cosmicexplorer commented Oct 25, 2019 •

edited

Loading