Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design v2 Target API #4535

Closed
stuhood opened this issue May 1, 2017 · 32 comments · Fixed by #7116
Closed

Design v2 Target API #4535

stuhood opened this issue May 1, 2017 · 32 comments · Fixed by #7116
Assignees

Comments

@stuhood
Copy link
Member

stuhood commented May 1, 2017

The current state of the "target" API that is consumed by @rules is very scattered, mostly due to a need for backward compatibility, and the incremental introduction of v2 to the rest of the codebase.

Now that @console_rules have begun to be used more widely, and this API (that was originally only designed to act as adaptors to allow pantsd to consume @rules for BUILD file parsing) is beginning to be used by end users writing @rules, it's time for another round of design.

A previous bit of design was done in https://docs.google.com/document/d/102EFbwk6cpM9-_4ZSYhMQYA0zL1UKbXzW-DCd8KsFeg/edit?usp=sharing, but some of the considerations there are stale: see comments on this ticket, and example usages of these APIs in the v2 python test runner and in v2 list for more information.


A quick explanation of the status quo as of March 2018 April 2019 August 2019 ~~February 2020:

None of these APIs are public yet, and they're all going to need to change... they came out of the engine experiment, and they support more features than we need. So we need to figure out which bits to prune.

Currently the hierarchy of construction (bottom to top) is:

  • HydratedStruct - A Struct that has been deserialized from a BUILD file. Currently this type supports inlining other Structs via address references: see this example ...where because the configurations field is expecting to receive Structs, the inline address reference there is expanded and inlined into
  • TargetAdaptor - A Struct subclass representing a target. So in the case of expanding a legacy build graph, the above step operates on concrete subclasses of TargetAdaptor. A TargetAdaptor still has sources represented as a PathGlobs object in a SourcesField wrapper: ie, the source globs haven't been expanded.
  • HydratedTarget - An object containing enough information to actually construct the legacy BuildGraph interface (with an EagerFilesetWithSpec object representing fully expanded and fingerprinted globs). As described on Declare v2 products on v1 Tasks #4769 and in the document linked above, this is not the interface we want to expose to anyone, as it is too granular to avoid expanding sources if the usecase doesn't require them.
  • LegacyHydratedTarget - a HydratedTarget but with a BuildFileAddress rather than an Address, which is needed for V1 to work properly.
@stuhood
Copy link
Member Author

stuhood commented Jun 7, 2017

Relates to #4641: people are running into the partial coverage of the previous target API in 1.3.0.

@stuhood
Copy link
Member Author

stuhood commented Feb 6, 2018

Relates to #3991 ... it's possible that with v1 build file parsing "gone" in master, that approach might be more feasible than before.

@stuhood
Copy link
Member Author

stuhood commented Mar 29, 2018

We should reevaluate this post #5580: it's possible that @jsirois's idea to make Structs inheritable/mergeable/etc with typed fields is totally excellent once we have easier @rule APIs.

@stuhood
Copy link
Member Author

stuhood commented Apr 12, 2018

@jsirois : I had another thought here around how to do extensibility without subclassing (which remained an open question in the doc linked in the description).

I think that a somewhat natural way to constrain the types that are legal in a particular field while still allowing for extension, would be to have "named TypeConstraints unions". As an example: for determining which Configuration/Field types are legal on Structs, rule registration could mutate or aggregate a TypeConstraint called "LegalFieldTypes" (or "legal_field_types", depending). @rules which needed to consume "all legal field types" would use that constraint by name... meanwhile, the @rule graph would still need to have @rules installed that could satisfy each of the unioned members of the named constraint... so if you had a yield Get(HydratedField, LegalFieldTypes, x) call, each member of the LegalFieldTypes union would need to have an @rule that could provide a HydratedField.

This "named type unions" concept could also apply to other plugin extension points: @rule authors would add some possible type to a union, and then add an @rule to satisfy it.

@stuhood
Copy link
Member Author

stuhood commented Jun 16, 2018

This "named type unions" concept could also apply to other plugin extension points: @rule authors would add some possible type to a union, and then add an @rule to satisfy it.

It occurs to me that in order for consuming rules to consume the output of this type of union, they'd need to be oblivious to the union members... and in order for that to happen and still be useful, the union members would probably all need a shared parent class. So... this might actually look a bit like subtyping with a closed universe of subclasses?

@stuhood
Copy link
Member Author

stuhood commented Sep 9, 2018

This is related to #6449, because the implementation of dep inference relies on implementers being able to inject new Field types and rules to provide them.

I think that all of the sketches described above have begun to solidify in my mind around "subtyping with a closed universe of subclasses". But that will require a bit of refactoring of how unions work now, and I don't think we should start that until #5788 is completed.

So while it is a bit of a yakshave, I'm going to consider this blocked by #5788.

@cosmicexplorer
Copy link
Contributor

I'm probably missing something, but:

This "named type unions" concept could also apply to other plugin extension points: @rule authors would add some possible type to a union, and then add an @rule to satisfy it.

I don't understand what the named union approach provides -- the requirement of a closed universe of possible types of input is already achieved by requiring a unique path from subject to product in the rule graph for every subj/prod pair requested.

It occurs to me that in order for consuming rules to consume the output of this type of union, they'd need to be oblivious to the union members... and in order for that to happen and still be useful, the union members would probably all need a shared parent class. So... this might actually look a bit like subtyping with a closed universe of subclasses?

Rules consuming some product A are already oblivious to the path from the subject to A? if we're thinking about requiring some common parent class anyway, it seems like instead we could make A just a normal datatype containing all of the info that rules consuming A would need, and rely on the fact that there must be a single unique path from subject to A for polymorphism? In the case of A = Field, I would think a rule author can make whatever kind of type F they want to represent their fields, and then define an @rule from F to A.

each member of the LegalFieldTypes union would need to have an @rule that could provide a HydratedField

This seems like exactly the guarantee we already have by requiring a unique path from product to subject. I'm looking at the doc linked up top and I cannot understand what the explicit, closed union concept provides over the implicit closed union that the rule graph already provides.

I have no problem with the idea, I just think requiring a unique path in the rule graph is a very simple model to use and debug and I would prefer to use the polymorphism from that already than require anyone to start subclassing anything. The property "this type subclasses this other type" means "this type provides some known set of methods/attrs" -- it seems like that can be achieved by using a datatype A which provides the methods/attrs you require, and then requiring that there be a unique path from whatever type the rule author can think of to A. I don't see what utility further narrowing that has as of now, it seems to make the model more complex. I could very easily be missing something obvious.

@stuhood
Copy link
Member Author

stuhood commented Sep 10, 2018

Re-reading my own comments, I think that I flip-flopped a few times between discussing "output" types (like HydratedField) and "input" types (like Field / SourcesField / BundleField). It's not the case that the input types need to be subclasses of any particular type, because the @rule needn't actually consume them, and can just pass them from place to place.

Given that confusion, I think I'll probably refresh the design doc for another round of review.

@stuhood
Copy link
Member Author

stuhood commented Sep 10, 2018

I don't understand what the named union approach provides -- the requirement of a closed universe of possible types of input is already achieved by requiring a unique path from subject to product in the rule graph for every subj/prod pair requested.

@cosmicexplorer : It provides extensibility into existing @rules that might not be aware of all of the types of a union. Concretely, it allows replacing:

if isinstance(x, SourcesField):
  hf = yield Get(HydratedField, SourcesField, x)
elif isinstance(x, BundleField):
  hf = yield Get(HydratedField, BundleField, x)
else:
  ..

with:

hf = yield Get(HydratedField, FieldsUnionType, x)

The former has a hardcoded list of legal fields, and so you cannot add additional Field types (or @rules to satisfy them) without editing that rule.

@stuhood
Copy link
Member Author

stuhood commented Sep 20, 2018

I rebased the AggregationRule concept out of #6170 before landing it, because it doesn't quite fit this usecase, but was in the same area. Pushed it over here for reference: 25322b1

@stuhood
Copy link
Member Author

stuhood commented Sep 27, 2019

EDIT: This question is resolved for now: see #8368!


From a discussion with @benjyw in https://pantsbuild.slack.com/archives/C0D7TNJHL/p1569606120265000, my feeling is that there might be a fundamental decision around how to use @union; do we (either, I think?):

  1. allow for "runtime" typechecks against a @union? IE, give @rules the ability to check whether an object is a member of a @union?
  2. add some sort of "default"ing facility to @union, so that if something isn't a member of the type, a different @rule runs.

The fundamental difference between these choices in a case like

# NB: This has the effect of "casting" a TargetAdaptor to a member of the TestTarget union. If the
# TargetAdaptor is not a member of the union, it will fail at runtime with a useful error message.
result = yield Get(TestResult, TestTarget, target.adaptor)
is that the former would mean "don't request a TestResult if we can't test something", and the latter would mean: "you can always get a TestResult, but sometimes it will be a noop provided by a default implementation".

I'm leaning toward the former, because it's less magical, but it all relates to how strongly typed targets are in general.

@benjyw
Copy link
Contributor

benjyw commented Sep 27, 2019

Agreed re the former probably being nicer.

@illicitonion
Copy link
Contributor

I think we definitely want runtime checks; @console_rules should each have explicit boilerplate to exclude non-matching targets, because they should all probably handle that differently. We can probably provide some helper functions to make it less boilerplatey as the need arises.

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Sep 30, 2019

We can probably provide some helper functions to make it less boilerplatey as the need arises.

I think this is the approach we should take for all v2 functionality! It seems to have allowed us to make more progress on this sort of API question recently.

While I would really like to eventually figure out a way to express UnionRule-y relations in a way which somehow doesn't allow for the case where the union relation doesn't exist, since as of right now it seems like there's no clear way to do that, I think choosing specifically to allow the explicit check for now, with an eye to some way to clean that up later, sounds like a great approach.

@stuhood
Copy link
Member Author

stuhood commented Oct 8, 2019

Closing the loop: #4535 (comment) was resolved in #8368!

@stuhood
Copy link
Member Author

stuhood commented Oct 24, 2019

This is also related to #7022: it's possible that target definitions (and macros) should themselves be implemented as @rules, which would allow them to declare their dependencies.

@stuhood
Copy link
Member Author

stuhood commented Oct 24, 2019

Attaching notes from a design discussion today.

The summary of these is that we think that there are roughly two levels to be concerned with:

  1. Parsing: ie, how do I define the universe of legal things that can be parsed from BUILD files into typed Target objects. The notes suggest that:
    • we should have two base classes: Field and Target.
      • A Target has a classmember that is a union of its legal Field types (we could maybe generate this synthetically like we do for Goal.Options), and would have a getter target.get(Field) for the instance of a field (or a default? or fields themselves have defaults, so it is infallible?)
      • Field might have a required class method Field.default(Target) that would create a default instance of the Field for that Target type (aka, "default Sources (glob) for PythonLibrary").
  2. Codegen: ie, how do I go from a typed Target object of one type (ie ThriftTarget) to the type that a caller/dependee wants (ie, ScalaTarget or Classpath).

Both of these can be worked on roughly independently: the only issue with starting Codegen before Parsing is that the types that you would consume as input to codegenny-conversions would be our existing awkward placeholder types (adaptors/etc).

Additionally, we thought that dependency inference is likely to "just work" almost regardless of design. And we thought that it was not crystal clear "how many" unions we need to have for the lint/fmt case, and that more experimentation was necessary there.

IMG_20191024_144904

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Oct 25, 2019

we should have two base classes: Field and Target.

I think that conceptually, we could absolutely use a @union for each of these and just create UnionRules. We could even go further, and make each of these into the @dataclasses Field and Target, where "adding a new Field type A" would then mean "defining a type A and writing a @ruleset mapping A -> Field".

In practice, I think a base class just seems like a better API, so I am definitely still behind using that approach. I think all of the above are actually reducible to each other anyway, so I don't think we would be losing anything or introducing any inconsistency by taking this approach now.

Super hype about the result of this discussion today!

@stuhood stuhood mentioned this issue Jan 24, 2020
Eric-Arellano added a commit that referenced this issue Mar 4, 2020
### Problem

`HydratedTarget` and `TargetAdaptor` currently both duplicate the exact same information in the fields `address` and `dependencies`, i.e. `hydrated_target.address == hydrated_target.adaptor.address` and `hydrated_target.dependencies == hydrated_target.adaptor.dependencies`.

Why keep both these types, then? We need `HydratedTarget` as a simple wrapper around `TargetAdaptor` due to the engine being a type-driven API. Sometimes, we want a generic way to refer to a target, regardless of what the target type is, i.e. `HydratedTarget`. Other times, we do care about the target type, such as a `PythonTestsAdaptor` vs. `PythonLibraryAdaptor`.

So, we need both types, but duplicating the information in both types makes the distinction between the two less clear—we want to emphasize that `HydratedTarget` is nothing more than a wrapper around `TargetAdaptor` for the purpose of having a generic target type for the engine. Duplicating the fields also complicates the implementation, e.g. making it harder to use `HydratedTarget` in tests.

### Solution

Simplify `HydratedTarget` to have only one public* field of `adaptor`. Whenever we need to access the address or dependencies, we can use `hydrated_target.adaptor.address` and `hydrated_target.adaptor.dependencies`.

*`HydratedTarget` gets passed around hundreds of times to the engine when constructing the build graph, so it needs to use `__eq__` hundreds of times to check for cache hits. To improve the speed of this, we override `HydratedTarget.__eq__` to short circuit if the addresses do not match. However, using `adaptor.address` instead of a direct field `._address` results in a slight performance hit when running `build-support/bin/mypy.py` (which runs on every target in Pants). Why? `TargetAdaptor.address` is ~10x slower than `HydratedTarget.address` (0.000_000_1 vs. 0.000_001 seconds). So, we add a private field `HydratedTarget._address` to avoid this performance regression when hashing. Performance stays the same - it takes 4.65 seconds to build the build graph when running `build-support/bin/mypy.py`.

### Result

The difference between `HydratedTarget` and `TargetAdaptor` is hopefully more explicit.

There are no (known) performance regressions for both V1 and V2. It still takes ~4.65 seconds to generate the build graph when running `build-support/bin/mypy.py`.

This is sort of prework for #4535 as it simplifies the status quo.
Eric-Arellano added a commit that referenced this issue Mar 13, 2020
## Problem

See #4535 and the [recent design doc](https://docs.google.com/document/d/1nxPdvuzgCPKhTabhfYBN2tbcr85enSR83OpcvdfWXfY/edit).

This design implements the main goals of the Target API:

1. Extensibility - add new target types.
2. Extensibility - add new fields to pre-existing target types.
3. Typed fields.
    * See [Typed Fields](https://docs.google.com/document/d/1nxPdvuzgCPKhTabhfYBN2tbcr85enSR83OpcvdfWXfY/edit#heading=h.ctjckb8e5t03) for a justification of this.
4. All fields are lazily hydrated/validated.
5. Has a utility for filtering based on required fields (see `test_has_fields()`)
     * This is important to how rules will consume the Target API. `python_test_runner.py` might say something like `if my_tgt.has_fields([PythonSources, Timeout])`.
     * See [Proposed Design](https://docs.google.com/document/d/1nxPdvuzgCPKhTabhfYBN2tbcr85enSR83OpcvdfWXfY/edit#heading=h.z97og7gj9rvs) for the importance of using Pure Python for this filtering, rather than engine code. 
6. Works 100% with MyPy.
7. Nice `repr` and `dir` functions to allow for easy debugging.

## Solution

Add `Target` and `Field` types. A `Target` type is a combination of several `Field`s that are valid _together_.

Given a target, the two main expected API calls will be:

```python
my_tgt.has_fields([PythonSources, Compatibility])
compatibility: Compatibility = my_tgt.get(Compatibility)
```

MyPy understands both of these calls, including `Target.get` thanks to generics.

### Lazy hydration - primitive vs. async fields

About 5% of fields require the engine for hydration, e.g. `sources` and `bundles`. The other 95% of fields, though, are nothing more than a `str`, `List[str]`, `bool`, etc.

We do not want to complicate accessing the majority of fields. While some field values will need to be accessed via `await Get`, the majority should not be this complex.

So, we introduce `PrimitiveField` and `AsyncField` to distinguish between the two. 

Thanks to MyPy and autocomplete, rule authors will quickly know when working with an `AsyncField` that they cannot call `my_field.value` like they normally could.

### Extensibility - add new fields on pre-existing targets

This is a new feature that we want as a better version of `tags`. To add a new field, plugin authors simply need to create the new field then register, for example, `UnionRule(PythonLibrary, TypeChecked)`.

The Target constructor will recognize this new field and treat it as first-class. Core Pants code can safely ignore the field, while plugins can now use the new field.

## Followups

1. Add a way to go from `HydratedStruct -> Target`, so that we can actually use this with production code.
2. Add syntactic sugar for declaring new fields, e.g. add `BoolField` and `StringField`.
3. Add a mechanism to actually register new target types and their aliases, e.g. a new backend registration hook

```python
def targets() -> Sequence[Type[Target]]:
   return (PythonLibrary, PythonTests)
```

4. Add implementations for Python targets as a real-world use case.
5. Use the new Python target types in some rules like Black and Isort.
6. Allow going from a V2 `Target` to a V1 `Target`.
7. Improve error messages for invalid fields and targets, e.g. point to the actual target upon failure.
8. Remove `TargetAdaptor`.
@Eric-Arellano
Copy link
Contributor

Closing now that the Target API has been in usage for about a month, including for codegen.

There's a bit of remaining work:

But the actual API is implemented and used in the wild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants