-
-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design v2 Target API #4535
Comments
Relates to #4641: people are running into the partial coverage of the previous target API in |
Relates to #3991 ... it's possible that with v1 build file parsing "gone" in master, that approach might be more feasible than before. |
@jsirois : I had another thought here around how to do extensibility without subclassing (which remained an open question in the doc linked in the description). I think that a somewhat natural way to constrain the types that are legal in a particular field while still allowing for extension, would be to have "named TypeConstraints unions". As an example: for determining which This "named type unions" concept could also apply to other plugin extension points: |
It occurs to me that in order for consuming rules to consume the output of this type of union, they'd need to be oblivious to the union members... and in order for that to happen and still be useful, the union members would probably all need a shared parent class. So... this might actually look a bit like subtyping with a closed universe of subclasses? |
This is related to #6449, because the implementation of dep inference relies on implementers being able to inject new I think that all of the sketches described above have begun to solidify in my mind around "subtyping with a closed universe of subclasses". But that will require a bit of refactoring of how unions work now, and I don't think we should start that until #5788 is completed. So while it is a bit of a yakshave, I'm going to consider this blocked by #5788. |
I'm probably missing something, but:
I don't understand what the named union approach provides -- the requirement of a closed universe of possible types of input is already achieved by requiring a unique path from subject to product in the rule graph for every subj/prod pair requested.
Rules consuming some product
This seems like exactly the guarantee we already have by requiring a unique path from product to subject. I'm looking at the doc linked up top and I cannot understand what the explicit, closed union concept provides over the implicit closed union that the rule graph already provides. I have no problem with the idea, I just think requiring a unique path in the rule graph is a very simple model to use and debug and I would prefer to use the polymorphism from that already than require anyone to start subclassing anything. The property "this type subclasses this other type" means "this type provides some known set of methods/attrs" -- it seems like that can be achieved by using a datatype |
Re-reading my own comments, I think that I flip-flopped a few times between discussing "output" types (like Given that confusion, I think I'll probably refresh the design doc for another round of review. |
@cosmicexplorer : It provides extensibility into existing
with:
The former has a hardcoded list of legal fields, and so you cannot add additional Field types (or |
EDIT: This question is resolved for now: see #8368! From a discussion with @benjyw in https://pantsbuild.slack.com/archives/C0D7TNJHL/p1569606120265000, my feeling is that there might be a fundamental decision around how to use
The fundamental difference between these choices in a case like pants/src/python/pants/rules/core/test.py Lines 71 to 73 in 44d4ab5
I'm leaning toward the former, because it's less magical, but it all relates to how strongly typed targets are in general. |
Agreed re the former probably being nicer. |
I think we definitely want runtime checks; |
I think this is the approach we should take for all v2 functionality! It seems to have allowed us to make more progress on this sort of API question recently. While I would really like to eventually figure out a way to express |
Closing the loop: #4535 (comment) was resolved in #8368! |
This is also related to #7022: it's possible that target definitions (and macros) should themselves be implemented as |
Attaching notes from a design discussion today. The summary of these is that we think that there are roughly two levels to be concerned with:
Both of these can be worked on roughly independently: the only issue with starting Codegen before Parsing is that the types that you would consume as input to codegenny-conversions would be our existing awkward placeholder types (adaptors/etc). Additionally, we thought that dependency inference is likely to "just work" almost regardless of design. And we thought that it was not crystal clear "how many" unions we need to have for the |
I think that conceptually, we could absolutely use a In practice, I think a base class just seems like a better API, so I am definitely still behind using that approach. I think all of the above are actually reducible to each other anyway, so I don't think we would be losing anything or introducing any inconsistency by taking this approach now. Super hype about the result of this discussion today! |
### Problem `HydratedTarget` and `TargetAdaptor` currently both duplicate the exact same information in the fields `address` and `dependencies`, i.e. `hydrated_target.address == hydrated_target.adaptor.address` and `hydrated_target.dependencies == hydrated_target.adaptor.dependencies`. Why keep both these types, then? We need `HydratedTarget` as a simple wrapper around `TargetAdaptor` due to the engine being a type-driven API. Sometimes, we want a generic way to refer to a target, regardless of what the target type is, i.e. `HydratedTarget`. Other times, we do care about the target type, such as a `PythonTestsAdaptor` vs. `PythonLibraryAdaptor`. So, we need both types, but duplicating the information in both types makes the distinction between the two less clear—we want to emphasize that `HydratedTarget` is nothing more than a wrapper around `TargetAdaptor` for the purpose of having a generic target type for the engine. Duplicating the fields also complicates the implementation, e.g. making it harder to use `HydratedTarget` in tests. ### Solution Simplify `HydratedTarget` to have only one public* field of `adaptor`. Whenever we need to access the address or dependencies, we can use `hydrated_target.adaptor.address` and `hydrated_target.adaptor.dependencies`. *`HydratedTarget` gets passed around hundreds of times to the engine when constructing the build graph, so it needs to use `__eq__` hundreds of times to check for cache hits. To improve the speed of this, we override `HydratedTarget.__eq__` to short circuit if the addresses do not match. However, using `adaptor.address` instead of a direct field `._address` results in a slight performance hit when running `build-support/bin/mypy.py` (which runs on every target in Pants). Why? `TargetAdaptor.address` is ~10x slower than `HydratedTarget.address` (0.000_000_1 vs. 0.000_001 seconds). So, we add a private field `HydratedTarget._address` to avoid this performance regression when hashing. Performance stays the same - it takes 4.65 seconds to build the build graph when running `build-support/bin/mypy.py`. ### Result The difference between `HydratedTarget` and `TargetAdaptor` is hopefully more explicit. There are no (known) performance regressions for both V1 and V2. It still takes ~4.65 seconds to generate the build graph when running `build-support/bin/mypy.py`. This is sort of prework for #4535 as it simplifies the status quo.
## Problem See #4535 and the [recent design doc](https://docs.google.com/document/d/1nxPdvuzgCPKhTabhfYBN2tbcr85enSR83OpcvdfWXfY/edit). This design implements the main goals of the Target API: 1. Extensibility - add new target types. 2. Extensibility - add new fields to pre-existing target types. 3. Typed fields. * See [Typed Fields](https://docs.google.com/document/d/1nxPdvuzgCPKhTabhfYBN2tbcr85enSR83OpcvdfWXfY/edit#heading=h.ctjckb8e5t03) for a justification of this. 4. All fields are lazily hydrated/validated. 5. Has a utility for filtering based on required fields (see `test_has_fields()`) * This is important to how rules will consume the Target API. `python_test_runner.py` might say something like `if my_tgt.has_fields([PythonSources, Timeout])`. * See [Proposed Design](https://docs.google.com/document/d/1nxPdvuzgCPKhTabhfYBN2tbcr85enSR83OpcvdfWXfY/edit#heading=h.z97og7gj9rvs) for the importance of using Pure Python for this filtering, rather than engine code. 6. Works 100% with MyPy. 7. Nice `repr` and `dir` functions to allow for easy debugging. ## Solution Add `Target` and `Field` types. A `Target` type is a combination of several `Field`s that are valid _together_. Given a target, the two main expected API calls will be: ```python my_tgt.has_fields([PythonSources, Compatibility]) compatibility: Compatibility = my_tgt.get(Compatibility) ``` MyPy understands both of these calls, including `Target.get` thanks to generics. ### Lazy hydration - primitive vs. async fields About 5% of fields require the engine for hydration, e.g. `sources` and `bundles`. The other 95% of fields, though, are nothing more than a `str`, `List[str]`, `bool`, etc. We do not want to complicate accessing the majority of fields. While some field values will need to be accessed via `await Get`, the majority should not be this complex. So, we introduce `PrimitiveField` and `AsyncField` to distinguish between the two. Thanks to MyPy and autocomplete, rule authors will quickly know when working with an `AsyncField` that they cannot call `my_field.value` like they normally could. ### Extensibility - add new fields on pre-existing targets This is a new feature that we want as a better version of `tags`. To add a new field, plugin authors simply need to create the new field then register, for example, `UnionRule(PythonLibrary, TypeChecked)`. The Target constructor will recognize this new field and treat it as first-class. Core Pants code can safely ignore the field, while plugins can now use the new field. ## Followups 1. Add a way to go from `HydratedStruct -> Target`, so that we can actually use this with production code. 2. Add syntactic sugar for declaring new fields, e.g. add `BoolField` and `StringField`. 3. Add a mechanism to actually register new target types and their aliases, e.g. a new backend registration hook ```python def targets() -> Sequence[Type[Target]]: return (PythonLibrary, PythonTests) ``` 4. Add implementations for Python targets as a real-world use case. 5. Use the new Python target types in some rules like Black and Isort. 6. Allow going from a V2 `Target` to a V1 `Target`. 7. Improve error messages for invalid fields and targets, e.g. point to the actual target upon failure. 8. Remove `TargetAdaptor`.
Closing now that the Target API has been in usage for about a month, including for codegen. There's a bit of remaining work:
But the actual API is implemented and used in the wild. |
The current state of the "target" API that is consumed by
@rule
s is very scattered, mostly due to a need for backward compatibility, and the incremental introduction of v2 to the rest of the codebase.Now that
@console_rules
have begun to be used more widely, and this API (that was originally only designed to act as adaptors to allowpantsd
to consume@rules
for BUILD file parsing) is beginning to be used by end users writing@rules
, it's time for another round of design.A previous bit of design was done in https://docs.google.com/document/d/102EFbwk6cpM9-_4ZSYhMQYA0zL1UKbXzW-DCd8KsFeg/edit?usp=sharing, but some of the considerations there are stale: see comments on this ticket, and example usages of these APIs in the v2 python test runner and in v2 list for more information.
A quick explanation of the status quo as of
March 2018April 2019August 2019 ~~February 2020:None of these APIs are public yet, and they're all going to need to change... they came out of the engine experiment, and they support more features than we need. So we need to figure out which bits to prune.
Currently the hierarchy of construction (bottom to top) is:
HydratedStruct
- AStruct
that has been deserialized from a BUILD file. Currently this type supports inlining otherStructs
via address references: see this example ...where because theconfigurations
field is expecting to receiveStructs
, the inline address reference there is expanded and inlined intoTargetAdaptor
- AStruct
subclass representing a target. So in the case of expanding a legacy build graph, the above step operates on concrete subclasses ofTargetAdaptor
. ATargetAdaptor
still has sources represented as aPathGlobs
object in aSourcesField
wrapper: ie, the source globs haven't been expanded.HydratedTarget
- An object containing enough information to actually construct the legacyBuildGraph
interface (with anEagerFilesetWithSpec
object representing fully expanded and fingerprinted globs). As described on Declare v2 products on v1 Tasks #4769 and in the document linked above, this is not the interface we want to expose to anyone, as it is too granular to avoid expanding sources if the usecase doesn't require them.LegacyHydratedTarget
- aHydratedTarget
but with aBuildFileAddress
rather than anAddress
, which is needed for V1 to work properly.The text was updated successfully, but these errors were encountered: