-
-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update engine README for @dataclasses #9446
update engine README for @dataclasses #9446
Conversation
# Delete this line to force CI to run Clippy and the Rust tests. [ci skip-rust-tests] # No Rust changes made. # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] # No JVM changes made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for splitting this out and for taking the time to update this!
@@ -76,52 +76,80 @@ In cases where this search detects any ambiguity (generally because there are tw | |||
that can provide the same product with the same number of parameters), rule graph compilation will | |||
fail with a useful error message. | |||
|
|||
### Datatypes | |||
### How to Declare Engine `Param` Types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider something like How to declare engine newtypes
. John and Tom Dyas used that phrase the other day and I think it would be great to leverage here because so many people already understand the concept of a newtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, ok. I'm pretty sure newtype
is a specifically Haskell concept, and I don't know what it means? Could you be more specific about what concept is supposed to be more common knowledge here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haskell has it, but so does Python with typing.NewType
https://docs.python.org/3/library/typing.html#newtype.
The concept is "I have some pre-existing type I like like int
, but I want to create a new distinct type that acts like int
but which must be explicitly used like MyIntType
."
The benefit of referencing newtype
is that devs who already know that concept now instantly make the connection between their prior knowledge and Pants. Maybe calling it new type
(with a space) + still explaining it, like this, would help people who still aren't familiar?
The engine makes heavy use of the concept of newtypes to disambiguate similar-looking types. For example, you might want to use
int
in several distinct domains, likeUserId
vs.ProjectId
. New types allow you to avoid accidentally usingUserId
when you meantProjectId
, because each is now a distinct type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cosmicexplorer The Rust ecosystem also uses the term "newtype" in explicit analogy to Haskell, even though there's not a newtype
keyword in Rust: https://doc.rust-lang.org/book/ch19-03-advanced-traits.html#using-the-newtype-pattern-to-implement-external-traits-on-external-types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might add a line saying something like "These small, unique classes are sometimes called 'newtypes', since they effectively wrap an existing type in a 'new type' that the engine will treat as distinct from the underlying type.", maybe with a link to that Rust documentation page.
#### Requirements for an Engine `Param` | ||
|
||
To use an instance of some class `C` as a `Param` to the engine: | ||
1. instances of `C` must be immutable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not technically. They should be immutable, but don't need to be. I recommend changing to should
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If instances of C
are mutable, then rules can start sharing state by mutating the parameters they're passed in, without notifying the engine that they have done so. We allow that to a very limited extent with the use of Workspace
, but even that object itself is immutable. Because incorrect behavior may result if C
is mutable, we would like to state C
being immutable as a precondition. #8548 tries to ensure this for the common case of @dataclass
params by ensuring @frozen_after_init
is used.
I think that C
needs to be immutable for the engine to work. I'm not sure that we explicitly check that, but #8548 definitely tries to do so. If there is a rewording that might make sense, I'd love to know -- perhaps we could add:
1. instances of `C` must be immutable, | |
1. instances of `C` must be immutable, (the engine will not be able to verify this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I agree on why we want to ensure things are immutable.
Maybe:
1. instances of `C` must be immutable (although the engine cannot validate this)
Why have this disclaimer? It warns plugin authors that the onus is on them to ensure immutability is respected.
|
||
To use an instance of some class `C` as a `Param` to the engine: | ||
1. instances of `C` must be immutable, | ||
2. `__hash__` and `__eq__` must be implemented such that when constructing two separate instances of `C` with the same argument list `...`, `C(...) == C(...)` and `hash(C(...)) == hash(C(...))`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Value-based equality" vs. "identity-based equality" would be good to mention.
__hash__
and__eq__
should be implemented to use value-based equality, rather than identity-based equality. That is, two distinct instances ofC
created with the same values should be equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thank you! I'll use that!
2. `__hash__` and `__eq__` must be implemented such that when constructing two separate instances of `C` with the same argument list `...`, `C(...) == C(...)` and `hash(C(...)) == hash(C(...))`. | ||
- This can be ignored for singleton `Param`s. | ||
|
||
#### Benefits of using `@dataclass` for a `Param` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove "Benefits of". We don't really need to make the case for dataclasses - I doubt many people will question our recommendation.
You could simply say "We recommend using frozen dataclasses to ergonomically create distinct newtypes understood by both Pants and by MyPy."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This being said, the link to Python 3's docs are great for people who aren't familiar already with dataclasses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing "benefits of" sounds great! Thank you!
# Pants requires that engine Params have a stable hash. This can be accomplished with the | ||
# `frozen=True` argument set in the `@dataclass` decorator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Exactly(FieldType))`). The bare type name (e.g. `FieldType`) can also be used as a shorthand for | ||
`Exactly(FieldType)`. If the tuple form is used, the constructor will create your object, then raise | ||
an error if the field value does not satisfy the type constraint. | ||
`@frozen_after_init` can also be used with `@dataclass(unsafe_hash=True)` to create engine `Param`s which can modify their fields within the `__init__()` method. This is useful if you want to perform any normalization on the args to `__init__()`, such as allowing `List` and then casting this to an immutable `Tuple` in `__init__()`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I recommend inverting the order of these sentences so that you lead with the why and then follow up with the how.
Often, you'll want a custom
__init__()
but to still have the dataclass be marked as frozen; for example, you may want to allow the user to passIterable[str]
in the constructor and then to convert it into an immutableTuple[str, ...]
.Dataclasses do not normally allow you to mix custom
__init__()
implementations withfrozen=True
, so we created a utility called@frozen_after_init
to get the best of both worlds.
self.elements = tuple(elements) | ||
|
||
x = ValidatedCollection([1, 2, 3]) | ||
assert x.elements == (1, 2, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might even better express the intent:
assert x.elements == (1, 2, 3) | |
assert isinstance(x.elements, tuple) |
Closing because we want this documentation to live in https://pants.readme.io/docs/plugins-overview. Once those docs are done, I think we'll want to delete this README. I'll incorporate your (very helpful!) changes in those docs, and I'd love any feedback on them once ready :) |
Split out from #8548 -- strictly just updating the engine docs to cover the (no-longer) new recommendation for
@dataclass
es as engine Params.[ci skip-rust-tests] # No Rust changes made.
[ci skip-jvm-tests] # No JVM changes made.