Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update engine README for @dataclasses #9446

Conversation

cosmicexplorer
Copy link
Contributor

Split out from #8548 -- strictly just updating the engine docs to cover the (no-longer) new recommendation for @dataclasses as engine Params.

[ci skip-rust-tests] # No Rust changes made.

[ci skip-jvm-tests] # No JVM changes made.

# Delete this line to force CI to run Clippy and the Rust tests.
[ci skip-rust-tests]  # No Rust changes made.

# Delete this line to force CI to run the JVM tests.
[ci skip-jvm-tests]  # No JVM changes made.
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for splitting this out and for taking the time to update this!

@@ -76,52 +76,80 @@ In cases where this search detects any ambiguity (generally because there are tw
that can provide the same product with the same number of parameters), rule graph compilation will
fail with a useful error message.

### Datatypes
### How to Declare Engine `Param` Types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider something like How to declare engine newtypes. John and Tom Dyas used that phrase the other day and I think it would be great to leverage here because so many people already understand the concept of a newtype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, ok. I'm pretty sure newtype is a specifically Haskell concept, and I don't know what it means? Could you be more specific about what concept is supposed to be more common knowledge here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haskell has it, but so does Python with typing.NewType https://docs.python.org/3/library/typing.html#newtype.

The concept is "I have some pre-existing type I like like int, but I want to create a new distinct type that acts like int but which must be explicitly used like MyIntType."

The benefit of referencing newtype is that devs who already know that concept now instantly make the connection between their prior knowledge and Pants. Maybe calling it new type (with a space) + still explaining it, like this, would help people who still aren't familiar?

The engine makes heavy use of the concept of newtypes to disambiguate similar-looking types. For example, you might want to use int in several distinct domains, like UserId vs. ProjectId. New types allow you to avoid accidentally using UserId when you meant ProjectId, because each is now a distinct type.

Copy link
Contributor

@gshuflin gshuflin Apr 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cosmicexplorer The Rust ecosystem also uses the term "newtype" in explicit analogy to Haskell, even though there's not a newtype keyword in Rust: https://doc.rust-lang.org/book/ch19-03-advanced-traits.html#using-the-newtype-pattern-to-implement-external-traits-on-external-types

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might add a line saying something like "These small, unique classes are sometimes called 'newtypes', since they effectively wrap an existing type in a 'new type' that the engine will treat as distinct from the underlying type.", maybe with a link to that Rust documentation page.

#### Requirements for an Engine `Param`

To use an instance of some class `C` as a `Param` to the engine:
1. instances of `C` must be immutable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not technically. They should be immutable, but don't need to be. I recommend changing to should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If instances of C are mutable, then rules can start sharing state by mutating the parameters they're passed in, without notifying the engine that they have done so. We allow that to a very limited extent with the use of Workspace, but even that object itself is immutable. Because incorrect behavior may result if C is mutable, we would like to state C being immutable as a precondition. #8548 tries to ensure this for the common case of @dataclass params by ensuring @frozen_after_init is used.

I think that C needs to be immutable for the engine to work. I'm not sure that we explicitly check that, but #8548 definitely tries to do so. If there is a rewording that might make sense, I'd love to know -- perhaps we could add:

Suggested change
1. instances of `C` must be immutable,
1. instances of `C` must be immutable, (the engine will not be able to verify this)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I agree on why we want to ensure things are immutable.

Maybe:

1. instances of `C` must be immutable (although the engine cannot validate this)

Why have this disclaimer? It warns plugin authors that the onus is on them to ensure immutability is respected.


To use an instance of some class `C` as a `Param` to the engine:
1. instances of `C` must be immutable,
2. `__hash__` and `__eq__` must be implemented such that when constructing two separate instances of `C` with the same argument list `...`, `C(...) == C(...)` and `hash(C(...)) == hash(C(...))`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Value-based equality" vs. "identity-based equality" would be good to mention.

__hash__ and __eq__ should be implemented to use value-based equality, rather than identity-based equality. That is, two distinct instances of C created with the same values should be equivalent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thank you! I'll use that!

2. `__hash__` and `__eq__` must be implemented such that when constructing two separate instances of `C` with the same argument list `...`, `C(...) == C(...)` and `hash(C(...)) == hash(C(...))`.
- This can be ignored for singleton `Param`s.

#### Benefits of using `@dataclass` for a `Param`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove "Benefits of". We don't really need to make the case for dataclasses - I doubt many people will question our recommendation.

You could simply say "We recommend using frozen dataclasses to ergonomically create distinct newtypes understood by both Pants and by MyPy."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This being said, the link to Python 3's docs are great for people who aren't familiar already with dataclasses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing "benefits of" sounds great! Thank you!

Comment on lines +109 to +110
# Pants requires that engine Params have a stable hash. This can be accomplished with the
# `frozen=True` argument set in the `@dataclass` decorator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Exactly(FieldType))`). The bare type name (e.g. `FieldType`) can also be used as a shorthand for
`Exactly(FieldType)`. If the tuple form is used, the constructor will create your object, then raise
an error if the field value does not satisfy the type constraint.
`@frozen_after_init` can also be used with `@dataclass(unsafe_hash=True)` to create engine `Param`s which can modify their fields within the `__init__()` method. This is useful if you want to perform any normalization on the args to `__init__()`, such as allowing `List` and then casting this to an immutable `Tuple` in `__init__()`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I recommend inverting the order of these sentences so that you lead with the why and then follow up with the how.

Often, you'll want a custom __init__() but to still have the dataclass be marked as frozen; for example, you may want to allow the user to pass Iterable[str] in the constructor and then to convert it into an immutable Tuple[str, ...].

Dataclasses do not normally allow you to mix custom __init__() implementations with frozen=True, so we created a utility called @frozen_after_init to get the best of both worlds.

self.elements = tuple(elements)

x = ValidatedCollection([1, 2, 3])
assert x.elements == (1, 2, 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might even better express the intent:

Suggested change
assert x.elements == (1, 2, 3)
assert isinstance(x.elements, tuple)

@Eric-Arellano
Copy link
Contributor

Closing because we want this documentation to live in https://pants.readme.io/docs/plugins-overview. Once those docs are done, I think we'll want to delete this README.

I'll incorporate your (very helpful!) changes in those docs, and I'd love any feedback on them once ready :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants