-
-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datatype()s can declare default values #6374
datatype()s can declare default values #6374
Conversation
569881f
to
6cdc180
Compare
I want to make sure you were aware of https://www.python.org/dev/peps/pep-0557/. One path to the stars is to standardize our codebase with dataclasses if our flip to python 3 can require at least 3.6 (backport here https://github.com/ericvsmith/dataclasses). |
Good point about dataclasses. One concern I have about our own data type implementation is that static type checkers like MyPy won’t understand it. I’m hoping once we drop Py2 we can start adopting Py3’s type hints, and we’ll get more meaningful results if we use dataclasses. Other than that concern, this looks like a very helpful and sensible change! |
I want to conform to existing standards as much as posssible and will review the links provided. |
In response to the comments, my plan for this PR now is to transition to use the dataclasses library for py2 and the py3 builtin one, and to get an idea of how and where this change affects perf. For things like checking whether args are strings or ints or something else that can be described in types, this can be partially checked statically (depending on the annotation quality and the type checker's abilities), so I can immediately see one reason why we might expect better perf from moving to the dataclass system (by ensuring those kinds of conditionals don't need even a probably-branch-predicted check at runtime). |
7b991d2
to
994dc49
Compare
After realizing that the |
I'm going to revert everything that's not strictly about default values and make it into a followup PR to make this easier to review. |
c87ae6d
to
98ec38d
Compare
I reverted 200 lines and kept this diff to the changes necessary to use default values, and a small number of examples, instead of any deeper code. Will publish a followup PR that allows us to completely remove a large number of My main concern right now is that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I count 6 lines of __new__
code removed as a result of this feature (there were other __new__
's removed that were not needed in the 1st place). This seems like a high cost (+439/-117) for that. Are you convinced its not worth waiting for python3.6 and dataclasses?
'time', | ||
# TODO: an `optional(type_constraint=None)` type constraint factory which matches values | ||
# matching the given constraint, as well as None. | ||
('failure', None, None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The degree of mysteriousness here is very high. The natural question from the reader is why ('failure', None, None),
instead of 'failure',
; ie: what does 'name',
above do that this does not or vice-versa? Maybe the TODO changes 'failure' in such away that None
is no longer tossed about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One alternative is to use the DatatypeFieldDecl()
constructor directly, with named arguments. One thing I did notice is that in the python 3 typing
module, the NamedTuple
class accepts precisely the syntax we currently (on master
) have for typed fields:
from typing import NamedTuple
ClassName = NamedTuple('ClassName', [('x', int)])
# versus
from pants.util.objects import datatype
class ClassName(datatype([('x', int)])): pass
# compare to:
from pants.util.objects import DatatypeFieldDecl as F
class ClassName(datatype([
F(field_name='failure', default_value=None, has_default_value=True),
])): pass
So while we have actually so far kept compatible field declaration syntax with NamedTuple
, this PR allowing mere positional tuples with this mysterious extended syntax as you've pointed out here goes past that. I think using the DatatypeFieldDecl
constructor directly with named arguments, as above, might help this situation, and ensure we make it clear when we're writing something incompatible.
Well, I can show what it's like with all of those TODOs implemented, which would make more convincing a case (mostly the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll step away and let others review this change.
This is back as a WIP after making several sweeping changes. Please take a look at the added tests in Note that we are not doing anything weird anymore with parsing arguments to |
0c72d76
to
7eded7d
Compare
@Eric-Arellano It will be very exciting when python 2 can be removed! I'm only complaining because I can't figure out how to run python 3 tests locally, or at least, I keep seeing: I spent some time investigating this yesterday but didn't end up anywhere, I'm sure I can figure it out I just haven't yet. |
This has gotten really large, and probably needs some discussion. In particular, at least one reviewer has resigned due to the size, and I'm concerned that this is overkill when compared to defining |
I think I can slim this down to a much smaller number of lines given the epiphany I mentioned earlier in setting default values. I've pushed up the current version to The intention with writing all of this code was not to have someone just accept it, but to then have a discussion about it after the iterative process of figuring out the interface I wanted for more complex typing. I did not have any idea in my head of what this could look like until I spent the time banging it out. It's also useful to note that 90% of the changes are just rearrangement of |
b68b46c
to
c622065
Compare
I wiped away all the extraneous stuff and avoided reordering anything in
Maybe defining The intent of the PR before I cut it down just now is not just to make it easy to add default values, but to add a syntax to declaratively specify what arguments are accepted, and what those arguments will eventually resolve to (which is actually completely orthogonal to default values -- I didn't realize this until I had been able to iterate on it). For example,
These type constraints are now composable. The goal is not to do any of the work that should be done in a rule, but to make it easier to look at a datatype definition and understand what values it accepts and what values it doesn't. For example, lists aren't hashable, but it's really not necessary for the argument to the constructor to be a tuple, in most cases. However, this is currently required for e.g. I can understand being concerned about complexity of implementation, but it seems like the result is fantastic -- consider the current definition of class AddressMapper(datatype([
'parser',
'build_patterns',
'build_ignore_patterns',
'exclude_target_regexps',
'subproject_roots',
])):
"""Configuration to parse build files matching a filename pattern."""
def __new__(cls,
parser,
build_patterns=None,
build_ignore_patterns=None,
exclude_target_regexps=None,
subproject_roots=None):
"""Create an AddressMapper.
Both the set of files that define a mappable BUILD files and the parser used to parse those
files can be customized. See the `pants.engine.parsers` module for example parsers.
:param parser: The BUILD file parser to use.
:type parser: An instance of :class:`pants.engine.parser.Parser`.
:param tuple build_patterns: A tuple of fnmatch-compatible patterns for identifying BUILD files
used to resolve addresses.
:param list build_ignore_patterns: A list of path ignore patterns used when searching for BUILD files.
:param list exclude_target_regexps: A list of regular expressions for excluding targets.
"""
build_patterns = tuple(build_patterns or ['BUILD', 'BUILD.*'])
build_ignore_patterns = tuple(build_ignore_patterns or [])
exclude_target_regexps = tuple(exclude_target_regexps or [])
subproject_roots = tuple(subproject_roots or [])
return super(AddressMapper, cls).__new__(
cls,
parser,
build_patterns,
build_ignore_patterns,
exclude_target_regexps,
subproject_roots
)
# ... Now consider the version that was in the previous version of this PR (with the exact same behavior, except we also check that our class AddressMapper(datatype([
('parser', SubclassesOf(Parser)),
('build_patterns', convert_default(tuple, default_value=('BUILD', 'BUILD.*'))),
('build_ignore_patterns', convert_default(tuple)),
('exclude_target_regexps', convert_default(tuple)),
('subproject_roots', convert_default(tuple)),
])):
"""Configuration to parse build files matching a filename pattern."" It's not cheating to remove the A final interesting part to me is that we can do something like: IntoTuple = convert_default(tuple)
class MyType(datatype([('arg', IntoTuple)])): pass This makes it easy to package up a lot of related coercion-esque logic into a single object that can be imported and shared across multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a reasonable step in a good direction - I think there's a little more we can kill off :)
('env', tuple), | ||
('output_files', tuple), | ||
('output_directories', tuple), | ||
# TODO: allow inferring a default value if a type is provided which has a 0-arg constructor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be happy to just remove this TODO; inferring defaults is scary...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion on slack, agreed and will do.
('output_files', tuple), | ||
('output_directories', tuple), | ||
# TODO: allow inferring a default value if a type is provided which has a 0-arg constructor. | ||
F('env', Exactly(tuple, dict, type(None)), default_value=None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this None
able? I would remove type(None)
from its type constraints, and default it to ()
or {}
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a correct analysis and I will do that.
# NB: timeout_seconds covers the whole remote operation including queuing and setup. | ||
('timeout_seconds', Exactly(float, int)), | ||
('jdk_home', Exactly(text_type, type(None))), | ||
F('timeout_seconds', Exactly(float, int), default_value=_default_timeout_seconds), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline _default_timeout_seconds
?
timeout_seconds=_default_timeout_seconds, | ||
jdk_home=None, | ||
): | ||
def __new__(cls, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this this implementation to be more work for me to read, rather than less... Before it massaged param, and then constructed an object as I'd expect; now it has multiple layers of validation, and the provision on the type definition for an intermediate state which would be invalid outside the constructor (having a dict as a field value)...
Maybe the answer here is to fall back to the automatically implemented __new__
, and have a with_env
class-function which does the env munging and passes the tuple to the constructor?
from pants.util.meta import AbstractClass | ||
|
||
|
||
# TODO: when we can restrict the python version to >= 3.6 in our python 3 shard, we can use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly that's not a sufficient condition; we also need to have dropped py2 support entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I was thinking there could be some parallel development (not too much) of py3-only code which would be using dataclasses. I will elaborate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parallel development of Py2 and Py3 branches is heavily discouraged in Python community, so might not be the best approach. It used to be the recommend approach in the early days, and it led to extreme complexity for library developers.
So, the new best practice is using Python 3 first code that still works on Python 2, which is what we've been doing with the future
library and backports.
|
||
# NB: it is *not* recommended to rely on the ordering of the tuple returned by this method. | ||
@memoized_classproperty | ||
def _supertype_keyword_only_cached_constructor(cls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One caller? Let's inline.
return super(DataType, cls).__new__(cls, **kw) | ||
return ctor | ||
|
||
def _replace(self, **kwds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/kwds/kwargs/g
?
|
||
This method upholds 2 contracts: | ||
1. If no keyword arguments are provided, return the original object. | ||
2. Do not call __new__() -- the parent class's __new__() is used instead (skipping e.g. type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would much rather just not allow copying in __new__
?
@@ -316,6 +421,9 @@ def satisfied_by_type(self, obj_type): | |||
:rtype: bool | |||
""" | |||
|
|||
# TODO: This method could be extended to allow for coercion -- we could change the interface to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the TODO; it's really specific detail for something which may or may not happen :)
If the argument `default_value` is provided (only by keyword), `has_default_value` is set to | ||
True, and the argument `default_value` is used as this field's `default_value`. If | ||
`default_value` is not provided (as when parsing a field declaration from a tuple), but | ||
`type_constraint` is a TypeConstraint with the property `has_default_value` evaluating to True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do any TypeConstraint
s actually have default values? If not, let's remove the abstract method, property, and this bit of pydoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right! Will do.
We can close this due to the progress made by @Eric-Arellano on #7074! |
Problem
It's true that we can define default values and other logic for the arguments of a datatype by overriding the
__new__
method, but it'sFalse
that it's the most ergonomic, declarative, or composable way to do that. What if we could reach for the stars?Solution
Result
There are a few elements that I have since removed and split off into separate changes to make this mergeable, but which would help to showcase how neat and compositional this feature is. For example, I have implemented (again, not in this PR) an
optional()
method producing a type constraint which will acceptNone
, or an object matching the given type constraint (if given), as well asconvert()
, which will check if an object matches a type constraint, or if not, coerce it in a structured way. This allows succinctly expressing a large variety of the usages ofdatatype()
in this repo which were previously overriding__new__
. With those changes, I was able to wipe away a lot of / almost all__new__
methods entirely, which seemed like a huge win in (de)clar(at)i(vi)ty.Also, I added a TODO to use python 3 dataclasses when we can restrict the python version on our python 3 test shard to
CPython>=3.6
.Please let me know your thoughts on the syntax and implementation!