-
-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
type-check specific datatype fields concisely and remove the class name argument #5723
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty excited about this :) (And would possibly even suggest extracting this into its own library for re-use in the future). Will leave some time for others to chime in
src/python/pants/util/objects.py
Outdated
for name, cls in field_decls.items(): | ||
if isinstance(cls, TypeDecl): | ||
processed_type_decls[name] = cls | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than continue; if
, this looks like it would sit pretty naturally as a chain of elif
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure you're right, I was just having trouble reading the elif
version in my head when I wrote this bit, so used continue
. I will make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I converted this into Exactly.from_type_or_collection()
when I moved TypeConstraint
into objects.py
. This actually has the same control flow as the original, but return
instead of continue
is a little bit more sane (imo) -- let me know if I'm wrong about that.
I'm on the fence about this in general, and we should talk about that before landing this. But one concrete thing that I think is probably not the right tradeoff if we do decide to do this is using a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first pass. I have a few style / assertion comments on tests and some suggestions for type constraints.
SomeTypedDatatype() | ||
self.fail("should have errored: not providing all constructor fields") | ||
except TypedDatatypeInstanceConstructionError as e: | ||
self.assertIn('my_val', str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see more tests that check parts of the error messages. I think you could elide some of the comments like # not providing all the fields
if there were an assertion that checked for that information.
One pattern that might be helpful is to use the context manager provided by assertRaises
. For example
with self.assertRaises(MyError) as cm:
call()
self.assertIn('my error is the best error', cm.exception)
Then you can rewrite the above try, fail, except, assert
as
with self.assertRaises(TypedDatatypeInstanceConstructionError) as cm:
SomeTypedDatatype()
self.assertIn('my_val', str(cm.exception))
self.assertIn('missing field', str(cm.exception)) # or something like it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really appreciate this feedback. I didn't make it clear in the OP, but I absolutely want to make sure error messages (and test cases) are deeply considered and rock solid for a feature like this. I didn't look deeply enough into assertRaises to realize that it yielded a context object and this seems really actionable. With respect to the code example you gave, I would probably do that and extract it into a helper method of the test class. The code samples you've provided are clear enough for me to act on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I canonicalized the __str__()
and __repr__()
of TypedDatatype
, as well as the messages in the exceptions it raises, and added tests for all the failure cases in 133ba06. Let me know if this is what you were looking for.
other_union_val = UnionFieldTypedDatatype(an_arg=3) | ||
self.assertIn('UnionFieldTypedDatatype', repr(other_union_val)) | ||
self.assertIn('an_arg', repr(other_union_val)) | ||
self.assertIn('3', repr(other_union_val)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you've included tests of the repr. I'd like the assertions to be a bit more concrete though. I think testing the full repr would make it clearer what the repr is supposed to look like. How would you feel about that?
Particularly since the ideal for repr is to have a round-trippable string representation that if it were evaluated as python would recreate the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I 100% agree. I have thoughts on stu's comment above (e.g. if there are no named args, then that would change the repr), but validating the repr in a structured way seems really appropriate for this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this in a response to your earlier comment: check 133ba06 to see if the testing added there addresses your concern here.
src/python/pants/util/objects.py
Outdated
"""A wrapper over namedtuple which accepts a dict of field names and types. | ||
|
||
This can be used to very concisely define classes which have fields that are | ||
type-checked at construction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want exact type matching, or allow subtypes? Currently, this allows subtypes. That might be tricky for products in v2. I think v2's type system is primarily based on exact matches of types, either as a singular type or as a type within a union.
If this for datatypes outside v2 that is probably fine. Either way, I think the entry point here should call out how types are matched.
If you want, you could use the existing TypeConstraint
classes that are used by the engine rules. https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/addressable.py#L31
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the right move is probably to do exact type matching for the reasons you describe. I had seen the TypeConstraint classes but hadn't used them -- that seems like the right way to tackle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for union types, exact type matching means you can use a set instead of a list to record the types that could match (and determine whether the type matches instantly), which in itself sounds like a fantastic argument for exact type checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked it over, and found TypeConstraint
to be a drop-in replacement for the TypeMatcher
I had, so used that instead. Had to move TypeConstraint
into objects.py
to avoid a cyclic dependency, but it was easy enough to move and passes tests locally as of ae70e02. The only difference is that it doesn't convert its self._types
into a frozenset
-- this seems like a tiny change so I will probably add that.
src/python/pants/util/objects.py
Outdated
except TypeDecl.ConstructionError as e: | ||
invalid_type_decls.append("in field '{}': {}".format(name, e)) | ||
continue | ||
if isinstance(cls, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe check for (set, list, tuple)
? Or do we want to enforce that it's a list
specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of the right answer to that. I do think this functionality should be extracted into a factory method of the TypeDecl class, which is easy and I can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the new Exactly.from_type_or_collection()
.
class YetAnotherNamedTypedDatatype(typed_datatype( | ||
'YetAnotherNamedTypedDatatype', { | ||
'nothing_special': str, | ||
'just_another_arg': StrOrInt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also create an explicit union here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, and if I wanted to use a field like StrOrInt
, I think that would be much more appropriate as its own typed_datatype
(which could then have its own set of @rule
s converting it into some other type, if necessary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Exactly.from_type_or_collection()
, I converted this into [int, str]
, which is then converted into an Exactly
type constraint.
I totally expected to have an extended discussion about the interface and/or impl before getting merged! I can see typed datatypes being very cool and I want to help ensure that code that lots of stuff depends on is still code that we should feel free to reach in and improve. I hadn't remembered until you said this, but I have definitely seen the utility of not having field names, if each field is then required to have its own distinct type wrt the other fields of the class SomeTypeOfRequest(typed_datatype('SomeTypeOfRequest', {
'binary_snapshot': Snapshot,
'source_snapshot': Snapshot,
})):
pass What assumptions are we making about the two fields? How do we know the class BinarySnapshot(typed_datatype('BinarySnapshot', {
'snapshot': Snapshot,
'binary_path_globs': PathGlobs,
}):
def get_binaries_path_stats(self):
# ...
class SourceSnapshot(typed_datatype('SourceSnapshot', {
'snapshot': Snapshot,
}):
# maybe we don't need to do anything here,
# but it's clear what this is supposed to represent
pass
class SomeTypeOfRequest(typed_datatype('SomeTypeOfRequest', {
'binary_snapshot': BinarySnapshot,
'source_snapshot': SourceSnapshot,
}):
pass We could even make the class SomeTypeOfRequest(typed_datatype('SomeTypeOfRequest', (BinarySnapshot, SourceSnapshot)):
pass
req = SomeTypeOfRequest(BinarySnapshot(...), SourceSnapshot(...))
# this would work as a result of some (probably simple) metaprogramming
# we could apply a similar technique as Task.stable_name() to generate the accessor name
# (that would then mean that all fields of some type would be accessed
# using the same field name, which is extremely interesting)
print(req.binary_snapshot) But if we are allowing multiple fields of the same type, I wouldn't want to be able to invoke the constructor with any positional arguments whatsoever, that seems to me to defeat the purpose of type checking entirely -- I absolutely don't trust myself to always put positional arguments in the right order, and if multiple fields can have the same type we're back where we started (speaking personally). Thinking about it more, I find the idea of requiring distinct types really interesting, especially if we had magic property accessors generated from type names (totally not required -- but I really don't want to use Thinking about it even more, I might actually say that requiring distinct types for fields of a |
I think this is a fantastic idea, not at all in the scope of this PR but if we iterate on it it seems extremely plausible to me that we could make |
@stuhood I also missed your comment about "on the fence about this in general" -- sorry about that. I know you mentioned perf concerns as well as (it seemed like) a question about how this explicit type checking would interact with the implicit type checking done when the v2 engine does a select. I'm not sure about that either, I just think it would be complementary for now, by decoupling type checking from the other failure modes in I'm not sure that's addressing your concern. I would also like to address the perf concern, but it's not clear to me how to construct a benchmark other than generating a million types and doing a million checks -- since |
Moved |
I implemented the magical accessors and unique, positional fields as discussed above in 46c0f2d, then in e6e97d1 I added a decorator class SomeClass():
def __init__(self, arg):
self._arg = arg
def f():
return self._arg
@typed_data(int, SomeClass)
class MyTypedData(SomeMixin):
# source code...
# the above is equivalent to the below:
class MyTypedData(typed_datatype('MyTypedData', (int, SomeClass)), SomeMixin):
# source code...
# you can then do:
data = MyTypedData(3, SomeClass('asdf'))
print(data.primitive__int)
print(data.some_class.f()) This seems pretty neat. I deleted all the
|
I added the |
src/python/pants/util/objects.py
Outdated
# | ||
# class MyTypedData(typed_datatype('MyTypedData', (int, str)), SomeMixin): | ||
# # source code... | ||
def typed_data(*fields): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too magical, IMO... particularly the fact that it requires modifying the linters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment makes sense. I have reverted the linter modification in 238ce3b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rebasing, the sha is aa9ba67.
src/python/pants/util/objects.py
Outdated
|
||
|
||
def typed_datatype(type_name, field_decls): | ||
"""A wrapper over namedtuple which accepts a dict of field names and types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This no longer takes a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the contents of this docstring to the docstring of the datatype()
function and added a FIXME:
to edit it later (and "later" should be right now) in 2d1489e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
@property | ||
def bin_path(self): | ||
return self.binary_location.primitive__str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not great syntax, unfortunately. I think this might have gone too far when it removed field names.
I would propose that the right way to land this would be to simplify it back to:
class JavacVersionExecutionRequest(datatype('JavacVersionExecutionRequest', [('bin_path', str)])):
IE: It should be straightforward to use the datatype
name and accept tuples in the field list. It should also be possible to remove the classname parameter from datatype
and just make an anonymous class, on the assumption that it will be extended. Then you'd have:
class JavacVersionExecutionRequest(datatype([('bin_path', str)])):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is pretty clear reasoning. I think you made this syntax clear in your comments above and I just wasn't sure what the end result would look like, and was a little iffy on how to munge classes in python. The only utility of the decorator was to avoid having to spell out the same class name again, which is clear to me how to address from your description above. I really like the result you've proposed (especially the pretty natural integration with existing datatype()
uses), so thanks for taking the time to make this clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yet again somehow everything Just Works
238ce3b
to
7f319fe
Compare
Ok, just rebased and I think CI will pass, will check back tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating. Nice cleanup.
CI was failing because I had left a FIXME comment in |
Thanks, looks good. Please fix up the description / commit message and I'll merge this tomorrow. |
Updated the post and title. |
### Problem It's not clear how to make rule definitions available to the scheduler, and there's no discussion of the type checking added to `datatype()` fields in #5723. ### Solution - Add a section on datatype fields, typing, and how they interact with the engine. - Add a section on making rules available to the scheduler in and out of unit tests. - Add a section describing types of rules after discussion on this PR.
Problem
See #5716. We had to introduce a ton of boilerplate to check argument types in some very simple usage of
datatype
subclasses in tests in #5703. This is one way to make that easier.Solution
TypeConstraint
and subclasses fromsrc/python/pants/engine/addressable.py
tosrc/python/pants/util/objects.py
, where they probably should have been anyway, to avoid an import cycle.datatype()
(and update all call sites).('field_name', FieldType)
in the list of fields todatatype()
to have the field type-checked by anExactly(FieldType)
type constraint at construction.__str__()
and__repr__()
fordatatype
objects, and add testing for the str and repr.datatype
construction.Result
We can now concisely declare
datatype
objects without having to repeat the class name, and can opt-in to a type check for specific fields.