-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Get errors for unhashable or mismatched types #7502
Improve Get errors for unhashable or mismatched types #7502
Conversation
Commits are useful to review independently. |
@@ -195,7 +195,7 @@ def test_union_rules(self): | |||
self.assertTrue(isinstance(a, A)) | |||
# Fails due to no union relationship from A -> UnionBase. | |||
expected_msg = """\ | |||
Exception: WithDeps(Inner(InnerEntry { params: {UnionWrapper}, rule: Task(Task { product: A, clause: [Select { product: UnionWrapper }], gets: [Get { product: A, subject: UnionA }, Get { product: A, subject: UnionB }], func: a_union_test(), cacheable: true }) })) did not declare a dependency on JustGet(Get { product: A, subject: A }) | |||
Exception: (A, [UnionWrapper], [Get(A, UnionA), Get(A, UnionB)], a_union_test()) for UnionWrapper did not declare a dependency on Get(A, A) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this less informative than the previous error. What are all of these comma-separated parts of a string? As a rule-author, I don't want to have to dig into the rust code in Pants to work out what these things all are... (The old message wasn't great either, but at least it had keys associated with each value...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is currently the exact output for rule-graph visualization and tracing, but they should clearly evolve. This change mostly just got us into alignment between an error case and those other cases.
If we think it is in scope for this change, I can look at it here... it is mostly a python signature, and we could roughly shape it that way. The only issue is the Gets
and other used Params
. A strawperson refactoring of this example that doesn't lose any information might be:
def a_union_test(UnionWrapper) -> A { for UnionWrapper; Get(A, UnionA), Get(A, UnionB) }
or
def a_union_test -> A { for UnionWrapper; Get(A, UnionA), Get(A, UnionB) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I'm going to open a ticket to discuss this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, in the meantime, I'd like to merge this output as-is in order to make it obvious which cases exist, and where (based on these tests now actually using Display
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the stack trace in particular with rust type names as it lets me reconstruct the rust part in my head much more easily, which is useful for specifically an error trace. This seems like a large UI change wrapped in another functional change. I would really prefer that we solve the output problem in a followup PR addressing #7509 and considering the difference between what we want to display in stack traces vs what we want to display in pretty graphs, while scoping this one to the identify()
and related changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've dropped these changes from the PR and will wait to see what happens with #7509.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as far as my limited understanding of this code goes.
8cd695e
to
c5a9244
Compare
c5a9244
to
b44073c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts on comments to add in places mostly. Being able to use rust enums more naturally is very exciting!
TypeId(c.to_id(res.product)), | ||
c.to_value(res.subject), | ||
c.identify(res.subject), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fantastic change. Is this relying on mostly the rust C representation of enums (with tag
and (in this case) get
fields), or is there some cffi magic going on to allow addressing unions in this deeply ergonomic way? Is there a link we could add documenting why this works to access our rust enums?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I actually think we don't want to do this, as we probably want to colocate parsing the python response and producing the FFI object that represents it (so the below code is needless complexity for now). But I wrote this comment out anyway so I can link to it in #7367:
Could this part be moved into a helper method? Something like:
@memoized_property
def _generator_response_tags_enum(self):
class _Tags(enum([self._lib.Get, self._lib.GetMulti, self._lib.Broke, self._lib.Throw])): pass
return _Tags
def _set_generator_response(self, response, c, input_res, tag):
attr, value = self._generator_response_tags_enum(tag).resolve_for_enum_variant({
self._lib.Get: lambda: ('get', (
TypeId(c.to_id(input_res.product)),
c.to_value(input_res.subject),
c.identify(input_res.subject),
),
# ...
response.tag = tag
setattr(response, attr, value)
})
def extern_generator_send(...):
# ...
if isinstance(res, Get):
self._set_generator_response(response, c, res, self._lib.Get)
# ...
return response[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fantastic change. Is this relying on mostly the rust C representation of enums (with tag and (in this case) get fields), or is there some cffi magic going on to allow addressing unions in this deeply ergonomic way? Is there a link we could add documenting why this works to access our rust enums?
This is mostly relying on the C API that is generated by cbindgen. Kudos on integrating that.
ffi = getattr(self._ffi_module, 'ffi') | ||
_FFISpecification(ffi).register_cffi_externs() | ||
return ffi | ||
return getattr(self._ffi_module, 'ffi') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that register_cffi_externs()
modifies the ffi
object and not the lib
object, I would think we would want to keep that call in this method to avoid spooky mutation of the ffi
object happening whenever we first happen to access the lib
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, constructing the _FFISpecification
inside of self.ffi
triggered infinite recursion when I attempted to feed it self.lib
(which required self.ffi
, etc, etc).
GetMulti(TypeIdBuffer, HandleBuffer, IdentBuffer), | ||
// NB: Broke not Break because C keyword. | ||
Broke(Handle), | ||
Throw(Handle), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better!! Thank you for taking the time to figure out how to do this with cffi, it makes the code leagues more readable. It also obviates the need for that TODO without additional complexity in building the engine!
with_vec(self.ptr(), self.len() as usize, |vec| Self::lift(&vec[0])) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all really great! I really like that we can manipulate python collections in this structured, efficient, and safe way.
"Expected exactly 1 item in Buffer, but had: {}", | ||
self.len() | ||
); | ||
with_vec(self.ptr(), self.len() as usize, |vec| Self::lift(&vec[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this doesn't delegate to .to_vec()
? Like:
fn unwrap_one(&self) -> Output {
assert_equal!(self.len(), 1, "Expected exactly 1 item in Buffer, but had: {}", self.len());
self.to_vec()[0]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with_vec
doesn't actually "create" a vec: instead, it "views" some memory as a Vec via:
unsafe { Vec::from_raw_parts(c_ptr, c_len, c_len) }
That means that by avoiding calling to_vec()
, we avoid actually copying the buffer containing the one item.
idents_ptr: *mut Ident, | ||
idents_len: u64, | ||
// A Handle to hold the underlying array alive. | ||
handle_: Handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this part isn't new, but could we add a comment on one of these fields explaining why handle_
needs to exist? Is it so python doesn't gc the pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. Since CI is green here, I'm going to go ahead and merge this one, but can revisit.
/// | ||
/// The response from a call to extern_generator_send. Gets include Idents for their Handles | ||
/// in order to avoid roundtripping to intern them, and to eagerly trigger errors for unhashable | ||
/// types on the python side where possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good explanation.
@@ -213,6 +222,12 @@ def test_get_type_match_failure(self): | |||
# `a_typecheck_fail_test` above expects `wrapper.inner` to be a `B`. | |||
self.scheduler.product_request(A, [Params(TypeCheckFailWrapper(A()))]) | |||
|
|||
def test_unhashable_failure(self): | |||
"""Test that unhashable Get(...) params result in a structured error.""" | |||
expected_msg = """TypeError: unhashable type: 'list'""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like for this message to check more specifically which object has the unhashable type, but I can't see how to do that easily, so this is definitely specific enough for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
Problem
The error for unhashable types used as
Get
params is unfortunate, because the call toidentify
fails and returns a null pointer to rust, which triggers an error... somewhere else. See #7499.Solution
As suggested in #7499, have
extern_generator_send
directlyidentify
Get
params, meaning that failures to hash take advantage of the fact thatextern_generator_send
is already fallible. This avoids a significant number of calls back to python to identifyValues
(an optimization discussed in #7318 and #7114).Result
Improved usability. Fixes #7499.