-
-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert usages of TypeConstraint to TypeId for rule products in the engine #7114
convert usages of TypeConstraint to TypeId for rule products in the engine #7114
Conversation
I'm marking this WIP because I'm seeing failures locally induced by things in the native backend not being tuples (among other things) and would love to get #7115 in first so I can avoid that mishap. The code here works. One thing I noticed is that I would get:
So I believe the fact that we are running these datatypes from the native backend through the |
Hm... it should not be necessary to hash the return value of an |
Yes, that is the issue -- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that the failing call shown in that stack should be converted into a call to externs::satisfied_by
Yes, that is the issue -- externs::satisfied_by was removed in this PR in order to convert TypeConstraint to TypeId. I believe this is the correct move, exchanging the ability to check for types satisfying constraints without hashing for a much simpler type checking model which runs much less python code to check whether a constraint is satisfied.
So, assuming that this PR were to swap externs::satisfied_by
from taking a TypeConstraint
to taking a TypeId
(and if it were renamed to externs::type_equals
, or something), it should be an equivalent amount of python code running... possibly less.
Also, note that there is a dict of interned types on the python side that doesn't appear to have been changed here: https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/native.py#L499-L500.
Having said that, since externs::product_type
doesn't hash the object instance, I don't see where any additional hashing of object instances is being introduced, so it's not clear why LLVMLinker
would go from not-hashed to hashed.
So perhaps there is some accidental additional hashing happening here? If so, it would be good to remove the additional hashing rather than blocking this on #7115.
Thank you for diving into this! That seems correct. |
54051b7
to
95de1ba
Compare
I tried to repro the errors I was seeing earlier, but they are currently not happening locally (and I didn't make anything else into tuples in the meantime, so ???). Once I can get this through Travis this should be reviewable. |
Very excited for this one! Thanks. |
We are seeing the error again, e.g. in this travis run -- will refer to the discussion above to diagnose and fix. |
@stuhood The error was because like you mentioned above, |
fd2f4bd
to
1cc87cb
Compare
@stuhood Green means go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Some optional tweaks to the FFI boundary.
src/rust/engine/src/externs.rs
Outdated
/// | ||
#[repr(C)] | ||
pub struct Ident { | ||
pub struct ProductTypeId { | ||
pub hash: i64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably isn't strictly necessary to have a hash on ProductTypeId
... it should be ok to just use the derived hashcode (based on hashing the TypeId, which is distinct).
And given that, I'm not sure that ProductTypeId
needs to exist... could just be TypeId
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was able to convert this to just TypeId
by hashing it manually -- let me know if the PRODUCT_TYPE_ID_HASH_BUILDER
lazy static in interns.rs
looks right: https://github.com/pantsbuild/pants/pull/7114/files#diff-8ab5c4db2ca5c0325da0cb5aecefaff7R43
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant just #[derive(Hash)]
, I think. But I might have been missing something. Regardless, not worth another CI burn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh...well TypeId
already derives Hash
? This was just my best attempt at turning the TypeId
into an i64
for the InternKey
(I was actually extremely confused about this part and almost messaged the engine channel), and since we were already using the FNV
hash it seemed reasonable to use again. I'm not sure if the as i64
affects any of the hashing properties, or if there's an easier way to get an i64
from an instance of a hashable type.
src/rust/engine/src/interning.rs
Outdated
@@ -43,14 +43,32 @@ impl Interns { | |||
Interns::default() | |||
} | |||
|
|||
pub fn insert_product(&mut self, v: Value) -> Key { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a separate method, or could just using insert
work? AFAIK, the type(some_type)
should just be type
, so calling insert
should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added in a hurry when I didn't know why identify()
was causing errors, this edit sounds right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we do want to make sure the type is stored in self._types
in native.py
, right? Are you thinking of using insert
without removing extern_product_type()
somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Should the generator code (in https://github.com/pantsbuild/pants/blob/3a81e86eca33bcd16ca68838052ffeb25fdcc803/src/python/pants/engine/native.py#L417-450) memoize these in self._types
before returning them?
We have TypeIdBuffer
, although it doesn't look like it's being used anywhere:
pants/src/rust/engine/src/externs.rs
Lines 538 to 551 in 3a81e86
// Points to an array of TypeIds. | |
#[repr(C)] | |
pub struct TypeIdBuffer { | |
ids_ptr: *mut TypeId, | |
ids_len: u64, | |
// A Handle to hold the underlying array alive. | |
handle_: Handle, | |
} | |
impl TypeIdBuffer { | |
pub fn to_vec(&self) -> Vec<TypeId> { | |
with_vec(self.ids_ptr, self.ids_len as usize, |vec| vec.clone()) | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking that we could add another argument to extern_identify()
which says to consider the handle as a type
and intern that instead of taking the type()
of the handle.
But I think the solution you're suggesting here with memoizing in extern_generator_send()
sounds like we would end up being able to remove all of the ffi calls in interning.rs
? Because that sounds pretty neat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pulled this thread in another PR: should have it out in a few minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding completely eliminating the need to call back to python in interning.rs
: I think that you could almost do it, but we'd probably need to make the PyGeneratorResponse
type more complicated, because we only hash the keys in Gets
, and not in Breaks
.
src/rust/engine/src/nodes.rs
Outdated
@@ -812,6 +807,8 @@ impl Task { | |||
.expect("edges for task exist.") | |||
.entry_for(&select_key) | |||
.unwrap_or_else(|| { | |||
// TODO: convert this to a Result, as well as all other panic!()s and .expects() that | |||
// assert graph logic which is not clearly going to be correct in all future cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree in general, the fix here is probably to improve the RuleGraph
's enums and types to make these operations infalliable, rather than turning this into a user-visible error. A failure here would be a bug in the RuleGraph
, rather than in a user's rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds good, will edit the comment to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turned into a comment about returning enums to avoid flattening failure modes, let me know if that captures the intent.
src/python/pants/engine/rules.py
Outdated
|
||
class RuleIndex(datatype(['rules', 'roots'])): | ||
"""Holds a normalized index of Rules used to instantiate Nodes.""" | ||
|
||
@classmethod | ||
def create(cls, rule_entries): | ||
"""Creates a RuleIndex with tasks indexed by their output type.""" | ||
# TODO: make a defaultdict-like wrapper for OrderedDict (and other types?)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sortof off-topic I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- remove `extern_satisfied_by()` - remove `extern_satisfied_by_type()` - add `extern_product_type()`
12a5152
to
82e04cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
#7318) ### Problem #7114 set us up to use lighter-weight identification of product types in `externs::generator_send`. We can additionally avoid calling back to python to memoize product types by having methods return `TypeIds` rather than `Values` representing types. ### Solution Rather than a `ValueBuffer` containing type instances, memoize the types and return a `TypeIdBuffer`. ### Result Fewer calls back to python.
### Problem The error for unhashable types used as `Get` params is unfortunate, because the call to `identify` fails and returns a null pointer to rust, which triggers an error... somewhere else. See #7499. ### Solution As suggested in #7499, have `extern_generator_send` directly `identify` `Get` params, meaning that failures to hash take advantage of the fact that `extern_generator_send` is already fallible. This avoids a significant number of calls back to python to identify `Values` (an optimization discussed in #7318 and #7114). ### Result Improved usability. Fixes #7499.
Problem
See #4535 and #4005, in particular this comment on #4535.
TypeConstraint
is a pretty general construct that we would like to do more with, for example #6936, and as of the current discussion in #4535 we realize we can emulate union types in@rule
s without it, matching just against a specific type.Solution
output_constraint
in theRule
subclasses inrules.py
intooutput_type
, and add aSubclassesOf(type)
type check indatatype
fields in those classes to ensure this.satisfied_by()
andsatisfied_by_type()
externs, and add aproduct_type()
extern used to intern a pythontype
as aTypeId
.TypeConstraint
s passed to the engine for intrinsics (e.g.Snapshot
) intoTypeId
s.==
!fmt::Debug
forTypeId
to be the same asDisplay
(we may want to do these differently in the future, but it is very useful to see the actual type name when debugging).Result
Everything is the same, but now we don't have the additional complexity of
TypeConstraint
down in the engine when we can do simpleTypeId
equality. This lets us get more creative withTypeConstraint
on the python side, while type checking on the rust side is a little less complex (and probably more efficient to some degree).