Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Get errors for unhashable or mismatched types #7502

Merged
merged 5 commits into from
Apr 10, 2019

Conversation

stuhood
Copy link
Member

@stuhood stuhood commented Apr 4, 2019

Problem

The error for unhashable types used as Get params is unfortunate, because the call to identify fails and returns a null pointer to rust, which triggers an error... somewhere else. See #7499.

Solution

As suggested in #7499, have extern_generator_send directly identify Get params, meaning that failures to hash take advantage of the fact that extern_generator_send is already fallible. This avoids a significant number of calls back to python to identify Values (an optimization discussed in #7318 and #7114).

Result

Improved usability. Fixes #7499.

@stuhood
Copy link
Member Author

stuhood commented Apr 4, 2019

Commits are useful to review independently.

src/rust/engine/src/externs.rs Outdated Show resolved Hide resolved
src/rust/engine/src/rule_graph.rs Outdated Show resolved Hide resolved
src/rust/engine/src/rule_graph.rs Outdated Show resolved Hide resolved
@@ -195,7 +195,7 @@ def test_union_rules(self):
self.assertTrue(isinstance(a, A))
# Fails due to no union relationship from A -> UnionBase.
expected_msg = """\
Exception: WithDeps(Inner(InnerEntry { params: {UnionWrapper}, rule: Task(Task { product: A, clause: [Select { product: UnionWrapper }], gets: [Get { product: A, subject: UnionA }, Get { product: A, subject: UnionB }], func: a_union_test(), cacheable: true }) })) did not declare a dependency on JustGet(Get { product: A, subject: A })
Exception: (A, [UnionWrapper], [Get(A, UnionA), Get(A, UnionB)], a_union_test()) for UnionWrapper did not declare a dependency on Get(A, A)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this less informative than the previous error. What are all of these comma-separated parts of a string? As a rule-author, I don't want to have to dig into the rust code in Pants to work out what these things all are... (The old message wasn't great either, but at least it had keys associated with each value...)

Copy link
Member Author

@stuhood stuhood Apr 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is currently the exact output for rule-graph visualization and tracing, but they should clearly evolve. This change mostly just got us into alignment between an error case and those other cases.

If we think it is in scope for this change, I can look at it here... it is mostly a python signature, and we could roughly shape it that way. The only issue is the Gets and other used Params. A strawperson refactoring of this example that doesn't lose any information might be:

def a_union_test(UnionWrapper) -> A { for UnionWrapper; Get(A, UnionA), Get(A, UnionB) }

or

def a_union_test -> A { for UnionWrapper; Get(A, UnionA), Get(A, UnionB) }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I'm going to open a ticket to discuss this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, in the meantime, I'd like to merge this output as-is in order to make it obvious which cases exist, and where (based on these tests now actually using Display).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the stack trace in particular with rust type names as it lets me reconstruct the rust part in my head much more easily, which is useful for specifically an error trace. This seems like a large UI change wrapped in another functional change. I would really prefer that we solve the output problem in a followup PR addressing #7509 and considering the difference between what we want to display in stack traces vs what we want to display in pretty graphs, while scoping this one to the identify() and related changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've dropped these changes from the PR and will wait to see what happens with #7509.

src/rust/engine/src/externs.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as far as my limited understanding of this code goes.

@stuhood stuhood force-pushed the stuhood/improve-get-errors branch 2 times, most recently from 8cd695e to c5a9244 Compare April 6, 2019 15:15
@stuhood stuhood force-pushed the stuhood/improve-get-errors branch from c5a9244 to b44073c Compare April 6, 2019 15:28
Copy link
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on comments to add in places mostly. Being able to use rust enums more naturally is very exciting!

TypeId(c.to_id(res.product)),
c.to_value(res.subject),
c.identify(res.subject),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fantastic change. Is this relying on mostly the rust C representation of enums (with tag and (in this case) get fields), or is there some cffi magic going on to allow addressing unions in this deeply ergonomic way? Is there a link we could add documenting why this works to access our rust enums?

Copy link
Contributor

@cosmicexplorer cosmicexplorer Apr 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I actually think we don't want to do this, as we probably want to colocate parsing the python response and producing the FFI object that represents it (so the below code is needless complexity for now). But I wrote this comment out anyway so I can link to it in #7367:

Could this part be moved into a helper method? Something like:

    @memoized_property
    def _generator_response_tags_enum(self):
       class _Tags(enum([self._lib.Get, self._lib.GetMulti, self._lib.Broke, self._lib.Throw])): pass
       return _Tags

    def _set_generator_response(self, response, c, input_res, tag):
      attr, value = self._generator_response_tags_enum(tag).resolve_for_enum_variant({
        self._lib.Get: lambda: ('get', (
            TypeId(c.to_id(input_res.product)),
            c.to_value(input_res.subject),
            c.identify(input_res.subject),
          ),
        # ...
        response.tag = tag
        setattr(response, attr, value)
      })

    def extern_generator_send(...):
      # ...
              if isinstance(res, Get):
                self._set_generator_response(response, c, res, self._lib.Get)
              # ...
      return response[0]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fantastic change. Is this relying on mostly the rust C representation of enums (with tag and (in this case) get fields), or is there some cffi magic going on to allow addressing unions in this deeply ergonomic way? Is there a link we could add documenting why this works to access our rust enums?

This is mostly relying on the C API that is generated by cbindgen. Kudos on integrating that.

ffi = getattr(self._ffi_module, 'ffi')
_FFISpecification(ffi).register_cffi_externs()
return ffi
return getattr(self._ffi_module, 'ffi')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that register_cffi_externs() modifies the ffi object and not the lib object, I would think we would want to keep that call in this method to avoid spooky mutation of the ffi object happening whenever we first happen to access the lib object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, constructing the _FFISpecification inside of self.ffi triggered infinite recursion when I attempted to feed it self.lib (which required self.ffi, etc, etc).

GetMulti(TypeIdBuffer, HandleBuffer, IdentBuffer),
// NB: Broke not Break because C keyword.
Broke(Handle),
Throw(Handle),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better!! Thank you for taking the time to figure out how to do this with cffi, it makes the code leagues more readable. It also obviates the need for that TODO without additional complexity in building the engine!

with_vec(self.ptr(), self.len() as usize, |vec| Self::lift(&vec[0]))
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all really great! I really like that we can manipulate python collections in this structured, efficient, and safe way.

"Expected exactly 1 item in Buffer, but had: {}",
self.len()
);
with_vec(self.ptr(), self.len() as usize, |vec| Self::lift(&vec[0]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this doesn't delegate to .to_vec()? Like:

  fn unwrap_one(&self) -> Output {
    assert_equal!(self.len(), 1, "Expected exactly 1 item in Buffer, but had: {}", self.len());
    self.to_vec()[0]
  }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_vec doesn't actually "create" a vec: instead, it "views" some memory as a Vec via:

unsafe { Vec::from_raw_parts(c_ptr, c_len, c_len) }

That means that by avoiding calling to_vec(), we avoid actually copying the buffer containing the one item.

idents_ptr: *mut Ident,
idents_len: u64,
// A Handle to hold the underlying array alive.
handle_: Handle,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this part isn't new, but could we add a comment on one of these fields explaining why handle_ needs to exist? Is it so python doesn't gc the pointer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Since CI is green here, I'm going to go ahead and merge this one, but can revisit.

///
/// The response from a call to extern_generator_send. Gets include Idents for their Handles
/// in order to avoid roundtripping to intern them, and to eagerly trigger errors for unhashable
/// types on the python side where possible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good explanation.

@@ -213,6 +222,12 @@ def test_get_type_match_failure(self):
# `a_typecheck_fail_test` above expects `wrapper.inner` to be a `B`.
self.scheduler.product_request(A, [Params(TypeCheckFailWrapper(A()))])

def test_unhashable_failure(self):
"""Test that unhashable Get(...) params result in a structured error."""
expected_msg = """TypeError: unhashable type: 'list'"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like for this message to check more specifically which object has the unhashable type, but I can't see how to do that easily, so this is definitely specific enough for now.

Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants