Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up python bindings generator, and add support for sequences. #214

Merged
merged 1 commit into from
Sep 29, 2020

Conversation

rfk
Copy link
Collaborator

@rfk rfk commented Aug 9, 2020

(This isn't on the critical path for nimbus, so no rush on reviewing this; I was just messing around with an idea on a lazy weekend).

This is a significant refactor of the python bindings generator, with
the aim of making the generated code a bit cleaner and of better hiding
many of our implementation details from the public API. To help prove
out the approach, it adds previously-missing support for sequence types.

The key idea here is that, for any given ComponentInterface, we can
finitely enumerate all the possible types used in that interface
(including recursive types like sequences) and can give each such type
a unique name. This lets us define per-type helper methods on the internal
helper objects like RustBuffer, rather than putting these as "hidden"
methods on the public classes.

For example, for each type that lowers into a RustBuffer, there is
a corresponding RustBuffer.allocFrom{{ type_name }} classmethod for
lowering it and a RustBuffer.consumeInto{{ type_name }} for lifting it.

If you squint, this is a little bit like defining private traits and
implementing them for each type, except within the constraints of python's
much looser type system.

Fixes #39, among other things.

@rfk rfk requested a review from a team August 9, 2020 12:32
@@ -1,17 +1,13 @@
# Helpers for lifting/lowering primitive data types from/to a bytebuffer.
# Helpers for reading/writing types from/to a bytebuffer.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a side note, I tried our calling these "read/write" rather than "lift_from/lower_into", and TBH I think it makes things a bit simpler to understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to do this with all the backends.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we like this, I'd be happy to go through the other backends and make them all consistent. @linacambridge @tarikeshaq @eoger feel free to thumbs-up or thumbs-down this comment for feedback :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds really good. It did take me a while to wrap my head around the difference between lift, liftFrom, lower, and lowerInto at first, and especially since the thing that does the lifting and lowering is called a "reader" and a "writer", changing the names to read and write seems natural.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye, I think we can call it a bit of an experiment in terminology that didn't work out (which, it's good to notice these things and change them before they get too entrenched!)

lift_from_py(&"buf", type_)?
),
Type::Sequence(type_) => format!(
"liftSequence({}, lambda buf: {})",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python lambdas are a pain, much of this is motivated by not wanting to build an elaborate series of nested lambdas for sequences-of-optional-types and things like that.

count -= 1
return items

def write{{ type_name|class_name_py }}(self, items):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a concrete example, in rondpoint this produces a writeSequenceEnumEnumeration method specialized for writing out sequences of Enumeration objects. We also get a writeRecDictionnaire method specialized for writing out Dictionnaire objects, etc.

@rfk
Copy link
Collaborator Author

rfk commented Aug 10, 2020

Just a note that I think we should land #215 before landing this one, given that it's the smaller of two concurrent python-related changes.

@rfk
Copy link
Collaborator Author

rfk commented Aug 20, 2020

This has been rebased and updated, adding support for maps along with additional support for strings.

@jhugman and @eoger , A++ work on the round-tripping tests, they helped me find a few places here where the python code was subtly doing the wrong thing.

@rfk rfk force-pushed the python-type-helpers branch 2 times, most recently from d071237 to 30a7289 Compare August 20, 2020 11:21
Copy link
Contributor

@jhugman jhugman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick drive by with some comments. Looking very nice.

# Test the efficacy of the string transport from rust. If this fails, but everything else
# works, then things are very weird.
wellKnown = st.well_known_string("python")
assert "uniffi 💚 python!" == wellKnown
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

@@ -60,7 +60,7 @@ let rt = Retourneur()
// together, we've shown the correctness of the return leg.
let st = Stringifier()

// Test the effigacy of the string transport from rust. If this fails, but everything else
// Test the efficacy of the string transport from rust. If this fails, but everything else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whelp.

/// able to uniquely name a particular type and call a named helper function that is specific
/// to that type. We support this by defining a naming convention where each type gets a
/// unique name, constructed recursively from the names of its component types if any.
pub fn unique_name(&self) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Ok; I got stuck on uniqueness here; we really giving a mapping from type to a string identifier.

Suggest canonical_name or identifier or id.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like canonical_name 👍

Comment on lines 10 to 13
# XXX TODO: if we have multiple instances of the python-side class
# that somehow share the same underlying handle, then the rust side
# of *all* of them will be poisoned whenever the first one is GC'd.
# Maybe we need a python-side map of handles to instances?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _uniffi_handle is generated by the ConcurrentHandleMap. Doesn't this make guarantees about this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think it's currently possible to get multiple foreign-language objects that reference the same _uniffi_handle, because the only way to get a foreign-language object is via calling a constructor to create a fresh instance. I'll remove this comment and we can reconsider all the details here if/when we come to being able to pass object instances in argument or return position.

return self._unpack_from(8, ">Q")

def putLong(self, v):
def write{{ type_name|class_name_py }}(self, v):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the unique_name through class_name_py could be done in unique_name.

Comment on lines 157 to 169
{% when Type::String -%}
# The primitive string type.
# These write out as size-prefixed utf-8 bytes.

def read{{ type_name|class_name_py }}(self):
size = self._unpack_from(4, ">I")
utf8Bytes = self._read_bytes(size)
return utf8Bytes.decode("utf-8")

def write{{ type_name|class_name_py }}(self, v):
utf8Bytes = v.encode("utf-8")
self._pack_into(4, ">I", len(utf8Bytes))
self._write_bytes(utf8Bytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of construction looks quite verbose. Just explicitly write out the primitives and simple types that can be known upfront, and filter out those types from iter_unique_types().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, I was a bit unsure about how this would come out. Part of what I was going for here, was to avoid emitting code for datatypes that aren't required, e.g. to not emit readUInt16 for code that doesn't use shorts. What do you think about keeping the cases in the match so that we only emit the types that are actually used, but making it easier to read by just writing the direct code rather than this {{ type_name|class_name_py }} templated layer?

@rfk
Copy link
Collaborator Author

rfk commented Aug 23, 2020

Tweaked based on feedback above.

I'd also like to wait for @linacambridge's panic-safety work from #221 to land before merging this, so I can rebase it on top of that.

return "RustBuffer(len={}, data={})".format(self.len, self.data[0:self.len])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: update this to use an explicitly-sized type rather than ctypes.c_long.

@rfk
Copy link
Collaborator Author

rfk commented Sep 8, 2020

I plan to revisit this once #261 and #262 have landed, as I think those wrap up all of the cross-cutting bindings changes that I wanted to make in this short term.

@jhugman
Copy link
Contributor

jhugman commented Sep 17, 2020

Revisiting this after reviewing #279; this is where a lot of those ideas came from.

The second language is easier than the first, the third is easier than the second. Let's add that PR to the long list of PRs that should land before this work re-commences on this one?

@rfk
Copy link
Collaborator Author

rfk commented Sep 17, 2020

Let's add that PR to the long list of PRs that should land before this work re-commences on this one?

Yep! By the time we're finished, there's not going to be much left to this PR other than the python bits...which actually feels pretty right 👍

@rfk rfk force-pushed the python-type-helpers branch 2 times, most recently from 02853a8 to a566d8d Compare September 19, 2020 12:50
Copy link
Collaborator Author

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alrighty, I think this is ready for a proper review, and I'm pleased that after a rebase it's almost entirely changes in the python-related code.

// NB. Numbers are all signed in kotlin. This makes roundtripping of unsigned numbers tricky to show.
// Uniffi does not generate unsigned types for kotlin, but the work tracked is
// in https://github.com/mozilla/uniffi-rs/issues/249. Tests using unsigned types are
// commented out for now.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I happened to notice this outdated comment while I was in here; unsigned types are included in the tests these days.

def _lowerInto(cls, v, buf):
{%- for field in rec.fields() %}
{{ "(v.{})"|format(field.name())|lower_into_py("buf", field.type_()) }}
{%- endfor %}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are now helper methods on the RustBuffer class and its friends, in order to hide them from consumers.

{%- endmatch -%},
_UniFFILib.{{ func.ffi_func().name() }}{% if func.arguments().len() > 0 %},{% endif %}{% call _arg_list_ffi_call(func) -%}
{%- endmatch -%}
,_UniFFILib.{{ func.ffi_func().name() }}{% if func.arguments().len() > 0 %},{% endif %}{% call _arg_list_ffi_call(func) -%}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This little comma tweak makes the call emit entirely on one line, which seemed to help with some formatting issues in the generated code.

@@ -20,8 +20,8 @@
import contextlib

{% include "RustBufferTemplate.py" %}

{% include "RustBufferHelper.py" %}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split this up into two files, one for the reading and one for the writing. It would have been nice to group the operations by datatype rather than read/write, but didn't really work out with the structure of the abstractions here.

Copy link
Contributor

@jhugman jhugman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments and questions, but I'd be happy to see this land.

[0.0, 0.5, 0.25, 1.0, 1.0 / 3],
st.to_string_double,
rustyFloatToStr,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: file an issue for testing defaults in python.

{%- endmatch -%},
_UniFFILib.{{ func.ffi_func().name() }}{% if func.arguments().len() > 0 %},{% endif %}{% call _arg_list_ffi_call(func) -%}
{%- endmatch -%}
,_UniFFILib.{{ func.ffi_func().name() }}{% if func.arguments().len() > 0 %},{% endif %}{% call _arg_list_ffi_call(func) -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Askama templates begin to resemble perl regular expressions at some point, so would suggest moar whitespace and splitting the rest of the args on to another line.

Suggested change
,_UniFFILib.{{ func.ffi_func().name() }}{% if func.arguments().len() > 0 %},{% endif %}{% call _arg_list_ffi_call(func) -%}
, _UniFFILib.{{ func.ffi_func().name() -}}
{% if func.arguments().len() > 0 %}, {% call _arg_list_ffi_call(func) %}{% endif -%}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye, this got pretty hairy. Some of the changes here were due to problems with indentation, but I'll take another look and see if I can clean it up.

{%- endmatch -%},
_UniFFILib.{{ func.ffi_func().name() }},{{- prefix }}{% if func.arguments().len() > 0 %},{% endif %}{% call _arg_list_ffi_call(func) %})
{%- endmatch -%}
,_UniFFILib.{{ func.ffi_func().name() }},{{ prefix }}{% if func.arguments().len() > 0 %},{% endif %}{% call _arg_list_ffi_call(func) -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also

Suggested change
,_UniFFILib.{{ func.ffi_func().name() }},{{ prefix }}{% if func.arguments().len() > 0 %},{% endif %}{% call _arg_list_ffi_call(func) -%}
, _UniFFILib.{{ func.ffi_func().name() }}, {{ prefix -}}
{% if func.arguments().len() > 0 %}, {% call _arg_list_ffi_call(func) %}{% endif -%}

{%- endfor %}
ctypes.POINTER(RustError)
{%- endmacro -%}
ctypes.POINTER(RustError),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, got it.

@@ -53,7 +53,7 @@ mod filters {
FFIType::Float64 => "ctypes.c_double".to_string(),
FFIType::RustCString => "ctypes.c_voidp".to_string(),
FFIType::RustBuffer => "RustBuffer".to_string(),
FFIType::RustError => "RustError".to_string(),
FFIType::RustError => "POINTER(RustError)".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my untrained eye, this doesn't look like Python.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, this at least needs ctypes.POINTER(RustError) otherwise the imports won't work out correctly.

# system.

{%- for typ in ci.iter_types() -%}
{%- let canonical_type_name = typ.canonical_name()|class_name_py -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙇‍♂️ 🏅

pub fn get_record_definition(&self, name: &str) -> Option<&Record> {
// TODO: probably we could store these internally in a HashMap to make this easier?
self.records.iter().find(|r| r.name == name)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we only need this method, but not the other get_*_definition methods. I don't understand how this wasn't needed in the other bindings.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we only need get_record_definition but I added the others for consistency.

The reason they'd needed here is because the python bindings generate some code for type::Record instances as part of doing ci.iter_types, while e.g. the Kotlin bindings generate all the code for type::Record while doing ci.iter_record_definitions. For python, we need to be able to go from the type::Record(name) that we get when iterating the types, to the actual definition of the record with the given name.

…pes.

This is a significant refactor of the python bindings generator, with
the aim of making the generated code a bit cleaner and of better hiding
many of our implementation details from the public API. To help prove
out the approach, it adds previously-missing support for data types like
sequences and maps.

We use the new `TypeUniverse` structure to iterate over all types used
in the interface, and generate various internal helper functions as
methods on our utility classes. This keeps them out of the public API
as seen by consumers.

For example, for each type that lowers into a `RustBuffer`, there is
a corresponding `RustBuffer.allocFrom{{ type_name }}` staticmethod for
lowering it and a `RustBuffer.consumeInto{{ type_name }}` for lifting it.
@rfk
Copy link
Collaborator Author

rfk commented Sep 29, 2020

Thanks @jhugman!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Audit Python helper size conversions
3 participants