Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default and converting constructors for all concrete Python types #464

Merged
merged 4 commits into from
Nov 17, 2016

Conversation

dean0x7d
Copy link
Member

@dean0x7d dean0x7d commented Oct 24, 2016

There is currently a functionality split between py::str() and py::object::str() which can be surprising:

const char *c_string = ...;
auto s1 = pt::str(c_string); // convert to str
std::string std_string = ...;
auto s2 = pt::str(std_string); // convert to str
py::object obj = ...;
auto s3 = pt::str(obj); // --> no conversion <--
auto s4 = obj.str(); // convert to str

This PR makes case s3 do the expected conversion (if needed). Case s4 is deprecated.

Update

This was extended to all concrete Python types represented in pybind11. New reinterpret_* functions were also added to streamline taking raw PyObject *. For example:

// New constructors
auto l1 = list(); // default construct empty list
auto l2 = list(obj); // convert `obj` to a list (throw on error)

// DEPRECATED
auto l3 = list(ptr, true/false); // borrow or steal `PyObject *` without any conversion

// In favor of
auto lb = reinterpret_borrow<list>(ptr);
auto ls = reinterpret_steal<list>(ptr);

Deprecations

  • obj.str() is deprecated in favor of str(obj).
  • T obj = ...; obj.check() is deprecated in favor of isinstance<T>(obj).
  • The T(handle, bool) constructor is deprecated in favor of reinterpret_borrow<T>() and reinterpret_steal<T>().
  • obj.repr() is removed in favor of repr(obj). There is no deprecation warning since this hasn't actually made it to a stable version yet.

Implementation

Making this work required changing the usual type check order, e.g.:

auto l = list(ptr, true);
if (l.check())
   // ...

The above check happens after a py::object type is created, however this would not work correctly with a converting constructor since py::str would be able to make a string representation of any object which would essentially make its .check() always true. This would also make using py::str as a function parameter very difficult since it would shadow every other overload.

To overcome this, type creation and checking are reversed with the following syntax:

if (isinstance<list>(ptr)) {
    auto l = reinterpret_borrow<list>(ptr);
    // ...
}

The py::isinstance<T>(obj) syntax was chosen because it can work uniformly for types derived from py::object as well as user types wrapped with py:class_.

class Pet { ... };
py::class_<Pet>(...);

m.def("is_pet", [](py::object obj) {
    return py::isinstance<Pet>(obj); // works as expected
});

Note that py::isinstance takes a shortcut and calls the usual Py*_Check() functions for py::object classes, while user classes go the long way with detail::get_type_info() and PyObject_IsInstance().

@aldanor
Copy link
Member

aldanor commented Oct 24, 2016

Hmm, I'm trying to figure, won't isinstance<array_t> actually try to create an array with a consequent data copy?

@dean0x7d
Copy link
Member Author

No, it won't. The third and fourth parameters of the PYBIND11_OBJECT_CVT macro are CheckFun and CvtStmt (the conversion). CheckFun was called in .check() and is now called in isinstance(). CvtStmt is called in the constructors and copy/move assignments. array_t just had a non_null check function without any conversion.

@wjakob
Copy link
Member

wjakob commented Oct 24, 2016

Cool, that seems like a nice idea. However, a few thoughts:

  1. It seems inconsistent to apply this change only to py::str but no other type wrappers.The logical consequence would IMHO also be that py::list turns an iterator into an actual list, and so on. (If the goal is to put this into v2.0, it should be done consistently everywhere.)
  2. Some of the changes complicate things (I'm never a fan of this), like the extra indirection for pybind11 type wrappers. I don't have a good alternative solution though.
  3. py::isinstance<> seems pretty long for something this important. How about py::is<> or py::is_a<>
  4. Good catch in PYBIND11_OBJECT_CVT :)
  5. Super-minor: inline not needed for templates. Typo: "aleady"

Best,
Wenzel

@dean0x7d
Copy link
Member Author

  1. Yes, I'm definitely in favor of having all the Python type wrappers have the appropriate constructors. This shouldn't be too much work, I'll get on it.
  2. You mean the type_caster specialization? That one is unfortunate and I was trying to find a way around it, but array_t just behaves so differently compared to the other py::object classes.
  3. Would isinstance really be used so much outside of type_caster? I was going for the same naming as Python's isinstance since that's exactly what it does. I think is could be confused with the Python keyword.
  4. I was very much surprised that not a single compiler issued a "dead code after return" warning for that function in PYBIND11_OBJECT_CVT.
  5. OK, will be fixed.

@dean0x7d
Copy link
Member Author

dean0x7d commented Oct 28, 2016

OK, so all concrete Python types now have converting and default constructors which match their Python counterparts. The pytype classes which model protocols (iterator, sequence) did not get any conversions or meaningful default constructors (which should be fine).

The only odd class is bytes. There are significant differences between Python 2.x and 3.x which make a general object -> bytes conversion pretty messy. There is already a str -> bytes converting constructor which should be enough for most cases, so I didn't want to complicate this too much.

There were a few things that needed to be adjusted because of the new converting constructors. See the commit messages for details.

Note for the naming of reinterpret_borrow<T>() and reinterpret_steal<T>(): these are long names, but should serve as warning that there are low-level things.

@dean0x7d dean0x7d changed the title Make py::str(obj) equivalent to its Python counterpart Add default and converting constructors for all concrete Python types Oct 28, 2016
@wjakob
Copy link
Member

wjakob commented Nov 3, 2016

this is conflicted now as well..

@dean0x7d
Copy link
Member Author

dean0x7d commented Nov 3, 2016

Rebased.

@aldanor
Copy link
Member

aldanor commented Nov 3, 2016

Sorry, I've broken your branch again :)

@dean0x7d
Copy link
Member Author

dean0x7d commented Nov 3, 2016

No worries :) Rebased and fixed.

@aldanor
Copy link
Member

aldanor commented Nov 3, 2016

This looks good all in all (although will sure trigger some deprecation warnings on existing code, but that's ok, better now than later), is there anything left planned to be done or should it get merged?

@dean0x7d
Copy link
Member Author

dean0x7d commented Nov 3, 2016

I don't have anything else planned here. I suppose it may be nice to make the new converting constructors explicit, but that would be a compilation error for existing code and this PR is probably disruptive enough as it is.

Copy link
Member

@wjakob wjakob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

I did a pass over the PR and added a few minor comments.

Best,
Wenzel

if (c.check()) {
value = (void *) c;
if (isinstance<capsule>(h)) {
value = capsule(h, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not reinterpret_borrow here?

return_value_policy policy = return_value_policy::automatic_reference,
handle parent = handle()) {
template <typename T, detail::enable_if_t<detail::is_pyobject<T>::value, int> = 0>
T cast(const handle &handle) { return {handle, true}; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reinterpret_borrow?

@@ -1224,7 +1241,7 @@ class unpacking_collector {
int _[] = { 0, (process(args_list, std::forward<Ts>(values)), 0)... };
ignore_unused(_);

m_args = object(PyList_AsTuple(args_list.ptr()), false);
m_args = std::move(args_list);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an optimization, I assume?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the conversion was done in the original PYBIND11_OBJECT_CVT macro, the move constructor was more efficient, so the std::move here resulted in smaller binary size. With the reworked CVT macro, the lvalue ref constructor is equally good, so the move isn't strictly needed any more, but it doesn't hurt either (args_list's lifetime ends here anyway).

@@ -98,7 +98,6 @@
#define PYBIND11_BYTES_FROM_STRING_AND_SIZE PyBytes_FromStringAndSize
#define PYBIND11_BYTES_AS_STRING_AND_SIZE PyBytes_AsStringAndSize
#define PYBIND11_BYTES_AS_STRING PyBytes_AsString
#define PYBIND11_BYTES_CHECK PyBytes_Check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised these duplicate definitions didn't cause warnings before.

/* Copy flags from base (except baseship bit) */
flags = base_array.flags() & ~detail::npy_api::NPY_ARRAY_OWNDATA_;
flags = array(base, true).flags() & ~detail::npy_api::NPY_ARRAY_OWNDATA_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reinterpret_borrow?

// Derived classes may override this constructor for Python type conversions
object(handle h, bool borrowed) : handle(h) { if (borrowed) inc_ref(); }
// This one must be inherited as is -- it's always just a pure pointer assignment
object(handle h, bool borrowed, detail::noconvert) : object(h, borrowed) { }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this special detail::noconvert constructor is needed. Isn't having a true/false argument enough to uniquely specify what is meant? (i.e. just borrow/steal a reference, but don't run any conversions).

Or is there some version of this constructor, which additional does something to the object? In that case, it may be worth rethinking that.

@dean0x7d
Copy link
Member Author

dean0x7d commented Nov 5, 2016

Based on the comments, I think the constructors may be need to be reworked to avoid confusion. This is a bit of a longer post and apologies if I'm bikeshedding here. But I think it may be best to first define the interface for selecting convert/reinterpret & borrow/steal and work on the implementation from there.

Here is the situation right now:

auto l1 = list(ptr, true/false); // does a conversion
auto l2 = list(ptr, true/false, noconvert()); // just assigns the pointer

The first one does a conversion because that was the existing situation with the OBJECT_CVT macro and array_t, so I didn't want to mess with that. For library internals, I think it's important to be able to choose between the converting and non-converting constructor. The alternative is just a straight bool implementation:

list::list(PyObject *ptr, bool borrow, bool convert) {
    if (convert) {
        m_ptr = PySequence_List(ptr);
        if (!borrow) handle(ptr).dec_ref();
        if (!m_ptr) throw error_already_set();
    } else {
        m_ptr = ptr;
        if (borrow) inc_ref();
    }
}

But I feel that having something like

auto l = list(ptr, true, false);

would be very confusing and error-prone. It would also be more work for the optimizer to eliminate dead code. Hence reinterpret_*:

PyObject *ptr = ...;

// non-converting
auto lb = reinterpter_borrow<list>(ptr);
auto ls = reinterpter_steal<list>(ptr);

// converting
auto l1 = list(ptr, true/false); // borrow or steal
auto l2 = cast<list>(ptr); // borrow
auto l3 = handle(ptr).cast<list>(); // borrow

There are multiple ways of doing the converting cast, but only one has a choice of borrow or steal for a raw pointer/handle. That's reasonable since the converting casts are mostly aimed at object which automatically manages references. So let's just stick with with the l1 case.

I messed up by adding the named borrow/steal reinterpret functions because they have a different mechanism of selecting borrow/steal compared to the converting l1 case. This could be streamlined.

Alternative interface 1

One possibility is to just stick with bools and then have the following:

// non-converting
auto l = reinterpter_as<list>(ptr, true/false); // borrow/steal
// converting
auto l = list(ptr, true/false); // borrow/steal

Alternative interface 2

My ideal case would be to encode borrow/steal in the type system:

auto thing = handle(...); // considered a borrowed reference (no ref counting)
// or
auto thing = new_handle(...); // considered a new reference (no ref counting)
// or
auto thing = object(...); // automatically managed

Now we can overload on type for borrow/steal:

// non-converting
auto l = reinterpter_as<list>(thing); // the compiler knows to borrow/steal based on type
// converting
auto l = list(thing); // again, the compiler knows what to do

This shifts the responsibility of selecting the kind of reference from the consumption site to the creation site. Currently, we have:

PyObject *ptr = PySomething_New(...);
// ...
auto o = object(ptr, false); // manually keeping track that this is a new reference

Instead, the new/borrowed selection could be done right at creation:

new_handle h = PySomething_New(...);
// ...
auto o = object(h); // compiler automatically calls the steal overload

I was looking into it and this would actually be a backwards compatible change with just a deprecation of the object(handle, bool) constructor.

Again, I might just be bikeshedding here, so feel free to just replace this with something you feel would be reasonable.

@aldanor
Copy link
Member

aldanor commented Nov 6, 2016

But I feel that having something like

   auto l = list(ptr, true, false);

would be very confusing and error-prone.

I think the point here is that these arguments are most often than not fixed at compile-time (i.e. you most always just pass boolean literals true or false since you know if you're borrowing or stealing, converting or not). As such, reinterpret_* variants basically just encode this notion using separate names, which makes sense.

// Btw if array_t is the only exception that sticks out, we could probably consider changing it for the sake of consistency. I don't think there's anything special about it other than the fact that it's (somewhat arbitrarily) done the way it's done. IIRC OBJECT_CVT is only used for array_t, too.

@jagerman
Copy link
Member

jagerman commented Nov 6, 2016

Is there really a benefit of encoding what is, as far as I can tell, simply an option about how to construct the object in the type? It feels a bit over-engineered, and it's really a pretty small bit of (non-templated!) dead code elimination.

I think the current "bypass" constructor (as the code is now) seems reasonable and relatively simple. It makes the various derived class constructors simpler because they don't have to worry about whether they are being constructed in noconvert mode or not: they just provide a two-argument constructor that is only called in conversion-allowed mode. Code needing a non-converting constructor calls the "bypass" constructor that never hits the overridden constructors.

One small suggestion, though: it might be nicer to do:

struct noconvert_t {};
static constexpr noconvert_t noconvert{};

so that you can call with detail::noconvert instead of detail::noconvert(). (Although maybe there's some reason (or compilers where) that won't work?)

@dean0x7d
Copy link
Member Author

dean0x7d commented Nov 6, 2016

Is there really a benefit of encoding what is, as far as I can tell, simply an option about how to construct the object in the type? It feels a bit over-engineered, and it's really a pretty small bit of (non-templated!) dead code elimination.

My main hesitation is about introducing an intrusive change to an existing codebase. I would definitely not call the approach itself over-engineered -- encoding as much information into the type system is a key strength of C++ and it's very good practice. Compare the following interfaces:

// What happens if we mix up the argument order?
void set_date(int m, int d, int y); // runtime error
void set_date(month_t m, day_t d, year_t y); // compile-time error

As for benefits in this project specifically, I've started tinkering with the implementation and it has already revealed a bug in one of the type casters:

// member of a type_caster
static handle cast(const T& src, return_value_policy policy, handle parent) {
    if (!src)
        return none();
    return caster_type::cast(*src, policy, parent);
}

Anyone see the bug? I definitely missed it. handle usually plays the role of a borrowed reference, except for a few specific cases where it's considered a new reference. Without using the type system, this is just convention which we need to remember to apply consistently. Bugs are allowed to go to runtime.

But if we make this change:

using borrowed_ref = handle;
struct new_ref { PyObject *m_ptr; /*...*/ };

class object : handle { // *implicit* conversion to a borrowed reference
    // ...
    new_ref release() const; // *explicit* conversion to a new reference
};

And now we tell the type system that type_caster::cast must return a new reference:

static new_ref cast(/*...*/) { // <-- returning new_ref instead of handle
    if (!src)
        return none(); // <-- compile-time error: no implicit conversion from object to new_ref
    return caster_type::cast(*src, policy, parent);
}

The bug is prevented at compile time! And it's a simple fix:

static new_ref cast(/*...*/) {
    if (!src)
        return none().release();
    return caster_type::cast(*src, policy, parent);
}

@wjakob
Copy link
Member

wjakob commented Nov 6, 2016

I agree that

auto l = list(ptr, true, false);

seems like an overly confusing and bug-prone notation. However, I am also hesitant to sign off on the new_ref etc. type system redesign proposed by @dean0x7d above. It seems like a rather significant change. If drawn to its full conclusion, it seems to me that we should have additional new/borrowed reference versions of every type wrapper, which is certainly possible but probably not where we want to go.

In the short term, the IMO simplest thing would be to axe the py::array special case and just have a two-argument constructor (potentially private/protected) that is called by the (very nice and clear) reinterpret_borrowed and reinterpret_steal casts. I think it would be very confusing if those did any kind of automatic conversion as well, which means that to perform a conversion in practice, one would need to do

PyObject *v1 = ;
auto v2 = reinterpret_steal<object>(v1);
auto v3 = str(v2);

What do you think?

@dean0x7d
Copy link
Member Author

dean0x7d commented Nov 7, 2016

That sounds very reasonable. Axing that special case should simplify everything else. I'll get on it. Just to make sure I have everything straight:

  • Make reinterpret_borrow/steal the only way to create an object (or derived) from PyObject * (use a private/implementation detail constructor which only reinterpret_borrow/steal can invoke).
  • Deprecate the existing PyObject *, bool constructor for all object types.

Does that sound OK?

(Minor note: The type-based approach would not need an entire parallel hierarchy for new_ref. Technically, object doesn't need to inherit from handle either. But it would definitely be more involved than what's outlined above.)

@dean0x7d dean0x7d mentioned this pull request Nov 12, 2016
@aldanor
Copy link
Member

aldanor commented Nov 12, 2016

@dean0x7d So I was actually planning to do some work on #338 tomorrow, I wonder how it relates to the planned changes to array_t in this PR?

@wjakob
Copy link
Member

wjakob commented Nov 13, 2016

Sorry, I never responded. This sounds ok, yes.

@dean0x7d
Copy link
Member Author

dean0x7d commented Nov 13, 2016

Rebased and completed the last batch of changes (last 3 commits). I think that should be it.

@aldanor You can see the latest changes to array_t which I isolated in the last commit. This makes its converting ctor behave the same as other pytypes, i.e. throws on error with the original Python message. The ensure function is intended for type_caster and it silently discards the error. The special nature of array_t now boils down to 3 things:

  1. It needs its own type_caster which uses ensure (as opposed to a simple check like the other pytypes). It will need to stay this way.
  2. py::isinstance<array_t<T>>() is not supported. This can probably be resolved, but there is no quick solution in the numpy C API, so a something custom would need to be made.
  3. It doesn't have a meaningful default constructor like other concrete pytypes. Right now it's just nullptr.

The last two points can be changed, but that's probably best done as part of the array_t rework -- I don't want to interfere with that any further.

@wjakob
Copy link
Member

wjakob commented Nov 13, 2016

@aldanor: Could you please take a look and report whether the array_t related changes look good to you?

@aldanor
Copy link
Member

aldanor commented Nov 15, 2016

array_t changes looks reasonable I guess, less special cases is always good. It would be great have some additional tests for the new array_t semantics though.

It needs its own type_caster which uses ensure (as opposed to a simple check like the other pytypes). It will need to stay this way.

Agreed.

py::isinstance<array_t<T>>() is not supported. This can probably be resolved, but there is no quick solution in the numpy C API, so a something custom would need to be made.

Do we want it to be supported? Note that there's also the flags aka requirements. I'm not sure there's an existing numpy function that checks exactly what we need, so we can check (1) py::isinstance<array>, (2) check that dtypes are equivalent, (3) check that flags are satisfied. That should be good enough in most cases.

It doesn't have a meaningful default constructor like other concrete pytypes. Right now it's just nullptr.

Should it be an empty array or it doesn't make sense?

@wjakob
Copy link
Member

wjakob commented Nov 15, 2016

(This is conflicted now.)

@dean0x7d dean0x7d force-pushed the str branch 2 times, most recently from 8af677f to 1d416ce Compare November 16, 2016 01:01
@dean0x7d
Copy link
Member Author

Rebased, resolved conflicts and updated array/array_t implementation:

  1. Default constructors now produce empty arrays.
  2. py::isintance<array_t<T>>() now works (tests added). It checks for an array and the correct dtype. The flags are not checked because I'm not completely sure how this should work. Some flags are part of the template, some may or may not be added dynamically (owndata, writable) and some aren't relevant for the check (forcecast). Seeing as the flags will be reworked soon anyway, perhaps it would be best to tackle this then.
  3. Previously, I forgot to add a converting constructor for the plain py::array. Added that and the tests.

With this, the only special thing left is the type_caster for array_t but that must stay. After the deprecated constructors are removed, array_t can also make use of the PYBIND11_OBJECT_CVT macro to bring it completely in line with the other pytypes. Right now, the deprecated array_t(handle, bool) constructor conflicts with one from the macro (which is also deprecated but different). It should all clean up nicely once the deprecated bits are removed.

@wjakob
Copy link
Member

wjakob commented Nov 16, 2016

I was just going to merge this, but then the last one broke it yet again. Could I ask you to fix it one last time? ;) -- I promise to merge it immediately afterwards.

@dean0x7d
Copy link
Member Author

No problem, rebased.

@wjakob
Copy link
Member

wjakob commented Nov 16, 2016

I'm thinking of squashing this (fairly large) sequence of commits into a single one with a combined description -- does that sound okay?

@dean0x7d
Copy link
Member Author

dean0x7d commented Nov 16, 2016

I can go back and selectively squash it down to 3-4 commits (pytype conversion changes, reinterpret_*, array stuff). I think it might a bit too difficult to follow the charges if it's all just a single commit (if someone's looking at the history or if something needs to be reverted).

@wjakob
Copy link
Member

wjakob commented Nov 16, 2016

Ok, great -- then I will wait for that.

Allows checking the Python types before creating an object instead of
after. For example:
```c++
auto l = list(ptr, true);
if (l.check())
   // ...
```
The above is replaced with:
```c++
if (isinstance<list>(ptr)) {
    auto l = reinterpret_borrow(ptr);
    // ...
}
```

This deprecates `py::object::check()`. `py::isinstance()` covers the
same use case, but it can also check for user-defined types:
```c++
class Pet { ... };
py::class_<Pet>(...);

m.def("is_pet", [](py::object obj) {
    return py::isinstance<Pet>(obj); // works as expected
});
```
* Deprecate the `py::object::str()` member function since `py::str(obj)`
  is now equivalent and preferred

* Make `py::repr()` a free function

* Make sure obj.cast<T>() works as expected when T is a Python type

`obj.cast<T>()` should be the same as `T(obj)`, i.e. it should convert
the given object to a different Python type. However, `obj.cast<T>()`
usually calls `type_caster::load()` which only checks the type without
doing any actual conversion. That causes a very unexpected `cast_error`.
This commit makes it so that `obj.cast<T>()` and `T(obj)` are the same
when T is a Python type.

* Simplify pytypes converting constructor implementation

It's not necessary to maintain a full set of converting constructors
and assignment operators + const& and &&. A single converting const&
constructor will work and there is no impact on binary size. On the
other hand, the conversion functions can be significantly simplified.
The pytype converting constructors are convenient and safe for user
code, but for library internals the additional type checks and possible
conversions are sometimes not desired. `reinterpret_borrow<T>()` and
`reinterpret_steal<T>()` serve as the low-level unsafe counterparts
of `cast<T>()`.

This deprecates the `object(handle, bool)` constructor.

Renamed `borrowed` parameter to `is_borrowed` to avoid shadowing
warnings on MSVC.
* `array_t(const object &)` now throws on error
* `array_t::ensure()` is intended for casters —- old constructor is
  deprecated
* `array` and `array_t` get default constructors (empty array)
* `array` gets a converting constructor
* `py::isinstance<array_T<T>>()` checks the type (but not flags)

There is only one special thing which must remain: `array_t` gets
its own `type_caster` specialization which uses `ensure` instead
of a simple check.
@dean0x7d
Copy link
Member Author

OK, I think I squashed it down to the essentials. Let me know it that's OK.

@wjakob
Copy link
Member

wjakob commented Nov 17, 2016

Very nice -- thanks for the very clear commit messages.

@wjakob wjakob merged commit 4de2710 into pybind:master Nov 17, 2016
@dean0x7d dean0x7d deleted the str branch November 17, 2016 11:13
@aldanor
Copy link
Member

aldanor commented Nov 17, 2016

That's great, thanks! I can get on to the flags stuff now based on the new array_t.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants