-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default and converting constructors for all concrete Python types #464
Conversation
Hmm, I'm trying to figure, won't |
No, it won't. The third and fourth parameters of the |
Cool, that seems like a nice idea. However, a few thoughts:
Best, |
|
OK, so all concrete Python types now have converting and default constructors which match their Python counterparts. The pytype classes which model protocols (iterator, sequence) did not get any conversions or meaningful default constructors (which should be fine). The only odd class is There were a few things that needed to be adjusted because of the new converting constructors. See the commit messages for details. Note for the naming of |
this is conflicted now as well.. |
Rebased. |
Sorry, I've broken your branch again :) |
No worries :) Rebased and fixed. |
This looks good all in all (although will sure trigger some deprecation warnings on existing code, but that's ok, better now than later), is there anything left planned to be done or should it get merged? |
I don't have anything else planned here. I suppose it may be nice to make the new converting constructors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
I did a pass over the PR and added a few minor comments.
Best,
Wenzel
if (c.check()) { | ||
value = (void *) c; | ||
if (isinstance<capsule>(h)) { | ||
value = capsule(h, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not reinterpret_borrow
here?
return_value_policy policy = return_value_policy::automatic_reference, | ||
handle parent = handle()) { | ||
template <typename T, detail::enable_if_t<detail::is_pyobject<T>::value, int> = 0> | ||
T cast(const handle &handle) { return {handle, true}; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reinterpret_borrow
?
@@ -1224,7 +1241,7 @@ class unpacking_collector { | |||
int _[] = { 0, (process(args_list, std::forward<Ts>(values)), 0)... }; | |||
ignore_unused(_); | |||
|
|||
m_args = object(PyList_AsTuple(args_list.ptr()), false); | |||
m_args = std::move(args_list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an optimization, I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the conversion was done in the original PYBIND11_OBJECT_CVT
macro, the move constructor was more efficient, so the std::move
here resulted in smaller binary size. With the reworked CVT
macro, the lvalue ref constructor is equally good, so the move isn't strictly needed any more, but it doesn't hurt either (args_list
's lifetime ends here anyway).
@@ -98,7 +98,6 @@ | |||
#define PYBIND11_BYTES_FROM_STRING_AND_SIZE PyBytes_FromStringAndSize | |||
#define PYBIND11_BYTES_AS_STRING_AND_SIZE PyBytes_AsStringAndSize | |||
#define PYBIND11_BYTES_AS_STRING PyBytes_AsString | |||
#define PYBIND11_BYTES_CHECK PyBytes_Check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised these duplicate definitions didn't cause warnings before.
/* Copy flags from base (except baseship bit) */ | ||
flags = base_array.flags() & ~detail::npy_api::NPY_ARRAY_OWNDATA_; | ||
flags = array(base, true).flags() & ~detail::npy_api::NPY_ARRAY_OWNDATA_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reinterpret_borrow
?
// Derived classes may override this constructor for Python type conversions | ||
object(handle h, bool borrowed) : handle(h) { if (borrowed) inc_ref(); } | ||
// This one must be inherited as is -- it's always just a pure pointer assignment | ||
object(handle h, bool borrowed, detail::noconvert) : object(h, borrowed) { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this special detail::noconvert
constructor is needed. Isn't having a true/false
argument enough to uniquely specify what is meant? (i.e. just borrow/steal a reference, but don't run any conversions).
Or is there some version of this constructor, which additional does something to the object? In that case, it may be worth rethinking that.
Based on the comments, I think the constructors may be need to be reworked to avoid confusion. This is a bit of a longer post and apologies if I'm bikeshedding here. But I think it may be best to first define the interface for selecting convert/reinterpret & borrow/steal and work on the implementation from there. Here is the situation right now: auto l1 = list(ptr, true/false); // does a conversion
auto l2 = list(ptr, true/false, noconvert()); // just assigns the pointer The first one does a conversion because that was the existing situation with the list::list(PyObject *ptr, bool borrow, bool convert) {
if (convert) {
m_ptr = PySequence_List(ptr);
if (!borrow) handle(ptr).dec_ref();
if (!m_ptr) throw error_already_set();
} else {
m_ptr = ptr;
if (borrow) inc_ref();
}
} But I feel that having something like auto l = list(ptr, true, false); would be very confusing and error-prone. It would also be more work for the optimizer to eliminate dead code. Hence PyObject *ptr = ...;
// non-converting
auto lb = reinterpter_borrow<list>(ptr);
auto ls = reinterpter_steal<list>(ptr);
// converting
auto l1 = list(ptr, true/false); // borrow or steal
auto l2 = cast<list>(ptr); // borrow
auto l3 = handle(ptr).cast<list>(); // borrow There are multiple ways of doing the converting cast, but only one has a choice of borrow or steal for a raw pointer/handle. That's reasonable since the converting casts are mostly aimed at I messed up by adding the named borrow/steal reinterpret functions because they have a different mechanism of selecting borrow/steal compared to the converting Alternative interface 1One possibility is to just stick with bools and then have the following: // non-converting
auto l = reinterpter_as<list>(ptr, true/false); // borrow/steal
// converting
auto l = list(ptr, true/false); // borrow/steal Alternative interface 2My ideal case would be to encode borrow/steal in the type system: auto thing = handle(...); // considered a borrowed reference (no ref counting)
// or
auto thing = new_handle(...); // considered a new reference (no ref counting)
// or
auto thing = object(...); // automatically managed Now we can overload on type for borrow/steal: // non-converting
auto l = reinterpter_as<list>(thing); // the compiler knows to borrow/steal based on type
// converting
auto l = list(thing); // again, the compiler knows what to do This shifts the responsibility of selecting the kind of reference from the consumption site to the creation site. Currently, we have: PyObject *ptr = PySomething_New(...);
// ...
auto o = object(ptr, false); // manually keeping track that this is a new reference Instead, the new/borrowed selection could be done right at creation: new_handle h = PySomething_New(...);
// ...
auto o = object(h); // compiler automatically calls the steal overload I was looking into it and this would actually be a backwards compatible change with just a deprecation of the Again, I might just be bikeshedding here, so feel free to just replace this with something you feel would be reasonable. |
I think the point here is that these arguments are most often than not fixed at compile-time (i.e. you most always just pass boolean literals // Btw if |
Is there really a benefit of encoding what is, as far as I can tell, simply an option about how to construct the object in the type? It feels a bit over-engineered, and it's really a pretty small bit of (non-templated!) dead code elimination. I think the current "bypass" constructor (as the code is now) seems reasonable and relatively simple. It makes the various derived class constructors simpler because they don't have to worry about whether they are being constructed in noconvert mode or not: they just provide a two-argument constructor that is only called in conversion-allowed mode. Code needing a non-converting constructor calls the "bypass" constructor that never hits the overridden constructors. One small suggestion, though: it might be nicer to do: struct noconvert_t {};
static constexpr noconvert_t noconvert{}; so that you can call with |
My main hesitation is about introducing an intrusive change to an existing codebase. I would definitely not call the approach itself over-engineered -- encoding as much information into the type system is a key strength of C++ and it's very good practice. Compare the following interfaces: // What happens if we mix up the argument order?
void set_date(int m, int d, int y); // runtime error
void set_date(month_t m, day_t d, year_t y); // compile-time error As for benefits in this project specifically, I've started tinkering with the implementation and it has already revealed a bug in one of the type casters: // member of a type_caster
static handle cast(const T& src, return_value_policy policy, handle parent) {
if (!src)
return none();
return caster_type::cast(*src, policy, parent);
} Anyone see the bug? I definitely missed it. But if we make this change: using borrowed_ref = handle;
struct new_ref { PyObject *m_ptr; /*...*/ };
class object : handle { // *implicit* conversion to a borrowed reference
// ...
new_ref release() const; // *explicit* conversion to a new reference
}; And now we tell the type system that static new_ref cast(/*...*/) { // <-- returning new_ref instead of handle
if (!src)
return none(); // <-- compile-time error: no implicit conversion from object to new_ref
return caster_type::cast(*src, policy, parent);
} The bug is prevented at compile time! And it's a simple fix: static new_ref cast(/*...*/) {
if (!src)
return none().release();
return caster_type::cast(*src, policy, parent);
} |
I agree that
seems like an overly confusing and bug-prone notation. However, I am also hesitant to sign off on the In the short term, the IMO simplest thing would be to axe the
What do you think? |
That sounds very reasonable. Axing that special case should simplify everything else. I'll get on it. Just to make sure I have everything straight:
Does that sound OK? (Minor note: The type-based approach would not need an entire parallel hierarchy for |
Sorry, I never responded. This sounds ok, yes. |
Rebased and completed the last batch of changes (last 3 commits). I think that should be it. @aldanor You can see the latest changes to
The last two points can be changed, but that's probably best done as part of the |
@aldanor: Could you please take a look and report whether the |
Agreed.
Do we want it to be supported? Note that there's also the flags aka requirements. I'm not sure there's an existing numpy function that checks exactly what we need, so we can check (1)
Should it be an empty array or it doesn't make sense? |
(This is conflicted now.) |
8af677f
to
1d416ce
Compare
Rebased, resolved conflicts and updated
With this, the only special thing left is the |
I was just going to merge this, but then the last one broke it yet again. Could I ask you to fix it one last time? ;) -- I promise to merge it immediately afterwards. |
No problem, rebased. |
I'm thinking of squashing this (fairly large) sequence of commits into a single one with a combined description -- does that sound okay? |
I can go back and selectively squash it down to 3-4 commits (pytype conversion changes, reinterpret_*, array stuff). I think it might a bit too difficult to follow the charges if it's all just a single commit (if someone's looking at the history or if something needs to be reverted). |
Ok, great -- then I will wait for that. |
Allows checking the Python types before creating an object instead of after. For example: ```c++ auto l = list(ptr, true); if (l.check()) // ... ``` The above is replaced with: ```c++ if (isinstance<list>(ptr)) { auto l = reinterpret_borrow(ptr); // ... } ``` This deprecates `py::object::check()`. `py::isinstance()` covers the same use case, but it can also check for user-defined types: ```c++ class Pet { ... }; py::class_<Pet>(...); m.def("is_pet", [](py::object obj) { return py::isinstance<Pet>(obj); // works as expected }); ```
* Deprecate the `py::object::str()` member function since `py::str(obj)` is now equivalent and preferred * Make `py::repr()` a free function * Make sure obj.cast<T>() works as expected when T is a Python type `obj.cast<T>()` should be the same as `T(obj)`, i.e. it should convert the given object to a different Python type. However, `obj.cast<T>()` usually calls `type_caster::load()` which only checks the type without doing any actual conversion. That causes a very unexpected `cast_error`. This commit makes it so that `obj.cast<T>()` and `T(obj)` are the same when T is a Python type. * Simplify pytypes converting constructor implementation It's not necessary to maintain a full set of converting constructors and assignment operators + const& and &&. A single converting const& constructor will work and there is no impact on binary size. On the other hand, the conversion functions can be significantly simplified.
The pytype converting constructors are convenient and safe for user code, but for library internals the additional type checks and possible conversions are sometimes not desired. `reinterpret_borrow<T>()` and `reinterpret_steal<T>()` serve as the low-level unsafe counterparts of `cast<T>()`. This deprecates the `object(handle, bool)` constructor. Renamed `borrowed` parameter to `is_borrowed` to avoid shadowing warnings on MSVC.
* `array_t(const object &)` now throws on error * `array_t::ensure()` is intended for casters —- old constructor is deprecated * `array` and `array_t` get default constructors (empty array) * `array` gets a converting constructor * `py::isinstance<array_T<T>>()` checks the type (but not flags) There is only one special thing which must remain: `array_t` gets its own `type_caster` specialization which uses `ensure` instead of a simple check.
OK, I think I squashed it down to the essentials. Let me know it that's OK. |
Very nice -- thanks for the very clear commit messages. |
That's great, thanks! I can get on to the flags stuff now based on the new |
There is currently a functionality split between
py::str()
andpy::object::str()
which can be surprising:This PR makes case
s3
do the expected conversion (if needed). Cases4
is deprecated.Update
This was extended to all concrete Python types represented in pybind11. New
reinterpret_*
functions were also added to streamline taking rawPyObject *
. For example:Deprecations
obj.str()
is deprecated in favor ofstr(obj)
.T obj = ...; obj.check()
is deprecated in favor ofisinstance<T>(obj)
.T(handle, bool)
constructor is deprecated in favor ofreinterpret_borrow<T>()
andreinterpret_steal<T>()
.obj.repr()
is removed in favor ofrepr(obj)
. There is no deprecation warning since this hasn't actually made it to a stable version yet.Implementation
Making this work required changing the usual type check order, e.g.:
The above check happens after a
py::object
type is created, however this would not work correctly with a converting constructor sincepy::str
would be able to make a string representation of any object which would essentially make its.check()
always true. This would also make usingpy::str
as a function parameter very difficult since it would shadow every other overload.To overcome this, type creation and checking are reversed with the following syntax:
The
py::isinstance<T>(obj)
syntax was chosen because it can work uniformly for types derived frompy::object
as well as user types wrapped withpy:class_
.Note that
py::isinstance
takes a shortcut and calls the usualPy*_Check()
functions forpy::object
classes, while user classes go the long way withdetail::get_type_info()
andPyObject_IsInstance()
.