-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add format_descriptor<>
& npy_format_descriptor<>
PyObject *
specializations.
#4674
Conversation
…ct *>` to/from-python conversions.
I kind of stumbled into this unintentionally, TBH. I don't want to spend too much extra time. I'm still working on what I really need. Do you have concrete suggestions for changes here? (The production code change is actually very small and even fixes a(n inconsequential) bug: |
@EricCousineau-TRI has given a lot of thought to this. #2259 Can you take a look? |
include/pybind11/numpy.h
Outdated
/// Return dtype for the given typenum (one of the NPY_TYPES). | ||
/// https://numpy.org/devdocs/reference/c-api/array.html#c.PyArray_DescrFromType | ||
static dtype from_typenum(int typenum) { | ||
auto *ptr = detail::npy_api::get().PyArray_DescrFromType_(typenum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought dtype already have an explicit int ctor for this. Probably should have made it a static method in retrospect, but it's already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out, changed. The explicit
ctor is fine IMO, I just didn't look around at all.
#1152 is mainly going into the eigen code, it looks like a lot of work. What I have is much more limited and really very simple. I don't want to expand the scope here. |
Trivial addition, but still in search for a meaningful test.
I just pushed 50eaa3a, which is what I really need. I thought getting into the numpy code will be a good way to add a meaningful test, but instead I got far off track. Does someone have an idea for a meaningful minimal test for the My original use case, which I forced through by making a small mess that I'm trying to clean up now: |
Not knowing any better (not being familiar with the buffer interface and numpy code), I decided to simply add a unit test that is 1. exactly focused on what I need, but 2. also comprehensive in that domain: 82ce80f |
npy_format_descriptor<PyObject *>
to enable py::array_t<PyObject *>
to/from-python conversions.format_descriptor<>
& npy_format_descriptor<>
PyObject *
support.
format_descriptor<>
& npy_format_descriptor<>
PyObject *
support.format_descriptor<>
& npy_format_descriptor<>
PyObject *
specializations.
…ters.h While working on pybind/pybind11#4674 I came to realize that this include is not needed here. Note however that the pybind11/cast.h changes under pybind/pybind11#4601 are still needed, therefore pybind11_abseil still requires current pybind11 master. PiperOrigin-RevId: 533182724
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this! A couple of comments for minor possible improvements.
@@ -287,6 +287,8 @@ handle eigen_encapsulate(Type *src) { | |||
template <typename Type> | |||
struct type_caster<Type, enable_if_t<is_eigen_dense_plain<Type>::value>> { | |||
using Scalar = typename Type::Scalar; | |||
static_assert(!std::is_pointer<Scalar>::value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might be able to move these asserts to EigenProps, which would reduce the amount of redundant code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was ambivalent before and still am a little bit, but decided to keep the 5 static_assert()
:
Cons: 2-3 x 2 more lines of code.
Pros:
The static_assert()
s appear at or near the top of each of the type_caster
s, which makes them essentially a kind of documentation.
In case a static_assert()
fails, it will be much more obvious that the message we want to send is: the type_caster
does not support pointer types as scalar types.
include/pybind11/detail/common.h
Outdated
// Common message for `static_assert()`s, which are useful to easily preempt much less obvious | ||
// errors in code that does not support `format_descriptor<PyObject *>`. | ||
#define PYBIND11_MESSAGE_POINTER_TYPES_ARE_NOT_SUPPORTED \ | ||
"Pointer types (in particular `PyObject *`) are not supported." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this message Eigen specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I introduced pybind11/eigen/common.h to have central location for the message. (Previously I shied away from doing that, that's why the message wasn't specific. But that was really just taking a shortcut. What we have now this is definitely better.)
@@ -164,6 +164,8 @@ PYBIND11_WARNING_POP | |||
|
|||
template <typename Type> | |||
struct type_caster<Type, typename eigen_tensor_helper<Type>::ValidType> { | |||
static_assert(!std::is_pointer<typename Type::Scalar>::value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to not add the assert to eigen_tensor_helper instead to avoid duplicate lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
tests/test_buffers.cpp
Outdated
#include <pybind11/stl.h> | ||
|
||
#include "constructor_stats.h" | ||
#include "pybind11_tests.h" | ||
|
||
TEST_SUBMODULE(buffers, m) { | ||
|
||
#define PYBIND11_LOCAL_DEF(...) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's worth changing, but I tend to not like macros that return values.
I would have written this as:
std::map<std::string, std::string> values;
#define ASSIGN_HELPER(...)
values[#__VA_ARGS__] = return py::format_descriptor<__VA_ARGS__>::format();
ASSIGN_HELPER(bool);
ASSIGN_HELPER(PyObject*);
return values[cpp_name];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
tests/test_buffers.py
Outdated
], | ||
) | ||
def test_format_descriptor_format(cpp_name, expected_codes): | ||
assert m.format_descriptor_format(cpp_name) in expected_codes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add an assert that the format descriptor is a valid numpy format descriptor
https://numpy.org/doc/stable/reference/generated/numpy.format_parser.html#numpy-format-parser
assert np. format_parser(blah).dtype is not None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done-ish. See my comments in the test code.
What a lucky man I was to never have seen #1908 before.
I see Windows isn't happy about my latest version of the test. Fixing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add an assert that the format descriptor is a valid numpy format descriptor
@lalaland Thanks a lot for nudging me in that direction!
Using np. format_parser()
didn't work out, but via d432ce7 I got on the right track, and after that taught me what I needed to look for, I quickly found it here:
pybind11/include/pybind11/buffer_info.h
Lines 173 to 190 in d72ffb4
template <typename T, typename SFINAE = void> | |
struct compare_buffer_info { | |
static bool compare(const buffer_info &b) { | |
return b.format == format_descriptor<T>::format() && b.itemsize == (ssize_t) sizeof(T); | |
} | |
}; | |
template <typename T> | |
struct compare_buffer_info<T, detail::enable_if_t<std::is_integral<T>::value>> { | |
static bool compare(const buffer_info &b) { | |
return (size_t) b.itemsize == sizeof(T) | |
&& (b.format == format_descriptor<T>::value | |
|| ((sizeof(T) == sizeof(long)) | |
&& b.format == (std::is_unsigned<T>::value ? "L" : "l")) | |
|| ((sizeof(T) == sizeof(size_t)) | |
&& b.format == (std::is_unsigned<T>::value ? "N" : "n"))); | |
} | |
}; |
From there it was only a small step to add a public (not in namespace detail
) interface for the existing functionality (029b157), and then your validation idea basically fell into place.
The new public interface is also exactly what I need for a clean solution here:
PyObject **data = arr_from_list.mutable_data(); | ||
for (py::size_t i = 0; i < arr_size; i++) { | ||
assert(data[i] == nullptr); | ||
data[i] = py::cast<PyObject *>(objs[i].attr("value")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a silly question, but does this appropriately increase the reference count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a silly question, this was something I was struggling with quite a bit:
pybind11/include/pybind11/cast.h
Lines 1062 to 1078 in d72ffb4
// Note that `cast<PyObject *>(obj)` increments the reference count of `obj`. | |
// This is necessary for the case that `obj` is a temporary, and could | |
// not possibly be different, given | |
// 1. the established convention that the passed `handle` is borrowed, and | |
// 2. we don't want to force all generic code using `cast<T>()` to special-case | |
// handling of `T` = `PyObject *` (to increment the reference count there). | |
// It is the responsibility of the caller to ensure that the reference count | |
// is decremented. | |
template <typename T, | |
typename Handle, | |
detail::enable_if_t<detail::is_same_ignoring_cvref<T, PyObject *>::value | |
&& detail::is_same_ignoring_cvref<Handle, handle>::value, | |
int> | |
= 0> | |
T cast(Handle &&handle) { | |
return handle.inc_ref().ptr(); | |
} |
…, in a future-proof way.
…cting than useful.
…format_descriptor<T>::format()` strings.
…::compare` more visible & accessible.
This reverts commit e9a289c.
@lalaland @Skylion007 this is done done now I think, could you please take another look? I already ran this with all clang sanitizers in the Google toolchain and also initiated Google-global testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment about the compare function, which I think is poorly named, but otherwise, looks good!
I like how we now have a full unit test to verify that the C++ types and numpy types are aligned accordingly
include/pybind11/buffer_info.h
Outdated
@@ -150,6 +153,11 @@ struct buffer_info { | |||
Py_buffer *view() const { return m_view; } | |||
Py_buffer *&view() { return m_view; } | |||
|
|||
template <typename T> | |||
static bool compare(const buffer_info &b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be named has_same_format or something like that? compare sounds like a general == check, when this is only checking the format of the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I totally agree, but what is a good name?
Here is my line of though that got me to the new name and source code comment:
"format" has so so many meanings already, to the uninitiated it likely means nothing. Additionally, the Py_buffer
format
strings may actually not be the same, because of the i/q/l ambiguity I was struggling with here.
Trying to start from the top down, what would a good source code comment be?
// Determines if the buffer item type is `T`.
Asking myself: Is that 100% accurate? What is "is" when it comes to aliases, such as long double
being effectively an alias for double
on some platforms? Or long
effectively being an alias for int
on some platforms, and long long
an alias for long
on others? (I think I actually stumbled over all of these ambiguities at various points in time far back.)
// Determines if the buffer item type is equivalent to `T`.
Asking myself for completeness: Do we have to worry about endianness? Searching for "endia", "most", "least", "signi" under https://docs.python.org/3/c-api/buffer.html doesn't pull up anything, which I take it means that we don't have to worry. It appears to be assumed that endianness conventions are never mixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the precise semantics here are the same as https://en.cppreference.com/w/cpp/types/is_same
// Determines if the butter item type is the same as T
(see std::is_same)
Asking myself for completeness: Do we have to worry about endianness? Searching for "endia", "most", "least", "signi" under https://docs.python.org/3/c-api/buffer.html doesn't pull up anything, which I take it means that we don't have to worry. It appears to be assumed that endianness conventions are never mixed.
Good point. Yeah, your code is actually technically incorrect as you aren't handling endianness. But it will only provide false negatives. The format strings do specify endianness. This is specified in https://docs.python.org/3/library/struct.html#byte-order-size-and-alignment
Note that if there is no endian specifier, the default is "native", so your code won't return any false positives.
Your current code will return false negatives. "@i" and "=i" for example are equivalent to "int", but your code will return false. "<i" will be equivalent to int on little-endian platforms. ">i" and "!i" on big-endian platforms. Etc, etc.
Not sure if this is worth handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why is this a static method and not a member function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the precise semantics here are the same as https://en.cppreference.com/w/cpp/types/is_same
Not sure: last night I saw that std::is_same<double, long double>::value
with MSVC is false
, which is why I had to change my code to sizeof(long double) == sizeof(double)
(b09e75b), because that's what we really want to know.
Not sure if this is worth handling.
I'm pretty sure no: https://docs.python.org/3/c-api/buffer.html isn't explicit about it (unfortunately), but it's strongly implied that endianness is assumed not to be mixed. Therefore I think going into that in the context of buffer_info
will not help anything, on the contrary.
The situation is very different for serializing/deserializing arrays, then it has to be assumed endianness could be different between the producer and consumer, which could run on totally different platforms.
But the buffer interface is all about in-memory access. Hopefully we won't have one process (one address space) that mixes endianness. (It sounds like an idea from hell.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why is this a static method and not a member function?
Then you can take a simple function pointer, vs a member function pointer. I think that's easier to work with in generic code.
Other than that, it doesn't really matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than that, it doesn't really matter?
It's rather uncommon to see a static member function that could be a member function since member functions have a better interface for general use. I actually can't think of a single example in the standard library where a static member function is used like this.
It's more of a style thing though, not major.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll try to change it to a member function asap. (I figure it's now or never, i.e. important to get right now.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: ef34d29
I agree the buffer_info
API looks better that way. The only trouble was/is, the syntax for dereferencing member function pointers badly rubs me the wrong way. (buffer.request().*((*equiv_table)[cpp_name]))()
, what a beauty!
@@ -595,3 +595,74 @@ def test_round_trip_float(): | |||
arr = np.zeros((), np.float64) | |||
arr[()] = 37.2 | |||
assert m.round_trip_float(arr) == 37.2 | |||
|
|||
|
|||
# HINT: An easy and robust way (although only manual unfortunately) to check for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be possible to do this automatically using weakrefs: https://docs.python.org/3/library/weakref.html#weakref.ref
I'm not sure it's worth changing for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be possible to do this automatically using weakrefs:
Yes, but: That only works when making strong assumptions about the reference-counting and garbage collection behavior. In practice, when targeting C Python only, that's usually fine. Fundamentally though such tests are creating tech debt that might get in the way of core interpreter developments in the future. Therefore I generally write tests involving weakref
and sys.getrefcount()
only as a last resort.
What would be great to have: a pytest feature, e.g. @pytest.mark.empirical_leak_check()
, with options like --empirical_leak_check=quick
and --empirical_leak_check=thorough
, that run each marked test in a loop for a few seconds and monitor RES
programmatically, with heuristics to decide how long to try (to deal with noise) and when to flag a potential leak or not. (It will probably be a bit of black magic to get to a useful stable implementation.)
…e_is_equivalent_to`. Add comment defining "equivalent" by example.
Just for the record: global testing passed as expected (internal test id OCL:533474823:BASE:533577337:1684538450117:2fa7449). I still need to work on the static-to-member-function change, but retesting globally will not be needed. |
…ber function, as suggested by @lalaland
The new member function was added with pybind/pybind11#4674 PiperOrigin-RevId: 534952040
Description
Fixes a curious omission: the
O
format anddtype.object
have been part of numpy ~forever (and therefore also of the buffer protocol, for all practical purposes).The
npy_format_descriptor<PyObject *>
specialization enablespy::array_t<PyObject *>
to/from-python conversions (see added tests).As a side-effect of adding the
npy_format_descriptor<PyObject *>
specialization, the runtime behavior of thetype_caster
s for Eigen types changes (see below). To guard against accidents,static_assert()
s are systematically added to all thosetype_caster
s.Also add
buffer_info::item_type_is_equivalent_to<T>()
to make the existingdetail::compare_buffer_info<T>::compare()
more visible & accessible.Note that the (small) change in pybind11/numpy.h fixes a long-standing (and inconsequential) bug:
pybind11_fail("Unsupported buffer format!")
called whilePyErr_Occurred()
is true.BTW: This PR turned into a much bigger project than anticipated, but the functional production code changes are actually really small. To see that, focus only on:
Everything else is just tests, safety guards, and boilerplate noise.
Example of the runtime behavior of the
type_caster
s for Eigen types before this PR:After this PR, the code used to produce that output does not compile anymore (
static_assert()
failure). Without the guards, there would be a buffer overrun (likely to trigger a segfault).Suggested changelog entry: