-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structured arrays support #1
Conversation
Fixed segfault with nested descriptors, apparently you need to incref them before passing on to numpy API, the tests actually pass fine now. |
a8a3c53
to
84a5056
Compare
auto buf_info = info; | ||
if (!buf_info.ptr) | ||
// always allocate at least 1 element, same way as NumPy does it | ||
buf_info.ptr = std::calloc(std::max(info.size, 1ul), info.itemsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see any way to provide a way to allocate uninitialized memory, which was what the previous implementation did? I would say that in almost all cases, the user will allocate a buffer and then fill it evaluations with some kind of numerical routine, so the initialization is just wasted.
also, should this be (size_t) 1 instead of 1ul in the std::max call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: 1ul
, yea, I guess I was just lazy :)
Re: malloc vs calloc, here's the only piece of code from numpy that refers to NPY_NEEDS_INIT
:
if (data == NULL) {
/*
* Allocate something even for zero-space arrays
* e.g. shape=(0,) -- otherwise buffer exposure
* (a.data) doesn't work as it should.
*/
if (nbytes == 0) {
nbytes = descr->elsize;
}
/*
* It is bad to have uninitialized OBJECT pointers
* which could also be sub-fields of a VOID array
*/
if (zeroed || PyDataType_FLAGCHK(descr, NPY_NEEDS_INIT)) {
data = npy_alloc_cache_zero(nbytes);
}
else {
data = npy_alloc_cache(nbytes);
}
if (data == NULL) {
PyErr_NoMemory();
goto fail;
}
fa->flags |= NPY_ARRAY_OWNDATA;
}
Realistically, if we never handle object dtypes, we can just use malloc
instead of calloc
I guess.
>>> a = np.array(['foo', 'bar'], dtype=object)
>>> memoryview(a).format
'O'
Or maybe we can check if there's any 'O'
in the format string, we switch to calloc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to ignore object types, which is somewhat esoteric in any case.
@wjakob Hey, I think I've fixed all issues with padding and stuff, rebased onto the most recent master (this took me a good while since there were name clashes, had to rename example18.* to example20.*, do some git filter-branch dark magic etc...), now just following up on the rest of your comments here and it will be then done :) A few things:
It would be sure nice to see signatures like def f(arr : numpy.ndarray[dtype=Foo]) -> numpy.ndarray[dtype=Bar] instead of it just being
P.S. by the way there's a few exception leaks in examples 5 and 18 in master, I've posted the issue to the main repo |
struct NestedStruct { | ||
SimpleStruct a; | ||
PackedStruct b; | ||
} __attribute__((packed)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think MSVC will like this attribute. Do you want to turn the PR into an official one at the pybind11 repository? That way, the CI build bots will try to compile your code on various platforms, which may uncover a few more things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good catch, I'll fix that tonight and open a PR to pybind11 :)
// please see my other few comments above
I am greatly in favor of introducing a py::dtype object that can be used dynamically (in whatever way makes sense -- don't interpret my previous suggestion for an API too strictly). |
@wjakob I believe I've addressed all of the comments here (last 6 commits), except moving dtype-related logic inside of a I've also recalled I forgot to add descriptor bindings for one important class of types: string fields; it's fairly straightforward actually (e.g. so that template <size_t N> struct npy_format_descriptor<char[N]> {
static PYBIND11_DESCR name() { return _("S") + _<N>(); }
// more stuff
}; This compiles/links fine, however if you try to run it with an example, you get Traceback (most recent call last):
File "../example/example20.py", line 6, in <module>
from example import (
ImportError: dlopen(../example/example.so, 2): Symbol not found: __ZN8pybind116detail10int_to_strILm0EJLm3EEE6digitsE
Referenced from: ../example/example.so
Expected in: flat namespace
in ../example/example.so So I can't use |
Weird .. sounds like a potential compiler issue. The whole point of |
@wjakob Hmm, a very short example (this is clang on OSX, c++14). Should this work / compile? #include <pybind11/pybind11.h>
namespace pybind11 {
namespace detail {
template<size_t N> PYBIND11_DESCR foo() { return _("S") + _<N>(); }
}
}
int main() {
std::cout << pybind11::detail::foo<3>().text() << std::endl;
} Undefined symbols for architecture x86_64:
"pybind11::detail::int_to_str<0ul, 3ul>::digits", referenced from:
auto pybind11::detail::_<3ul>() in pb11-_-2001d1.o
ld: symbol(s) not found for architecture x86_64 (This is built without Interestingly, it seems to work on gcc 5 on Linux (c++14), just checked. |
Fixed (4a87933). The issue is that a reference of a constexpr value was taken, which requires it to actually exist at runtime. The commit now switches to pass-by-value at compile time. |
Cool, I'll check this tonight when I get back to my mac 👍 // Wonder why gcc5 was fine with it though? |
I think whether this should give a linker error is implementation/optimization-specific behavior that is not covered by the spec. |
@wjakob Your I've also moved all numpy API-related stuff out of |
Ok that sounds good! Please go ahead and create an official PR at the main repo once you have something. |
@wjakob This is the final bit of this PR, I've pushed preliminary implementation of While I'm searching for the refcount bug, looking forward to any comments and suggestions re: the API 😃 |
@aldanor: This looks fantastic. I would suggest one minor reorganization and one addition before wrapping it up: There is a part of the code which sets up the arguments for the dtype and then calls the dtype constructor.
It would make more sense to me if (instead of the static
(or something like that, feel free to modify) The other bit that would be super nice to have is an array constructor which can accept the dynamic dtype:
|
Agreed on having two more constructors, good points! ( Should |
@wjakob Okay, done, the refcount bug fixed, plus a few constructors added, see latest commits:
Basically, it's now possible to do stuff like this: auto arr1 = py::array_t<int>(10);
auto arr2 = py::array("int32", { 3, 2 }); without having to mess with buffer_info in such simple examples. I could probably fix the existing examples in the docs to use those. Also, the const qualifier for the data pointer was dropped for array ctors -- I'm not absolutely sure about this yet but it looks a more reasonable default, I think? This will prob need to be revisited as a part of the whole casting rework, but generally we do want to create and export arrays that don't own their data (e.g. wrappers to expose huge vectors in c++), I'd say that'd be one of the most common use cases. It's also possible to add ctors from std::vector, too, but that needs more though re: casting and data ownership (the general numpy convention is that nothing is copied unless you explicitly ask for it, e.g. via |
This branch represents work-in-progress re: structured arrays support.
See
example20.cpp
,example20.py
andexample20.ref
for the attached tests. Most of the implementation is contained innumpy.h
.py::memoryview
wrapper type with constructor frombuffer_info
.py::array
constructor from buffer logic had to be rewritten to account for structured types.PYBIND11_NUMPY_DTYPE
macro for defining dtypes for structured arrays.npy_format_descriptor::value
had to be changed tonpy_format_descriptor::dtype()
since the typenums are only sufficient for built-in types. Internally, it is converted to a descriptor anyway by numpy. However, this is backwards-compatible since the typenums for non-structured types are still accessible via::value
.format_descriptor::value
had to be changed toformat_descriptor::format()
so as to be able to use numpy's logic to general buffer format strings at runtime. However, the change is backwards compatible since the format strings known at compile time are still accessible via::value
.