Structured arrays support #1

aldanor · 2016-06-21T20:13:30Z

This branch represents work-in-progress re: structured arrays support.

See example20.cpp, example20.py and example20.ref for the attached tests. Most of the implementation is contained in numpy.h.

Added py::memoryview wrapper type with constructor from buffer_info.
py::array constructor from buffer logic had to be rewritten to account for structured types.
- Added PYBIND11_NUMPY_DTYPE macro for defining dtypes for structured arrays.
npy_format_descriptor::value had to be changed to npy_format_descriptor::dtype() since the typenums are only sufficient for built-in types. Internally, it is converted to a descriptor anyway by numpy. However, this is backwards-compatible since the typenums for non-structured types are still accessible via ::value.
format_descriptor::value had to be changed to format_descriptor::format() so as to be able to use numpy's logic to general buffer format strings at runtime. However, the change is backwards compatible since the format strings known at compile time are still accessible via ::value.

aldanor · 2016-06-21T23:36:18Z

Fixed segfault with nested descriptors, apparently you need to incref them before passing on to numpy API, the tests actually pass fine now.

wjakob · 2016-07-02T14:51:34Z

include/pybind11/numpy.h

+        auto buf_info = info;
+        if (!buf_info.ptr)
+            // always allocate at least 1 element, same way as NumPy does it
+            buf_info.ptr = std::calloc(std::max(info.size, 1ul), info.itemsize);


Do you see any way to provide a way to allocate uninitialized memory, which was what the previous implementation did? I would say that in almost all cases, the user will allocate a buffer and then fill it evaluations with some kind of numerical routine, so the initialization is just wasted.

also, should this be (size_t) 1 instead of 1ul in the std::max call?

Re: 1ul, yea, I guess I was just lazy :)

Re: malloc vs calloc, here's the only piece of code from numpy that refers to NPY_NEEDS_INIT:

if (data == NULL) { /* * Allocate something even for zero-space arrays * e.g. shape=(0,) -- otherwise buffer exposure * (a.data) doesn't work as it should. */ if (nbytes == 0) { nbytes = descr->elsize; } /* * It is bad to have uninitialized OBJECT pointers * which could also be sub-fields of a VOID array */ if (zeroed || PyDataType_FLAGCHK(descr, NPY_NEEDS_INIT)) { data = npy_alloc_cache_zero(nbytes); } else { data = npy_alloc_cache(nbytes); } if (data == NULL) { PyErr_NoMemory(); goto fail; } fa->flags |= NPY_ARRAY_OWNDATA; }

Realistically, if we never handle object dtypes, we can just use malloc instead of calloc I guess.

>>> a = np.array(['foo', 'bar'], dtype=object) >>> memoryview(a).format 'O'

Or maybe we can check if there's any 'O' in the format string, we switch to calloc.

I think it's fine to ignore object types, which is somewhat esoteric in any case.

aldanor · 2016-07-17T22:21:06Z

@wjakob Hey, I think I've fixed all issues with padding and stuff, rebased onto the most recent master (this took me a good while since there were name clashes, had to rename example18.* to example20.*, do some git filter-branch dark magic etc...), now just following up on the rest of your comments here and it will be then done :)

A few things:

Is there a great value in having PYBIND11_DESCR being constexpr (and hence descr not being constructible from runtime strings) other than it being fancy? I see it's only used to generate signatures in docstrings, that sounds quite runtime to me :)

It would be sure nice to see signatures like

def f(arr : numpy.ndarray[dtype=Foo]) -> numpy.ndarray[dtype=Bar]

instead of it just being "struct" for all user-defined types. Are there any workarounds around this?

Do you think it's worth introducing a py::dtype object? I don't think your previous suggestion (i.e. have methods like .append(offset, name, format)) would work since not all dtypes are structured, so you would need a whole hierarchy of c++ classes. But maybe moving padding stripping logic and a few other things into py::dtype would make sense (e.g., construct dtype from buffer format string, extract it from an array object, etc). Also there's now functions like py::dtype_of<T>() (maybe makes sense to make this a static function in py::dtype?) and py::array::dtype() that should naturally return py::dtype objects.
Re: static class variables, your suggestion wouldn't work, since they're not constexpr and this library is header-only, so there's no way to statically initialize them. Which implies they should be kept in those ugly static functions...

P.S. by the way there's a few exception leaks in examples 5 and 18 in master, I've posted the issue to the main repo

wjakob · 2016-07-18T08:56:49Z

example/example20.cpp

+struct NestedStruct {
+    SimpleStruct a;
+    PackedStruct b;
+} __attribute__((packed));


Don't think MSVC will like this attribute. Do you want to turn the PR into an official one at the pybind11 repository? That way, the CI build bots will try to compile your code on various platforms, which may uncover a few more things.

Oh, good catch, I'll fix that tonight and open a PR to pybind11 :)

// please see my other few comments above

wjakob · 2016-07-18T09:17:02Z

PYBIND11_DESCR really has to stay constexpr. This enables generating function signatures at compile time, which reduces the size of generated binaries considerably.

I am greatly in favor of introducing a py::dtype object that can be used dynamically (in whatever way makes sense -- don't interpret my previous suggestion for an API too strictly).

aldanor · 2016-07-18T23:11:13Z

@wjakob I believe I've addressed all of the comments here (last 6 commits), except moving dtype-related logic inside of a py::dtype wrapper -- working on that right now.

I've also recalled I forgot to add descriptor bindings for one important class of types: string fields; it's fairly straightforward actually (e.g. so that char[7] and std::array<char, 7> map to S7 in numpy and 7s in buffer protocol), however I ran into a weird problem with detail::_, maybe I don't fully understand how it works :)

template <size_t N> struct npy_format_descriptor<char[N]> {
    static PYBIND11_DESCR name() { return _("S") + _<N>(); } 
    // more stuff
};

This compiles/links fine, however if you try to run it with an example, you get

Traceback (most recent call last):
  File "../example/example20.py", line 6, in <module>
    from example import (
ImportError: dlopen(../example/example.so, 2): Symbol not found: __ZN8pybind116detail10int_to_strILm0EJLm3EEE6digitsE
  Referenced from: ../example/example.so
  Expected in: flat namespace
 in ../example/example.so

So I can't use _ on template parameters? 🐼

wjakob · 2016-07-18T23:28:57Z

Weird .. sounds like a potential compiler issue. The whole point of _ is that it can be used with template parameters :)

aldanor · 2016-07-19T07:37:57Z

@wjakob Hmm, a very short example (this is clang on OSX, c++14). Should this work / compile?

#include <pybind11/pybind11.h>

namespace pybind11 {
namespace detail {
template<size_t N> PYBIND11_DESCR foo() { return _("S") + _<N>(); }
}
}

int main() {
    std::cout << pybind11::detail::foo<3>().text() << std::endl;
}

Undefined symbols for architecture x86_64:
  "pybind11::detail::int_to_str<0ul, 3ul>::digits", referenced from:
      auto pybind11::detail::_<3ul>() in pb11-_-2001d1.o
ld: symbol(s) not found for architecture x86_64

(This is built without -undefined dynamic_lookup so it actually triggers a compile error and not a runtime one)

Interestingly, it seems to work on gcc 5 on Linux (c++14), just checked.

wjakob · 2016-07-19T10:00:53Z

Fixed (4a87933). The issue is that a reference of a constexpr value was taken, which requires it to actually exist at runtime. The commit now switches to pass-by-value at compile time.

aldanor · 2016-07-19T10:44:47Z

Cool, I'll check this tonight when I get back to my mac 👍

// Wonder why gcc5 was fine with it though?

wjakob · 2016-07-19T10:46:09Z

I think whether this should give a linker error is implementation/optimization-specific behavior that is not covered by the spec.

aldanor · 2016-07-19T23:59:01Z

@wjakob Your int_to_str fix seems to have worked, I've implemented format/numpy descriptors for char[N] and std::array<char, N> + tests (that's not directly recarray-related, but very dtype-related, and is a great addition to our numpy arsenal, idk if it's worth mentioning in the docs or not), works on both clang/osx and gcc5/linux.

I've also moved all numpy API-related stuff out of py::array into a separate container for the sake of clarity, it was started to get swamped with it. So now the last cleanup bit is finishing py::dtype, hope that won't take long, will keep you posted :)

wjakob · 2016-07-21T15:31:47Z

Ok that sounds good! Please go ahead and create an official PR at the main repo once you have something.

aldanor · 2016-07-23T21:01:28Z

@wjakob This is the final bit of this PR, I've pushed preliminary implementation of py::dtype which mostly works (all tests pass, but when the process closes there's a segfault in handle::dec_ref(), seems like refcount is off somewhere but I can seem to find our where exactly so far........ -- hopefully I can find it).

While I'm searching for the refcount bug, looking forward to any comments and suggestions re: the API 😃

wjakob · 2016-07-24T06:46:48Z

@aldanor: This looks fantastic. I would suggest one minor reorganization and one addition before wrapping it up:

There is a part of the code which sets up the arguments for the dtype and then calls the dtype constructor.

args["names"] = names; args["formats"] = formats; args["offsets"] = offsets;

It would make more sense to me if (instead of the static from_args function) there was a special dtype constructor which directly takes these lists, i.e.:

dtype::dtype(list names, list formats, list offsets, size_t itemsize)

(or something like that, feel free to modify)

The other bit that would be super nice to have is an array constructor which can accept the dynamic dtype:

array:array(const dtype &dt, size_t size, const void *ptr);

aldanor · 2016-07-24T08:31:04Z

Agreed on having two more constructors, good points! (~~maybe you can spot the ref count bug too, haha?~~ refcount bug fixed)

Should from_args stay too or should it become internal? It accepts a much wider range of things (everything that np.dtype does, and it's a lot if you check the docstring). The only reason it's a static function and not a ctor is that it takes a single object which would mix up with object's ctors

aldanor · 2016-07-24T23:18:12Z

@wjakob Okay, done, the refcount bug fixed, plus a few constructors added, see latest commits:

simplified buffer_info constructor for 1-D case
dtype constructor accepting names/formats/offsets/itemsize
array gained a constructor accepting a dtype
all copy/paste eliminated inarray / array_t constructors, added a whole bunch of them to cover most common cases (e.g. if you provide just the shape, strides will be auto-computed)

Basically, it's now possible to do stuff like this:

auto arr1 = py::array_t<int>(10);
auto arr2 = py::array("int32", { 3, 2 });

without having to mess with buffer_info in such simple examples.

I could probably fix the existing examples in the docs to use those.

Also, the const qualifier for the data pointer was dropped for array ctors -- I'm not absolutely sure about this yet but it looks a more reasonable default, I think? This will prob need to be revisited as a part of the whole casting rework, but generally we do want to create and export arrays that don't own their data (e.g. wrappers to expose huge vectors in c++), I'd say that'd be one of the most common use cases. It's also possible to add ctors from std::vector, too, but that needs more though re: casting and data ownership (the general numpy convention is that nothing is copied unless you explicitly ask for it, e.g. via ensurecopy flag; this would also mean though that such a constructor would be unsafe be default since its data could now reside in some external vector whose lifetime is not tied to the array). Maybe it's best to just have pointers to hint at the fact that it's not 100% safe and you should know what you're doing.

aldanor mentioned this pull request Jun 21, 2016

How to deal with structured/record arrays? pybind/pybind11#67

Closed

aldanor force-pushed the recarray branch 2 times, most recently from a8a3c53 to 84a5056 Compare July 2, 2016 13:53

wjakob reviewed Jul 2, 2016
View reviewed changes

aldanor force-pushed the recarray branch from 198642b to 8ce737f Compare July 17, 2016 21:40

wjakob reviewed Jul 18, 2016
View reviewed changes

aldanor force-pushed the recarray branch from 8ce737f to 0cc2774 Compare July 18, 2016 18:59

aldanor added 5 commits July 19, 2016 22:16

Change npy_format_descriptor typenum to static fn

a2aec09

Add buffer_info::as_pybuffer() method

0f4a7ef

Change format_descriptor::value to a static func

f06e8c8

Dump test output if the test runner fails

4719de4

Add memoryview type

1afd6f6

aldanor added 6 commits July 19, 2016 22:27

Make buffer_info::as_pybuffer a memoryview ctor

4ec176b

Store array requests in local variables in tests

21183e8

Fix a comment and wrong indentation

1f7747e

Make struct packing in example20 MSVC-compliant

dbeb5c2

Replace 4096B format buffer with std::string

72acf83

Remove redundant definition

55f3604

aldanor force-pushed the recarray branch from af647ef to 55f3604 Compare July 19, 2016 21:27

aldanor added 4 commits July 20, 2016 00:19

Add numpy wrappers for char[] and std::array<char>

826a046

Rename example20 -> example-numpy-dtypes

8637851

Code reordering / cleanup only

3e874fe

Cleanup: move numpy API bindings out of py::array

98e00e3

Initial implementation of py::dtype

87e7438

aldanor added 8 commits July 24, 2016 17:34

Fix a segfault where func object wasn't released

88d383a

Add dtype(names, offsets, formats, itemsize) ctor

6e75c44

Add all possible ctors for py::array

c1d2f12

Also add the new ctors to py::array_t

b27a3ba

Add a simplified buffer_info ctor for 1-D case

a09d05e

Add tests for new array/array_t ctors

0f576c6

Add tests for py::dtype ctors

9459e99

Add a few more dtype tests

55623e2

aldanor added 2 commits July 25, 2016 00:46

Allow nullptr in array ctors wherever possible

d96d564

Added dtype from const char pointer ctor

01dfc40

aldanor closed this Aug 1, 2016

aldanor mentioned this pull request Aug 1, 2016

py::dtype, buffer protocol improvements, structured types support pybind/pybind11#308

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structured arrays support #1

Structured arrays support #1

aldanor commented Jun 21, 2016 •

edited

Loading

aldanor commented Jun 21, 2016

wjakob Jul 2, 2016

aldanor Jul 2, 2016

wjakob Jul 2, 2016

aldanor commented Jul 17, 2016 •

edited

Loading

wjakob Jul 18, 2016

aldanor Jul 18, 2016

wjakob commented Jul 18, 2016

aldanor commented Jul 18, 2016

wjakob commented Jul 18, 2016

aldanor commented Jul 19, 2016 •

edited

Loading

wjakob commented Jul 19, 2016

aldanor commented Jul 19, 2016

wjakob commented Jul 19, 2016

aldanor commented Jul 19, 2016 •

edited

Loading

wjakob commented Jul 21, 2016

aldanor commented Jul 23, 2016

wjakob commented Jul 24, 2016

aldanor commented Jul 24, 2016 •

edited

Loading

aldanor commented Jul 24, 2016 •

edited

Loading

Structured arrays support #1

Structured arrays support #1

Conversation

aldanor commented Jun 21, 2016 • edited Loading

aldanor commented Jun 21, 2016

wjakob Jul 2, 2016

Choose a reason for hiding this comment

aldanor Jul 2, 2016

Choose a reason for hiding this comment

wjakob Jul 2, 2016

Choose a reason for hiding this comment

aldanor commented Jul 17, 2016 • edited Loading

wjakob Jul 18, 2016

Choose a reason for hiding this comment

aldanor Jul 18, 2016

Choose a reason for hiding this comment

wjakob commented Jul 18, 2016

aldanor commented Jul 18, 2016

wjakob commented Jul 18, 2016

aldanor commented Jul 19, 2016 • edited Loading

wjakob commented Jul 19, 2016

aldanor commented Jul 19, 2016

wjakob commented Jul 19, 2016

aldanor commented Jul 19, 2016 • edited Loading

wjakob commented Jul 21, 2016

aldanor commented Jul 23, 2016

wjakob commented Jul 24, 2016

aldanor commented Jul 24, 2016 • edited Loading

aldanor commented Jul 24, 2016 • edited Loading

aldanor commented Jun 21, 2016 •

edited

Loading

aldanor commented Jul 17, 2016 •

edited

Loading

aldanor commented Jul 19, 2016 •

edited

Loading

aldanor commented Jul 19, 2016 •

edited

Loading

aldanor commented Jul 24, 2016 •

edited

Loading

aldanor commented Jul 24, 2016 •

edited

Loading