Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py::dtype, buffer protocol improvements, structured types support #308

Merged
merged 87 commits into from
Aug 13, 2016

Conversation

aldanor
Copy link
Member

@aldanor aldanor commented Aug 1, 2016

Detailed discussions in aldanor#1 and #67.

Despite the fact that there's a lot of changes, I believe there's no user-facing breaking ones.

@aldanor
Copy link
Member Author

aldanor commented Aug 1, 2016

Let's see if this builds at all on MSVC now 🐼

@wjakob
Copy link
Member

wjakob commented Aug 1, 2016

Looks like no ;).

About your comment in the other thread regarding ownership: How about adding an extra default argument py::object owner = py::none() to all py::array / py::array_t<> constructors. If owner is specified, no copy is made, and NumPy will reference owner to ensure that the data memory region stays alive at least until the array is garbage collected. If owner is py::none, the behavior is identical to the current implementation.

@aldanor
Copy link
Member Author

aldanor commented Aug 1, 2016

Yea, a few things to fix of course... I'm on it :)

I was pondering about the whole ownership thing too and I'm not quite sure what's the best approach. It'd be nice if it "just worked" out of the box; the numpy-esque behaviour (e.g. how pandas does it, how cython does it) would be to avoid copying whenever possible and only do a copy when explicitly requested (or when ensurecopy flag is passed). Wonder if the ensure-copy flag belongs with the casting flags (e.g. strict casting, equivalent casting), or should it be a separate thing like an owner parameter as you suggested?

One of the most typical use case for providing views rather than copies, I think, would be something like this:

struct Data { std::vector<int> data; };

py::class_<Data>("Data", m)
    .def_property_readonly("data", [](Data& self) {  // non-const since we require a non-const ptr?
        return py::array_t<int>(self.data.data(), self.data.size(), /* owner = */ self);
    });

Did you have something like this in mind?

Note that a lot of things may still go bad, like vector being resized or something like that but that's inevitable and would be the user's fault.

@aldanor
Copy link
Member Author

aldanor commented Aug 1, 2016

Note that in theory it's also possible to create immutable/readonly views by disabling WRITEABLE flag in the array interface; so hypothetically you could get a view from a const pointer that's not writeable. I haven't ever tried using that though.

@wjakob
Copy link
Member

wjakob commented Aug 1, 2016

Do we both agree that an owner is definitely needed for reference counting purposes if the ENSURECOPY flag is not specified to NumPy? (by owner I mean what is passed via the obj argument to e.g. PyArray_NewFromDescr) -- without an owner, there is no mechanism to ensure that the referenced memory region stays alive while it is being referenced. The issue of who is allowed to read/write is also important but orthogonal to this issue.

Now let's say that the user wants to copy the array contents. In that case I don't see why it would be useful to specify an owner since the lifetimes are completely decoupled.

So as far as I can see, a default argument is sufficient. There is no need to add a ensurecopy flag analogous to the current strict casting/memory layout flags.

BTW: I am pretty sure that NumPy will allocate a new memory region if the "slave" NumPy array gets a reshape request, since it does not own that memory region and thus can't just go and free it.

Regarding your code example: to be super-strict, I think it would have to be

/* owner = */ py::cast(self)

@aldanor
Copy link
Member Author

aldanor commented Aug 2, 2016

@wjakob Argh, apparently numpy wants str objects in both Python 2 (bytes) and Python 3 (unicode), so neither pybind11::str or pybind11::bytes would actually work.

I can do it manually, but I had a few ideas re: bytes/str that could help in situations like this.

It would be nice to have functions (not necessarily free functions; maybe methods or operators) like to_bytes and to_str which would do something like this:

# py3

def to_bytes(s):
    if isinstance(s, bytes):
        return s
    return s.encode()

def to_str(s):
    if isinstance(s, str):
        return s
    return s.decode()

Or maybe operator str for bytes and operator bytes for str? Something along those lines...

You can kind of do it through casting to string currently, I believe, but it's a bit ugly.

@wjakob
Copy link
Member

wjakob commented Aug 2, 2016

Adding a conversion operator to str and bytes sounds good to me! For the encoding let's stick to utf8.

@wjakob
Copy link
Member

wjakob commented Aug 8, 2016

I'm very curious about the status of this PR -- are there any news? It seems it's almost ready to be merged.

@aldanor
Copy link
Member Author

aldanor commented Aug 13, 2016

@wjakob Apologies for the delay, real life occurred so I fell off the Internet for a while :) Indeed, it's almost done, there's a few tiny things to fix like str/unicode woes which I've mostly finished locally, will try to finish it up this weekend.

@wjakob
Copy link
Member

wjakob commented Aug 13, 2016

Oh, I hope everything is okay! -- no need to apologize. I look forward to shipping this :shipit:

@aldanor
Copy link
Member Author

aldanor commented Aug 13, 2016

@dean0x7d Yea that's exactly what I did :) Seems to work on Windows from the first glance; will push changes soon.

// Just out of wonder, I tried forcing MSVC to actually use the cast operators, define rvalue cast operators etc, but no matter what I did it would just ignore it. A very stubborn compiler..

@aldanor
Copy link
Member Author

aldanor commented Aug 13, 2016

Ok, all green now

@wjakob
Copy link
Member

wjakob commented Aug 13, 2016

This is ready to be merged now?

@aldanor
Copy link
Member Author

aldanor commented Aug 13, 2016

Yea, hmm I think so.

A few things are completely undocumented (like dtype class, the new array ctors etc), but I was planning to write that up after finishing the numpy ownership stuff.

I could probably compile a list of changes for the changelog though if that helps.

@wjakob
Copy link
Member

wjakob commented Aug 13, 2016

It's fine, we can do the documentation in a separate PR (this one has gotten big enough ;)). If you want to add a few lines to the changelog here, then that sounds good to me.

@wjakob
Copy link
Member

wjakob commented Aug 13, 2016

And don't forget to add yourself to the contributors list in README.md :)

@aldanor
Copy link
Member Author

aldanor commented Aug 13, 2016

@wjakob Done, feel free to reshuffle the changelog if needed 🐼

@wjakob
Copy link
Member

wjakob commented Aug 13, 2016

Great, thank you for the epic PR.

@aldanor
Copy link
Member Author

aldanor commented Aug 13, 2016

Yw -- more epic than I initially planned, that's for sure!🍸

@wjakob
Copy link
Member

wjakob commented Aug 13, 2016

💥 🎉 🎄

@wjakob wjakob merged commit 3c3533b into pybind:master Aug 13, 2016
@jagerman
Copy link
Member

With this merged, I can no longer run any test code (on my Debian Linux system with gcc 6). I get:

jagerman@rawls:~/src/upstream-pybind11/example⦗master⦘$ python3 example-methods-and-attributes.py 
Traceback (most recent call last):
  File "example-methods-and-attributes.py", line 6, in <module>
    from example import ExampleMandA
ImportError: Could not allocate string object!

@aldanor
Copy link
Member Author

aldanor commented Aug 13, 2016

@jagerman That's interesting since it doesn't seem to occur anywhere else. C++14? Python 3.5?

I've just rebuilt the test suite with gcc 6.1.0 on OS X (C++14 / 3.4.4), all tests seem to pass.

@jagerman
Copy link
Member

Yes, to both. With -std=c++11 it works, but not under -std=c++14.

@jagerman
Copy link
Member

I can set up a temporary account for you on a Debian testing system where this is happening, if you e-mail me an ssh public key (jason@imaginary.ca).

@aldanor
Copy link
Member Author

aldanor commented Aug 13, 2016

Ok, can reproduce in a VM, GCC 5.3.1, ubuntu Xenial, x64, Python 3.5.2 -- I'll take a look.

@jagerman
Copy link
Member

Okay, good (I'm not crazy) :)

@aldanor
Copy link
Member Author

aldanor commented Aug 13, 2016

@jagerman I think it's because the format strings for char[N] and std::array<char, N> look like garbage and fail to get UTF8-decoded:

T{=�=�+:a:=�;�+:b:}

This would be related to the compile-time string concatenation hacks which I'm not intimately familiar with (used in format descriptors here); it also explains why it fails on C++14 since PYBIND11_DESCR macro expands to constexpr auto then.

Maybe @wjakob could shed some light on this since there were some changes done recently?

@jagerman
Copy link
Member

jagerman commented Aug 14, 2016

The problem looks to me like your s.text() is returning an address to a temporary, i.e. your s isn't guaranteed to persist past the end of your function. If I replace those two function bodies with either of:

static const char *format() { static std::string s{std::to_string(N) + "s"}; return s.c_str(); }

or

static const char *format() { static PYBIND11_DESCR s = detail::_<N>() + detail::_("s"); return s.text(); }

everything works again.

@aldanor
Copy link
Member Author

aldanor commented Aug 15, 2016

@jagerman I believe this should fix it: #334

@jagerman
Copy link
Member

Yes, I've been following, it looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants