Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing pybind11::bytes() ambiguous overload issue and adding missing bytes type to test_constructors(). #2340

Merged
merged 1 commit into from
Jul 31, 2020

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Jul 29, 2020

Two of the additions work, but pybind11::bytes(obj) generates this compilation error (tested with c++11 & c++2a):

/usr/local/google/home/rwgk/forked/pybind11/tests/test_pytypes.cpp:214:23: error: ambiguous conversion for functional-style cast from 'pybind11::detail::item_accessor' (aka 'accessor<accessor_policies::generic_item>') to 'py::bytes'
            "bytes"_a=py::bytes(d["bytes"]),
                      ^~~~~~~~~~~~~~~~~~~~
/usr/local/google/home/rwgk/forked/pybind11/include/pybind11/detail/../pytypes.h:957:21: note: candidate constructor
    PYBIND11_OBJECT(bytes, object, PYBIND11_BYTES_CHECK)
                    ^
/usr/local/google/home/rwgk/forked/pybind11/include/pybind11/detail/../pytypes.h:957:21: note: candidate constructor
/usr/local/google/home/rwgk/forked/pybind11/include/pybind11/detail/../pytypes.h:987:15: note: candidate constructor
inline bytes::bytes(const pybind11::str &s) {
              ^
1 error generated.

@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Jul 30, 2020

Clue: the situation for py::str is almost exactly the same/symmetric, except that py::bytes has PYBIND11_OBJECT, while py::str has PYBIND11_OBJECT_CVT.

@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Jul 30, 2020

Next clue: this makes things compile:

diff --git a/include/pybind11/pytypes.h b/include/pybind11/pytypes.h
index 536835f7..d1c528a6 100644
--- a/include/pybind11/pytypes.h
+++ b/include/pybind11/pytypes.h
@@ -802,7 +802,9 @@ PYBIND11_NAMESPACE_END(detail)
     PYBIND11_OBJECT_COMMON(Name, Parent, CheckFun) \
     /* This is deliberately not 'explicit' to allow implicit conversion from object: */ \
     Name(const object &o) : Parent(o) { } \
-    Name(object &&o) : Parent(std::move(o)) { }
+    Name(object &&o) : Parent(std::move(o)) { } \
+    template <typename Policy_> \
+    Name(const ::pybind11::detail::accessor<Policy_> &a) : Name(object(a)) { }
 
 #define PYBIND11_OBJECT_DEFAULT(Name, Parent, CheckFun) \
     PYBIND11_OBJECT(Name, Parent, CheckFun) \

What I don't get is:

  • Why does py::bytes not use PYBIND11_OBJECT_CVT? Is there a reason py::bytes cannot have a converting constructor?
  • Why does PYBIND11_OBJECT not call check_? The constructor taking an object basically says "OK, I trust this is the type you say it is", without checking, almost like a C++ reinterpret_cast (except segfault safe, because Python is dynamically typed and only the PyObject* gets stored anyway). Maybe there are some good reasons for having this, but I feel this should have a much more dangerously looking construct (reinterpret_as?) that shouts "I assume you know what you're doing".

EDIT: Some digging into the git history might explain, though?

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 30, 2020

  • Why does PYBIND11_OBJECT not call check_?

One thing that comes to mind: in Python 2 int and long are different types. But that's the exception, not the rule.

@YannickJadoul
Copy link
Collaborator

One thing that comes to mind: in Python 2 int and long are different types. But that's the exception, not the rule.

int_ is still using PYBIND11_OBJECT_CVT, and uses the PYBIND11_LONG_CHECK macro for that (different between Python 2 and Python 3, for exactly the reason you describe).

class int_ : public object {
public:
PYBIND11_OBJECT_CVT(int_, object, PYBIND11_LONG_CHECK, PyNumber_Long)

@rwgk rwgk force-pushed the arg_bytes branch 2 times, most recently from 8d3fe95 to d4818c2 Compare July 30, 2020 22:09
@rwgk rwgk changed the title pybind11::bytes(obj): ambiguous conversion for functional-style cast Fixing pybind11::bytes() ambiguous overload issue and adding missing bytes type to test_constructors(). Jul 30, 2020
@rwgk
Copy link
Collaborator Author

rwgk commented Jul 30, 2020

I morphed this PR from a demo for the pybind11::bytes() ambiguous overload issue into a full PR that I want to merge.

Thanks @YannickJadoul for pointing out the fix!

You will see that I'm changing PYBIND11_OBJECT_COMMON. It isn't mentioned in the documentation, and isn't used directly anywhere in the Google codebase, which I take as an indication that it is probably rarely used externally.

@rwgk rwgk requested review from dean0x7d, bstaletic and YannickJadoul and removed request for dean0x7d July 30, 2020 22:56
@YannickJadoul
Copy link
Collaborator

Nice, getting some consistency/sanity into all of this :-)
Just a heads-up: I don't know when I will be able to do a full review. It might take a few days.

All in all, this seems fine to me, but I still want to see these two things answered (I'll dive into all blames and PRs) before I'm confident in saying that this is the way to go:

What I don't get is:

  • Why does py::bytes not use PYBIND11_OBJECT_CVT? Is there a reason py::bytes cannot have a converting constructor?

  • Why does PYBIND11_OBJECT not call check_? The constructor taking an object basically says "OK, I trust this is the type you say it is", without checking, almost like a C++ reinterpret_cast (except segfault safe, because Python is dynamically typed and only the PyObject* gets stored anyway). Maybe there are some good reasons for having this, but I feel this should have a much more dangerously looking construct (reinterpret_as?) that shouts "I assume you know what you're doing".

If anyone has any insights into this, though, do say so!

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 30, 2020

Nice, getting some consistency/sanity into all of this :-)
Just a heads-up: I don't know when I will be able to do a full review. It might take a few days.

All in all, this seems fine to me, but I still want to see these two things answered (I'll dive into all blames and PRs) before I'm confident in saying that this is the way to go:

What I don't get is:

  • Why does py::bytes not use PYBIND11_OBJECT_CVT? Is there a reason py::bytes cannot have a converting constructor?
  • Why does PYBIND11_OBJECT not call check_? The constructor taking an object basically says "OK, I trust this is the type you say it is", without checking, almost like a C++ reinterpret_cast (except segfault safe, because Python is dynamically typed and only the PyObject* gets stored anyway). Maybe there are some good reasons for having this, but I feel this should have a much more dangerously looking construct (reinterpret_as?) that shouts "I assume you know what you're doing".

If anyone has any insights into this, though, do say so!

Thanks Yannick, I strongly feel that these questions are best addressed in separate PRs.
I'm really glad to finally get some solid ground under my feet with this PR. This PR is an important stepping stone for me to make progress with the other PRs I opened, and eventually #2198 which got me started on a small odyssey.

@YannickJadoul
Copy link
Collaborator

Thanks Yannick, I strongly feel that these questions are best addressed in separate PRs.

Oh, I see. I was/am a bit afraid of making things even more complicated, potentially forgetting to tie up loose ends. If py::bytes is not using PYBIND11_OBJECT_CVT by mistake, but there was a good reason to have PYBIND11_OBJECT not have a conversion from detail::accessor<Policy_>, we might be making things worse.

I'm really glad to finally get some solid ground under my feet with this PR. This PR is an important stepping stone for me to make progress with the other PRs I opened, and eventually #2198 which got me started on a small odyssey.

Right, I understand. Things have gotten complex very fast!
I do actually believe this is a correct fix.
My main concern is that this code was written and maintained by smart people before, and I'm suspicious about overlooking some complex reasoning why things are the way they are. Every "quick fix" I've wanted to do in pybind11 turned out to not be that straightforward at all ;-)

Regardless of that, I have a suggestion: what if we'd make a longer-running str-bytes-consistency branch in the pybind11 repo to collect all str/bytes/unicode/... changes? That would allow us to merge things fix by fix, see progress, but also still keep an eye on the global picture (and provide @wjakob with a comprehensive, complete overview at the end).

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 31, 2020

Regardless of that, I have a suggestion: what if we'd make a longer-running str-bytes-consistency branch in the pybind11 repo to collect all str/bytes/unicode/... changes?

I have mixed feelings.
Thoughts:

  • I look at this PR as a harmless bug fix, as close as it gets.
  • It makes all PYBIND11_OBJECT macros consistent. — Note that "PYBIND11_OBJECT" does not appear in the documentation and therefore this family of macros can safely be considered an implementation detail; it appears in only 3 .h files.
  • It adds to an existing test.

That's really all.

Branching adds a lot of friction. Generally, I don't have great experience with branching. Note also Bertrand Meyer's Agile class (edx), it calls out branching as one of the top bad ideas.

To be completely thorough: the "str": u"" assignment for Python 2 in the modified test makes that part no longer rely on the chimera nature of pybind11::str (#2343 is nothing more than a proof of that, with just a small tweak to existing tests; I'm planning to drop that PR completely).

The chimera issue will be fixed with #2256, modified by following your suggestion to replace detail::PyUnicode_Check_Permissive with PyUnicode_Check. The only thing I still have to figure out for #2256 is to NOT disable implicit pybind11::bytes to pybind11:str conversion at the same time. If I can achieve that, #2256 will again be a non-behavior-changing pure bug fix.

#2256 will be the only additional step after this PR needed to make pybind11:str true to the intent that it encapsulates Python 3 str (it is not a chimera anymore), which will mean we're in a much better place, from which we can then safely experiment with using PYBIND11_OBJECT_CVT for pybind11::bytes, and adding calls to check_ (although I expect they will be redundant because we will not have a "permissive" CheckFun anymore).

I think branching will make this only more difficult, especially branching for this pure bug fix, and hopefully already the next step (#2256) will make things fall into place.

@YannickJadoul
Copy link
Collaborator

Related PR, introducing PYBIND11_OBJECT_CVT for py::str but PYBIND11_OBJECT for py::bytes: #464 (e18bc02)

@YannickJadoul
Copy link
Collaborator

Alright, Ralf and I have been going back in time 3 or 4 years, and found this in pybind11's history:

  • Why does py::bytes not use PYBIND11_OBJECT_CVT? Is there a reason py::bytes cannot have a converting constructor?

#464 introduced a rework of PYBIND11_OBJECT_CVT and PYBIND11_OBJECT. Quoting #464 (comment), the reason to not have py::bytes use PYBIND11_OBJECT_CVT is:

The only odd class is bytes. There are significant differences between Python 2.x and 3.x which make a general object -> bytes conversion pretty messy. There is already a str -> bytes converting constructor which should be enough for most cases, so I didn't want to complicate this too much.

We'll see in a later PR how easy it is to increase consistency here

* Why does `PYBIND11_OBJECT` not call `check_`? The constructor taking an `object` basically says "OK, I trust this is the type you say it is", without checking, almost like a C++ `reinterpret_cast` (except segfault safe, because Python is dynamically typed and only the `PyObject*` gets stored anyway). Maybe there are some good reasons for having this, but I feel this should have a much more dangerously looking construct (`reinterpret_as`?) that shouts "I assume you know what you're doing".

This still needs to be investigated (another PR to try things out), but there is no reason that PYBIND11_OBJECT wouldn't have a constructor taking a detail::accessor<Policy>, just like PYBIND11_OBJECT_CVT, as far as I understand. The PR that introduced that is #1076, and since it's a small PR fixing a bug report, a reasonable explanation is that the PYBIND11_OBJECT constructor taking detail::accessor<Policy> was overlooked. So +1 for adding it to PYBIND11_OBJECT as well (through PYBIND11_OBJECT_COMMON, in particular, as this PR does it).

Copy link
Collaborator

@YannickJadoul YannickJadoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words: let's get this in!

@YannickJadoul YannickJadoul changed the base branch from master to str-bytes-cleanup July 31, 2020 12:57
@YannickJadoul
Copy link
Collaborator

Merging into str-bytes-cleanup instead of master, so we have a long-lived tracking PR to see all changes to str/bytes

@YannickJadoul YannickJadoul merged commit 60cba1c into pybind:str-bytes-cleanup Jul 31, 2020
@rwgk rwgk deleted the arg_bytes branch July 31, 2020 13:08
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Aug 1, 2020
YannickJadoul pushed a commit to YannickJadoul/pybind11 that referenced this pull request Aug 3, 2020
YannickJadoul pushed a commit to YannickJadoul/pybind11 that referenced this pull request Aug 3, 2020
YannickJadoul pushed a commit to YannickJadoul/pybind11 that referenced this pull request Aug 3, 2020
YannickJadoul pushed a commit to YannickJadoul/pybind11 that referenced this pull request Aug 3, 2020
rwgk pushed a commit that referenced this pull request Aug 4, 2020
rwgk pushed a commit that referenced this pull request Aug 7, 2020
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Aug 12, 2020
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Aug 12, 2020
…s one file).

The two other files changed with PR pybind#2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR pybind#2380.
rwgk pushed a commit that referenced this pull request Aug 13, 2020
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Aug 13, 2020
…s one file).

The two other files changed with PR pybind#2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR pybind#2380.
YannickJadoul pushed a commit that referenced this pull request Aug 13, 2020
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file).

The two other files changed with PR #2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR #2380.

* test_constructors() fix for Python 2.

Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`).

Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change.

Tests for `bytes` conversions will be added separately.

* Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
rwgk pushed a commit that referenced this pull request Aug 14, 2020
rwgk pushed a commit that referenced this pull request Aug 14, 2020
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file).

The two other files changed with PR #2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR #2380.

* test_constructors() fix for Python 2.

Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`).

Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change.

Tests for `bytes` conversions will be added separately.

* Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
rwgk pushed a commit that referenced this pull request Aug 14, 2020
rwgk pushed a commit that referenced this pull request Aug 14, 2020
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file).

The two other files changed with PR #2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR #2380.

* test_constructors() fix for Python 2.

Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`).

Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change.

Tests for `bytes` conversions will be added separately.

* Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
rwgk pushed a commit that referenced this pull request Aug 16, 2020
rwgk pushed a commit that referenced this pull request Aug 16, 2020
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file).

The two other files changed with PR #2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR #2380.

* test_constructors() fix for Python 2.

Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`).

Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change.

Tests for `bytes` conversions will be added separately.

* Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
YannickJadoul pushed a commit to YannickJadoul/pybind11 that referenced this pull request Aug 16, 2020
YannickJadoul pushed a commit to YannickJadoul/pybind11 that referenced this pull request Aug 16, 2020
…2390)

* Rolling back PR pybind#2340 change to tests/test_pytypes.py (only this one file).

The two other files changed with PR pybind#2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR pybind#2380.

* test_constructors() fix for Python 2.

Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`).

Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change.

Tests for `bytes` conversions will be added separately.

* Adding test_constructors test for bytes, on top of cherry-picked commit from PR pybind#2380.
rwgk pushed a commit that referenced this pull request Aug 17, 2020
rwgk pushed a commit that referenced this pull request Aug 17, 2020
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file).

The two other files changed with PR #2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR #2380.

* test_constructors() fix for Python 2.

Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`).

Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change.

Tests for `bytes` conversions will be added separately.

* Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants