-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing pybind11::str to only hold PyUnicodeObject (NOT also bytes). #2380
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like this one. Should it be merged into str_bytes_cleanup
instead of master
?
Thanks Boris! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a shorter solution (and more elegant (?); I like it better, at least).
The only other suggestion I have is to leave out PYBIND11_DISABLE_IMPLICIT_STR_FROM_BYTES
from this PR, and give it its own discussion. But maybe that's too nitpicky?
include/pybind11/cast.h
Outdated
#else | ||
template <typename T = type, enable_if_t<std::is_same<T, str>::value, int> = 0> | ||
bool load(handle src, bool /* convert */) { | ||
if (isinstance<bytes>(src)) { | ||
PyObject *str_from_bytes = PyUnicode_FromEncodedObject(src.ptr(), "utf-8", nullptr); | ||
if (!str_from_bytes) throw error_already_set(); | ||
value = reinterpret_steal<type>(str_from_bytes); | ||
return true; | ||
} | ||
if (!isinstance<type>(src)) | ||
return false; | ||
value = reinterpret_borrow<type>(src); | ||
return true; | ||
} | ||
|
||
template <typename T = type, enable_if_t<std::is_base_of<object, T>::value && !std::is_same<T, str>::value, int> = 0> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to figure out what's going on here. While technically correct, it's confusing that "template header" on line 1633 belongs to the template function on line 1651 rather than 1636.
It feels like there has to be a better way? Either by writing to separate #ifdef
s or by reversing the order and first putting the #ifndef
(or #if !defined(...)
) branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next, this is also duplicated code with raw_str
. Couldn't this be simplified to this?
template <typename T = type, enable_if_t<std::is_same<T, str>::value, int> = 0>
bool load(handle src, bool /* convert */) {
if (isinstance<bytes>(src)) {
value = str(reinterpret_borrow<bytes>(src));
return true;
}
if (!isinstance<type>(src))
return false;
value = reinterpret_borrow<type>(src);
return true;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even smaller, combining my two remarks:
template <typename T = type, enable_if_t<std::is_base_of<object, T>::value, int> = 0>
bool load(handle src, bool /* convert */) {
#ifndef PYBIND11_DISABLE_IMPLICIT_STR_FROM_BYTES
if (std::is_same<T, str>::value && isinstance<bytes>(src)) {
value = str(reinterpret_borrow<bytes>(src));
return true;
}
#endif
if (!isinstance<type>(src))
return false;
value = reinterpret_borrow<type>(src);
return true;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, wow, apparently py::str(handle)
doesn't do the same as py::str(bytes)
!??
Anyway, like THIS, then:
template <typename T = type, enable_if_t<std::is_base_of<object, T>::value, int> = 0>
bool load(handle src, bool /* convert */) {
#ifndef PYBIND11_DISABLE_IMPLICIT_STR_FROM_BYTES
if (std::is_same<T, str>::value && isinstance<bytes>(src)) {
value = str(src);
return true;
}
#endif
if (!isinstance<type>(src))
return false;
value = reinterpret_borrow<type>(src);
return true;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thanks!! The std::is_same
inside the load
function is much better! (I was trapped in not considering combining the compile-time condition with the runtime condition.)
value = str(src)
doesn't work with Python 2 or Python 3, but putting my original code there does the trick.
For completeness, with value = str(src)
the errors are:
Python2:
> m.pass_to_pybind11_str(malformed_utf8)
E RuntimeError: Unable to compute length of object
Python 3:
> assert m.pass_to_pybind11_str(b"Bytes") == 5 # implicit decode
E AssertionError: assert 8 == 5
E + where 8 = <built-in method pass_to_pybind11_str of PyCapsule object at 0x7f925c32d090>(b'Bytes')
E + where <built-in method pass_to_pybind11_str of PyCapsule object at 0x7f925c32d090> = m.pass_to_pybind11_str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I already get where the Python 3 error originates! In Python 3, str(b"Bytes")
results in 'b"Bytes"'
!
Trying to understand the Python 2 error, but then there's maybe no way to avoid the C API call code duplication :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha. That's because the unexpected Python 2 error is a bug showing itself. I'll create a PR ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There we go: #2392
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Can't believe I'm still fixing Python 2 bugs ... :-| )
include/pybind11/cast.h
Outdated
@@ -1629,7 +1629,25 @@ struct pyobject_caster { | |||
template <typename T = type, enable_if_t<std::is_same<T, handle>::value, int> = 0> | |||
bool load(handle src, bool /* convert */) { value = src; return static_cast<bool>(value); } | |||
|
|||
#ifdef PYBIND11_DISABLE_IMPLICIT_STR_FROM_BYTES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a horribly long name.
But actually, it's not related to this PR, right? Introducing it seems like it could be another discussion/PR, since maybe we ought to deprecate this str
-from-bytes
behavior so it can be removed in a pybind11 3.0.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not meant to be a public or permanent macro. I could also just put a couple comments marking the begin and end of the part that implements the implicit decoding. The macro has the advantage (IMO) that it stands out more and that we can slightly more easily experiment in the run-up for version 3.0.0 (you can just put a #define
right before). I don't feel very strongly about macro versus comment; only that we should have some demarkation.
…s one file). The two other files changed with PR pybind#2340 are not affected by this partial rollback. This partial rollback enables cherry-picking a commit from PR pybind#2380.
…s one file). The two other files changed with PR pybind#2340 are not affected by this partial rollback. This partial rollback enables cherry-picking a commit from PR pybind#2380.
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file). The two other files changed with PR #2340 are not affected by this partial rollback. This partial rollback enables cherry-picking a commit from PR #2380. * test_constructors() fix for Python 2. Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`). Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change. Tests for `bytes` conversions will be added separately. * Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file). The two other files changed with PR #2340 are not affected by this partial rollback. This partial rollback enables cherry-picking a commit from PR #2380. * test_constructors() fix for Python 2. Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`). Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change. Tests for `bytes` conversions will be added separately. * Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
3c0f566
to
f38fafc
Compare
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file). The two other files changed with PR #2340 are not affected by this partial rollback. This partial rollback enables cherry-picking a commit from PR #2380. * test_constructors() fix for Python 2. Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`). Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change. Tests for `bytes` conversions will be added separately. * Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
f38fafc
to
6cd4552
Compare
…old `PyUnicodeObject` (NOT also `bytes`). NO change in behavior. These additional tests are designed to clearly bring out behavior changes related to planned `pybind11::str` changes.
…tes`). The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read. Note: Unlike PR pybind#2256 (dropped), this PR only changes exactly one behavior. The two other behavior changes discussed under PR pybind#2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file). The two other files changed with PR #2340 are not affected by this partial rollback. This partial rollback enables cherry-picking a commit from PR #2380. * test_constructors() fix for Python 2. Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`). Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change. Tests for `bytes` conversions will be added separately. * Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
…2390) * Rolling back PR pybind#2340 change to tests/test_pytypes.py (only this one file). The two other files changed with PR pybind#2340 are not affected by this partial rollback. This partial rollback enables cherry-picking a commit from PR pybind#2380. * test_constructors() fix for Python 2. Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`). Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change. Tests for `bytes` conversions will be added separately. * Adding test_constructors test for bytes, on top of cherry-picked commit from PR pybind#2380.
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file). The two other files changed with PR #2340 are not affected by this partial rollback. This partial rollback enables cherry-picking a commit from PR #2380. * test_constructors() fix for Python 2. Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`). Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change. Tests for `bytes` conversions will be added separately. * Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
This PR obsoletes PR #2256. The older PR convoluted three behavior changes. This new PR changes exactly one behavior.
The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read.
The two other behavior changes discussed under PR #2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.