Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing pybind11::str to only hold PyUnicodeObject (NOT also bytes). #2380

Merged
merged 2 commits into from
Aug 15, 2020

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Aug 11, 2020

This PR obsoletes PR #2256. The older PR convoluted three behavior changes. This new PR changes exactly one behavior.

The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read.

The two other behavior changes discussed under PR #2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.

Copy link
Collaborator

@bstaletic bstaletic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like this one. Should it be merged into str_bytes_cleanup instead of master?

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 12, 2020

I do like this one. Should it be merged into str_bytes_cleanup instead of master?

Thanks Boris!
I'll start a thread about the branch on slack.

Copy link
Collaborator

@YannickJadoul YannickJadoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a shorter solution (and more elegant (?); I like it better, at least).

The only other suggestion I have is to leave out PYBIND11_DISABLE_IMPLICIT_STR_FROM_BYTES from this PR, and give it its own discussion. But maybe that's too nitpicky?

Comment on lines 1634 to 1649
#else
template <typename T = type, enable_if_t<std::is_same<T, str>::value, int> = 0>
bool load(handle src, bool /* convert */) {
if (isinstance<bytes>(src)) {
PyObject *str_from_bytes = PyUnicode_FromEncodedObject(src.ptr(), "utf-8", nullptr);
if (!str_from_bytes) throw error_already_set();
value = reinterpret_steal<type>(str_from_bytes);
return true;
}
if (!isinstance<type>(src))
return false;
value = reinterpret_borrow<type>(src);
return true;
}

template <typename T = type, enable_if_t<std::is_base_of<object, T>::value && !std::is_same<T, str>::value, int> = 0>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to figure out what's going on here. While technically correct, it's confusing that "template header" on line 1633 belongs to the template function on line 1651 rather than 1636.

It feels like there has to be a better way? Either by writing to separate #ifdefs or by reversing the order and first putting the #ifndef (or #if !defined(...)) branch?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next, this is also duplicated code with raw_str. Couldn't this be simplified to this?

    template <typename T = type, enable_if_t<std::is_same<T, str>::value, int> = 0>
    bool load(handle src, bool /* convert */) {
        if (isinstance<bytes>(src)) {
            value = str(reinterpret_borrow<bytes>(src));
            return true;
        }
        if (!isinstance<type>(src))
            return false;
        value = reinterpret_borrow<type>(src);
        return true;
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even smaller, combining my two remarks:

    template <typename T = type, enable_if_t<std::is_base_of<object, T>::value, int> = 0>
    bool load(handle src, bool /* convert */) {
#ifndef PYBIND11_DISABLE_IMPLICIT_STR_FROM_BYTES
        if (std::is_same<T, str>::value && isinstance<bytes>(src)) {
            value = str(reinterpret_borrow<bytes>(src));
            return true;
        }
#endif
        if (!isinstance<type>(src))
            return false;
        value = reinterpret_borrow<type>(src);
        return true;
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, wow, apparently py::str(handle) doesn't do the same as py::str(bytes)!??

Anyway, like THIS, then:

    template <typename T = type, enable_if_t<std::is_base_of<object, T>::value, int> = 0>
    bool load(handle src, bool /* convert */) {
#ifndef PYBIND11_DISABLE_IMPLICIT_STR_FROM_BYTES
        if (std::is_same<T, str>::value && isinstance<bytes>(src)) {
            value = str(src);
            return true;
        }
#endif
        if (!isinstance<type>(src))
            return false;
        value = reinterpret_borrow<type>(src);
        return true;
    }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks!! The std::is_same inside the load function is much better! (I was trapped in not considering combining the compile-time condition with the runtime condition.)

value = str(src) doesn't work with Python 2 or Python 3, but putting my original code there does the trick.

For completeness, with value = str(src) the errors are:

Python2:

>           m.pass_to_pybind11_str(malformed_utf8)
E           RuntimeError: Unable to compute length of object

Python 3:

>       assert m.pass_to_pybind11_str(b"Bytes") == 5  # implicit decode
E       AssertionError: assert 8 == 5
E        +  where 8 = <built-in method pass_to_pybind11_str of PyCapsule object at 0x7f925c32d090>(b'Bytes')
E        +    where <built-in method pass_to_pybind11_str of PyCapsule object at 0x7f925c32d090> = m.pass_to_pybind11_str

Copy link
Collaborator

@YannickJadoul YannickJadoul Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I already get where the Python 3 error originates! In Python 3, str(b"Bytes") results in 'b"Bytes"'!
Trying to understand the Python 2 error, but then there's maybe no way to avoid the C API call code duplication :-(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha. That's because the unexpected Python 2 error is a bug showing itself. I'll create a PR ;-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There we go: #2392

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Can't believe I'm still fixing Python 2 bugs ... :-| )

include/pybind11/pytypes.h Show resolved Hide resolved
@@ -1629,7 +1629,25 @@ struct pyobject_caster {
template <typename T = type, enable_if_t<std::is_same<T, handle>::value, int> = 0>
bool load(handle src, bool /* convert */) { value = src; return static_cast<bool>(value); }

#ifdef PYBIND11_DISABLE_IMPLICIT_STR_FROM_BYTES
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a horribly long name.

But actually, it's not related to this PR, right? Introducing it seems like it could be another discussion/PR, since maybe we ought to deprecate this str-from-bytes behavior so it can be removed in a pybind11 3.0.0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not meant to be a public or permanent macro. I could also just put a couple comments marking the begin and end of the part that implements the implicit decoding. The macro has the advantage (IMO) that it stands out more and that we can slightly more easily experiment in the run-up for version 3.0.0 (you can just put a #define right before). I don't feel very strongly about macro versus comment; only that we should have some demarkation.

rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Aug 12, 2020
…s one file).

The two other files changed with PR pybind#2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR pybind#2380.
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Aug 12, 2020
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Aug 13, 2020
…s one file).

The two other files changed with PR pybind#2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR pybind#2380.
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Aug 13, 2020
YannickJadoul pushed a commit that referenced this pull request Aug 13, 2020
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file).

The two other files changed with PR #2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR #2380.

* test_constructors() fix for Python 2.

Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`).

Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change.

Tests for `bytes` conversions will be added separately.

* Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
@rwgk rwgk changed the base branch from master to str-bytes-cleanup August 13, 2020 23:30
rwgk pushed a commit that referenced this pull request Aug 14, 2020
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file).

The two other files changed with PR #2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR #2380.

* test_constructors() fix for Python 2.

Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`).

Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change.

Tests for `bytes` conversions will be added separately.

* Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
@rwgk rwgk force-pushed the str-bytes-cleanup branch from 3c0f566 to f38fafc Compare August 14, 2020 20:08
rwgk pushed a commit that referenced this pull request Aug 14, 2020
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file).

The two other files changed with PR #2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR #2380.

* test_constructors() fix for Python 2.

Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`).

Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change.

Tests for `bytes` conversions will be added separately.

* Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
@rwgk rwgk force-pushed the str-bytes-cleanup branch from f38fafc to 6cd4552 Compare August 14, 2020 22:03
…old `PyUnicodeObject` (NOT also `bytes`).

NO change in behavior. These additional tests are designed to clearly bring out behavior changes related to planned `pybind11::str` changes.
…tes`).

The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read.

Note: Unlike PR pybind#2256 (dropped), this PR only changes exactly one behavior. The two other behavior changes discussed under PR pybind#2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.
@rwgk rwgk merged commit 1d553e4 into pybind:str-bytes-cleanup Aug 15, 2020
@rwgk rwgk deleted the true_str branch August 15, 2020 05:33
rwgk pushed a commit that referenced this pull request Aug 16, 2020
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file).

The two other files changed with PR #2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR #2380.

* test_constructors() fix for Python 2.

Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`).

Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change.

Tests for `bytes` conversions will be added separately.

* Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
YannickJadoul pushed a commit to YannickJadoul/pybind11 that referenced this pull request Aug 16, 2020
…2390)

* Rolling back PR pybind#2340 change to tests/test_pytypes.py (only this one file).

The two other files changed with PR pybind#2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR pybind#2380.

* test_constructors() fix for Python 2.

Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`).

Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change.

Tests for `bytes` conversions will be added separately.

* Adding test_constructors test for bytes, on top of cherry-picked commit from PR pybind#2380.
rwgk pushed a commit that referenced this pull request Aug 17, 2020
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file).

The two other files changed with PR #2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR #2380.

* test_constructors() fix for Python 2.

Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`).

Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change.

Tests for `bytes` conversions will be added separately.

* Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants