Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding tests specifically to exercise pybind11::str::raw_str. #2366

Merged
merged 1 commit into from
Aug 11, 2020

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Aug 5, 2020

Initially this was meant to be a PR preparing for follow-up PR #2367. The discussion on the second PR exposed behavior changes I was unaware of initially. I will drop #2367 for now because it's non-essential at the moment, to not distract further from the far more important pybind11::str change.

Copy link
Collaborator

@YannickJadoul YannickJadoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good. A minor remark, but otherwise, yes, this should stay the same :-)

# specifically to exercise pybind11::str::raw_str
cvt = m.convert_to_pybind11_str
assert cvt(u"Str") == u"Str"
assert cvt(b"Bytes") == u"Bytes" if str is bytes else "b'Bytes'"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe use sys.version_info.major < 3 instead of str is bytes?

Or start using some kind of compatibility library (cfr. https://python-future.org/compatible_idioms.html#strings-and-bytes, or six)? Another thing that could make things simpler/more Python 3: from __future__ import unicode_literals would allow us to remove all u prefixes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were other parts of test code that explicitly check for 2 vs. 3, I believe. Let me find it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples:

byte = bytes if sys.version_info[0] < 3 else str

PY3 = sys.version_info[0] >= 3

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I've filed #2376

def test_pybind11_str_raw_str():
# specifically to exercise pybind11::str::raw_str
cvt = m.convert_to_pybind11_str
assert cvt(u"Str") == u"Str"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be:

pybind11_str = unicode if sys.version_info.major < 3 else str
for test in [u"Str",
             None,
             False,
             True,
             ...]:
    assert cvt(test) == pybind11_str(test)

Or do we need to be explicit here, line by line?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually prefer it being explicit, just to make it as clear as possible, and permit specific branching.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fine. Let's leave it up to @rwgk, then :-)

assert cvt({}) == u"{}"
assert cvt({3: 4}) == u"{3: 4}"
assert cvt(set()) == u"set([])" if str is bytes else "set()"
assert cvt({3, 3}) == u"set([3])" if str is bytes else "{3}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another one, sorry. There're no funky unicode characters/string/weirdly encoded strings as bytes here!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And while you're add it, I don't know how much work it is, but some checks on types with custom __str__/__unicode__ methods might be interesting?

@YannickJadoul
Copy link
Collaborator

rwgk wants to merge 1 commit into pybind:master from rwgk:test_pybind11_str_raw_str

Is this correct, or do you want to merge into str-bytes-cleanup?

@rwgk
Copy link
Collaborator Author

rwgk commented Aug 5, 2020

rwgk wants to merge 1 commit into pybind:master from rwgk:test_pybind11_str_raw_str

Is this correct, or do you want to merge into str-bytes-cleanup?

It's intentional: no behavior change, just hardening tests, and a very-small-scale internal cleanup in the companion PR. This will help keep the branch simpler for Wenzel to review when he's back. I actually hope to extract more of the new pure test code into master ahead of time. That way Wenzel can easily focus on what's actually changing.

The whole point of bringing in a team of maintainer is to take load off Wenzel's shoulders.

Thanks for all the suggestions. I think I will take them all (hopefully later today), except in the context of str-vs-bytes, the str is bytes condition makes things super obvious, as opposed to looking at sys.version_info; that latter takes an extra mental leap to realize "ah, str is bytes".

@YannickJadoul
Copy link
Collaborator

It's intentional: no behavior change,

Ah, OK; my thoughts behind this were that it would be nice to keep an overview of all the things that changed to the str-bytes handling. Just as well for @wjakob (OK, yes, this is perfectly fine to not pass his sight, agreed) as for us to check "are all the small tests and changes added now consistent, altogether?".

except in the context of str-vs-bytes, the str is bytes condition makes things super obvious, as opposed to looking at sys.version_info; that latter takes an extra mental leap to realize "ah, str is bytes".

Ah, OK. Then we have a different mental model :-D For me it's exactly the opposite way around. It's the str is bytes that always take me an extra mental leap: "Ah, yes, this is ... Python 2, so str and unicode rather than bytes and str". But either way works, I suppose. Not going to bikeshed this, then ;-)

These tests will also alert us to any behavior changes across Python and PyPy versions.

Hardening tests in preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`). Note that this test exposes that `pybind11::str` can also hold `bytes`.
@rwgk rwgk force-pushed the test_pybind11_str_raw_str branch from b10d309 to a5d1aaa Compare August 10, 2020 22:59
@rwgk rwgk changed the title adding tests specifically to exercise pybind11::str::raw_str Consolidated test changes for pybind11::str changes and adding pybind11::bytes overload. Aug 10, 2020
@rwgk rwgk force-pushed the test_pybind11_str_raw_str branch from a5d1aaa to 4f28229 Compare August 11, 2020 00:14
@rwgk rwgk changed the title Consolidated test changes for pybind11::str changes and adding pybind11::bytes overload. Adding tests specifically to exercise pybind11::str::raw_str. Aug 11, 2020
@rwgk
Copy link
Collaborator Author

rwgk commented Aug 11, 2020

Thanks for the reviews!

I made one very minor change compared to the reviewed code, pulled up from downstream changes in my fork, so that the follow-up reviews for Wenzel are simpler.
I will merge this now.

Note in the meantime I refactored or decided to drop all the other PRs I pushed up, based on all the feedback I received. I will clean that up now with explanations (I have a final end state for reviews already in my fork). The main driving force is to prepare a PR changing pybind::str that is as easy as possible to review by Wenzel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants