-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding test_isinstance_string_types, with asserts simply matching cur… #2256
Conversation
e20b54d
to
c78cc6a
Compare
@wjakob @YannickJadoul Request for Review This PR has two commits:
Commit 2. has a side-effect, that could be seen as desirable, or undesirable: it disables implicit Note that Commit 2. also fixes #2198 (which is what got me started with this PR). About naming: the meaning of the term For Python 2:
For Python 3:
For pybind11, irrespective of the Python version:
The essential part of Commit 2. is a one-line insertion copied from @jagerman : 77059f0 The desired effect is that it makes But three other tests are affected.
Is the behavior change good or bad? My opinion: Good. "Explicit is better than implicit." for this case. It's easy to add
Detail: if accepted, I'd want to merge the commits separately, for the same reasons that I have them separately in this PR. |
Are those affected on python3 as well as python2?
Can you provide a link to
What is "native str" in this context? Do you mean |
https://pypi.org/project/six/
The new
Note: |
Regarding "native str", is the proposed behaviour of |
Good question again: I figured that was the original intention, to prevent accidental conversion of "regular" str literals to a list of integers. Thinking: with Python 3 it makes sense to allow |
include/pybind11/stl.h
Outdated
@@ -144,7 +144,7 @@ template <typename Type, typename Value> struct list_caster { | |||
using value_conv = make_caster<Value>; | |||
|
|||
bool load(handle src, bool convert) { | |||
if (!isinstance<sequence>(src) || isinstance<str>(src)) | |||
if (!isinstance<sequence>(src) || isinstance<str>(src) || isinstance<native_str>(src)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant check on Python 3.x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
I did it this way mostly because I thought it's the clearest way of putting the condition, for the purpose of this code review. ("optimize later")
To also answer @bstaletic question here:
I still don't understand. Why does it make sense to allow
bytes
on python3, but not on python2? I would understand, were pybind11 dealing inpy::native_bytes
andpy::native_str
everywhere. It isn't, so havingpy::native_str
in this one place seems very surprising.
TL;DR: What would be the behavior that you prefer to see here?
With Python 2 the interpreter (!) will tell you bytes is str
is True
, therefore you cannot enable conversion of bytes
to list
without also enabling conversion of [Python 2 native str
] to list
. That's a feature you can only have with Python 3. — Another way to put it: do you NOT want to use the feature in Python 3 because you cannot have it in Python 2?
We can do anything else we want here, what you see is what I assumed was the original intent.
(At some point long time ago, when there was only Python 2, I had a similar problem and disabled str
to list
conversions to prevent confusing accidents.)
Once we are certain what we want, we could optimize, e.g. by putting the #ifdef here and not even having the native_str
type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: What would be the behavior that you prefer to see here?
Either:
- Ban everything that's string-like in the
list_caster
- Or allow
bytes
in both py2 and py3
With Python 2 the interpreter (!) will tell you
bytes is str
isTrue
, therefore you cannot enable conversion ofbytes
tolist
without also enabling conversion of [Python 2 nativestr
] tolist
.
Why is py2 str
(or py2 bytes
, but those are the exact same thing) to list
problematic if py3 bytes
to list
isn't?
Another way to put it: do you NOT want to use the feature in Python 3 because you cannot have it in Python 2?
Rarely, but if it helps me reason about what is going on in the code, I see nothing wrong with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ban everything that's string-like in the list_caster
That's current behavior. Two consequences:
- Effectively you do not want to use a feature that's feasible with Python 3 because it's infeasible in Python 2.
- Effectively a rejection of Fixes #1807: 2.3.0 regression: <class 'bytes'> -> std::vector<uint8_t> #2198.
Or allow bytes in both py2 and py3
That breaks the existing test in test_stl.py: assert m.func_with_string_or_vector_string_arg_overload('A') == 3
We'd have to change that to u'A', list('A')
, or iter('A')
to pass again.
The behavior currently implemented in this PR means:
- If you care about Python 3 (and the future) only: you're fine!
- If you still care about Python 2 even past EOL, you'll have to use explicit
list(s)
oriter(s)
(which is also Python 3 compatible).
I feel that's the best compromise, but I'll go with the votes of the team.
@YannickJadoul and Wenzel, what are your votes on this specific list_caster
question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That breaks the existing test in test_stl.py: assert
m.func_with_string_or_vector_string_arg_overload('A') == 3
Is this an actual concern, considering that this pull request already contains a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That breaks the existing test in test_stl.py: assert
m.func_with_string_or_vector_string_arg_overload('A') == 3
Is this an actual concern, considering that this pull request already contains a breaking change.
I'd say so, I think it's worth getting each of the 3 questions settled in the best way possible.
The only fully intentional change is for isinstance<str>
. We could preserve the behaviors for the other 2, although I've not looked into what it would take to preserve the implicit bytes
to actual unicode
conversions. I'm waiting for Yannick's and Wenzel's takes before investing time there.
include/pybind11/pytypes.h
Outdated
@@ -950,6 +950,8 @@ inline namespace literals { | |||
inline str operator"" _s(const char *s, size_t size) { return {s, size}; } | |||
} | |||
|
|||
template <> inline bool isinstance<str>(handle obj) { return PyUnicode_Check(obj.ptr()); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will change the behavior of pybind11, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I wrote the long Request for Review comment for exactly this reason. I'm pointing this out in at the beginning ("desirable/undesirable") and at the end ("good or bad?"), hoping to provide a start for collectively reasoning out if, how, and when we want to change the behaviors.
There are actually "3 and a half" behaviors that are (unfortunately) convoluted:
- isinstance (reminder:
str
is alwaysactual unicode
in pybind11), keep or match native? - implicit decode when passing
bytes
topybind11::str
, keep or disable? - what do we want to do about
bytes
tolist
conversions in A. Python 2, B. Python 3?
What would be your preferred answers for these 3 cases independently?
I guess we can always engineer a way to achieve any combination of preferred answers, even making the changes independently in multiple pybind11 releases.
I'm generally okay with these changes, but compatibility is going to be problem. Disabling an implicit bytes -> unicode conversion will break packages that depend on this behavior. Given that we have chosen to adopt SemVer, it seems to me that this kind of change will necessarily fall in the category "postpone to next major version bump". |
I still don't understand. Why does it make sense to allow |
Answer under #2256 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this PR makes sense from the Python 3 perspective. I had no idea bytes
would convert to py::str
. And py::isinstance<py:str>
of a bytes
object seems plain wrong?
Then again, why does the current implementation not do that?
Before continuing, let's put this straight: it's 2020 and Python 3 fixed string types. So I think Python 3 should be the standard and what guides development, while keeping Python 2 as sane as possible. (I think that's also what pybind11 has already been doing, actually; py::str
being "actual unicode" in Ralf's terminology and py::bytes
being "actual bytes" match Python 3's names).
If anything, we should maybe clarify this by adding something to the docs. Some kind of "String types" section here: https://pybind11.readthedocs.io/en/stable/advanced/pycpp/object.html?
Another option would be to still allow for Python 2 str
(so "actual bytes") to be converted to py::str
, but ónly on Python 2? But then not in isinstance<py::str>
, but with convert=True
in the string caster.
Regarding "native str", is the proposed behaviour of
list_caster
correct? You're banningbytes
andunicode
on python2, but onlyunicode
on python3.Good question again: I figured that was the original intention, to prevent accidental conversion of "regular" str literals to a list of integers. Thinking: with Python 3 it makes sense to allow
bytes
, with Python 2 not so much.
I think I agree here. str
is a very ambiguous type in Python 2 (being both a bytes-container as well as the default string type), so you want to be careful with converting that to a vector of bytes. On the other hand, Python 3 fixed this, doesn't confuse strings with plain bytes anymore, so we can allow Python 3's bytes
to be nicely converted into a vector
.
There's something to be said for consistency as well, but ... again, maybe this should just be clarified in some docs.
include/pybind11/pytypes.h
Outdated
#if PY_MAJOR_VERSION < 3 | ||
using native_str = bytes; | ||
#else | ||
using native_str = str; | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have PYBIND11_STR_TYPE
for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, very good, thanks for pointing that out! I would never have guessed from the name what it does, but now I know :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you/we want to provide this as API to pybind11 users, we should provide a typedef, though, rather than have them use this macro.
If so, I'm just not entirely convinced about native_str
as a name (though I don't now, immediately have a better suggestion).
Let me answer my own question: pybind11/include/pybind11/pytypes.h Lines 888 to 890 in 7e0a4fb
So why not change |
Jumping to my most important question first: what are my options for re-enabling the implicit conversion from Motivation for asking: I want to backtrack from making 3 behavior changes to only one. Concretely:
Secondary observations & questions: I tried @YannickJadoul 's suggestion (as I understood it), by removing the
to
Python 3: All tests pass, with the rest of this PR as-is. (To clarify: NO implicit bytes-str conversion, as before.) Python 2: Only one test failed: While trying ideas for fixes I stumbled over another problem, for which I created a mock PR to easily show: #2340 |
Wouldn't this require a major version bump either way, because the first bullet point is a breaking change? I agree that it is the wrong behaviour, but I remember @jagerman talking about compatibility concerns regarding the first bullet point.
I might be repeating myself, but one more thing. Pybind11 is currently consistent when talking about string-like types. |
That's some sort of consistent yes, but also highly misleading and outright useless. It's consistent only in some narrowly defined scope. If you look at the bigger picture: one major achievement of the very expensive Python 2 -> 3 transition is a clear delineation between If anyone out there currently relies on |
I definitely agree with this. In fact, the above is what my own code is already doing. I was just pointing out that if we want to fix this, we will have to make a breaking change no matter what. |
Cool, thanks! |
Great. Just shout when it's time to discuss consequences and dig up/judge old decisions ;-) |
60cba1c
to
3c8dbc2
Compare
…bytes type to test_constructors(). (pybind#2340)
…th that pybind11::str is guaranteed to be a PyUnicodeObject.
…load in pyobject_caster
…tes`). The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read. Note: Unlike PR pybind#2256 (dropped), this PR only changes exactly one behavior. The two other behavior changes discussed under PR pybind#2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.
…tes`). The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read. Note: Unlike PR pybind#2256 (dropped), this PR only changes exactly one behavior. The two other behavior changes discussed under PR pybind#2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.
…tes`). The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read. Note: Unlike PR pybind#2256 (dropped), this PR only changes exactly one behavior. The two other behavior changes discussed under PR pybind#2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.
…tes`). The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read. Note: Unlike PR pybind#2256 (dropped), this PR only changes exactly one behavior. The two other behavior changes discussed under PR pybind#2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.
…tes`). The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read. Note: Unlike PR pybind#2256 (dropped), this PR only changes exactly one behavior. The two other behavior changes discussed under PR pybind#2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.
…tes`). The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read. Note: Unlike PR pybind#2256 (dropped), this PR only changes exactly one behavior. The two other behavior changes discussed under PR pybind#2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.
…tes`). The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read. Note: Unlike PR #2256 (dropped), this PR only changes exactly one behavior. The two other behavior changes discussed under PR #2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.
…tes`). The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read. Note: Unlike PR #2256 (dropped), this PR only changes exactly one behavior. The two other behavior changes discussed under PR #2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.
…tes`). The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read. Note: Unlike PR pybind#2256 (dropped), this PR only changes exactly one behavior. The two other behavior changes discussed under PR pybind#2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.
…tes`). The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read. Note: Unlike PR #2256 (dropped), this PR only changes exactly one behavior. The two other behavior changes discussed under PR #2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.
…rent behavior.
Basis for discussing actual desired behavior.
Background: