Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #1807: 2.3.0 regression: <class 'bytes'> -> std::vector<uint8_t> #2198

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cdump
Copy link

@cdump cdump commented May 1, 2020

Fixes 2.2.4 -> 2.3.0+ regression #1807

Introduced by e76dff7

@@ -144,7 +144,7 @@ template <typename Type, typename Value> struct list_caster {
using value_conv = make_caster<Value>;

bool load(handle src, bool convert) {
if (!isinstance<sequence>(src) || isinstance<str>(src))
if (!isinstance<sequence>(src) || (!isinstance<bytes>(src) && isinstance<str>(src)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this only cures a symptom of a bigger problem, pointed out to me by @bstaletic :
77059f0
With that change, the code here can stay as-is.
@wjakob Did you attempt work based on @jagerman 's code, or do you know if anyone else tried?
@cdump While I think it's best to not change this line here, your test looks valuable to me. I'm in favor of merging it after the more fundamental issue is fixed.

rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Jun 18, 2020
…rent behavior.

Basis for discussing actual desired behavior.

Background:
* pybind@77059f0
* pybind#2198
@wjakob
Copy link
Member

wjakob commented Jun 30, 2020

I agree with @rwgk's comment above that picking up #1463 seems like the way to go (and tracking down the Python 2.7 issue mentioned in that PR).

rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Jul 21, 2020
…rent behavior.

Basis for discussing actual desired behavior.

Background:
* pybind@77059f0
* pybind#2198
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Jul 22, 2020
…rent behavior.

Basis for discussing actual desired behavior.

Background:
* pybind@77059f0
* pybind#2198
@rwgk
Copy link
Collaborator

rwgk commented Dec 1, 2020

In connection with precursors of #2409, we had discussions about the behavior change in this PR, with ~3 different ideas for what behavior is desired. My thinking:

  • The implicit conversion from bytes should only be supported for Python 3, but not Python 2.

Rationale: In Python 2, plain string literals are bytes (because str is bytes in Python 2). Converting plain string literals is more likely an accident, rather than intentional. In Python 3, plain string literals are unicode and will therefore not get converted. If the conversion is actually desired, it can easily be achieved by using iter(...).

I believe this thinking is in line with #1298, which added the || isinstance<str>(src) condition, referencing issue #1258 with title "pybind11/stl.h converts string to vector". Note, for Python 2, this PR as-is, will undo #1258.

@YannickJadoul
Copy link
Collaborator

I'm not sure I'm happy with introducing differences between Python 2 and Python 3, though. Especially now. As you say, in Python 2, str is bytes, and users were depending on converting bytes to vector<uint8_t> (cfr. this PR, as well as #1807). So what about keeping consistency: either we allow the conversion, and Python 2 is ... well, almost dead, but at least consistent. Or we provide a different kind of type (e.g., say that we tell users to accept py::bytes and then provide a conversion to vector<uint8_t>). Half, half is going to be funky, I think :-/

The thing is:

  1. Inconsistencies are going to be just as unexpected for users, and horrible to provide support for. At least if we keep consistency, we can say "py::bytes converts to vector<uint8_t>". (I think pybind11 is already pretty clear it focuses on Pytthon 3 concepts?)
  2. It's currently not possible to provide this bytes conversion to vector<uint8_t> interface with pybind11.
  3. pybind11/stl.h converts string to vector<string> #1258 complains about the order of overloads. The docs clearly specify order of overloads matter.

@YannickJadoul
Copy link
Collaborator

Oh, what about a different solution that would satisfy both?

What if we allow the conversion from bytes only to vector<uint8_t>/vector<char>/vector<unsigned char>? That should work, and avoid the weird vector<string> issue from #1258.

@rwgk
Copy link
Collaborator

rwgk commented Dec 4, 2020

Oh, what about a different solution that would satisfy both?

What if we allow the conversion from bytes only to vector<uint8_t>/vector<char>/vector<unsigned char>? That should work, and avoid the weird vector<string> issue from #1258.

I'm guessing implementing that will be significantly more complex than the currently very simple change, that complexity will still be there even after Python 2 is truly dead, and we will get complaints thatvector<int>, vector<long>, vector<unsigned> etc. don't work.

EDIT - WARNING: In the meantime I discovered that something is very fishy in the environment I'm testing in. I'm debugging. Please ignore the below for the minute.

In the interest of getting the story right for Python 3 as soon as possible, I'm OK to accept the potential pitfall for Python 2. But when I just tried this PR locally with Python 2, I'm getting failures of existing tests in test_stl.py and test_stl_binders.py, and the new test also doesn't work. To my surprise, even with Python 3 I'm getting failures of existing tests in test_stl_binders.py. (For compleness, I was testing with -std=c++17.)

@cdump, could you please rebase this PR on current master and push to your fork? That will trigger our fancy new CI system, which I expect to surface the same failures.

@rwgk
Copy link
Collaborator

rwgk commented Dec 4, 2020

It turns out just adding this one line in test_stl.cpp breaks all of our CI Windows jobs, even without actually calling that function from Python:

m.def("func_with_vector_uint8_t_arg", [](std::vector<uint8_t> v) { return v.size(); });

More precisely, if you look here: https://github.com/pybind/pybind11/pull/2711/commits:

  • The most recent commit 04859f2 there is plain master (all changes commented out). There are two CI failures but they are certainly flakes (I saw them before in other contexts).

  • The commit 97b3463 before has just that one line above on top of current master (all other changes commented out). There are many failures. It looks like every single Windows build is failing.

It seems pretty certain that we have a bug on master, unrelated to the change in this PR, except that the one new line in test_stl.py very interestingly triggers a crash in test_stl_binders.py (!).

Unfortunately that existing bug makes it difficult to work on this PR.

@rwgk
Copy link
Collaborator

rwgk commented Dec 4, 2020

After some quick trial-and-error, it looks like some sort of crosstalk between the use of std::vector<unsigned char> in test_stl.cpp and pybind11_tests.stl_binders.VectorUChar in test_stl_binders.cpp. A simple workaround to unblock work on this PR is to use std::vector<signed char> (or std::vector<int8_t>). — I'll stop working on this issue for now.

@cdump
Copy link
Author

cdump commented Dec 4, 2020

or may be even deep more to "original" bug which cause this.

I wonder if it has to do with this: https://pybind11.readthedocs.io/en/stable/advanced/cast/stl.html#making-opaque-types

Maybe there needs to be a PYBIND11_MAKE_OPAQUE for std::vector<uint8_t> in test_stl_binders.cpp? I don't see any in that cpp file, but my understanding of how it works may be incorrect.

@YannickJadoul
Copy link
Collaborator

Sorry about the delay; busy end of November!

I just tried this PR locally with Python 2, I'm getting failures of existing tests in test_stl.py and test_stl_binders.py, and the new test also doesn't work.

Oh right, I thought we were just talking about "what behavior do we want?", but I didn't realize tests were failing :-(

@YannickJadoul
Copy link
Collaborator

After some quick trial-and-error, it looks like some sort of crosstalk between the use of std::vector<unsigned char> in test_stl.cpp and pybind11_tests.stl_binders.VectorUChar in test_stl_binders.cpp. A simple workaround to unblock work on this PR is to use std::vector<signed char> (or std::vector<int8_t>). — I'll stop working on this issue for now.

I was looking into this, getting a "warning: type ‘struct type_caster’ violates the C++ One Definition Rule [-Wodr]" warning, and @bstaletic rightly suggested that we have conflicting caster instantiations. test_stl_binders.cpp works with std::vector<unsigned char> as opaque type, test_stl.cpp tries to use stl.h for std::vector<uint8_t>. C++ doesn't like definitions like that.

@rwgk
Copy link
Collaborator

rwgk commented Dec 11, 2020

getting a "warning: type ‘struct type_caster’ violates the C++ One Definition Rule [-Wodr]" warning

That's interesting, what exactly did you do to get that warning?
I added -Wodr to all clang++ commands (compilation, linking) but don't see that warning.

@YannickJadoul
Copy link
Collaborator

I added -Wodr to all clang++ commands (compilation, linking) but don't see that warning.

I really just (re)compiled it, almost ignored it, then noticed something fishy.

Making progress, btw, but it's kind of annoying that we can't check std::vector<uint8_t> both with stl.h as well as with stl_bind.h :-(

@rwgk
Copy link
Collaborator

rwgk commented Dec 11, 2020

I really just (re)compiled it, almost ignored it, then noticed something fishy.

Only if super easy, so I don't have to guess platform, compiler, exact command: could you email me the command and build log?

Making progress, btw, but it's kind of annoying that we can't check std::vector<uint8_t> both with stl.h as well as with stl_bind.h :-(

I was under the impression it's possible, but maybe/apparently only if the bindings live in different extension modules (.so)?
We're lumping most tests into the same extension module. (I always wondered why, because it goes a bit counter to the idea of testing units.)

@YannickJadoul
Copy link
Collaborator

Only if super easy, so I don't have to guess platform, compiler, exact command: could you email me the command and build log?

Sure, but I'm not sure it's going to be an obvious help for you. Let me try to reproduce :-)

I was under the impression it's possible, but maybe/apparently only if the bindings live in different extension modules (.so)?
We're lumping most tests into the same extension module. (I always wondered why, because it goes a bit counter to the idea of testing units.)

Yes, you're right. I meant in the current version of the tests in a single extension module. This doesn't feel important enough to start throwing around the tests' design for, but yes it's peculiar. I would assume it's maybe slightly less of a hassle and more efficient to create 1 file than to create a couple of dozen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants