Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail on passing py::object with wrong Python type to py::object subclass using PYBIND11_OBJECT macro #2349

Conversation

YannickJadoul
Copy link
Collaborator

@YannickJadoul YannickJadoul commented Jul 31, 2020

Throwing this out there and seeing how it reacts to existing code, and opening this up for discussion.

As discussed in #2340, this would throw an error on e.g., py::bytes(o) with o any py::object that isn't a bytes or subclass instance.

Notes:

  • I'm not happy about the error message myself. I think it throwing a type_error is OK, but I'm hoping someone has a better formulations of the error message.

include/pybind11/pytypes.h Outdated Show resolved Hide resolved
include/pybind11/pytypes.h Outdated Show resolved Hide resolved
@YannickJadoul YannickJadoul force-pushed the throw-on-PYBIND11_OBJECT-check_-failure branch from 4d9ae63 to 2e66b40 Compare August 3, 2020 16:22
@YannickJadoul YannickJadoul force-pushed the throw-on-PYBIND11_OBJECT-check_-failure branch 3 times, most recently from 8db37ff to 837710f Compare August 3, 2020 17:42
@YannickJadoul
Copy link
Collaborator Author

Python 2 tests fail, but #2360 needs to be merged (or we need to disable the Python 2 tests for py::ellipsis, if #2360 doesn't get merged)

@rwgk rwgk force-pushed the str-bytes-cleanup branch from 3c8dbc2 to 65130b7 Compare August 4, 2020 19:14
@YannickJadoul YannickJadoul force-pushed the throw-on-PYBIND11_OBJECT-check_-failure branch from 837710f to 47af504 Compare August 4, 2020 19:20
@@ -796,11 +796,16 @@ PYBIND11_NAMESPACE_END(detail)
: Parent(check_(o) ? o.release().ptr() : ConvertFun(o.ptr()), stolen_t{}) \
{ if (!m_ptr) throw error_already_set(); }

#define PYBIND11_OBJECT_CHECK_FAILED(Name, o) \
type_error("Object of type '" + std::string(Py_TYPE(o.ptr())->tp_name) + "' is not an instance of '" #Name "'")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this works correctly for nested classes. Questions:

  • Is there a higher-level function to get the fully-qualified class name?
  • Could we exercise with a native nested class, and a nested pybind11 extension class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it looks horrible, but Py_TYPE(o.ptr())->tp_name is used in other places as well.
About nested types: good point. I should try. Probably something with __qualname__.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it looks horrible, but Py_TYPE(o.ptr())->tp_name is used in other places as well.
About nested types: good point. I should try. Probably something with __qualname__.

This thought randomly crossed my mind:
The "proper" way of doing this is:

  • Implement a helper function that returns the fully-qualified type name (module + qualname + tp_name) as a string.
  • Unit test that separately.
  • Hook it in here, the existing simple test is fine as-is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just need to find time to check how the full name is normally retrieved :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, a quick lookup confirms that tp_name is the way to go?

Pointer to a NUL-terminated string containing the name of the type. For types that are accessible as module globals, the string should be the full module name, followed by a dot, followed by the type name; for built-in types, it should be just the type name. If the module is a submodule of a package, the full package name is part of the full module name. For example, a type named T defined in module M in subpackage Q in package P should have the tp_name initializer "P.Q.M.T".

https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_name

Let me confirm with a few tests ;-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix needed for PyPy, though, it seems. It looks like PyPy has tp_name the same as __name__. Need to check and confirm if I got that correct with half-sleepy mind.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merged the tp_name PR, should this be rebased?

tests/test_pytypes.py Outdated Show resolved Hide resolved
@rwgk rwgk force-pushed the str-bytes-cleanup branch 2 times, most recently from dcc41ef to 0e4e64a Compare August 13, 2020 00:00
@YannickJadoul YannickJadoul force-pushed the throw-on-PYBIND11_OBJECT-check_-failure branch from 47af504 to 41f2e39 Compare August 13, 2020 23:10
@rwgk rwgk force-pushed the str-bytes-cleanup branch 2 times, most recently from f38fafc to 6cd4552 Compare August 14, 2020 22:03
@YannickJadoul YannickJadoul force-pushed the throw-on-PYBIND11_OBJECT-check_-failure branch from 41f2e39 to 6c0f03b Compare August 15, 2020 21:33
@rwgk rwgk force-pushed the str-bytes-cleanup branch from 1d553e4 to 0b49467 Compare August 16, 2020 17:24
@YannickJadoul YannickJadoul force-pushed the throw-on-PYBIND11_OBJECT-check_-failure branch from 6c0f03b to f32aa52 Compare August 16, 2020 23:29
@rwgk rwgk force-pushed the str-bytes-cleanup branch from 7ae6bb2 to 4fc3b1b Compare August 17, 2020 17:55
@YannickJadoul YannickJadoul force-pushed the throw-on-PYBIND11_OBJECT-check_-failure branch from 16a9fac to e43114f Compare September 21, 2020 00:35
@YannickJadoul YannickJadoul changed the base branch from str-bytes-cleanup to master September 21, 2020 00:35
@YannickJadoul YannickJadoul force-pushed the throw-on-PYBIND11_OBJECT-check_-failure branch from e43114f to f36af9e Compare September 21, 2020 00:37
@YannickJadoul YannickJadoul added this to the v2.6.0 milestone Sep 22, 2020
@YannickJadoul
Copy link
Collaborator Author

Submitted an issue to PyPy: https://foss.heptapod.net/pypy/pypy/-/issues/3318

I guess I'll just try to see how we can best sweep this thing under the rug?

@YannickJadoul YannickJadoul force-pushed the throw-on-PYBIND11_OBJECT-check_-failure branch from 4e8ef13 to d8664ad Compare October 4, 2020 22:03
@YannickJadoul
Copy link
Collaborator Author

This (i.e., f2424c5) should do it, I think. Is this too heavy?

The other option (while waiting for PyPy for a response and/or solution) would be to change the tests and have them accept not just ellipsis but also __builtin__.ellipsis or builtins.ellipsis.

@rwgk
Copy link
Collaborator

rwgk commented Oct 5, 2020

Is this too heavy?

Don't think so. Looks good to me.

@henryiii henryiii assigned wjakob and unassigned YannickJadoul Oct 5, 2020
Copy link
Member

@wjakob wjakob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks fine to me. Regarding:

This (i.e., f2424c5) should do it, I think. Is this too heavy?

Your change is very minor, and we have some very heavy workarounds for a few PyPy-specific problems.

include/pybind11/detail/class.h Outdated Show resolved Hide resolved
@YannickJadoul
Copy link
Collaborator Author

Green! Thanks, all, for the thorough reviews!

@nyibbang
Copy link

nyibbang commented Jan 11, 2022

This is a breaking change. It breaks some of my code, such as:

  const py::bytes uid = py::getattr(obj, uidAttrName, py::none());
  if (!uid.is_none())
    return deserialize(static_cast<std::string>(uid));

If the attribute was not found before, it would work fine. Now it raises an exception saying that Object of type 'NoneType' is not an instance of 'bytes'.

I know that the fix is easy to implement, but I was hoping that upgrading from pybind 2.5.0 to 2.9.0 would not break my code, according to SemVer.
I believe this patch should have been merged in a major upgrade of the library version.

@henryiii
Copy link
Collaborator

henryiii commented Jan 11, 2022

SemVer is not a solution. You can't get a consensus as to what a breaking change is. Every change could break someone's code, and so the only perfect SemVer is to make every single release a major version, which makes it useless. Anything else is a practical approximation.

In this case, you were relying on undefined behavior - we didn't support converting to an object subclass if the types were invalid before either, it just failed to throw an error - it just happened to work for you since the only thing you used was is_none(). But that was not tested or defined behavior. The "fix" will also work just fine in 2.5, so I consider this a strict improvement.

Now, one could argue that "None" is a special case, especially due to the existence of is_none on all types. However, this would likely complicate code, because it would force you to check for None more often, unless there is some valid way to get None for py::bytes without a conversion like this. If there is some other way to get a None currently valid, then this is possibly worth a special case.

I believe this is the fix and is valid on both 2.5 and 2.9:

  const py::object uid = py::getattr(obj, uidAttrName, py::none());
  if (!uid.is_none())
    return deserialize(static_cast<std::string>(uid));

@nyibbang
Copy link

SemVer is not a solution. You can't get a consensus as to what a breaking change is. Every change could break someone's code, and so the only perfect SemVer is to make every single release a major version, which makes it useless. Anything else is a practical approximation.

This is true, so then the decision to release a major version is decided by the risk that a change might break client code.
As stated on the SemVer website:

Use your best judgment. If you have a huge audience that will be drastically impacted by changing the behavior back to what the public API intended, then it may be best to perform a major version release, even though the fix could strictly be considered a patch release.

I could argue that this change had the potential to affect a lot of client code, especially since it concerns implicit conversions. Although I guess I'm wrong since it seems I'm the only one that got bothered by it.

In this case, you were relying on undefined behavior - we didn't support converting to an object subclass if the types were invalid before either, it just failed to throw an error - it just happened to work for you since the only thing you used was is_none(). But that was not tested or defined behavior. The "fix" will also work just fine in 2.5, so I consider this a strict improvement.

The documentation was scarce in 2.5.0 about py::object inherited types constructors, and I had to read the code to understand what they meant. I especially got confused about the conversion from py::none to py::str. It has since been improved.

I didn't realize that my code was undefined behavior when I wrote it.

I believe this is the fix and is valid on both 2.5 and 2.9:

  const py::object uid = py::getattr(obj, uidAttrName, py::none());
  if (!uid.is_none())
    return deserialize(static_cast<std::string>(uid));

That's pretty much what I've done. It's actually more sensible to write it that way compared to before.

@henryiii
Copy link
Collaborator

henryiii commented Jan 11, 2022

SemVer is best considered "an abbreviated changelog". A patch release has only fixes - but could break your code. A minor release has features, but mostly should be compatible with past releases - but could break your code. A major release should have large new things a user should review, and is more likely, but not guarantied, to break some older code. For a stable library, in fact, unless you are a heavy user, major versions are not likely to break you - any more likely than patch/minor releases, actually.

And it differs based on the language - in JavaScript, you have nested dependencies, so having frequent major releases is encouraged - one library can use version 2 and another can use version 3 and both work together (though there turn out to be a lot of caveats, and I'm not sure it's really "better"). You can read more about this (and my opinions on capping) here: https://iscinumpy.dev/post/bound-version-constraints/

In Python, major releases are highly discouraged, and instead deprecation periods are encouraged. If possible, warnings should be produced for 1-3 releases, and then finally removed. You should upgrade frequently in order to benefit from this. Python itself follows this (Guido recently promised at least version 3.33) - things will slowly be removed and changed, but no more major versions, to avoid a transition like 2 to 3. This is a social construct of Python packaging.

Anyway, we've done quite a bit in pybind11 to improve the checking for undefined behavior. This had reduced bugs in client code. We do try to avoid breaking changes if possible - we test quite a bit of downstream code, and changes are required that are strictly fixes, we allow that, but if other types of changes are required, we try to avoid that. Major versions will likely be based on incrementing the ABI, not on the public API, though we will likely remove a few deprecated things on major version bumps. (PS: This is probably true for Python as well - ABI3 cannot be changed without a bump to Python 4, so this will likely be a/the driving factor).

nyibbang pushed a commit to aldebaran/libqi-python that referenced this pull request May 6, 2022
The public constructor of `pybind11::module` is now deprecated. This
patch replaces its only use by a call to `PYBIND11_EMBEDDED_MODULE`
followed by a `pybind11::import(...)`.

The constructor of a `pybind11::object` subclass from a
`pybind11::object` from the wrong type now throws an exception. See
pybind/pybind11#2349.

We had one case of this happening where we tried constructing a `bytes`
from a `None`. We now first construct an `object`, then test if it's
`None` before trying to convert it to `bytes`.

Change-Id: I4aa2a1b4a49d3024f203bdd5c6f9c4ee86bfc9c4
Reviewed-on: http://gerrit2.aldebaran.lan/1760
Reviewed-by: philippe.martin <philippe.martin@softbankrobotics.com>
Reviewed-by: jmonnon <jmonnon@aldebaran.com>
Tested-by: vincent.palancher <vincent.palancher@softbankrobotics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants