-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[smart_holder] Add a new return value policy return_as_bytes
#3838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This isn't necessary though as we already have bindings for 'py::bytes |
// test return_value_policy::return_as_bytes | ||
m.def( | ||
"invalid_utf8_string_as_bytes", | ||
[]() { return std::string("\xba\xd0\xba\xd0"); }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[]() { return std::string("\xba\xd0\xba\xd0"); }, | |
[]() { return py::bytes(std::string("\xba\xd0\xba\xd0")); }, |
is all that was ever needed.
// test return_value_policy::return_as_bytes | ||
m.def( | ||
"invalid_utf8_string_array_as_bytes", | ||
[]() { return std::array<std::string, 1>{{"\xba\xd0\xba\xd0"}}; }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[]() { return std::array<std::string, 1>{{"\xba\xd0\xba\xd0"}}; }, | |
[]() { return std::array<py::bytes, 1>{{"\xba\xd0\xba\xd0"}}; }, |
would solve this use case, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better yet, you could return the py::array directly or use py::memoryview::from_buffer
If you are every unhappy with the behavior of casters, you can always just return the python objects themselves to the same effect. |
This is for the PyCLIF pybind11 code generator. You're right, for a simple
That is a transformation we'd have to make automatically and recursively. In contrast, the core of this PR is super simple:
It got us from ~97.5% success to ~98.5%, i.e. it was the top-most-important fix by far that we needed. (We're in the "long tail" phase of the project.) |
We really need this — essentially 6-line change. |
My biggest concern is that this is to other return_value_policy. IE, we may want to combine this flag with reference_internal etc at some point in the future. Also, this is only valid for a single underlying type (std::string caster), which seems wrong. I agree we probably need a better way to handle this, but hacking the caster like this doesn't seem right. Maybe we need a proxy wrapper that automatically triggers this modified caster behavior through a templated function? |
What we really need way is the ability to override the value_conv and key_conv behavior of these templated types. Abusing return_value_policy for this has terrible code smell. |
Hm ... could you explain more? This PR is taking a very simple path to achieve the desired behavior. Could it be even simpler? |
Taking a simple path is not always the correct path. If you have a clear meaning for return value policy (which is not to control the conversion of types), then abusing it to do what you want here can spell disaster down the road. It might make it impossible to refactor, for example, is hard to document and confusing to newcomers. I think it's worth investigating to see if there's a way around it without making it a return_value_policy. |
º> This PR is taking a very simple path to achieve the desired behavior. Could it be even simpler? []() { return std::array<py::bytes, 1>{{"\xba\xd0\xba\xd0"}}; }, and specify to have it return by reference, by value, or by copy. Since we return value policy is an Enum and not a flag, we cannot combine this return value policies easily with the other ones which are mutually exclusive to one another. The real issue here is that we have no way disambiguate casters except by doing the cast ourselves in the lambda. This is normally trivial, but becomes non-trivial for container or other variant types. We could of course fix that with another caster which changes the behavior of current caster.
This exposes three issues with our current :
We could just add a special wrapper that triggers an alternative version of the stl_casters, but I don't think that would solve your problem since presumably your issue is actually ABSL or other container types? The easiest solution is probably to call the caster directly in the lambda with some modified optional args. We could abstract the list_caster, array_caster, map_caster, and set_caster further to allow for templates which modify the key_conv and value_conv as well though. Another solution though would probably be to make this another extra arg that specified in the def() block that specializes the casters or allows the user to specify their own. @henryiii @wjakob I would love to hear your thoughts on how to best solve this use case / ambiguity. |
include/pybind11/cast.h
Outdated
handle s = decode_utfN(buffer, nbytes); | ||
handle s; | ||
if (policy == return_value_policy::return_as_bytes) { | ||
s = PYBIND11_BYTES_FROM_STRING_AND_SIZE(buffer, nbytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyBytes_FromStringAndSize
The old macro is unfinished Python 2 cleanup. For new code like this it's best to use the Python 3 C API directly.
include/pybind11/detail/common.h
Outdated
reference_internal | ||
reference_internal, | ||
|
||
/** Use this policy to make C++ functions return bytes to Python instead of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of word-smithing:
With this policy, C++ string types are converted to Python bytes
, instead of str
. This is most useful when a C++ function returns a container-like type with nested C++ string types, and py::bytes
cannot be applied easily. Note that this return_value_policy
is not concerned with lifetime/ownership semantics, like the other policies, but the purpose of return_as_bytes
is certain to be orthogonal, because C++ strings are always copied to Python bytes
or str
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ strings are always copied to Python bytes or str.
What about a dictionary that has byte strings as keys and references to a C++ Object as values? This still would not work even with the wordsmithing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed.
But see my longer comment from a couple minutes ago.
This is a very good analysis, thanks Aaron! For strategic reasons, we cannot afford to make this a huge general project at the moment, we have to approach this with a long-term view: we still have to prove that pybind11 actually works for Google, by successfully integrating it into PyCLIF, i.e. we have to get from 98.5% to 100%. Once we've made that hurdle, we can devote more time on bigger projects for pybind11 itself, like generalizing the return_value_policy concept.
Will anyone ever need something more general? Will that just be over-engineering? I don't know. With one enum and |
Maybe it's best for this to live in another branch and merge in once it's cleaned up and up to 100%? |
I would also be okay if it's hidden by some IFDEF flag. I just really don't want this becoming a part of the public API that we have to support later like the PYTHON2 compatability macros. |
I could easily maintain this in the smart_holder branch if you prefer. pybind11 will not work for PyCLIF without this, just like pybind11 won't work for PyCLIF without
My totally personal bet: nobody will ever have enough motivation (time/money) to generalize |
Actually @wangxf123456 for internal usages, does this return_value_policy works with dicts? Or does it only work with sequences and sets? |
include/pybind11/detail/common.h
Outdated
be applied easily. Note that this return_value_policy is not concerned | ||
with lifetime/ownership semantics, like the other policies, but the | ||
purpose of return_as_bytes is certain to be orthogonal, because C++ | ||
strings are always copied to Python bytes or str. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mark this as experimental and likely to change in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangxf123456 Please add that this is for sequences only too: #3838 (comment) here too
This should only work with sequences and sets. For example, return types like |
@wangxf123456 @rwgk I also found some C++ template magic to rebind the type of (STL) container types: https://stackoverflow.com/a/32214640/2444240 We might be able to use a trick like this to recursively change the Return type parameter to disambiguate how we should return std::string. The TLDR of this is we need to have a variant of std::string caster that prefers to output bytes and we need a way to signal that should be called. One idea is to change the behavior of out_cast with a special tag like we for is_operator. This is actually pretty easy, but would involve setting a static variable that I would like to avoid. We need some way to pass that info into the type caster or query that extra arg from inside the std::string caster. There also may be a way to abuse the polymorphic_type_hook to do this. I am by no means an expert on C++ templating idioms, but I feel like there has to be a better way to do this than abusing the return type. |
I'd much rather @Skylion007's suggestion be attempted. This is adding a misusage of the return policies that doesn't even cover all cases like dict's, or complex types. It does not combine with other return types (since they are disjoint conceptual features). If we add it, it will be impossible to pull out (just see our "private" compatibility macros!). Just because it's easy don't make it right. If it is really, really needed and attempts to do it properly have failed, then the name should start with an underscore, and probably have a warning in the code that it is not guarantied to be kept in the future. |
FYI: I'm systematically combing through include/pybind11 to see how the one new enum + if could disturb things, or set up traps. I'll report here when I'm done. Could take a few more days. (Our focus is on larger scale issues that we need to sort out to get to 100% success rate for PyCLIF + pybind11.) |
I’m not worried about the current code. I’m worried about breaking the mental and programmatic model of return value policies and casters. |
That time might be better spent trying @Skylion007’s ideas above. |
What c++ standard are you targeting in PyCLIF? |
Data first. Also priorities. I don't want the big project die the death of a thousand cuts, getting sidetracked with too many side issues.
C++17 required (already). But I want to keep the smart_holder branch compatible with master. (I spent many hours keeping it compatible with all the old compilers.) |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…cy is not available on master.
… and eigen.h Based on systematic review under pybind#3838 (comment)
return_as_bytes
return_as_bytes
…ely to pre-empt repeat trips through the CI).
Thanks, I added the underscore. I'll merge this now on the smart_holder branch, to keep the PyCLIF-pybind11 integration work on track. If @wjakob supports having this on master, I'll back-port. |
pybind#3838)" This reverts commit 7064d43. Conflicts resolved in: include/pybind11/eigen.h tests/test_builtin_casters.cpp
…major and/or influential contributors to smart_holder branch * pybind#2904 by @rhaschke was merged on Mar 16, 2021 * pybind#3012 by @rhaschke was merged on May 28, 2021 * pybind#3039 by @jakobandersen was merged on Jun 29, 2021 * pybind#3048 by @Skylion007 was merged on Jun 18, 2021 * pybind#3588 by @virtuald was merged on Jan 3, 2022 * pybind#3633 by @wangxf123456 was merged on Jan 25, 2022 * pybind#3635 by @virtuald was merged on Jan 26, 2022 * pybind#3645 by @wangxf123456 was merged on Jan 25, 2022 * pybind#3796 by @wangxf123456 was merged on Mar 10, 2022 * pybind#3807 by @wangxf123456 was merged on Mar 18, 2022 * pybind#3838 by @wangxf123456 was merged on Apr 15, 2022 * pybind#3929 by @tomba was merged on May 7, 2022 * pybind#4031 by @wangxf123456 was merged on Jun 27, 2022 * pybind#4343 by @wangxf123456 was merged on Nov 18, 2022 * pybind#4381 by @wangxf123456 was merged on Dec 5, 2022 * pybind#4539 by @wangxf123456 was merged on Feb 28, 2023 * pybind#4609 by @wangxf123456 was merged on Apr 6, 2023 * pybind#4775 by @wangxf123456 was merged on Aug 3, 2023 * pybind#4921 by @iwanders was merged on Nov 7, 2023 * pybind#4924 by @iwanders was merged on Nov 6, 2023 * pybind#5401 by @msimacek was merged on Oct 8, 2024 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Dustin Spicuzza <dustin@virtualroadside.com> Co-authored-by: Ivor Wanders <iwanders@users.noreply.github.com> Co-authored-by: Jakob Lykke Andersen <Jakob@caput.dk> Co-authored-by: Michael Šimáček <michael.simacek@oracle.com> Co-authored-by: Robert Haschke <rhaschke@users.noreply.github.com> Co-authored-by: Tomi Valkeinen <tomi.valkeinen@iki.fi> Co-authored-by: Xiaofei Wang <6218006+wangxf123456@users.noreply.github.com>
* Pure `git merge --squash smart_holder` (no manual interventions). * Remove ubench/ directory. * Remove include/pybind11/smart_holder.h * [ci skip] smart_ptrs.rst updates [WIP/unfinished] * [ci skip] smart_ptrs.rst updates continued; also updating classes.rst, advanced/classes.rst * Remove README_smart_holder.rst * Restore original README.rst from master * [ci skip] Minimal change to README.rst, to leave a hint that this is pybind11v3 * [ci skip] Work in ChatGPT suggestions. * Change macro name to PYBIND11_RUN_TESTING_WITH_SMART_HOLDER_AS_DEFAULT_BUT_NEVER_USE_IN_PRODUCTION_PLEASE * Add a note pointing to the holder reinterpret_cast. * Incorporate suggestion by @virtuald: #5542 (comment) * Systematically change most py::class_ to py::classh under docs/ * Remove references to README_smart_holder.rst This should have been part of commit eb550d0. * [ci skip] Fix minor oversight (``class_`` -> ``py::class_``) noticed by chance. * [ci skip] Resolve suggestion by @virtuald #5542 (comment) * [ci skip] Apply suggestions by @timohl (thanks!) * #5542 (comment) * #5542 (comment) * #5542 (comment) * Replace `classh : class_` inhertance with `using`, as suggested by @henryiii #5542 (comment) * Revert "Systematically change most py::class_ to py::classh under docs/" This reverts commit ac9d31e. * docs: focus on py::smart_holder instead of py::classh Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com> * Restore minor general fixes that got lost when ac9d31e was reverted. * Remove `- smart_holder` from list of branches in all .github/workflows * Extend classh note to explain whitespace noise motivation. * Suggest `py::smart_holder` for "most situations for safety" * Add back PYBIND11_HAS_INTERNALS_WITH_SMART_HOLDER_SUPPORT This define was * introduced with #5286 * removed with #5531 It is has been in use here: * https://github.com/pybind/pybind11_protobuf/blob/f02a2b7653bc50eb5119d125842a3870db95d251/pybind11_protobuf/native_proto_caster.h#L89-L101 Currently pybind11 unit tests for the two holder caster backwards compatibility traits * `copyable_holder_caster_shared_ptr_with_smart_holder_support_enabled` * `move_only_holder_caster_unique_ptr_with_smart_holder_support_enabled` are missing. * Add py::trampoline_self_life_support to all trampoline examples under docs/. Address suggestion by @timohl: * #5542 (comment) Add to the "please think twice" note: the overhead for safety is likely in the noise. Also fix a two-fold inconsistency introduced by revert-commit 1e646c9: 1. py::trampoline_self_life_support is mentioned in a note, but is missing in the example right before. 2. The section starting with To enable safely passing a ``std::unique_ptr`` to a trampoline object between is obsolete. * Fix whitespace accident (indentation) introduced with 1e646c9 Apparently the mis-indentation was introduced when resolving merge conflicts for what became 1e646c9 * WHITESPACE CHANGES ONLY in README.rst (list of people that made significant contributions) * Add Ethan Steinberg to list of people that made significant contributions (for completeness, unrelated to smart_holder work). * [ci skip] Add to list of people that made significant contributions: major and/or influential contributors to smart_holder branch * #2904 by @rhaschke was merged on Mar 16, 2021 * #3012 by @rhaschke was merged on May 28, 2021 * #3039 by @jakobandersen was merged on Jun 29, 2021 * #3048 by @Skylion007 was merged on Jun 18, 2021 * #3588 by @virtuald was merged on Jan 3, 2022 * #3633 by @wangxf123456 was merged on Jan 25, 2022 * #3635 by @virtuald was merged on Jan 26, 2022 * #3645 by @wangxf123456 was merged on Jan 25, 2022 * #3796 by @wangxf123456 was merged on Mar 10, 2022 * #3807 by @wangxf123456 was merged on Mar 18, 2022 * #3838 by @wangxf123456 was merged on Apr 15, 2022 * #3929 by @tomba was merged on May 7, 2022 * #4031 by @wangxf123456 was merged on Jun 27, 2022 * #4343 by @wangxf123456 was merged on Nov 18, 2022 * #4381 by @wangxf123456 was merged on Dec 5, 2022 * #4539 by @wangxf123456 was merged on Feb 28, 2023 * #4609 by @wangxf123456 was merged on Apr 6, 2023 * #4775 by @wangxf123456 was merged on Aug 3, 2023 * #4921 by @iwanders was merged on Nov 7, 2023 * #4924 by @iwanders was merged on Nov 6, 2023 * #5401 by @msimacek was merged on Oct 8, 2024 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Dustin Spicuzza <dustin@virtualroadside.com> Co-authored-by: Ivor Wanders <iwanders@users.noreply.github.com> Co-authored-by: Jakob Lykke Andersen <Jakob@caput.dk> Co-authored-by: Michael Šimáček <michael.simacek@oracle.com> Co-authored-by: Robert Haschke <rhaschke@users.noreply.github.com> Co-authored-by: Tomi Valkeinen <tomi.valkeinen@iki.fi> Co-authored-by: Xiaofei Wang <6218006+wangxf123456@users.noreply.github.com> --------- Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com> Co-authored-by: Henry Schreiner <henryschreineriii@gmail.com> Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Dustin Spicuzza <dustin@virtualroadside.com> Co-authored-by: Ivor Wanders <iwanders@users.noreply.github.com> Co-authored-by: Jakob Lykke Andersen <Jakob@caput.dk> Co-authored-by: Michael Šimáček <michael.simacek@oracle.com> Co-authored-by: Robert Haschke <rhaschke@users.noreply.github.com> Co-authored-by: Tomi Valkeinen <tomi.valkeinen@iki.fi> Co-authored-by: Xiaofei Wang <6218006+wangxf123456@users.noreply.github.com>
Description
Add a new return value policy
return_as_bytes
to make C++ functions returnbytes
to Python instead ofstr
.We can convert return values to
bytes
by applyingpy::bytes
, but this might be hard when dealing with nested types.