Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stash pybind11 data structures in interpreter state dictionary #4293

Closed
wants to merge 10 commits into from

Conversation

wjakob
Copy link
Member

@wjakob wjakob commented Oct 28, 2022

Following the discussion in wjakob/nanobind#96, pybind11 should similarly stop stashing its internal data structures in the global builtins dictionary.

It would be nice to include this patch with an ABI bump for an upcoming release.

Suggested changelog entry:

pybind11 used to store an internal ``capsule`` in the Python ``builtins``
dictionary. This enabled complex binding projects to be split up across
multiple extension modules that used the capsule to exchange information about
types for interoperability. As was pointed out by Python core developers, using
the ``builtins`` dictionary for this purpose is not acceptable. This release of
pybind11 therefore moves the capsule into a the Python interpreter state
dictionary that was designed specifically for such usage.

@wjakob
Copy link
Member Author

wjakob commented Oct 28, 2022

Looks like support for older Python versions is problematic here. The CPython API used in nanobind starts being available in 3.8..

@henryiii
Copy link
Collaborator

I'd be happy to drop 3.6 the second it's a problem, but 3.7 is not EoL yet. But could we just enable it only on 3.8+? ABI compatibility between Python versions doesn't matter.

Pretty sure we plan to do an ABI bump, maybe as soon as the next minor (major?) release.

include/pybind11/detail/internals.h Outdated Show resolved Hide resolved
include/pybind11/detail/internals.h Outdated Show resolved Hide resolved
include/pybind11/detail/internals.h Outdated Show resolved Hide resolved
include/pybind11/detail/internals.h Outdated Show resolved Hide resolved
include/pybind11/detail/internals.h Outdated Show resolved Hide resolved
@rwgk
Copy link
Collaborator

rwgk commented Oct 31, 2022

After turning this over in my mind for a while:

Background:

  • Just stating: We don't need an internals version bump at the moment.
  • Nothing is won by changing the behavior for Python <= 3.11. What pybind11 is doing is not "proper" but it is not actually a problem.
  • But it will be disruptive if we change the behavior for Python >= 3.8, unless we write & read to/from both builtins and PyInterpreterState_Get().
  • Important discontinuities to take into consideration: 1. we want to drop Python 3.7, 2. we want to bump the internals version for other reasons.

Suggested migration plan, minimizing disruptions:

  • At this time, no change for Python <= 3.11. It works.
  • For Python >= 3.12, only use PyInterpreterState_Get().
  • If we want to bump the internals version before dropping Python 3.7: still no change, because it would still be a disruption for Python 3.7 users.
  • After we drop Python 3.7, and when we want to bump the internals version: make the Python 3.12 code path the only one. No need to write or read to/from builtins anymore.

@wjakob
Copy link
Member Author

wjakob commented Oct 31, 2022

A bit of bad news: It seems that PyEval_GetBuiltins and PyInterpreterState_GetDict have different semantics when it comes to more advances use cases like interpreter shutdown and subinterpreters.

It causes two related tests in the C++/Catch-based test suite to fail.

This is a bit more than I wanted to bite off :-/. I had I opened this PR hoping to quickly port a change from nanobind (which doesn't support either of these more advanced use cases.)

@wjakob
Copy link
Member Author

wjakob commented Nov 2, 2022

So, it turns out I just did not look carefully enough. The failing C++ tests were checking something rather trivial -- that the internals capsule can be found in PyEval_GetBuiltins() (which is obviously not the case anymore after this commit). So all that was needed is to adapt those tests to the new conventions. The PR now passes the tests 🎉.

In response to Ralf's comments above, I think that this PR will have no negative consequences on other projects, if it is merged along with the next internals version bump. Effectively, switching the storage location is just like an ABI version change itself. Whenever is a good time to do so due to something else requiring ABI-incompatible changes, it could be combined with this PR to fix this undesirable behavior (messing with builtins) at least on all Python versions that allow for this (meaning CPython 3.8+). It's fine if this PR lingers a bit until those conditions arrive. Would you agree with this summary @rwgk?

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In response to Ralf's comments above, I think that this PR will have no negative consequences on other projects, if it is merged along with the next internals version bump.

Yes, I agree, it's a choice, connected to "do we still support Python 3.7 or not?" I was on the conservative side in my previous comment, but today maybe not :-) IOW either way is fine with me. The only thing I wouldn't want to do is bump the internals version just because of this PR, but it sounds like we agree on that, too.

include/pybind11/detail/internals.h Outdated Show resolved Hide resolved
tests/test_embed/test_interpreter.cpp Outdated Show resolved Hide resolved

dict state_dict;
#if PY_VERSION_HEX < 0x03080000 || defined(PYPY_VERSION)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwgk Should we ping PyPy to see if we can this fixed in the next release?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Might be a bit early right now. After this PR is merged might be a better timing for asking. Which brings me to another thought:

It would actually be great to merge this PR asap, but with PY_VERSION_HEX < 0x030C0000, so that we get Python 3.12 on the right track even before day 1, never will have to worry about related backwards compatibility questions for anything >= 3.12, and pre-release testing with 3.12 will never run into issues around using builtins.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does pybind11 actually compile with Python 3.12? My guess would be some internal details will need to change just like with 3.11.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: it compiles with cpython/main, though there are apparently some new failures in the test suite.

RROR tests/test_custom_type_setup.py::test_self_cycle - pytest.PytestUnraisableExceptionWarning: Exception ignor...
ERROR tests/test_custom_type_setup.py::test_indirect_cycle - pytest.PytestUnraisableExceptionWarning: Exception i...
FAILED tests/test_call_policies.py::test_alive_gc - pytest.PytestUnraisableExceptionWarning: Exception ignored in...
FAILED tests/test_methods_and_attributes.py::test_dynamic_attributes - pytest.PytestUnraisableExceptionWarning: E...
FAILED tests/test_methods_and_attributes.py::test_cyclic_gc - pytest.PytestUnraisableExceptionWarning: Exception ...
FAILED tests/test_multiple_inheritance.py::test_mi_dynamic_attributes - pytest.PytestUnraisableExceptionWarning: ...
FAILED tests/test_pickling.py::test_roundtrip_with_dict[PickleableWithDict] - pytest.PytestUnraisableExceptionWar...
FAILED tests/test_pickling.py::test_roundtrip_with_dict[PickleableWithDictNew] - pytest.PytestUnraisableException...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that wide-scale deployment of 3.12 is likely so far off that an ABI bump in pybind11 is perhaps the closer thing to aim for with regards to this PR.

@wjakob
Copy link
Member Author

wjakob commented Nov 3, 2022

That's a quite strange error for the PyPy 3.7 CI run. Did anybody see something like this before? Not sure how it could be related to this PR (which in principle does not change the behavior of PyPy builds).

@wjakob
Copy link
Member Author

wjakob commented Nov 3, 2022

Yes, I agree, it's a choice, connected to "do we still support Python 3.7 or not?"

To follow up on @rwgk's comment: the implementation of this PR leaves Python 3.7 with the previous approach to stash the capsule in the builtins dict. In this way, it is no longer necessary to drop the older Python version. Using different implementations in Python 3.7 vs 3.8+ should also be fine -- they don't need to share a compatible ABI because modules linked with each flavor won't talk to each other.

@rwgk
Copy link
Collaborator

rwgk commented Nov 3, 2022

That's a quite strange error for the PyPy 3.7 CI run. Did anybody see something like this before? Not sure how it could be related to this PR (which in principle does not change the behavior of PyPy builds).

I've never seen that error before. Very very strange.
I don't have a clue what changed in the environment. PyPy has been using the "simple" gil_scoped code forever.
It's 3.7, IIUC that means this PR also doesn't change much for that job. (I didn't get a chance yet to look at the latest changes here.)

@rwgk
Copy link
Collaborator

rwgk commented Nov 3, 2022

I clicked the rerun button for Win PyPy 3.7 after downloading the current log archive.

Two goals:

  1. does the failure reproduce?
  2. I want to learn: are both logs (1st & 2nd attempt) in the new log archive?

@@ -401,9 +401,30 @@ inline void translate_local_exception(std::exception_ptr p) {
}
#endif

inline object get_internals_state_dict() {
object state_dict;
#if PY_VERSION_HEX < 0x03080000 || defined(PYPY_VERSION)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out it is extremely easy to invite the Python 3.12 core developers to test with pybind11, with a 1 or 2 line change here, depending on how you count. Diff below.

I tested that diff with a local installation of Python 3.12.0a1. Log also below.

diff --git a/include/pybind11/detail/internals.h b/include/pybind11/detail/internals.h
index 03c10506..c8ac22e7 100644
--- a/include/pybind11/detail/internals.h
+++ b/include/pybind11/detail/internals.h
@@ -403,7 +403,8 @@ inline void translate_local_exception(std::exception_ptr p) {

 inline object get_internals_state_dict() {
     object state_dict;
-#if PY_VERSION_HEX < 0x03080000 || defined(PYPY_VERSION)
+#if (PYBIND11_INTERNALS_VERSION <= 4 && PY_VERSION_HEX < 0x030C0000)                              \
+    || PY_VERSION_HEX < 0x03080000 || defined(PYPY_VERSION)
     state_dict = reinterpret_borrow<object>(PyEval_GetBuiltins());
 #else
 #    if PY_VERSION_HEX < 0x03090000
+ /usr/local/google/home/rwgk/usr_local_like/Python-3.12.0a1/bin/python3 /usr/local/google/home/rwgk/clone/pybind11_scons/run_tests.py ../pybind11
Running tests in directory "/usr/local/google/home/rwgk/forked/pybind11/tests/test_embed":
===============================================================================
All tests passed (1554 assertions in 12 test cases)

Running tests in directory "/usr/local/google/home/rwgk/forked/pybind11/tests":
=========================================================== test session starts ============================================================
platform linux -- Python 3.12.0a1, pytest-7.2.0, pluggy-1.0.0
C++ Info: Debian Clang 14.0.6 C++17 __pybind11_internals_v4_clang_libstdcpp_cxxabi1002__
rootdir: /usr/local/google/home/rwgk/forked/pybind11/tests, configfile: pytest.ini
collected 698 items

test_async.py ..                                                                                                                     [  0%]
test_buffers.py .........                                                                                                            [  1%]
test_builtin_casters.py ....................                                                                                         [  4%]
test_call_policies.py ........                                                                                                       [  5%]
test_callbacks.py ............                                                                                                       [  7%]
test_chrono.py ...........................................                                                                           [ 13%]
test_class.py ...............................                                                                                        [ 17%]
test_const_name.py ......................                                                                                            [ 21%]
test_constants_and_functions.py .....                                                                                                [ 21%]
test_copy_move.py ........                                                                                                           [ 22%]
test_custom_type_casters.py ...                                                                                                      [ 23%]
test_custom_type_setup.py ..                                                                                                         [ 23%]
test_docstring_options.py .                                                                                                          [ 23%]
test_eigen_matrix.py .....................ss.......                                                                                  [ 28%]
test_eigen_tensor.py .............................................................................................................   [ 43%]
test_enum.py .........                                                                                                               [ 44%]
test_eval.py ....                                                                                                                    [ 45%]
test_exceptions.py ......................                                                                                            [ 48%]
test_factory_constructors.py ...............                                                                                         [ 50%]
test_gil_scoped.py .....                                                                                                             [ 51%]
test_iostream.py ......................                                                                                              [ 54%]
test_kwargs_and_defaults.py ........                                                                                                 [ 55%]
test_local_bindings.py ..........                                                                                                    [ 57%]
test_methods_and_attributes.py ......................                                                                                [ 60%]
test_modules.py .......                                                                                                              [ 61%]
test_multiple_inheritance.py ..................                                                                                      [ 64%]
test_numpy_array.py .........................................................                                                        [ 72%]
test_numpy_dtypes.py ...............                                                                                                 [ 74%]
test_numpy_vectorize.py ........                                                                                                     [ 75%]
test_opaque_types.py ...                                                                                                             [ 75%]
test_operator_overloading.py .....                                                                                                   [ 76%]
test_pickling.py ........                                                                                                            [ 77%]
test_pytypes.py ..................................................................................                                   [ 89%]
test_sequences_and_iterators.py ..............                                                                                       [ 91%]
test_smart_ptr.py .............                                                                                                      [ 93%]
test_stl.py .........s.............                                                                                                  [ 96%]
test_stl_binders.py .........                                                                                                        [ 97%]
test_tagbased_polymorphic.py .                                                                                                       [ 98%]
test_thread.py ..                                                                                                                    [ 98%]
test_union.py .                                                                                                                      [ 98%]
test_virtual_functions.py ..........                                                                                                 [100%]

========================================================= short test summary info ==========================================================
SKIPPED [1] test_eigen_matrix.py:718: could not import 'scipy': No module named 'scipy'
SKIPPED [1] test_eigen_matrix.py:728: could not import 'scipy': No module named 'scipy'
SKIPPED [1] test_stl.py:143: no <experimental/optional>
===================================================== 695 passed, 3 skipped in 10.04s ======================================================

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also triggered CI testing under #4307, after git rebase master.

@rwgk
Copy link
Collaborator

rwgk commented Nov 3, 2022

I clicked the rerun button for Win PyPy 3.7 after downloading the current log archive.

Two goals:

  1. does the failure reproduce?

No, that error did NOT reproduce, but another flake slipped in, that I believe I've seen several times before, although it is not the most common one.

Trying a 3rd time.

  1. I want to learn: are both logs (1st & 2nd attempt) in the new log archive?

It turns out it works differently:

3.8M Nov  3 04:42 logs_24701.zip
 80K Nov  3 05:51 logs_24721.zip

The new log archive only has the new log, not any of the old ones, although the directory structure is still there, but with lots of empty subdirectories. Certain files for jobs other than the one that was rerun have new timestamps. Not easy to work with, unfortunately.

I just see: the 3rd attempt was successful!

rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Nov 17, 2022
Modifications are:

* Backward compatibility (no ABI break), as originally under PR pybind#4307.
* Naming: `get_python_state_dict()`, `has_pybind11_internals_capsule()`
* Report error retrieving `internals**` from capsule instead of clearing it.

Locally tested with ASAN, MSAN, TSAN, UBSAN (Google-internal toolchain).
// name. We clear the error status below in that case
internals_pp = static_cast<internals **>(PyCapsule_GetPointer(o.ptr(), id_cstr));
if (!internals_pp) {
PyErr_Clear();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR (with the #4307 tweak) is now baked into PR #4329:

c2e6b38

Copying the commit message here (I think the link will stop working in case I have to rebase):


Modified version of PR #4293 by @wjakob

Modifications are:

* Backward compatibility (no ABI break), as originally under PR #4307.
* Naming: `get_python_state_dict()`, `has_pybind11_internals_capsule()`
* Report error retrieving `internals**` from capsule instead of clearing it.

Locally tested with ASAN, MSAN, TSAN, UBSAN (Google-internal toolchain).

My commit changes the code here, to report the error rather than suppressing it.

What is the rationale for suppressing it?

object o = state_dict[id];
// May fail if 'capsule_obj' is not a capsule, or if it has a different
// name. We clear the error status below in that case
internals_pp = static_cast<internals **>(PyCapsule_GetPointer(o.ptr(), id_cstr));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the raw CAPI here? Any particular reason we are changing it from pytype.h API it was before?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my current version:

        void *raw_ptr = PyCapsule_GetPointer(state_dict[id].ptr(), id_cstr);
        if (raw_ptr == nullptr) {
            raise_from(
                PyExc_SystemError,
                "pybind11::detail::get_internals(): Retrieve internals** from capsule FAILED");
        }
        internals_pp = static_cast<internals **>(raw_ptr);

The nice thing is that PyCapsule_GetPointer() does everything "just right" in one simple line. Additionally, in this particular situation I'd definitely want to avoid the throw PYBIND11_OBJECT_CHECK_FAILED() that comes with the capsule constructor.

I don't have a good idea for using capsule in an elegant way here TBH.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While looking around more (work under PR #4329) I noticed this code in embed.h (finalize_interpreter()):

    handle builtins(PyEval_GetBuiltins());
    ...
    if (builtins.contains(id) && isinstance<capsule>(builtins[id])) {
        internals_ptr_ptr = capsule(builtins[id]);
    }

Two things learned:

  • That needs to be changed, too, to inspect the correct dict.
  • The implementation is a bit on the high-level overkill side, scraping by an if (...) throw, but it is elegant!

@wjakob
Copy link
Member Author

wjakob commented Mar 28, 2023

Closed in favor of #4570

@wjakob wjakob closed this Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants