-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pybind11::handle
inc_ref()
& dec_ref()
PyGILState_Check()
**excluding** nullptr
#4246
Conversation
Is there a way to boil the test down to a minimal reproducer and open a PyPy issue? |
Ooo, that's hard, I'd have to reduce via the GitHub CI. I don't have access to a suitable Windows machine. And not really any clue where the problem could be, beyond observing that pytest didn't even start up. |
FYI — This PR was deployed to Google's production systems today (after a relatively small internal cleanup). I'll keep this in draft mode for a week or so, JIC there is some pushback. |
#endif | ||
#if defined(PYBIND11_ASSERT_GIL_HELD_INCREF_DECREF) | ||
if (m_ptr != nullptr && !PyGILState_Check()) { | ||
throw std::runtime_error("pybind11::handle::inc_ref() PyGILState_Check() failure."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason no pybind11_fail? Because of the corresponding !PyErrOccurred()
assertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
But even if that was taken out in pybind11_fail()
, I think a "raw" C++ exception is best here, we want the process to terminate as directly as possible. I was thinking of using std::terminate()
directly, but then it is troublesome to also produce the message, so I decided simpler is better. In our environment at least, std::runtime_error()
works out very nicely (useful traceback with message).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, functions that can raise exceptions do come with some subtle costs in C++. The compiler needs to keep track of the many ways in which might have to destroy objects if an exception arises at various different points of the program and generate code for each one. I don't think it's a performance issue, but this will likely make the debug binaries bigger (compared to pybind11_fail
) given that such a central operation is affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Narrow focus:
Currently pybind11_fail()
includes this:
assert(!PyErr_Occurred());
That needs the GIL, but here (handle
) we just determined that we don't have it.
Bigger picture:
There is a general tradeoff: a faster or leaner binary today vs a future with more features.
Boundary condition: human time is ~constant.
So ultimately we always trade one for the other.
If you hire person A who likes to optimize, you may get something that runs in 1.0 time units with 1.0 resources, with 10 features.
If you hire person B, who's more like me, you get something that runs in 1.1 time units with 1.1 resources, with 20 features.
What's "better"?
It depends.
Oh, thanks for looking and pointing this out! I was misled by what I saw in the web view of the logs yesterday before I cancelled the jobs. There was just a spinning wheel for a long time without any output. I'll try to get back here as soon as I get a chance. |
b770824
to
37c9cb7
Compare
This reverts commit 23ac16e.
This reverts commit 23ac16e.
* Add option to force the use of the PYPY GIL scoped acquire/release logic to support nested gil access, see #1276 and pytorch/pytorch#83101 * Apply suggestions from code review * Update CMakeLists.txt * docs: update upgrade guide * Update docs/upgrade.rst * All bells & whistles. * Add Reminder to common.h, so that we will not forget to purge `!WITH_THREAD` branches when dropping Python 3.6 * New sentence instead of semicolon. * Temporarily pull in snapshot of PR #4246 * Add `test_release_acquire` * Add more unit tests for nested gil locking * Add test_report_builtins_internals_keys * Very minor enhancement: sort list only after filtering. * Revert change in docs/upgrade.rst * Add test_multi_acquire_release_cross_module, while also forcing unique PYBIND11_INTERNALS_VERSION for cross_module_gil_utils.cpp * Hopefully fix apparently new ICC error. ``` 2022-10-28T07:57:54.5187728Z -- The CXX compiler identification is Intel 2021.7.0.20220726 ... 2022-10-28T07:58:53.6758994Z icpc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message. 2022-10-28T07:58:54.5801597Z In file included from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../detail/type_caster_base.h(15), 2022-10-28T07:58:54.5803794Z from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../cast.h(15), 2022-10-28T07:58:54.5805740Z from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../attr.h(14), 2022-10-28T07:58:54.5809556Z from /home/runner/work/pybind11/pybind11/include/pybind11/detail/class.h(12), 2022-10-28T07:58:54.5812154Z from /home/runner/work/pybind11/pybind11/include/pybind11/pybind11.h(13), 2022-10-28T07:58:54.5948523Z from /home/runner/work/pybind11/pybind11/tests/cross_module_gil_utils.cpp(13): 2022-10-28T07:58:54.5949009Z /home/runner/work/pybind11/pybind11/include/pybind11/detail/../detail/internals.h(177): error #2282: unrecognized GCC pragma 2022-10-28T07:58:54.5949374Z PYBIND11_TLS_KEY_INIT(tstate) 2022-10-28T07:58:54.5949579Z ^ 2022-10-28T07:58:54.5949695Z ``` * clang-tidy fixes * Workaround for PYPY WIN exitcode None * Revert "Temporarily pull in snapshot of PR #4246" This reverts commit 23ac16e. * Another workaround for PYPY WIN exitcode None * Clean up how the tests are run "run in process" Part 1: uniformity * Clean up how the tests are run "run in process" Part 2: use `@pytest.mark.parametrize` and clean up the naming. * Skip some tests `#if defined(THREAD_SANITIZER)` (tested with TSAN using the Google-internal toolchain). * Run all tests again but ignore ThreadSanitizer exitcode 66 (this is less likely to mask unrelated ThreadSanitizer issues in the future). * bug fix: missing common.h include before using `PYBIND11_SIMPLE_GIL_MANAGEMENT` For the tests in the github CI this does not matter, because `PYBIND11_SIMPLE_GIL_MANAGEMENT` is always defined from the command line, but when monkey-patching common.h locally, it matters. * if process.exitcode is None: assert t_delta > 9.9 * More sophisiticated `_run_in_process()` implementation, clearly reporting `DEADLOCK`, additionally exercised via added `intentional_deadlock()` * Wrap m.intentional_deadlock in a Python function, for `ForkingPickler` compatibility. ``` > ForkingPickler(file, protocol).dump(obj) E TypeError: cannot pickle 'PyCapsule' object ``` Observed with all Windows builds including mingw but not PyPy, and macos-latest with Python 3.9, 3.10, 3.11 but not 3.6. * Add link to potential solution for WOULD-BE-NICE-TO-HAVE feature. * Add `SKIP_IF_DEADLOCK = True` option, to not pollute the CI results with expected `DEADLOCK` failures while we figure out what to do about them. * Add COPY-PASTE-THIS: gdb ... command (to be used for debugging the detected deadlock) * style: pre-commit fixes * Do better than automatic pre-commit fixes. * Add `PYBIND11_SIMPLE_GIL_MANAGEMENT` to `pytest_report_header()` (so that we can easily know when harvesting deadlock information from the CI logs). Co-authored-by: Arnim Balzer <arnim@seechange.ai> Co-authored-by: Henry Schreiner <HenrySchreinerIII@gmail.com> Co-authored-by: Ralf W. Grosse-Kunstleve <rwgk@google.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add option to force the use of the PYPY GIL scoped acquire/release logic to support nested gil access, see #1276 and pytorch/pytorch#83101 * Apply suggestions from code review * Update CMakeLists.txt * docs: update upgrade guide * Update docs/upgrade.rst * All bells & whistles. * Add Reminder to common.h, so that we will not forget to purge `!WITH_THREAD` branches when dropping Python 3.6 * New sentence instead of semicolon. * Temporarily pull in snapshot of PR #4246 * Add `test_release_acquire` * Add more unit tests for nested gil locking * Add test_report_builtins_internals_keys * Very minor enhancement: sort list only after filtering. * Revert change in docs/upgrade.rst * Add test_multi_acquire_release_cross_module, while also forcing unique PYBIND11_INTERNALS_VERSION for cross_module_gil_utils.cpp * Hopefully fix apparently new ICC error. ``` 2022-10-28T07:57:54.5187728Z -- The CXX compiler identification is Intel 2021.7.0.20220726 ... 2022-10-28T07:58:53.6758994Z icpc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message. 2022-10-28T07:58:54.5801597Z In file included from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../detail/type_caster_base.h(15), 2022-10-28T07:58:54.5803794Z from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../cast.h(15), 2022-10-28T07:58:54.5805740Z from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../attr.h(14), 2022-10-28T07:58:54.5809556Z from /home/runner/work/pybind11/pybind11/include/pybind11/detail/class.h(12), 2022-10-28T07:58:54.5812154Z from /home/runner/work/pybind11/pybind11/include/pybind11/pybind11.h(13), 2022-10-28T07:58:54.5948523Z from /home/runner/work/pybind11/pybind11/tests/cross_module_gil_utils.cpp(13): 2022-10-28T07:58:54.5949009Z /home/runner/work/pybind11/pybind11/include/pybind11/detail/../detail/internals.h(177): error #2282: unrecognized GCC pragma 2022-10-28T07:58:54.5949374Z PYBIND11_TLS_KEY_INIT(tstate) 2022-10-28T07:58:54.5949579Z ^ 2022-10-28T07:58:54.5949695Z ``` * clang-tidy fixes * Workaround for PYPY WIN exitcode None * Revert "Temporarily pull in snapshot of PR #4246" This reverts commit 23ac16e. * Another workaround for PYPY WIN exitcode None * Clean up how the tests are run "run in process" Part 1: uniformity * Clean up how the tests are run "run in process" Part 2: use `@pytest.mark.parametrize` and clean up the naming. * Skip some tests `#if defined(THREAD_SANITIZER)` (tested with TSAN using the Google-internal toolchain). * Run all tests again but ignore ThreadSanitizer exitcode 66 (this is less likely to mask unrelated ThreadSanitizer issues in the future). * bug fix: missing common.h include before using `PYBIND11_SIMPLE_GIL_MANAGEMENT` For the tests in the github CI this does not matter, because `PYBIND11_SIMPLE_GIL_MANAGEMENT` is always defined from the command line, but when monkey-patching common.h locally, it matters. * if process.exitcode is None: assert t_delta > 9.9 * More sophisiticated `_run_in_process()` implementation, clearly reporting `DEADLOCK`, additionally exercised via added `intentional_deadlock()` * Wrap m.intentional_deadlock in a Python function, for `ForkingPickler` compatibility. ``` > ForkingPickler(file, protocol).dump(obj) E TypeError: cannot pickle 'PyCapsule' object ``` Observed with all Windows builds including mingw but not PyPy, and macos-latest with Python 3.9, 3.10, 3.11 but not 3.6. * Add link to potential solution for WOULD-BE-NICE-TO-HAVE feature. * Add `SKIP_IF_DEADLOCK = True` option, to not pollute the CI results with expected `DEADLOCK` failures while we figure out what to do about them. * Add COPY-PASTE-THIS: gdb ... command (to be used for debugging the detected deadlock) * style: pre-commit fixes * Do better than automatic pre-commit fixes. * Add `PYBIND11_SIMPLE_GIL_MANAGEMENT` to `pytest_report_header()` (so that we can easily know when harvesting deadlock information from the CI logs). Co-authored-by: Arnim Balzer <arnim@seechange.ai> Co-authored-by: Henry Schreiner <HenrySchreinerIII@gmail.com> Co-authored-by: Ralf W. Grosse-Kunstleve <rwgk@google.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…cluding** `nullptr` Guarded by `PYBIND11_ASSERT_GIL_HELD_INCREF_DECREF`
37c9cb7
to
3115440
Compare
15 days later: I only got two questions re problems that turned out to be completely unrelated, and one report of a bug uncovered by this PR and then fixed. Also note that this PR would have immediately flagged the root cause of #4137, quite likely that issue would not even have been filed if we had made this change earlier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given you've checked runtime, and this helps with safety, I'm for it.
Summary: pybind11 recently added a feature to assert that the GIL is held during reference counting operations pybind/pybind11#4246 This feature has already uncovered several instances of undefined behavior in several Python extensions. However, this code does not seem to pass these checks, which is preventing us from upgrading PyTorch to the latest pybind11. See pytorch/pytorch#105245 This PR disables those checks, which will in turn allow PyTorch to upgrade. Note that this code already has known potential GIL issues: https://github.com/pytorch/multipy/blob/bd1c76f294335695db8bfa66250781c1627f4eb7/multipy/runtime/deploy.cpp#L59 Pull Request resolved: #325 Reviewed By: PaliC Differential Revision: D47518187 Pulled By: albanD fbshipit-source-id: 5bf14d3633afe67fe0aa8fd74610403e0c494ea5
Description
Background: https://docs.python.org/3/glossary.html#term-global-interpreter-lock
The GIL must be held when calling any Python C API functions. In multithreaded applications that use callbacks this requirement can easily be violated by accident. A general tool to ensure GIL health is not available, but the this PR integrates a very simple basic health check into
pybind11::handle
.A comparison of test runtimes with and without this PR showed that the runtime overhead for the added checks is practically undetectable for non-optimized builds, but Google global testing with this PR uncovered a number (10+) of often very subtle GIL-not-held issues, including bugs in a few 3rd party projects (e.g. pikepdf).
The added GIL checks are guarded by
PYBIND11_ASSERT_GIL_HELD_INCREF_DECREF
, which is the default only ifNDEBUG
is not defined:The test for the
PY_ASSERT_GIL_HELD_INCREF_DECREF
macro is for cooperation with a more general patch in the Python sources, inserting the GIL checks directly inPy_INCREF
&Py_DECREF
, which makes the checks in pybind11 redundant.Note that the GIL checks are skipped if
py::handle
ptr() == nullptr
. This could mask bugs, in particular for classes withpy::object
members for whichptr() == nullptr
in all unit tests, but not in production. Unfortunately Google global testing revealed that such members are much more common than one might expect. It is currently unclear if the cleanup cost for enforcing that the GIL is held even fornullptr
is commensurate with the benefits of preventing a potentially only very small number of production bugs.Suggested changelog entry: