[v2.10] Revert the addition of the GIL check feature #4432

EthanSteinberg · 2022-12-28T18:02:24Z

Description

We added some additional GIL status assertion in 2.10.2 (#4246).

This change seems fine, but is probably too big for a bugfix release and causing a lot of problems for our users.

This disables those assertions by default. Users can always manually opt-in with PYBIND11_ASSERT_GIL_HELD_INCREF_DECREF.

Helps with #4420

Suggested changelog entry:

Temporarily disabled our GIL status assertions to be re-enabled later.

EthanSteinberg · 2022-12-28T18:03:43Z

@henryiii If we want to quickly revert the GIL error check stuff, this is how I would do it.

rwgk · 2022-12-28T23:49:58Z

We went from not knowing I have a bug,

to knowing I have a bug,

to masking that I know I have a bug.

Because: there are N people that complained

because they have

a debug build
an unhealthy (IMO) toolchain that segfaults (MingW, what else?) instead of showing a useful stack trace with the C++ exception message, like some other toolchains do. — Side note: we already have a mitigation, PR Improve the error reporting for inc_ref GIL failures #4427, for the toolchains that don't, complete with a tutorial-kind-of addition to the documentation.

In the meantime, and going forward, M people discover/ed "oh I have a bug", fix/ed it, and save/d D hours debugging in other ways, especially D_f hours debugging flaky multi-threaded applications. I know from first-hand experience that D_f can easily go to being measured in weeks for things like long-running RPC services.

To be socially responsible, we need to do the math:

Inconveniencing N people, basically prodding them to fix their bugs sooner rather than later, or to pin the pybind11 version until they get to fixing their bugs.
Vs helping M people saving D hours of finding out in much more costly ways that they have a bug.

Now assume this PR is merged. The clock starts ticking:

A dubious (IMO) benefit for N people ("yes it's UB, but it works in practice") vs accumulating lost time for M people: D steadily increasing.

Note the history of #4246, which had been sitting there since Oct 17, getting little to no attention from other pybind11 maintainers, despite several attempts to get attention, and being in Google production use since Oct 17, which means a large number of 3rd party packages effectively "saw" #4246 already. — That was already accumulating lost time (D increasing), but that's water under the bridge.

When will the 2.11 release be made?

As a pybind11 maintainer I feel a strong responsibility for letting D increase — again.

We cannot possibly do the math exactly, but I'd say:

If the 2.11 release is only a week out: Fine.
If the 2.11 release is a couple months out: No, that's irresponsible. — And why? Is it more than a short moment of extra thinking to get the version pinning right one way or another?

henryiii · 2022-12-29T03:11:22Z

A couple of major packages (PyTorch, SciPy) see this change as breaking. I don't approve of the pinning SciPy is doing and am not fond of disabling the check - but I think this is the most practical path that will cause the least effort and hours waisted. Modern existing versions of SciPy no longer build with their <2.11 upper cap (I'm not fond of the practice but I'm not going to actively punish it). We can revert this for a quick 2.10.3 release (but we don't need to target master, we can target v2.10), improve the error reporting for 2.11, and move on. This bug has been there for a long time, and the UB behavior wasn't that bad (SciPy and PyTorch work for the most part ;) ). It'll be in 2.11, and by then hopefully most of the fixes needed will be in place.

I don't think we are releasing 2.11 next week - as far as I know, there's pretty much nothing in it yet. There's no need to force improvements into 2.10.x just because 2.11 isn't out yet. Eventually 2.11 will be out and this will be in place.

We don't need to fix every bug we know about in every release - and this isn't even a bug fix, just turning (working for someone¹) UB into an error. We just need to fix everything we know about in master, and make practical patch releases when we can that try not to break anyone.

I'm guessing the py::none obtained before Python was initialized worked. If it doesn't, then I'd say this was a real bug and we are fine elevating it to a compile time error in a patch release. But I'm pretty sure that would have been in SciPy's / PyTorch's test suite. ↩

henryiii · 2022-12-29T03:13:10Z

Ahh, and what's even nicer, with this PR, you can still opt in to the new check manually in 2.10.3. :)

include/pybind11/detail/common.h

EthanSteinberg · 2022-12-30T14:54:16Z

@henryiii I think in the long term, it might be a good idea to do at least a scipy and pytorch build before every release as an extra unit test.

rwgk · 2022-12-30T16:16:54Z

@henryiii I think in the long term, it might be a good idea to do at least a scipy and pytorch build before every release as an extra unit test.

Google-global testing does that, but only with the Google toolchain.

The question is then: who has the free bandwidth to - ultimately - do scipy and pytorch a favor, testing with all toolchains. It's pretty thankless effort I'm afraid. In particular in this case: we'd just be telling them a little earlier that they have long-standing bugs. I've done some of that in preparation for the internal deployment of PR #4246 (I was forced to fix all failures first). Sometimes people are thankful. Sometimes it's more hey why are you bugging me, it works in practice. So I tend to not volunteer myself for such projects.

rwgk

henryiii changed the base branch from master to v2.10 yesterday

Thanks for doing that!

I'm not really happy TBH, but it's a reasonable compromise!

Skylion007 · 2022-12-30T19:48:10Z

@henryiii I think in the long term, it might be a good idea to do at least a scipy and pytorch build before every release as an extra unit test.

Google-global testing does that, but only with the Google toolchain.

The question is then: who has the free bandwidth to - ultimately - do scipy and pytorch a favor, testing with all toolchains. It's pretty thankless effort I'm afraid. In particular in this case: we'd just be telling them a little earlier that they have long-standing bugs. I've done some of that in preparation for the internal deployment of PR #4246 (I was forced to fix all failures first). Sometimes people are thankful. Sometimes it's more hey why are you bugging me, it works in practice. So I tend to not volunteer myself for such projects.

I am PyTorch contributor, I brought up the breaking change as an issue, but I don't have time to fix all the related bugs. Happy to test RCs on PyTorch to look for obvious errors / edge cases though.

Revert the GIL check

3509fc0

Skylion007 approved these changes Dec 28, 2022

View reviewed changes

rwgk mentioned this pull request Dec 28, 2022

[BUG]: PyCapsule_SetName segfault #4420

Closed

3 tasks

henryiii changed the base branch from master to v2.10 December 29, 2022 03:12

henryiii reviewed Dec 30, 2022

View reviewed changes

include/pybind11/detail/common.h Outdated Show resolved Hide resolved

henryiii approved these changes Dec 30, 2022

View reviewed changes

Update include/pybind11/detail/common.h

8cbae86

rwgk approved these changes Dec 30, 2022

View reviewed changes

henryiii changed the title ~~Revert the addition of the GIL check feature~~ [v2.10] Revert the addition of the GIL check feature Dec 30, 2022

henryiii merged commit 2de6e39 into pybind:v2.10 Dec 30, 2022

github-actions bot added the needs changelog Possibly needs a changelog entry label Dec 30, 2022

EthanSteinberg deleted the revert_gil_check branch December 30, 2022 18:47

henryiii removed the needs changelog Possibly needs a changelog entry label Jan 3, 2023

rwgk mentioned this pull request Feb 11, 2023

FWD pybind11 google/pybind11clif#4432

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[v2.10] Revert the addition of the GIL check feature #4432

[v2.10] Revert the addition of the GIL check feature #4432

Uh oh!

EthanSteinberg commented Dec 28, 2022

Uh oh!

EthanSteinberg commented Dec 28, 2022

Uh oh!

rwgk commented Dec 28, 2022 •

edited

Loading

Uh oh!

henryiii commented Dec 29, 2022 •

edited

Loading

Uh oh!

henryiii commented Dec 29, 2022

Uh oh!

Uh oh!

EthanSteinberg commented Dec 30, 2022

Uh oh!

rwgk commented Dec 30, 2022

Uh oh!

rwgk left a comment

Uh oh!

Skylion007 commented Dec 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[v2.10] Revert the addition of the GIL check feature #4432

[v2.10] Revert the addition of the GIL check feature #4432

Uh oh!

Conversation

EthanSteinberg commented Dec 28, 2022

Description

Suggested changelog entry:

Uh oh!

EthanSteinberg commented Dec 28, 2022

Uh oh!

rwgk commented Dec 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henryiii commented Dec 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

henryiii commented Dec 29, 2022

Uh oh!

Uh oh!

EthanSteinberg commented Dec 30, 2022

Uh oh!

rwgk commented Dec 30, 2022

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

Skylion007 commented Dec 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rwgk commented Dec 28, 2022 •

edited

Loading

henryiii commented Dec 29, 2022 •

edited

Loading