-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v2.10] Revert the addition of the GIL check feature #4432
Conversation
@henryiii If we want to quickly revert the GIL error check stuff, this is how I would do it. |
We went from not knowing I have a bug, to knowing I have a bug, to masking that I know I have a bug. Because: there are N people that complained because they have
In the meantime, and going forward, M people discover/ed "oh I have a bug", fix/ed it, and save/d D hours debugging in other ways, especially D_f hours debugging flaky multi-threaded applications. I know from first-hand experience that D_f can easily go to being measured in weeks for things like long-running RPC services. To be socially responsible, we need to do the math:
Now assume this PR is merged. The clock starts ticking:
Note the history of #4246, which had been sitting there since Oct 17, getting little to no attention from other pybind11 maintainers, despite several attempts to get attention, and being in Google production use since Oct 17, which means a large number of 3rd party packages effectively "saw" #4246 already. — That was already accumulating lost time (D increasing), but that's water under the bridge. When will the 2.11 release be made? As a pybind11 maintainer I feel a strong responsibility for letting D increase — again. We cannot possibly do the math exactly, but I'd say:
|
A couple of major packages (PyTorch, SciPy) see this change as breaking. I don't approve of the pinning SciPy is doing and am not fond of disabling the check - but I think this is the most practical path that will cause the least effort and hours waisted. Modern existing versions of SciPy no longer build with their <2.11 upper cap (I'm not fond of the practice but I'm not going to actively punish it). We can revert this for a quick 2.10.3 release (but we don't need to target master, we can target v2.10), improve the error reporting for 2.11, and move on. This bug has been there for a long time, and the UB behavior wasn't that bad (SciPy and PyTorch work for the most part ;) ). It'll be in 2.11, and by then hopefully most of the fixes needed will be in place. I don't think we are releasing 2.11 next week - as far as I know, there's pretty much nothing in it yet. There's no need to force improvements into 2.10.x just because 2.11 isn't out yet. Eventually 2.11 will be out and this will be in place. We don't need to fix every bug we know about in every release - and this isn't even a bug fix, just turning (working for someone1) UB into an error. We just need to fix everything we know about in master, and make practical patch releases when we can that try not to break anyone. Footnotes
|
Ahh, and what's even nicer, with this PR, you can still opt in to the new check manually in 2.10.3. :) |
@henryiii I think in the long term, it might be a good idea to do at least a scipy and pytorch build before every release as an extra unit test. |
Google-global testing does that, but only with the Google toolchain. The question is then: who has the free bandwidth to - ultimately - do scipy and pytorch a favor, testing with all toolchains. It's pretty thankless effort I'm afraid. In particular in this case: we'd just be telling them a little earlier that they have long-standing bugs. I've done some of that in preparation for the internal deployment of PR #4246 (I was forced to fix all failures first). Sometimes people are thankful. Sometimes it's more hey why are you bugging me, it works in practice. So I tend to not volunteer myself for such projects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am PyTorch contributor, I brought up the breaking change as an issue, but I don't have time to fix all the related bugs. Happy to test RCs on PyTorch to look for obvious errors / edge cases though. |
Description
We added some additional GIL status assertion in 2.10.2 (#4246).
This change seems fine, but is probably too big for a bugfix release and causing a lot of problems for our users.
This disables those assertions by default. Users can always manually opt-in with PYBIND11_ASSERT_GIL_HELD_INCREF_DECREF.
Helps with #4420
Suggested changelog entry: