bpo-46210: Fix deadlock in print. #30310

notarealdeveloper · 2021-12-30T23:39:42Z

Add _at_fork_reinit handler to buffer->lock of stdout and stderr, to fix a deadlock in print.

https://bugs.python.org/issue46210

the-knights-who-say-ni · 2021-12-30T23:39:45Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@notarealdeveloper

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

sweeneyde

Just some superficial things, I'm not that knowledgable about the whole threading/forking situation.

Modules/_io/bufferedio.c

Python/pystate.c

sweeneyde · 2021-12-31T00:32:24Z

Python/pystate.c

+    /* error */
+    ret = -1;
+end:
+    return ret;


I think there are a few more leaks here. Perhaps it would be simpler to initialize the variables to NULL, and then use Py_XDECREF in the end:/error: label.

sweeneyde · 2021-12-31T00:38:00Z

Python/pystate.c

+    if (_PyObject_GetAttrId(stdio, &PyId_isatty) == NULL) {
+        PyErr_Clear();
+        goto end;
+    }
+
+    isatty = _PyObject_CallMethodIdNoArgs(stdio, &PyId_isatty);
+    if (Py_IsFalse(isatty)) {
+        goto end;
+    }


I think this can be just one call to _PyObject_CallMethodIdNoArgs with a check for a NULL return value.

sweeneyde · 2021-12-31T00:46:14Z

It would be nice to see a test case added for these methods. If a threading/forking case can be made to pass consistently, that would be nice, but even if not, just a simple exercise of the _at_fork_reinit method would be nice. Then we can make sure there are no reference leaks (with ./python -m test test_whatever -R3:3).

notarealdeveloper · 2021-12-31T01:17:32Z

Thanks for the tips! I know exactly what to learn about next. Will push some updates after I learn a bit more. I think the general "unlock the thing" idea probably makes sense, but I'm sure I tripped over myself in 17 different ways in the initial patch. Curious to hear from Victor about whether the overall goal is a sane one. Either way, more coming soon. Happy New Year. :)

notarealdeveloper · 2021-12-31T19:36:24Z

Ok, re: ./python -m test some_test -R3:3.

Just updated the pull request with what (I think?) is correct reference counting.

All the tests pass, but I wanted to check for reference leaks like you suggested.

So I ran several of the relevant tests with -R3:3 to check, but then I realized something.

Supposing there's a reference leak, would it be noticed by -R? From the docs, it seems -R uses sys.gettotalrefcount() to check for leaks, but this code is only being executed in the child process created by a fork, so if the parent is checking for updates to a variable in its address space I don't think it'll see them.

Is there a standard way to check for leaks in situations like that, or is it better to just use a brute force strategy like I did in the comment above (i.e., calling every function twice and checking ob_refcnt manually with printf like a caveman).

notarealdeveloper · 2022-01-05T20:50:23Z

Hi there. :)

After spending some time thinking about how to exercise the lock re-initialization function in a test, I realized that the best way to overcome my noobish-ness here is to follow the example of _PyImport_ReInitLock, seeing as @vstinner has done a lot of the work on that, and his past fixes of various deadlocks are basically identical to what I'm trying to do here.

The "meat" of the implementation we've already discussed doesn't change (thanks again for the tips @sweeneyde). I've just moved things to different files, in order to maximize the similarity with what _PyImport_ReInitLock does.

Here's what a grep of the codebase for the two functions looks like:

$ grep -Ir _PyImport_ReInitLock
Modules/posixmodule.c:#include "pycore_import.h"        // _PyImport_ReInitLock()
Modules/posixmodule.c:    status = _PyImport_ReInitLock();
Include/internal/pycore_import.h:extern PyStatus _PyImport_ReInitLock(void);
Python/import.c:_PyImport_ReInitLock(void)

$ grep -Ir _PySys_ReInitStdio
Modules/posixmodule.c:#include "pycore_sysmodule.h"     // _PySys_ReInitStdio()
Modules/posixmodule.c:    status = _PySys_ReInitStdio();
Include/internal/pycore_sysmodule.h:extern PyStatus _PySys_ReInitStdio(void);
Python/sysmodule.c:_PySys_ReInitStdio(void)

And here's the git blame output showing the only location where the new function is called.

26881c8fae3 (Victor Stinner  2020-06-02 15:51:37  592)     status = _PyImport_ReInitLock();
26881c8fae3 (Victor Stinner  2020-06-02 15:51:37  593)     if (_PyStatus_EXCEPTION(status)) {
26881c8fae3 (Victor Stinner  2020-06-02 15:51:37  594)         goto fatal_error;
26881c8fae3 (Victor Stinner  2020-06-02 15:51:37  595)     }
26881c8fae3 (Victor Stinner  2020-06-02 15:51:37  596) 
8c8d9c99971 (Jason Wilkes    2022-01-05 11:45:01  597)     status = _PySys_ReInitStdio();
8c8d9c99971 (Jason Wilkes    2022-01-05 11:45:01  598)     if (_PyStatus_EXCEPTION(status)) {
8c8d9c99971 (Jason Wilkes    2022-01-05 11:45:01  599)         goto fatal_error;
8c8d9c99971 (Jason Wilkes    2022-01-05 11:45:01  600)     }

The goal was to get line-by-line correspondence with what @vstinner does with the import lock, and we seem to have that now.

Now, the only difference arises inside the implementation, because the import lock is a global variable inside Python/import.c, while the stdout lock lives on the stdout.buffer object (similarly with stderr). The addition of an _at_fork_reinit method to the buffered object in Modules/_io/bufferedio.c isn't a necessary part of the implementation, it's just the simplest way I knew of (as a noob) to get a pointer to those locks from outside that module. I'd be more than happy to change that detail if anyone thinks there's a better approach.

As for everything else: (1) the PEP 7 formatting should be good now, (2) @the-knights-who-say-ni seem to have given me the "CLA signed" badge just now, (3) in playing with all this, I stumbled on another small unrelated bug in the test suite and fixed it, so I'll submit that as a separate bpo & pull request.

Let me know if there's anything else you'd like me to do. Thanks again for all the help. :)

JelleZijlstra

Agree that this needs tests. It also adds a new Python-accessible method. What happens if Python code calls it?

Python/sysmodule.c

notarealdeveloper · 2022-02-01T18:30:21Z

Re: Helpful comments from @JelleZijlstra

Agree that this needs tests. It also adds a new Python-accessible method. What happens if Python code calls it?

100% agreed, the Python-accessible method wasn't necessary. I needed to learn a bit more before I could do this correctly.

Just pushed some changes to address this. The updated approach:

Removes the Python-accessible method.
Tests that the lock reinitialization is working as expected (See TestStdioAtForkReInit in Lib/test/test_thread.py).

The test's behavior is to cause the deadlock described in bpo-46210 if and only if the lock is not being reinitialized correctly.

Just like the other C-level locks in the codebase, this is something that should be reinitialized on every fork, so you (@JelleZijlstra) were absolutely right to point out that a Python-accessible interface was an awkward fit.

Further, I want to make it clear to other devs precisely what situation the bugfix is targeting, so the test contains a description of what the bug looks like without the fix, and which specific C function was added to fix it.

Summary: the test can be run with ./python -m test test_thread, and removing the single line status = _PySys_ReInitStdio(); from Modules/posixmodule.c should be sufficient to cause ./python -m test test_thread to fail with a timeout error due to the deadlock, which illustrates how this PR addresses bpo-46210. Let me know if there's anything else you'd like me to do. Thanks again for all the help. :)

notarealdeveloper · 2022-02-01T18:43:41Z

In addition to the above:

In the course of adding the test, I found a small bug in the wait_process function in Lib/test/support/__init__.py (the function ignores its timeout argument, and uses a module level global SHORT_TIMEOUT even when timeout is specified).

That won't be relevant to anyone who runs the test as is, since the PR fixes the deadlock.

However, if anyone feels like running the above test after removing the bugfix to verify that the test fails, as it should (if we remove status = _PySys_ReInitStdio(); from Modules/posixmodule.c and then re-run ./python -m test test_thread), then you may notice an odd behavior, in which a line in the new test that says support.wait_process(main_pid, exitcode=0, timeout=5) takes 30 seconds to time out, not 5.

That's due to the above mentioned bug in Lib/test/support/__init__.py:wait_process.

Should I submit another bpo for this small bug, or just send a PR directly?

Should be a one-line fix. Just need to change if dt > SHORT_TIMEOUT in that function to if dt > timeout.

JelleZijlstra · 2022-02-01T18:47:45Z

Thanks, can you open another issue for the test.support fix? (I don't have time right now to look at this PR in detail, but I'll review it again.)

notarealdeveloper · 2022-02-01T18:48:47Z

Thanks, can you open another issue for the test.support fix? (I don't have time right now to look at this PR in detail, but I'll review it again.)

Will do! :)

Thanks for the help.

JelleZijlstra · 2022-02-02T02:23:40Z

Modules/_io/bufferedio.c

+int
+_PyIO_buffered_at_fork_reinit(PyObject *self)
+{
+    Py_INCREF(self);


Why not just do the INCREF after the if statement?

Ah, of course. Still getting a hang of reference counting. Thanks for the tips! (This one should have been obvious, now that I look at it again).

JelleZijlstra · 2022-02-02T02:26:30Z

Python/sysmodule.c

+    PyObject *buffer = NULL;
+    PyObject *result = NULL;
+
+    PyObject *stdio = _PySys_GetObjectId(key);


Looks like this returns a borrowed reference, so there's no need to DECREF it: the code is correct.

Cool, thanks. :)

nitpick: I suggest the name stream, but stdio works too ;-)

JelleZijlstra · 2022-02-02T02:28:45Z

Python/sysmodule.c

+
+    PyObject *isatty = NULL;
+    PyObject *buffer = NULL;
+    PyObject *result = NULL;


This is never used

Removed. Thanks again.

notarealdeveloper · 2022-02-07T21:55:24Z

Ok, all the comments above from @JelleZijlstra should be addressed (thanks again for the help!).

Also submitted PRs for two minor bugs I found in the test suite while working on this PR (those are #31204 and #31205).

Neither of those is a requirement for this PR to work. They're just trivial fixes to a couple edge cases in the test suite.

However, in a hypothetical scenario where:

We merge the test that this PR added, but
We remove or break the C code in this PR that fixes the deadlock.

then #31205 ensures that the timeout in our new test is respected when waiting for a deadlocked child process.

That's just a minor bit of polish, but I'm trying to think about possible future scenarios, not just the code as it exists now.

Let me know if there's anything else I can improve. :)

vstinner · 2022-02-07T22:22:34Z

Python/sysmodule.c

+    PyObject *stdio = _PySys_GetObjectId(key);
+
+    _Py_IDENTIFIER(isatty);
+    isatty = _PyObject_CallMethodIdNoArgs(stdio, &PyId_isatty);


I don't see the purpose of testing isatty().

vstinner · 2022-02-07T22:22:56Z

Python/sysmodule.c

+    PyObject *buffer = NULL;
+
+    PyObject *stdio = _PySys_GetObjectId(key);
+


stdio can be NULL or Py_None. You should do nothing in this case.

Thanks, done.

vstinner · 2022-02-07T22:24:17Z

Python/sysmodule.c

+    if (_PyObject_LookupAttrId(stdio, &PyId_buffer, &buffer) < 0) {
+        /* stdout.buffer and stderr.buffer are not part of the
+         * TextIOBase API and may not exist in some implementations.
+         * If not present, no need to reinitialize their locks. */


The function can raises an AttributeError, no? If yes, you must clear it: PyErr_Clear().

Fixed! (The function is MUCH simpler now. Thanks for the tips.)

vstinner · 2022-02-07T22:25:08Z

Python/sysmodule.c

+    int ret = 0;
+
+    PyObject *isatty = NULL;
+    PyObject *buffer = NULL;


I prefer to move variable declaration where the variable is set for the first value, rather than declarating most of them at the top.

Changed to this style throughout.

vstinner · 2022-02-07T22:25:40Z

Python/sysmodule.c

+    PyObject *buffer = NULL;
+    PyObject *result = NULL;
+
+    PyObject *stdio = _PySys_GetObjectId(key);


nitpick: I suggest the name stream, but stdio works too ;-)

vstinner · 2022-02-07T22:26:31Z

Modules/_io/bufferedio.c

+    if (!Py_IS_TYPE(self, &PyBufferedWriter_Type)) {
+        return 0;
+    }
+    Py_INCREF(self);


Why do you hold a string pointer? _PyThread_at_fork_reinit() doesn't release the GIL, it only allocates memory on the heap. You can remove the INCREF/DECREF dance.

Re: stream over stdio. Changed.

Re: INCREF/DECREF dance. Done. (Still getting a feel for refcounting, thanks for the tips.)

vstinner · 2022-02-07T22:28:54Z

Lib/test/test_thread.py

@@ -265,5 +267,107 @@ def tearDown(self):
            pass


+class TestStdioAtForkReInit(unittest.TestCase):


I dislike such function test, it's fragile and may break tomorrow. I would prefer to remove it.

If you want to keep it, you should:

create a new script and run the script from the test. So you better control the Python state when the test is run: which modules are imported, signals, etc.

move it to test_io: the tested code lives in the _io module

Can you clarify why you think it's fragile? It seems useful as a regression test. I do agree that it would be more stable in a separate script.

This is the only part I didn't change yet, b/c I wanted to check with you guys on the details.

@vstinner All the at_fork_reinit handlers in the codebase have your name on them, so I'll trust your judgment here. (Though I'm also curious for the same reasons @JelleZijlstra mentioned).

On the topic of the current test, you said "If you want to keep it, you should..."

What's the "else" condition for that sentence? I would prefer to do whatever you think makes the most sense. I don't have any strong opinions here, aside from a desire to have that one line in there that reinitializes the lock.

Test-wise, would you prefer (1) just moving the current test to a separate script? (2) removing the current test entirely? (3) a different test with certain properties that this one doesn't have?

I'm equally happy to do any of the above. :)

vstinner · 2022-02-07T22:30:44Z

Python/sysmodule.c

+   created child processes do not share locks with the parent. */
+PyStatus
+_PySys_ReInitStdio(void)
+{


sys.stdin has also a buffer which has a lock which must be reinitialized at fork.

Cool, now PyBufferedWriter_Type and PyBufferedReader_Type both get their locks reinitialized.

vstinner · 2022-02-07T22:30:53Z

Python/sysmodule.c

+    int reinit_stderr = stdio_at_fork_reinit(&PyId_stderr);
+
+    if (reinit_stdout < 0 || reinit_stderr < 0) {
+        return _PyStatus_ERR("Failed to reinitialize stdout and stderr");


Suggested change

return _PyStatus_ERR("Failed to reinitialize stdout and stderr");

return _PyStatus_ERR("Failed to reinitialize standard streams");

standard streams it is! :)

This commit: * Properly identifies whether we need to reinit a stream's lock. * Changes 'stdio' to 'stream' in several functions for clarity. * Also reinitialize stdin's buffer->lock at fork.

notarealdeveloper · 2022-02-17T01:36:30Z

Mind if I rebase this on main and force push a small update? I fixed the PR to be consistent with #30928 (bpo-46541).

i.e., stream_at_fork_reinit now uses _Py_ID instead of _Py_IDENTIFIER.

(P.S. Am I imagining things, or do recent changes in main feel very Sam Gross-y? Trying not to get too excited, but I hope that's what's going on. 🎉)

JelleZijlstra · 2022-02-17T01:37:40Z

Generally better to merge main, so that the review history gets preserved.

notarealdeveloper · 2022-02-17T01:45:22Z

Cool, so this pattern?

Merge main (Now "ahead of 'origin/print' by 523 commits.")
Cherry pick (Now "ahead of 'origin/print' by 524 commits.")
Ok to push, despite the other commits? (i.e., those won't pollute the review history here?)

Apologies for the dumb question. I've used git socially and github personally, but I'm still fairly new to using github socially. :)

JelleZijlstra · 2022-02-17T01:47:45Z

Something like this:

git fetch upstream (assuming upstream is python/cpython)
git merge upstream/main
Fix conflicts
git commit
git push

GitHub also lets you do this in the web interface if there's a conflict (the "Resolve conflicts" button at the bottom of the thread).

Update for consistency with pythongh-30928 (bpo-46541).

notarealdeveloper · 2022-02-17T02:11:03Z

Thanks!

Fix deadlock in print.

dc5d787

the-knights-who-say-ni added the CLA not signed label Dec 30, 2021

bedevere-bot added the awaiting review label Dec 30, 2021

notarealdeveloper added 2 commits December 30, 2021 15:51

Add Misc/NEWS.d entry.

46d4ce6

Only add _at_fork_reinit if fork is available.

3409181

sweeneyde reviewed Dec 31, 2021

View reviewed changes

sweeneyde requested a review from vstinner December 31, 2021 00:39

notarealdeveloper added 2 commits December 31, 2021 10:44

Use Py_RETURN_* in Modules/_io/bufferedio.c:buffered_at_fork_reinit

19f19b7

Fix up reference counting in Python/pystate.c:stdio_at_fork_reinit

f5334b4

Formatting for PEP 7.

3858006

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Jan 5, 2022

notarealdeveloper added 2 commits January 5, 2022 11:49

Add _PySys_ReInitStdio to Python/sysmodule.c

fb0bd16

Call _PySys_ReInitStdio in PyOS_AfterFork_Child.

8c8d9c9

JelleZijlstra requested changes Jan 24, 2022

View reviewed changes

Python/sysmodule.c Outdated Show resolved Hide resolved

bedevere-bot added awaiting core review and removed awaiting review labels Jan 24, 2022

notarealdeveloper added 2 commits January 24, 2022 13:44

Avoid adding a new python method.

a14bf74

Add TestStdioAtForkReInit to test_thread.py

be40192

JelleZijlstra self-requested a review February 1, 2022 18:33

JelleZijlstra reviewed Feb 2, 2022

View reviewed changes

JelleZijlstra self-requested a review February 7, 2022 21:05

notarealdeveloper mentioned this pull request Feb 7, 2022

bpo-46679: Don't ignore timeout argument in test.support.wait_process. #31205

Closed

vstinner reviewed Feb 7, 2022

View reviewed changes

JelleZijlstra removed their request for review February 9, 2022 03:47

Correct buffer->lock reinit logic.

2054bfa

This commit: * Properly identifies whether we need to reinit a stream's lock. * Changes 'stdio' to 'stream' in several functions for clarity. * Also reinitialize stdin's buffer->lock at fork.

notarealdeveloper mentioned this pull request Feb 14, 2022

bpo-46711: increase timeout for test_logging::test_post_fork_child_no_deadlock #31274

Closed

notarealdeveloper added 2 commits February 16, 2022 17:50

Merge remote-tracking branch 'upstream/main' into print

a8e4185

stream_at_fork_reinit now uses _Py_ID instead of _Py_Identifier.

56a405f

Update for consistency with pythongh-30928 (bpo-46541).

kumaraditya303 mentioned this pull request Apr 16, 2022

Deadlock when mixing event loops and subprocesses #88056

Closed

kumaraditya303 requested a review from gpshead April 16, 2022 13:01

ericsnowcurrently mentioned this pull request Jul 9, 2022

Replace _Py_IDENTIFIER() with statically initialized objects. #90699

Closed

ezio-melotti removed the CLA signed label Jul 13, 2022

kumaraditya303 added type-bug An unexpected behavior, bug, or error needs backport to 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) needs backport to 3.11 only security fixes labels Sep 4, 2022

kumaraditya303 closed this Sep 4, 2022

kumaraditya303 reopened this Sep 4, 2022

hugovk removed the needs backport to 3.10 only security fixes label Apr 7, 2023

serhiy-storchaka added needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes and removed needs backport to 3.11 only security fixes labels May 9, 2024

Yhg1s removed the needs backport to 3.12 only security fixes label Apr 8, 2025

serhiy-storchaka added the needs backport to 3.14 bugs and security fixes label May 8, 2025

		PyObject *buffer = NULL;

		PyObject *stdio = _PySys_GetObjectId(key);

		@@ -265,5 +267,107 @@ def tearDown(self):
		pass


		class TestStdioAtForkReInit(unittest.TestCase):

	return _PyStatus_ERR("Failed to reinitialize stdout and stderr");
	return _PyStatus_ERR("Failed to reinitialize standard streams");

Uh oh!

bpo-46210: Fix deadlock in print. #30310

Are you sure you want to change the base?

bpo-46210: Fix deadlock in print. #30310

Uh oh!

Conversation

notarealdeveloper commented Dec 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

the-knights-who-say-ni commented Dec 30, 2021

CLA Missing

Uh oh!

sweeneyde left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sweeneyde commented Dec 31, 2021

Uh oh!

notarealdeveloper commented Dec 31, 2021

Uh oh!

notarealdeveloper commented Dec 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

notarealdeveloper commented Jan 5, 2022

Uh oh!

JelleZijlstra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

notarealdeveloper commented Feb 1, 2022 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

notarealdeveloper commented Feb 1, 2022

Uh oh!

JelleZijlstra commented Feb 1, 2022

Uh oh!

notarealdeveloper commented Feb 1, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

notarealdeveloper Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

notarealdeveloper commented Feb 7, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

notarealdeveloper commented Dec 30, 2021 •

edited

Loading

notarealdeveloper commented Dec 31, 2021 •

edited

Loading

notarealdeveloper commented Feb 1, 2022 •

edited by bedevere-bot

Loading

notarealdeveloper Feb 7, 2022 •

edited

Loading

notarealdeveloper Feb 14, 2022 •

edited

Loading

notarealdeveloper commented Feb 17, 2022 •

edited by bedevere-bot

Loading