-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FreeIPA + Python 3.9.8 causes deadlock #730
Comments
For a start, don't use Python sub interpreters. I see the call to That said, what is the mod_wsgi configuration for Apache being used. Ideally you would be using mod_wsgi daemon mode, where that WSGI application is the only application running in that daemon process group, and use of the main interpreter context in that daemon process group is forced by setting the application group to Also, does the WSGI application create its own background threads or use the Note that in future because of changes in how Python supports the use of sub interpreters, there is a chance that use of sub interpreters by mod_wsgi may have to be dropped, thus ensuring you are using a separate daemon process group and forcing the use of the main interpreter context is preferred as that is all that may be available in the future. Anyway, will look in more detail when get a chance. |
To make one thing clear which may be relevant. All request handler threads in mod_wsgi are external threads, they are not Python threads, thus the Python interpreter would never be in a position of having to stop them. That the code is therefore ending up in My assumption therefore is that the problem is that the WSGI application is creating a background thread (possibly even forcibly marking it as non daemon, although changes to sub interpreters may mean all Python threads in sub interpreters are now created as non daemon regardless) but not doing anything to try and stop it when the process is being signalled to shutdown, or it tries to use If you are able to somehow instrument things I would be dumping out a list of Python native threads still running, with stack traces if possible, so can work out where they originate. Dump out whether they are tagged as daemon or non daemon threads. To stop the threads you can try and use Rather than try and rely on
Notes about There is a chance that this has worked previously because all background threads would have been created as daemon threads and the interpreter shutdown wouldn't wait on them. I recollect some discussion about sub interpreters which meant that any background threads in sub interpreters would be forced to be non daemon threads even if explicitly set to be daemon. If this proposed change as I recollect it was made in Python 3.9 that would explain the problem as the sub interpreter would now wait on all background threads created in Python and block sub interpreter deletion. Thus it is now imperative that a WSGI application if it creates background threads, to subscribe to process shutdown and do whatever it can to ensure the background thread stops. Part of the reason for expecting that sub interpreters in Python would become useless to mod_wsgi in the future was bound up in this change to make background threads in sub interpreters always non daemon and force sub interpreter to wait on them when deleted. I didn't realise this change ended up getting made if it did. One expected impact of the change was process hanging on process shutdown when sub interpreters were used and background threads were created as WSGI applications don't usually stop daemon threads, but now they would have to be changed to do so. |
Thanks a lot for the quick response. I have to analyze it because I haven't worked with sub interpreters yet. In the meantime, here is the source code of FreeIPA and its wsgi configuration: https://pagure.io/freeipa/blob/master/f/install/share/ipa.conf.template |
That configuration looks buggy to me. You have set:
under:
This means that for that one URL path you are running a separate instance of IPA. That is, in the one daemon process group process you have two IPA instances running, one in the main interpreter context for the bulk of requests, and one just for handling requests sent to "/ipa/session/login_x509". Not knowing if that was intended I would say you should not have:
under that
What doesn't make sense is that since daemon is otherwise being used for both instances, then the daemon process group processes shouldn't hang around for anything more than 5 seconds since the way Apache manages them (not under mod_wsgi control), it should forcibly kill off the processes if they don't exit themselves within 5 seconds. Where the embedded mode instances are coming from may be because you are using:
In the Apache logs there is:
Thus So summarising issues as I see it.
My recommendations therefore are:
|
The relevant Python issue about changes to daemon threads in sub interpreters is: The outcome of that suggests that default behaviour would be the same and daemon threads are allowed. If that is the case, then could only end up with blocking being seen if the Python thread was explicitly created as a non daemon thread, unless something is trying to shutdown daemon threads anyway, which is quite possible if |
Py_EndInterpreter() calls atexit callbacks since Python 3.8. The atexit state is now per-interpreter since Python 3.10. |
In the logs, I see 3 Python threads:
It's strange to have two Python threads with no frame. What are these threads? Are they still running?
|
atexit callbacks are called after threading._shutdown() in Py_Finalize() and Py_EndInterpreter(). |
Oh, this is a fun bug. I'll apply Graham's suggestions and see if they help to address the issue. |
Great, thanks. If you send me a PR or something I'll happily test it for you. |
I had a very interesting debug session with @frenzymadness who is able to reproduce the FreeIPA hang at exit in Apache mod_wsgi. I used gdb on Python 3.9. Random notes:
|
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 94d19f6. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
I added debug traces in Python itself in functions like Simplified code:
In a Python sub-interpreter, it's really important that all tstate (Python thread state) are deleted by Py_EndInterpreter(), otherwise the function fails with a fatal error:
Apache httpd mod_wsgi (simplified) logs of the sub-interpreter:
|
If I revert the threading._shutdown() change (python/cpython@38c6773), threading._shutdown() no longer hangs, and I see that tstate 3 and tstate 5 are deleted as expected by Py_EndInterpreter():
|
Logs with mod_wsgi debug traces:
|
The This change was made in 2019. Recently (Python 3.9.8), Maybe FreeIPA still relies on the old Python threading behavior: consider that a thread completes when it completed at the Python level. When using subinterpreters, it's really important that a thread completed at the C level: the new Python threading behavior. Otherwise, Py_EndInterpreter() fails with a Python fatal error, as I explained previously. It's important to delete most threads before It seems like mod_wsgi now has a mechanism to delete all Python tstate by registering itself in atexit: the C function ShutdownInterpreter_call() deletes all tstate with:
Maybe it's too late: atexit callbacks are only called after |
For sub interpreters, mod_wsgi triggers atexit() callbacks itself earlier before interpreter deletion. But then, how atexit callbacks work has changed many times over the years between Python versions, so how I do that may not be working anymore and a new hack is required to workaround Python C APIs not providing a way for embedded systems to do it when they need to. |
So looks like the hack was to replace https://github.com/GrahamDumpleton/mod_wsgi/blob/develop/src/server/wsgi_interp.c#L519 But the order was that I would only call atexit() after calling original function I was wrapping. That seems to now be the source of the problems if Probably take me a few hours to get my head around this new mess. :-( |
I think the only solution to all of this is for mod_wsgi to not bother trying to delete sub interpreters or shutdown the main Python interpreter. Instead mod_wsgi would just call C exit() and bypass all that. For cleanup actions on process shutdown people would have to use the mod_wsgi mechanism for getting notification of process shutdown instead of atexit(). FWIW, from memory uWSGI never tried to stop the Python interpreter properly and always just called C exit(), thus you were always forced in uWSGI to use its own process shutdown notification, where as mod_wsgi tried to rely on atexit(), even though that meant in older Python versions that for sub interpreters mod_wsgi had to trigger atexit() callbacks itself since sub interpreters didn't do that. As things progress it just appears that it is going to be too hard to combine external C threads with sub interpreters and sub interpreter deletion because of how the use case for sub interpreters in CPython is changing. |
Latest develop version of mod_wsgi in GitHub no longer attempts to destroy Python sub interpreters or main Python interpreter on shutdown and will instead just let the process exit without attempting any cleanup.
Details in: I can't see any way of being able to safely delete Python thread state objects before destroying the Python sub interpreters if that is what is now required as can't absolutely guarantee the threads using those thread state objects aren't still running. This is the case whether they be externally create C threads used by mod_wsgi to handle requests, or background daemon threads created by a WSGI application. Yes this meant that technically if such a thread became running when the interpreters were being destroyed that the process could crash, but this was deemed acceptable compared to having the process hang as could now occur. |
Hi @GrahamDumpleton, thanks for looking into the matter! Is it safe to omit the interpreter shutdown? As far as I understand the system, Python also runs finalizers (try/finally, By the way, there is an unofficial API to run all atexit hooks manually:
|
I don't know whether has always existed as an issue with That said, I have been thinking whether in new scheme whereby skip destroying interpreter whether should still call this. Even when destroying sub interpreters I have been calling it since prior to 3.7 it wasn't called for sub interpreters by Python core. I think from 3.7 at least it would deregister the callbacks once called, so it being called more than once since 3.7, since I didn't stop calling it, hasn't caused any issues that know of. I am breaking the notional contract though that supposedly Python applies to it, which is only called after non daemon threads are stopped. As to flushing buffers, C exit() will at least flush output buffers for C libraries, so would only be an issue where Python is buffering itself. For any other Python level constructs, nothing I can do. Anyway, can't see any other viable options since I am not convinced I can safely cleanup the thread states for request handler threads before destroying the sub interpreter. Plus there could be daemon threads still running which I can't cleanup anyway and that will still result in things hanging if destroying the sub interpreter is going to hang if any thread states still exist as it seems is occurring. Maybe that restriction should only have been getting applied if the sub interpreter was created with that option which prohibits daemon threads. |
Some more issues to complicate this. From mod_wsgi code:
I think the version in the comment here is wrong, but then the conditional is also confusing as it suggests prior to Python 3.4, so shouldn't have been 3.5.1 unless the conditional was wrong and was supposed to be <= 4. Either way, this seems to indicate that Python 2.7 relies on atexit to wait on non daemon threads. So for Python 2.7 invoking atexit functions would cause waiting on the non daemon threads where it doesn't for supported Python 3.X versions. And then:
This confirms an earlier comment that after atexit registered callbacks were called they weren't deregistered for some Python 3.X versions, means that one could trigger them twice if called atexit prior to destroying the interpreter. So would need to know which versions that occurs for and see whether is only now for unsupported Python 3.X versions. Have also found other things in mod_wsgi code which is confusing and can't remember the exact history of. Namely there are actually two places where thread states for sub interpreters are deleted. One is immediately before destroying the sub interpreter.
but this is only done for Python 2.7, something changed during Python 3.X versions. The comment does warn though about concerns about process crashing as a result of doing this. The other place where delete thread states is after had called threading._shutdown() and atexit.run_exitfuncs() when started doing that explicitly for Python sub interpreters. Anyway, you can hopefully appreciate how messy things are with atexit and thread shutdown over time with different Python versions and why this have never given any joy in trying to handle it all. I would have to go through and look really close at what all supported Python versions do and see if is a sane way to handle it all, perhaps deleting some code which goes back to very early Python 2.X versions so can better see what is going on. Also, way back in time there was an attempt to actually delete Python sub interpreters and recreate them as a way of reloading an application without stopping the process. Sub interpreters proved to leak too many resources and many C extensions would break if this was done so disabled that feature, but the code was never refactored to eliminate the mess around modelling sub interpreters as referenced counted resources to managed deletion. Since don't need that, things could be ever further simplified, but is still a lot of rewriting. |
FWIW, these are all the Python versions mod_wsgi possibly still works with:
I don't know that want to drop Python 2.7 just yet, but could drop Python 2.6 and Python 3.3 through 3.5, presuming some of those old versions do actually still work. Not tested them, but code in mod_wsgi hasn't really changed much last few years where it would matter so they could actually still work. |
Reviewing code. Python 2.7 atexit calling is pure Python. It pops handlers off the list when calling so they are thus removed. Python 3.6 atexit calling is C code. In head of 3.6, it does clear atexit callbacks list once called. Going back further even Python 3.0 did it, so comment above about list not being cleared is possibly related to early dev or release candidate versions of Python 3.0. Maybe I raised the issue at the time and implementation was changed. As to atexit being use to wait for threads to shutdown, that looks to have changed at some point in the lifetime of Python 2.7. https://github.com/python/cpython/blob/v2.7.18/Python/pythonrun.c#L419-L430 This code says:
The comment is actually wrong and threading.py does not use atexit in head of Python 2.7, instead wait_for_thread_shutdown() does that just beforehand. In fact the use of wait_for_thread_shutdown() was added in Python 2.6.5. https://github.com/python/cpython/blob/v2.6.5/Python/pythonrun.c#L392 I can't remember the overlap between Python 2.6 and 3.X versions but possibly when this change was done in 3.X it was back ported to 2.6 for consistency. Means that is safe to assume that with versions of Python that will be used, that atexit isn't going to wait on threads to shutdown even if support Python 2.7. That makes things easier. The problem is still that threading._shutdown() expects thread states to be deleted which still don't really understand how is enforced. It is strange that with this behaviour that the function is even public within threading module and not some hidden C function. Anyway, still can't attempt to call it. Best could possibly do would be as follows.
|
In contrast, summarising what was being done since have better understanding now, ignoring funny variations for old Python versions, we have:
Which as we are seeing potentially results in the following.
That sys._run_exitfuncs() was only called for sub interpreters meant there was an inconsistency with behaviour compared to the Python main interpreter. For historical reasons (atexit callbacks were not originally deregistered when called in very early Python 3.0 alpha/beta versions) it wasn't done for that case, but was never revisited when Python 3.0 fixed issue. What I may do at this point is try and throw out all the old crufty code which has accrued due to supporting so many variations of things in very old Python versions, thus simplifying the mod_wsgi code. I'll then try and verify again the expectation that problem is caused by daemon threads which aren't stopped. I'll have to look more closely at what is happening with the thread state for the main thread as well, since one would be created for that (in addition to request handler threads), when WSGI script preloading occurs. |
The Python finalization is a complex task. I took some notes about recent changes: https://pythondev.readthedocs.io/finalization.html You're free to call os._exit() to skip any kind of cleanup: simple and fast. The risk is if an application relies on Python finalizer methods ( About threads which can hang at exit, there are different cases:
For case (A), you can use a Python debugger, faulthandler, gdb, whatever you want to see what's going on. mod_wsgi seems to be affected by case (B) where Python thread states are still attached to Apache threads at Python exit, even if these threads are idle (don't run any Python code). See my debug notes in this comment: #730 (comment) To debug that, you can iterate on For case (C), daemon threads, they are ignored by (*) About "safe": daemon threads still running at Python exit call The Python finalization problem is not specific to sub-interpreters. If a Python thread never stops, case (A), a "regular Python" also hangs at exit. It's the expected behavior. By the way, there is no way to kill a thread on a portable way. There is no silver bullet. What can help is to have better documentation and explain how to debug these issues. |
To make clear one point. Am not calling For mod_wsgi daemon mode I would end up calling C |
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 94d19f6. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 94d19f6. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 94d19f6. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
…read (pythonGH-28549) (pythonGH-28589)" This reverts commit 38c6773. It introduced regression causing FreeIPA's tests to fail. For more info see: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 GrahamDumpleton/mod_wsgi#730
Hello there.
This issue probably does not belong here and I'm sorry for that but it's interesting and I'm running out of my debugging skills and you might be able to help me with that - and - I think that you experts love challenges.
I was describing my progress in https://bodhi.fedoraproject.org/updates/FEDORA-2021-e152ce5f31 but I'll try to summarize it here.
Short description: Combination of FreeIPA, httpd, mod_wsgi, and Python causes a deadlock in the threading module when systemd tries to stop httpd service - the process of stopping the httpd processes takes more than 90 seconds and all the processes are then killed.
The problem has been found by OpenQA tests in Fedora 34 when we tried to update Python from 3.9.7 to 3.9.8. I did a bisection of commits between these releases of Python and found out that the problem is caused/uncovered by this change: python/cpython@94d19f6 I've discussed it with @vstinner and we think that the code in Python is correct - still, without this commit, the problem is not there but the code itself does look correct. There has to be something else.
Components:
How to reproduce it:
Result is:
The log is available here: https://lbalhar.fedorapeople.org/error_log and it contains all the messages produced by httpd from start to end as described in the reproducer + some debug prints containing the file name where they come from. Note that there is no jump in the timestamps because httpd does not provide any more messages before it's killed. But the jump is in the systemd logs:
From the error_log, you can see that there are multiple similar processes (5659, 5660, …) but only one of them (5660) served the requests from the web UI and the same process caused the deadlock. Only this one process reached the new branch (line 1457 https://github.com/python/cpython/blob/94d19f606fa18a1c4d2faca1caf2f470a8ce6d46/Lib/threading.py#L1457) and never reached the end of the _shutdown function.
I've also added
faulthandler.dump_traceback_later(20, file=tempfile.mkstemp(dir="/var/log/httpd/")[0], repeat=True)
to the_shutdown
function and it produces only one file with some content and the content is:So the one process deadlocks on
lock.acquire()
, line: https://github.com/python/cpython/blob/94d19f606fa18a1c4d2faca1caf2f470a8ce6d46/Lib/threading.py#L1470From what I can see in the code of the threading module and the
_shutdown
function it seems that the process serving the requests from web UI is the only one where the_shutdown
function is not called by the main thread. We have tried a blind fix - import threading as soon as possible (in site.py) and something similar is implemented here but it did not help.I've tried to overload
_thread.start_new_thread
inthreading.py
but it produces no results so I think that threads are created in a different way here.I've tried to use gdb to see what is happening in the locked process and there were two threads but one of them was caused by the faulthandler - when removed, there is a single thread with this stack:
Any help is much appreciated. Cc @tiran @encukou
The text was updated successfully, but these errors were encountered: