-
Notifications
You must be signed in to change notification settings - Fork 61
Crash during shutdown with patch #8
Comments
Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew): @akruis on 2013-01-28 08:32:19 said: Hi Kristján, many thanks for the patch. A single remaining question: was it possible to trigger this bug by creating new threads? If so, this could explain some crashes that I have seen every now and then. Unfortunately I was never able to reproduce or even debug them. |
Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew): @krisvale on 2013-01-25 06:46:41 said: This is the kind of problem that is very dependent on circumstance. For it to happen, a gc collection has to be triggered at this particular point and it has to clear out objects with weak references. However, I don't actually see any recursion in the above callstack. initialize_main_and_current hasn't done anything at this point. In fact, it has most likely been already recursively entered and left because slp_eva_task is on the stack too. I think there is nothing wrong with recursion happening at this point. It can happen pretty much anywhere when there are allocations and this point is not special in any way. I think we need to have a look at the assembly code at the place of the crash to determine its exact nature. |
Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew): @krisvale on 2013-01-28 09:00:03 said: Yes, that is a definite possibility. |
Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew): @akruis on 2013-01-25 09:03:06 said: I totally agree that the bug depends on the garbage collection at this very special point and that it is indeed very unlikely to happen at this point. But if I understand the code, then there is an important effect: if the garbage collection happens at this special place, tstate->st.main is still NULL when Python codes executes during garbage collection. I wonder if that is OK? |
Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew): @akruis on 2013-01-25 14:27:00 said: I was able to write a simple test case:
Usage: f:\fg2\stackless\stackless\PCbuild\python.exe -B -s -S -u crash.py Expected output for NUMBER = 0, -1, -2 GC collection runs prior to sys.exitfunc
GC collection triggered by tasklet_new in initialize_main_and_current
''Here you get a crash'' GC collection runs after sys.exitfunc
|
Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew): @ctismer on 2013-01-25 12:13:11 said: This is a bit hairy, indeed. By the way, """Make sure that the tasklet's "atomic" flag inhibits thread switching, so that 74135:ac70790fa499 I don't get this, yet. Why do we need such a feature? Synchronizing should |
Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew): @krisvale on 2013-01-26 11:36:46 said: Christian, the use case for atomic is, for example, when using "atomic" to create short critical sections. For example creating a lock. if atomic does not inhibit thread volountary thread switching, then a multi-threaded program would need a global "GIL" in adition to atomic, so that code like this could be safe:
Before this change, you woould have to write something like:
Notice how we must also release the lock before blocking, which is inconvenient. With this change, you can use channel-based locks to synchronize tasklets, no matter on what thread they are running. If a tasklet's atomic flag is set, both stackless' scheduler won't interrupt it, nor will it be interrupted by voulountary thread switching in ceval. |
Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew): @krisvale on 2013-01-26 11:41:54 said: Anselm, I think tstate->ts.main is not NULL, because initialize_main_and_current has been called again as a result of the callbacks executing, this time not triggering a GC and has successfully set up st.main, before returning. But the original invokation of initialize_main_and current is still on the stack. Something like this:
and then
I'll look at the repro cases later tonight. |
Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew): @ctismer on 2013-01-26 13:36:49 said: Replying to [comment:5 krisvale]: Hi Kristján, Ah I see. I was first afraid that preemption was used, but you are very right, the atomic flag
Yes, great. I was thinking the wrong way. This optimises and simplifies, perfect! Local Interpreter Lock, Tasklet Interpreter Lock -- TIL, LIL, :-) LOL Btw, I reduced the bored-ness a bit by disabling the captcha for authenticated users if that's ok. But many thanks to Felix Schwarz who wrote the plugin! He is actually my colleague and I didn't know about it... |
Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew): @krisvale on 2013-01-26 16:40:46 said: So, I have identified the issue. In fact, init_main_and_current is not entered recursively, as should have been expected, because the original frame is stillin ts->frame, and therefor a quicker path is taken which skips the stackless tests. |
Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew): @krisvale on 2013-01-26 17:00:23 said: Here is a suggested patch:
The change delays putting the frame in tstate->frame until the stackless state has been successfully installed. This will cause such memory allocation callbacks that happen during this initial state to work correctly should there be recursion. |
Originally reported by: RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew)
(originally reported in Trac by @akruis on 2013-01-24 16:13:06)
Setup
I'm working with stackless python version 2.7. compiled from a mercurial sandbox. Current changeset: f34947c81d3e+ (2.7-slp)
OS: Windows 7, Compiler VS 2008 professional, build target is x86 release with optimisation turned off.
Testcase
My test code is huge, confidential and the crash disappears if I make small modifications. The crash happens in about 1 of 5 test runs.
Details
I'm fairly confident that I can explain and fix the problem. Look at the call stack:
Call Stack (innermost frame first)
This line varies between test runs. The arguments on the stack usually don't match to the code location.
func: co_filename "f:\fg2\eclipsews\fg2py\arch\win32\libexec\lib\atexit.py", co_name "_run_exitfuncs"
IMHO the crash is caused by the interpreter recursion
slp_run_tasklet() -> initialize_main_and_current() -> tasklet_new() -> PyTasklet_New() -> PyType_GenericAlloc() -> _PyObject_GC_Malloc() -> collect_generations() -> collect() -> handle_weakrefs() -> PyObject_CallFunctionObjArgs() -> ...
If I disable the garbage collector in initialize_main_and_current() during the execution of tasklet_new(), the crash does not occur (see attached patch).
Open questions:
I can't answer these questions, because my understanding of the internal workings of ceval.c is limited.
Could anybody please review the patch. Is there a better way to disable the GC? Unfortunately there is no C-API for gc.isenabled(), gc.disable() and gc.enable().
The text was updated successfully, but these errors were encountered: