Skip to content
This repository was archived by the owner on Feb 13, 2025. It is now read-only.

Crash during shutdown with patch #8

Closed
ghost opened this issue Nov 29, 2013 · 12 comments
Closed

Crash during shutdown with patch #8

ghost opened this issue Nov 29, 2013 · 12 comments

Comments

@ghost
Copy link

ghost commented Nov 29, 2013

Originally reported by: RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew)


(originally reported in Trac by @akruis on 2013-01-24 16:13:06)

Setup

I'm working with stackless python version 2.7. compiled from a mercurial sandbox. Current changeset: f34947c81d3e+ (2.7-slp)
OS: Windows 7, Compiler VS 2008 professional, build target is x86 release with optimisation turned off.

Testcase

My test code is huge, confidential and the crash disappears if I make small modifications. The crash happens in about 1 of 5 test runs.

Details

  • The windows error message is always: "Unhandled exception at 0x77df15de (ntdll.dll) in _fg2python.exe: 0xC0000005: Access violation reading location 0x00000018."
  • The crash does not occur with versions before changeset 74135:ac70790fa499
  • The crash occurs with every version that includes 74135:ac70790fa499
  • The code does not involve tasklet switching.
  • The crash does not occur if I make one of the following modifications:
    • disable atexit processing
    • call gc.collect() within atexit
    • call stackless.enable_softswitch(False) during program startup
  • I can't reproduce the crash on Linux 64bit

I'm fairly confident that I can explain and fix the problem. Look at the call stack:

Call Stack (innermost frame first)

ntdll.dll!_ZwRaiseException@12()  + 0x12 bytes	
ntdll.dll!_ZwRaiseException@12()  + 0x12 bytes	

python27.dll!_string_tailmatch()  Line 2900 + 0x1c bytes	C

This line varies between test runs. The arguments on the stack usually don't match to the code location.

python27.dll!PyEval_EvalFrameEx_slp(_frame * f=0x0318d8c0, int throwflag=0, _object * retval=0x1e320030) Line 964 + 0x11 bytes C
frame: co_filename "f:\fg2\eclipsews\fg2py\arch\win32\bin..\libexec\lib\linecache.py", co_name "updatecache" f_lasti=72

python27.dll!PyEval_EvalFrame_value(_frame * f=0x0328a040, int throwflag=0, _object * retval=0x1e320030)  Line 3271 + 0x1a bytes	C
python27.dll!PyEval_EvalFrameEx_slp(_frame * f=0x0328a040, int throwflag=0, _object * retval=0x1e320030)  Line 964 + 0x11 bytes	C
python27.dll!PyEval_EvalFrame_value(_frame * f=0x03289ec8, int throwflag=0, _object * retval=0x1e320030)  Line 3271 + 0x1a bytes	C
python27.dll!PyEval_EvalFrameEx_slp(_frame * f=0x03289ec8, int throwflag=0, _object * retval=0x1e320030)  Line 964 + 0x11 bytes	C
python27.dll!PyEval_EvalFrame_value(_frame * f=0x031c74e8, int throwflag=0, _object * retval=0x1e320030)  Line 3271 + 0x1a bytes	C
python27.dll!PyEval_EvalFrameEx_slp(_frame * f=0x031c74e8, int throwflag=0, _object * retval=0x1e320030)  Line 964 + 0x11 bytes	C
python27.dll!PyEval_EvalFrame_value(_frame * f=0x03289d50, int throwflag=0, _object * retval=0x1e320030)  Line 3271 + 0x1a bytes	C
python27.dll!PyEval_EvalFrameEx_slp(_frame * f=0x03289d50, int throwflag=0, _object * retval=0x1e320030)  Line 964 + 0x11 bytes	C
python27.dll!PyEval_EvalFrame_value(_frame * f=0x03289bd8, int throwflag=0, _object * retval=0x1e320030)  Line 3271 + 0x1a bytes	C
python27.dll!PyEval_EvalFrameEx_slp(_frame * f=0x03289bd8, int throwflag=0, _object * retval=0x1e320030)  Line 964 + 0x11 bytes	C
python27.dll!slp_eval_frame_newstack(_frame * f=0x03289bd8, int exc=0, _object * retval=0x1e320030)  Line 470 + 0x11 bytes	C
python27.dll!PyEval_EvalFrameEx_slp(_frame * f=0x03289bd8, int throwflag=0, _object * retval=0x1e320030)  Line 910 + 0x11 bytes	C
python27.dll!slp_frame_dispatch(_frame * f=0x03289bd8, _frame * stopframe=0x031d7f48, int exc=0, _object * retval=0x1e320030)  Line 737 + 0x16 bytes	C
python27.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x023d23c8, _object * globals=0x02435a50, _object * locals=0x00000000, _object * * args=0x03249adc, int argcount=1, _object * * kws=0x00000000, int kwcount=0, _object * * defs=0x00000000, int defcount=0, _object * closure=0x00000000)  Line 3561 + 0x16 bytes	C
python27.dll!function_call(_object * func=0x024429b0, _object * arg=0x03249ad0, _object * kw=0x00000000)  Line 542 + 0x3a bytes	C
python27.dll!PyObject_Call(_object * func=0x024429b0, _object * arg=0x03249ad0, _object * kw=0x00000000)  Line 2539 + 0x3e bytes	C
python27.dll!PyObject_CallFunctionObjArgs(_object * callable=0x024429b0, ...)  Line 2786 + 0xf bytes	C
python27.dll!handle_weakrefs(_gc_head * unreachable=0x1e34c9f0, _gc_head * old=0x1e2f0d38)  Line 752 + 0xf bytes	C
python27.dll!collect(int generation=0)  Line 1025 + 0xe bytes	C
python27.dll!collect_generations()  Line 1097 + 0x9 bytes	C
python27.dll!_PyObject_GC_Malloc(unsigned int basicsize=44)  Line 1559	C
python27.dll!PyType_GenericAlloc(_typeobject * type=0x00397f70, int nitems=0)  Line 754 + 0x9 bytes	C
python27.dll!PyTasklet_New(_typeobject * type=0x00397f70, _object * func=0x00000000)  Line 218 + 0x13 bytes	C
python27.dll!tasklet_new(_typeobject * type=0x00397f70, _object * args=0x01d58030, _object * kwds=0x00000000)  Line 282 + 0xd bytes	C
python27.dll!initialize_main_and_current()  Line 1028 + 0x1c bytes	C
python27.dll!slp_run_tasklet()  Line 1231 + 0xe bytes	C
python27.dll!slp_eval_frame(_frame * f=0x031d7f48)  Line 313 + 0x5 bytes	C
python27.dll!climb_stack_and_eval_frame(_frame * f=0x031d7f48)  Line 274 + 0x9 bytes	C
python27.dll!slp_eval_frame(_frame * f=0x031d7f48)  Line 303 + 0x9 bytes	C
python27.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x023a4848, _object * globals=0x01fddae0, _object * locals=0x00000000, _object * * args=0x01d5803c, int argcount=0, _object * * kws=0x00000000, int kwcount=0, _object * * defs=0x00000000, int defcount=0, _object * closure=0x00000000)  Line 3564 + 0x9 bytes	C
python27.dll!function_call(_object * func=0x023b20b0, _object * arg=0x01d58030, _object * kw=0x00000000)  Line 542 + 0x3a bytes	C
python27.dll!PyObject_Call(_object * func=0x023b20b0, _object * arg=0x01d58030, _object * kw=0x00000000)  Line 2539 + 0x3e bytes	C
python27.dll!PyEval_CallObjectWithKeywords(_object * func=0x023b20b0, _object * arg=0x01d58030, _object * kw=0x00000000)  Line 4219 + 0x11 bytes	C

func: co_filename "f:\fg2\eclipsews\fg2py\arch\win32\libexec\lib\atexit.py", co_name "_run_exitfuncs"

python27.dll!call_sys_exitfunc()  Line 1778 + 0xd bytes	C
python27.dll!Py_Finalize()  Line 433	C
python27.dll!Py_Main(int argc=3, char * * argv=0x00391b10)  Line 683	C
_fg2python.exe!__tmainCRTStartup()  Line 586 + 0x17 bytes	C
kernel32.dll!76b233aa() 	
[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]	
ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes	
ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes	

IMHO the crash is caused by the interpreter recursion

slp_run_tasklet() -> initialize_main_and_current() -> tasklet_new() -> PyTasklet_New() -> PyType_GenericAlloc() -> _PyObject_GC_Malloc() -> collect_generations() -> collect() -> handle_weakrefs() -> PyObject_CallFunctionObjArgs() -> ...

If I disable the garbage collector in initialize_main_and_current() during the execution of tasklet_new(), the crash does not occur (see attached patch).

Open questions:

  • Why did the bug not occur with builds prior to changeset 74135:ac70790fa499?
  • What is the exact mechanism for the access violation. Where the location 0x00000018 coming from.
  • Why didn't I observe similar issues on linux 64bit.
    I can't answer these questions, because my understanding of the internal workings of ceval.c is limited.

Could anybody please review the patch. Is there a better way to disable the GC? Unfortunately there is no C-API for gc.isenabled(), gc.disable() and gc.enable().


@ghost
Copy link
Author

ghost commented Nov 29, 2013

Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew):


@akruis on 2013-01-28 08:32:19 said:

Hi Kristján, many thanks for the patch.

A single remaining question: was it possible to trigger this bug by creating new threads? If so, this could explain some crashes that I have seen every now and then. Unfortunately I was never able to reproduce or even debug them.

@ghost
Copy link
Author

ghost commented Nov 29, 2013

Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew):


@krisvale on 2013-01-25 06:46:41 said:

This is the kind of problem that is very dependent on circumstance. For it to happen, a gc collection has to be triggered at this particular point and it has to clear out objects with weak references.
There is no particular reason why this didn't happen prior to the changeset in question, other than pure chance, is my guess.

However, I don't actually see any recursion in the above callstack. initialize_main_and_current hasn't done anything at this point. In fact, it has most likely been already recursively entered and left because slp_eva_task is on the stack too.

I think there is nothing wrong with recursion happening at this point. It can happen pretty much anywhere when there are allocations and this point is not special in any way.

I think we need to have a look at the assembly code at the place of the crash to determine its exact nature.

@ghost
Copy link
Author

ghost commented Nov 29, 2013

Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew):


@krisvale on 2013-01-28 09:00:03 said:

Yes, that is a definite possibility.
This problem would occur whenever you are calling into python from "outside", and GC or other activity would occur in this critical space between setting tstate->frame and actually initializing the main tasklet. This is a good point and makes it all the more important that this fix be applied, for example to the stackless that we are using, since I use threads with stackless also.

@ghost
Copy link
Author

ghost commented Nov 29, 2013

Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew):


@akruis on 2013-01-25 09:03:06 said:

I totally agree that the bug depends on the garbage collection at this very special point and that it is indeed very unlikely to happen at this point.

But if I understand the code, then there is an important effect: if the garbage collection happens at this special place, tstate->st.main is still NULL when Python codes executes during garbage collection. I wonder if that is OK?

@ghost
Copy link
Author

ghost commented Nov 29, 2013

Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew):


@akruis on 2013-04-24 16:37:34 said:

Fix is in changeset [2a5f59973ea4].

@ghost
Copy link
Author

ghost commented Nov 29, 2013

Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew):


@akruis on 2013-01-25 14:27:00 said:

I was able to write a simple test case:

  • It fills the generation 0 of the garbage collector
  • Uses a weak reference callback and a sys.exitfunc to run code past the end of the script

Usage: f:\fg2\stackless\stackless\PCbuild\python.exe -B -s -S -u crash.py
where NUMBER is a integer adjustment to the GC threshold for generation 0

Expected output for NUMBER = 0, -1, -2

GC collection runs prior to sys.exitfunc

$ python.exe -B -s -S -u crash.py 0
using adjustment  0
main tasklet:  <stackless.tasklet object at 0x01C678B0>
End of module: Count0 701, threshold0 700

exitfunc: start
  main tasklet:  <stackless.tasklet object at 0x01C676B0>
  GC count (700, 0, 0) , threshold0 700
  reduce the GC adjustment
exitfunc: end

wrCallback: start
  main tasklet:  <stackless.tasklet object at 0x01C676B0>
wrCallback: end

GC collection triggered by tasklet_new in initialize_main_and_current

$ ../../stackless/stackless/PCbuild/python.exe -B -s -S -u crash.py -1
using adjustment  -1
main tasklet:  <stackless.tasklet object at 0x004878B0>
End of module: Count0 701, threshold0 699

wrCallback: start

''Here you get a crash''

GC collection runs after sys.exitfunc

$ python.exe -B -s -S -u crash.py -2
using adjustment  -2
main tasklet:  <stackless.tasklet object at 0x01D878B0>
End of module: Count0 701, threshold0 698

wrCallback: start
  main tasklet:  <stackless.tasklet object at 0x01D876B0>
wrCallback: end

exitfunc: start
  main tasklet:  <stackless.tasklet object at 0x01D876B0>
  GC count (1, 1, 0) , threshold0 698
  increase the GC adjustment
exitfunc: end

@ghost
Copy link
Author

ghost commented Nov 29, 2013

Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew):


@ctismer on 2013-01-25 12:13:11 said:

This is a bit hairy, indeed.

By the way,

"""Make sure that the tasklet's "atomic" flag inhibits thread switching, so that
it can be used when synchronizing tasklets from different threads."""

74135:ac70790fa499

I don't get this, yet. Why do we need such a feature? Synchronizing should
happen during a channel action, nothing else. Doing more that that is not
what I think should be supported.
What is the use case?

@ghost
Copy link
Author

ghost commented Nov 29, 2013

Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew):


@krisvale on 2013-01-26 11:36:46 said:

Christian, the use case for atomic is, for example, when using "atomic" to create short critical sections. For example creating a lock. if atomic does not inhibit thread volountary thread switching, then a multi-threaded program would need a global "GIL" in adition to atomic, so that code like this could be safe:

def acquire_lock(self):
  old = stackless.setatomic(1)
  if self.free:
    self.free = False:
  else:
    self.channel.receive()
  stackless.setatomic(old)

Before this change, you woould have to write something like:

tlock = threading.Lock()
def acquire_lock(self):
  old = stackless.setatomic(1)
  tlock.acquire()
  if self.free:
    self.free = False:
    tlock.release()
  else:
    tlock.release()  #must not hold the lock while blocked
    self.channel.receive()
  stackless.setatomic(old)

Notice how we must also release the lock before blocking, which is inconvenient.
So, in fact, the atomic flag becomes useless, we might as well have a GIL called tlock.

With this change, you can use channel-based locks to synchronize tasklets, no matter on what thread they are running. If a tasklet's atomic flag is set, both stackless' scheduler won't interrupt it, nor will it be interrupted by voulountary thread switching in ceval.

@ghost
Copy link
Author

ghost commented Nov 29, 2013

Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew):


@krisvale on 2013-01-26 11:41:54 said:

Anselm, I think tstate->ts.main is not NULL, because initialize_main_and_current has been called again as a result of the callbacks executing, this time not triggering a GC and has successfully set up st.main, before returning. But the original invokation of initialize_main_and current is still on the stack. Something like this:

initialize_main_and_current
gc.collect
gc.clear_weakrefs
call_callback
initialize_main and current  #--- this invocation actually succeeds

and then

initialize_main_and_current
gc.collect
gc.clear_weakrefs
call_callback
our_python_code_executes

I'll look at the repro cases later tonight.

@ghost
Copy link
Author

ghost commented Nov 29, 2013

Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew):


@ctismer on 2013-01-26 13:36:49 said:

Replying to [comment:5 krisvale]:

Hi Kristján,

Ah I see. I was first afraid that preemption was used, but you are very right, the atomic flag
makes a lot of sense because it acts as our tasklet local GIL.

Christian, the use case for atomic is, for example, when using "atomic" to create short critical sections. For example creating a lock. if atomic does not inhibit thread volountary thread switching, then a multi-threaded program would need a global "GIL" in adition to atomic, so that code like this could be safe:

def acquire_lock(self): old = stackless.setatomic(1) if self.free: self.free = False: else: self.channel.receive() stackless.setatomic(old)

Before this change, you woould have to write something like:

`
tlock = threading.Lock()
def acquire_lock(self):
old = stackless.setatomic(1)
tlock.acquire()
if self.free:
self.free = False:
tlock.release()
else:
tlock.release() #must not hold the lock while blocked
self.channel.receive()
stackless.setatomic(old)

`

Notice how we must also release the lock before blocking, which is inconvenient.
So, in fact, the atomic flag becomes useless, we might as well have a GIL called tlock.

With this change, you can use channel-based locks to synchronize tasklets, no matter on what thread they are running. If a tasklet's atomic flag is set, both stackless' scheduler won't interrupt it, nor will it be interrupted by voulountary thread switching in ceval.

Yes, great. I was thinking the wrong way. This optimises and simplifies, perfect!

Local Interpreter Lock, Tasklet Interpreter Lock -- TIL, LIL, :-) LOL

Btw, I reduced the bored-ness a bit by disabling the captcha for authenticated users if that's ok. But many thanks to Felix Schwarz who wrote the plugin! He is actually my colleague and I didn't know about it...

@ghost
Copy link
Author

ghost commented Nov 29, 2013

Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew):


@krisvale on 2013-01-26 16:40:46 said:

So, I have identified the issue. In fact, init_main_and_current is not entered recursively, as should have been expected, because the original frame is stillin ts->frame, and therefor a quicker path is taken which skips the stackless tests.
This is a general problem and I'm looking at making a proper fix for it.
Thanks, Anslem, for the detailed bug and repro case.

@ghost
Copy link
Author

ghost commented Nov 29, 2013

Original comment by RMTEW FULL NAME (Bitbucket: rmtew, GitHub: rmtew):


@krisvale on 2013-01-26 17:00:23 said:

Here is a suggested patch:

diff --git a/Stackless/core/stackless_impl.h b/Stackless/core/stackless_impl.h
--- a/Stackless/core/stackless_impl.h
+++ b/Stackless/core/stackless_impl.h
@@ -441,7 +441,7 @@
 
 /* tasklet startup */
 
-PyAPI_FUNC(PyObject *) slp_run_tasklet(void);
+PyAPI_FUNC(PyObject *) slp_run_tasklet(PyFrameObject *f);
 
 /* handy abbrevations */
 
diff --git a/Stackless/core/stacklesseval.c b/Stackless/core/stacklesseval.c
--- a/Stackless/core/stacklesseval.c
+++ b/Stackless/core/stacklesseval.c
@@ -302,7 +302,6 @@
         if (stackref > ts->st.cstack_base)
             return climb_stack_and_eval_frame(f);
 
-        ts->frame = f;
         returning = make_initial_stub();
         if (returning < 0)
             return NULL;
@@ -310,7 +309,7 @@
          * if this is the original call that just created the stub.
          * This can be useful for debugging
          */
-        return slp_run_tasklet();
+        return slp_run_tasklet(f);
     }
     Py_INCREF(Py_None);
     return slp_frame_dispatch(f, fprev, 0, Py_None);
diff --git a/Stackless/module/scheduling.c b/Stackless/module/scheduling.c
--- a/Stackless/module/scheduling.c
+++ b/Stackless/module/scheduling.c
@@ -1223,7 +1223,7 @@
 */
 
 PyObject *
-slp_run_tasklet(void)
+slp_run_tasklet(PyFrameObject *f)
 {
     PyThreadState *ts = PyThreadState_GET();
     PyObject *retval;
@@ -1232,6 +1232,7 @@
         ts->frame = NULL;
         return NULL;
     }
+    ts->frame = f;
 
     TASKLET_CLAIMVAL(ts->st.current, &retval);

The change delays putting the frame in tstate->frame until the stackless state has been successfully installed. This will cause such memory allocation callbacks that happen during this initial state to work correctly should there be recursion.

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

0 participants