Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_audit_subinterpreter crashes with tracerefs enabled #125286

Open
nascheme opened this issue Oct 10, 2024 · 2 comments
Open

test_audit_subinterpreter crashes with tracerefs enabled #125286

nascheme opened this issue Oct 10, 2024 · 2 comments
Labels
topic-subinterpreters type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@nascheme
Copy link
Member

nascheme commented Oct 10, 2024

Crash report

What happened?

To run test: Programs/_testembed test_audit_subinterpreter.

The program crashes with an assert failure:

_testembed: ../Objects/object.c:196: _PyRefchain_Remove: Assertion `value == REFCHAIN_VALUE' failed.

Stack trace:

#5  0x00007fbd32045eb2 in __GI___assert_fail (
    assertion=assertion@entry=0x55f8c78ecded "value == REFCHAIN_VALUE", 
    file=file@entry=0x55f8c78ecd93 "../Objects/object.c", line=line@entry=196, 
    function=function@entry=0x55f8c78edba0 <__PRETTY_FUNCTION__.13> "_PyRefchain_Remove")
    at ./assert/assert.c:101
#6  0x000055f8c76c9999 in _PyRefchain_Remove (interp=<optimized out>, obj=obj@entry=0x7fbd3183d710)
    at ../Objects/object.c:196
#7  0x000055f8c76cb6c3 in _Py_ForgetReference (op=op@entry=0x7fbd3183d710) at ../Objects/object.c:2548
#8  0x000055f8c76ca91d in _Py_Dealloc (op=op@entry=0x7fbd3183d710) at ../Objects/object.c:2923
#9  0x000055f8c76b2ad4 in Py_DECREF (filename=filename@entry=0x55f8c78b4f72 "../Include/refcount.h", 
    lineno=lineno@entry=476, op=0x7fbd3183d710) at ../Include/refcount.h:367
#10 0x000055f8c76b2af3 in Py_XDECREF (op=<optimized out>) at ../Include/refcount.h:476
#11 0x000055f8c76b2c0f in dictkeys_decref (interp=interp@entry=0x7fbd31f6e020, 
    dk=dk@entry=0x55f8c9b29030, use_qsbr=use_qsbr@entry=false) at ../Objects/dictobject.c:459
#12 0x000055f8c76bca24 in dict_dealloc (self=0x7fbd31765070) at ../Objects/dictobject.c:3187
#13 0x000055f8c76ca923 in _Py_Dealloc (op=op@entry=0x7fbd31765070) at ../Objects/object.c:2925
#14 0x000055f8c76c67ab in Py_DECREF (
    filename=filename@entry=0x55f8c78ec613 "../Objects/moduleobject.c", lineno=lineno@entry=1118, 
    op=0x7fbd31765070) at ../Include/refcount.h:367
#15 0x000055f8c76c6bdb in module_clear (self=0x7fbd31764110) at ../Objects/moduleobject.c:1118
#16 0x000055f8c77e604a in delete_garbage (tstate=tstate@entry=0x7fbd31fa4d18, 
    gcstate=gcstate@entry=0x7fbd31f6fd08, collectable=collectable@entry=0x7ffd93cac500, 
    old=old@entry=0x7fbd31f6fd50) at ../Python/gc.c:1126
#17 0x000055f8c77e62f7 in gc_collect_region (tstate=tstate@entry=0x7fbd31fa4d18, 
    from=from@entry=0x7fbd31f6fd50, to=to@entry=0x7fbd31f6fd50, untrack=untrack@entry=3, 
    stats=stats@entry=0x7ffd93cac5e0) at ../Python/gc.c:1585
#18 0x000055f8c77e6790 in gc_collect_full (tstate=tstate@entry=0x7fbd31fa4d18, 
    stats=stats@entry=0x7ffd93cac5e0) at ../Python/gc.c:1498
#19 0x000055f8c77e7070 in _PyGC_Collect (tstate=0x7fbd31fa4d18, generation=generation@entry=2, 
    reason=reason@entry=_Py_GC_REASON_SHUTDOWN) at ../Python/gc.c:1862
#20 0x000055f8c77e70ce in _PyGC_CollectNoFail (tstate=tstate@entry=0x7fbd31fa4d18)
    at ../Python/gc.c:1903
#21 0x000055f8c7818408 in finalize_modules (tstate=tstate@entry=0x7fbd31fa4d18)
    at ../Python/pylifecycle.c:1716
#22 0x000055f8c781e9f4 in Py_EndInterpreter (tstate=tstate@entry=0x7fbd31fa4d18)
    at ../Python/pylifecycle.c:2409
#23 0x000055f8c781ec3a in finalize_subinterpreters () at ../Python/pylifecycle.c:2484
#24 0x000055f8c781edc3 in _Py_Finalize (runtime=runtime@entry=0x55f8c7b1e000 <_PyRuntime>)
    at ../Python/pylifecycle.c:2088
#25 0x000055f8c781ef34 in Py_Finalize () at ../Python/pylifecycle.c:2216
#26 0x000055f8c75e90ec in test_audit_subinterpreter () at ../Programs/_testembed.c:1431
#27 0x000055f8c75ece41 in main (argc=<optimized out>, argv=0x7ffd93cac888)
    at ../Programs/_testembed.c:2520

The object is a string that has been interned but not immortalized. Likely it is shared between sub-interpreters, using basic single-phase init and is therefore shared between multiple sub-interpreters.

(rr) p *op
$3 = {{ob_refcnt = 0, ob_refcnt_split = {0, 0}}, ob_type = 0x55f8c7b00c40 <PyUnicode_Type>}
(rr) p op
$4 = 'time_ns'

The value of _PyRefchain_IsTraced() is false since it is ref-traced in a different interpreter.

Quick and dirty fix for the crash:

--- a/Objects/object.c
+++ b/Objects/object.c
@@ -2537,6 +2537,13 @@ _Py_ForgetReference(PyObject *op)
 
     PyInterpreterState *interp = _PyInterpreterState_GET();
 
+    if (!_PyRefchain_IsTraced(interp, op) &&
+        PyUnicode_Check(op) &&
+        PyUnicode_CHECK_INTERNED(op) &&
+        interp != _PyInterpreterState_Main()) {
+        return;
+    }
+
 #ifdef SLOW_UNREF_CHECK
     if (!_PyRefchain_Get(interp, op)) {
         /* Not found */

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

No response

Linked PRs

@nascheme nascheme added the type-crash A hard crash of the interpreter, possibly with a core dump label Oct 10, 2024
@nascheme
Copy link
Member Author

This bug was introduced by gh-124865.

My analysis of the issue is that the tracerefs debugging logic gets confused when objects are shared between sub-interpreters. E.g. _PyRefchain_Trace() and _Py_ForgetReference() kind of implicitly assume that objects belong to a single interpreter (interp->object_state.refchain is what stores the trace and it's per interpreter). Objects can be shared when the reload_singlephase_extension() m_size == -1 case occurs. That makes the objects used by the extension become shared between all sub-interpreters importing that extension.

I'm not sure why interned strings are the only thing causing trace-refs failures. I would guess that other objects should also cause an issue since _Py_ForgetReference() will fail if they are traced in a different interpreter. It could be worth constructing a test case that triggers that problem (if possible).

@nascheme
Copy link
Member Author

Indeed, it does seem possible to crash with shared objects other than strings. You just have to initialize in the right way and then finalize in the right order.

https://github.com/nascheme/cpython/tree/test_single_phase_shared

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-subinterpreters type-crash A hard crash of the interpreter, possibly with a core dump
Projects
Status: Todo
Development

No branches or pull requests

2 participants