Skip to content

Data race in compile_template in sre.c #129983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
colesbury opened this issue Feb 10, 2025 · 9 comments
Closed

Data race in compile_template in sre.c #129983

colesbury opened this issue Feb 10, 2025 · 9 comments
Labels
extension-modules C modules in the Modules dir topic-free-threading topic-regex type-bug An unexpected behavior, bug, or error

Comments

@colesbury
Copy link
Contributor

colesbury commented Feb 10, 2025

Bug report

Concurrent accesses to module_state->compile_template in the free threaded build

cpython/Modules/_sre/sre.c

Lines 1169 to 1177 in 1feaecc

/* delegate to Python code */
PyObject *func = module_state->compile_template;
if (func == NULL) {
func = PyImport_ImportModuleAttrString("re", "_compile_template");
if (func == NULL) {
return NULL;
}
Py_XSETREF(module_state->compile_template, func);
}

Linked PRs

@colesbury colesbury added topic-free-threading type-bug An unexpected behavior, bug, or error labels Feb 10, 2025
@picnixz picnixz added extension-modules C modules in the Modules dir topic-regex labels Feb 10, 2025
@tom-pytel
Copy link
Contributor

This fixes the issue. Let me know if you want me to send up. Or if you prefer it can be done with mutex (but uglier).

$ git diff main
diff --git a/Modules/_sre/sre.c b/Modules/_sre/sre.c
index 0d8d4843d33..64ef6b202e5 100644
--- a/Modules/_sre/sre.c
+++ b/Modules/_sre/sre.c
@@ -1167,13 +1167,21 @@ compile_template(_sremodulestate *module_state,
                  PatternObject *pattern, PyObject *template)
 {
     /* delegate to Python code */
-    PyObject *func = module_state->compile_template;
+    PyObject *func = FT_ATOMIC_LOAD_PTR_RELAXED(module_state->compile_template);
     if (func == NULL) {
         func = PyImport_ImportModuleAttrString("re", "_compile_template");
         if (func == NULL) {
             return NULL;
         }
+#ifdef Py_GIL_DISABLED
+        PyObject *other_func = NULL;
+        if (!_Py_atomic_compare_exchange_ptr(&module_state->compile_template, &other_func, func))  {
+            Py_DECREF(func);
+            func = other_func;
+        }
+#else
         Py_XSETREF(module_state->compile_template, func);
+#endif
     }

     PyObject *args[] = {(PyObject *)pattern, template};

Reproducer:

import re
import threading


def test(b):
    b.wait()
    re.sub(r"(\d+)", r"\1kg", "Weight: 50")


def check(funcs, *args):
    barrier = threading.Barrier(len(funcs))
    thrds = []

    for func in funcs:
        thrd = threading.Thread(target=func, args=(barrier, *args))

        thrds.append(thrd)
        thrd.start()

    for thrd in thrds:
        thrd.join()


if __name__ == "__main__":
    while True:
        check([test] * 40)

Also, while testing this I ran into this error exactly once and no more, though I don't think it has anything to do with this particular issue or fix:

ThreadSanitizer:DEADLYSIGNAL
==470981==ERROR: ThreadSanitizer: SEGV on unknown address 0x7f5cda2d0320 (pc 0x7f5d1447ed0f bp 0x00000000000d sp 0x7f5c221bf980 T471055)
==470981==The signal is caused by a WRITE memory access.
    #0 unsigned long long func_cas<unsigned long long>(unsigned long long volatile*, unsigned long long, unsigned long long) ../../../../src/libsanitizer/tsan/tsan_interface_atomic.cpp:97 (libtsan.so.0+0x7ed0f)
    #1 AtomicCAS<long long unsigned int> ../../../../src/libsanitizer/tsan/tsan_interface_atomic.cpp:422 (libtsan.so.0+0x7ed0f)
    #2 __tsan_atomic64_compare_exchange_strong ../../../../src/libsanitizer/tsan/tsan_interface_atomic.cpp:786 (libtsan.so.0+0x86ac2)
    #3 _Py_atomic_compare_exchange_ssize Include/cpython/pyatomic_gcc.h:130 (python+0x2235f2)
    #4 _Py_TryIncRefShared Include/internal/pycore_object.h:582 (python+0x2235f2)
    #5 _Py_TryIncrefCompare Include/internal/pycore_object.h:602 (python+0x2235f2)
    #6 _Py_TryXGetRef Include/internal/pycore_object.h:652 (python+0x22d0df)
    #7 _Py_dict_lookup_threadsafe Objects/dictobject.c:1543 (python+0x22d0df)
    #8 dict_subscript Objects/dictobject.c:3392 (python+0x22d65b)
    #9 PyObject_GetItem Objects/abstract.c:158 (python+0x14ca14)
    #10 _PyEval_EvalFrameDefault Python/generated_cases.c.h:62 (python+0x41b7c2)
    #11 _PyEval_EvalFrame Include/internal/pycore_ceval.h:116 (python+0x46f61d)
    #12 _PyEval_Vector Python/ceval.c:1725 (python+0x46f61d)
    #13 _PyFunction_Vectorcall Objects/call.c:413 (python+0x1838dc)
    #14 _PyObject_VectorcallTstate Include/internal/pycore_call.h:167 (python+0x18b5e3)
    #15 method_vectorcall Objects/classobject.c:72 (python+0x18b5e3)
    #16 _PyVectorcall_Call Objects/call.c:273 (python+0x187428)
    #17 _PyObject_Call Objects/call.c:348 (python+0x187956)
    #18 PyObject_Call Objects/call.c:373 (python+0x1879db)
    #19 thread_run Modules/_threadmodule.c:354 (python+0x6669ab)
    #20 pythread_wrapper Python/thread_pthread.h:242 (python+0x57bf0d)
    #21 __tsan_thread_start_func ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:959 (libtsan.so.0+0x2e5ff)
    #22 start_thread nptl/pthread_create.c:442 (libc.so.6+0x94ac2)
    #23 <null> <null> (libc.so.6+0x12684f)

ThreadSanitizer can not provide additional info.
SUMMARY: ThreadSanitizer: SEGV ../../../../src/libsanitizer/tsan/tsan_interface_atomic.cpp:97 in unsigned long long func_cas<unsigned long long>(unsigned long long volatile*, unsigned long long, unsigned long long)
==470981==ABORTING

@colesbury
Copy link
Contributor Author

Thanks - pease send a PR!

@kumaraditya303
Copy link
Contributor

I think the much simpler solution would be to initialize compile_template in module exec in this case sre_exec, then there will not be any thread safety issues as module exec is exclusively executed first before any code.

@tom-pytel
Copy link
Contributor

I think the much simpler solution would be to initialize compile_template in module exec in this case sre_exec

The code in compile_template() is a lazy import because re imports _sre, which is why you can't initialize compile_template in sre_exec. You can however import re in sre_exec and access it in compile_template(), do you want this one?

$ git diff main
diff --git a/Modules/_sre/sre.c b/Modules/_sre/sre.c
index 0d8d4843d33..c2b1be957f5 100644
--- a/Modules/_sre/sre.c
+++ b/Modules/_sre/sre.c
@@ -376,7 +376,7 @@ typedef struct {
     PyTypeObject *Match_Type;
     PyTypeObject *Scanner_Type;
     PyTypeObject *Template_Type;
-    PyObject *compile_template;  // reference to re._compile_template
+    PyObject *re_module;
 } _sremodulestate;
 
 static _sremodulestate *
@@ -1167,17 +1167,13 @@ compile_template(_sremodulestate *module_state,
                  PatternObject *pattern, PyObject *template)
 {
     /* delegate to Python code */
-    PyObject *func = module_state->compile_template;
+    PyObject *func = PyObject_GetAttrString(module_state->re_module, "_compile_template");
     if (func == NULL) {
-        func = PyImport_ImportModuleAttrString("re", "_compile_template");
-        if (func == NULL) {
-            return NULL;
-        }
-        Py_XSETREF(module_state->compile_template, func);
+        return NULL;
     }
-
     PyObject *args[] = {(PyObject *)pattern, template};
     PyObject *result = PyObject_Vectorcall(func, args, 2, NULL);
+    Py_DECREF(func);
 
     if (result == NULL && PyErr_ExceptionMatches(PyExc_TypeError)) {
         /* If the replacement string is unhashable (e.g. bytearray),
@@ -3342,7 +3338,7 @@ sre_traverse(PyObject *module, visitproc visit, void *arg)
     Py_VISIT(state->Match_Type);
     Py_VISIT(state->Scanner_Type);
     Py_VISIT(state->Template_Type);
-    Py_VISIT(state->compile_template);
+    Py_VISIT(state->re_module);
 
     return 0;
 }
@@ -3356,7 +3352,7 @@ sre_clear(PyObject *module)
     Py_CLEAR(state->Match_Type);
     Py_CLEAR(state->Scanner_Type);
     Py_CLEAR(state->Template_Type);
-    Py_CLEAR(state->compile_template);
+    Py_CLEAR(state->re_module);
 
     return 0;
 }
@@ -3409,6 +3405,12 @@ sre_exec(PyObject *m)
         goto error;
     }
 
+    PyObject *re = PyImport_ImportModule("re");
+    if (re == NULL) {
+        goto error;

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Feb 11, 2025

The code in compile_template() is a lazy import because re imports _sre, which is why you can't initialize compile_template in sre_exec. You can however import re in sre_exec and access it in compile_template(), do you want this one?

Ah, we had similar issue in asyncio as well with circular import, it was solved by delaying the import of C impl later, probably we can do something similar here like

// Must be done after types are added to avoid a circular dependency

@colesbury
Copy link
Contributor Author

Although I generally prefer initializing module state in the exec function, I'm hesitant to delay or reorder the imports here and would rather just use the atomics.

@kumaraditya303
Copy link
Contributor

The following seems to fix it:

From e1ef31d9417fe9247603fdc66f2acc5cecfbb074 Mon Sep 17 00:00:00 2001
From: Kumar Aditya <kumaraditya@python.org>
Date: Tue, 11 Feb 2025 15:55:55 +0000
Subject: [PATCH] fix re

---
 Lib/re/__init__.py | 16 +++++++++-------
 Modules/_sre/sre.c | 14 +++++++-------
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/Lib/re/__init__.py b/Lib/re/__init__.py
index 7e8abbf6ffe..9773c643065 100644
--- a/Lib/re/__init__.py
+++ b/Lib/re/__init__.py
@@ -123,8 +123,16 @@
 """
 
 import enum
-from . import _compiler, _parser
 import functools
+
+_MAXCACHE = 512
+@functools.lru_cache(_MAXCACHE)
+def _compile_template(pattern, repl):
+    # internal: compile replacement pattern
+    return _sre.template(pattern, _parser.parse_template(repl, pattern))
+
+
+from . import _compiler, _parser
 import _sre
 
 
@@ -323,7 +331,6 @@ def escape(pattern):
 # _cache uses the LRU policy which has better hit rate.
 _cache = {}  # LRU
 _cache2 = {}  # FIFO
-_MAXCACHE = 512
 _MAXCACHE2 = 256
 assert _MAXCACHE2 < _MAXCACHE
 
@@ -371,11 +378,6 @@ def _compile(pattern, flags):
     _cache2[key] = p
     return p
 
-@functools.lru_cache(_MAXCACHE)
-def _compile_template(pattern, repl):
-    # internal: compile replacement pattern
-    return _sre.template(pattern, _parser.parse_template(repl, pattern))
-
 # register myself for pickling
 
 import copyreg
diff --git a/Modules/_sre/sre.c b/Modules/_sre/sre.c
index 0d8d4843d33..a2b5f82d8b9 100644
--- a/Modules/_sre/sre.c
+++ b/Modules/_sre/sre.c
@@ -1168,13 +1168,7 @@ compile_template(_sremodulestate *module_state,
 {
     /* delegate to Python code */
     PyObject *func = module_state->compile_template;
-    if (func == NULL) {
-        func = PyImport_ImportModuleAttrString("re", "_compile_template");
-        if (func == NULL) {
-            return NULL;
-        }
-        Py_XSETREF(module_state->compile_template, func);
-    }
+    assert(func != NULL);
 
     PyObject *args[] = {(PyObject *)pattern, template};
     PyObject *result = PyObject_Vectorcall(func, args, 2, NULL);
@@ -3409,6 +3403,12 @@ sre_exec(PyObject *m)
         goto error;
     }
 
+    state->compile_template = PyImport_ImportModuleAttrString("re", "_compile_template");
+
+    if (state->compile_template == NULL) {
+        goto error;
+    }
+
     return 0;
 

It seems that we need to just import _sre after defining _compile_template to avoid circular import.

@tom-pytel
Copy link
Contributor

It seems that we need to just import _sre after defining _compile_template to avoid circular import.

A bit fragile?

@colesbury
Copy link
Contributor Author

I'd be a lot more confident backporting the atomic change to 3.13 than reordering functions and imports in re.

tom-pytel added a commit to tom-pytel/cpython that referenced this issue Feb 11, 2025
kumaraditya303 pushed a commit to kumaraditya303/cpython that referenced this issue Feb 12, 2025
kumaraditya303 added a commit that referenced this issue Feb 12, 2025
gh-129983: fix data race in compile_template in sre.c (#130015)

(cherry picked from commit 3cf68cd)

Co-authored-by: Tomasz Pytel <tompytel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir topic-free-threading topic-regex type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants