Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random crashes after upgrade to 3.9.12 #452

Closed
wilson3q opened this issue Jan 23, 2024 · 16 comments
Closed

random crashes after upgrade to 3.9.12 #452

wilson3q opened this issue Jan 23, 2024 · 16 comments

Comments

@wilson3q
Copy link

This is from system dmesg output:

[Fri Jan 19 10:41:06 2024] python3[3421008]: segfault at 7fe28bd24000 ip 00007fe296824bde sp 00007ffdd5db46f8 error 4 in orjson.cpython-312-x86_64-linux-gnu.so[7fe2967fe000+2f000] [Fri Jan 19 10:41:06 2024] Code: 66 66 66 2e 0f 1f 84 00 00 00 00 00 4c 01 c0 4c 01 c6 49 f7 d0 4c 01 c2 4c 89 10 4c 01 c8 48 ff c6 48 85 d2 0f 84 dd 02 00 00 <c5> fe 6f 1e c5 fe 7f 18 c5 e5 74 e0 c5 e5 74 e9 c5 d5 eb e4 c5 e5

Not sure if other people encounter similar issues.

@edouardpoitras
Copy link

I'm also running into this issue with 3.9.12. Fix was to revert to 3.9.10.

Randomly seg faults at different times in my test suite. Seems to be related to the NUMPY opt.

tests/... Fatal Python error: Segmentation fault

Current thread 0x00007ebb93c28b80 (most recent call first):
  ...
  File "/opt/venv/lib/python3.12/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/opt/venv/lib/python3.12/site-packages/_pytest/python.py", line 1792 in runtest
  File "/opt/venv/lib/python3.12/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/opt/venv/lib/python3.12/site-packages/_pytest/runner.py", line 262 in <lambda>
  File "/opt/venv/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
  File "/opt/venv/lib/python3.12/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/opt/venv/lib/python3.12/site-packages/_pytest/runner.py", line 222 in call_and_report
  File "/opt/venv/lib/python3.12/site-packages/_pytest/runner.py", line 133 in runtestprotocol
  File "/opt/venv/lib/python3.12/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/opt/venv/lib/python3.12/site-packages/_pytest/main.py", line 350 in pytest_runtestloop
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/opt/venv/lib/python3.12/site-packages/_pytest/main.py", line 325 in _main
  File "/opt/venv/lib/python3.12/site-packages/_pytest/main.py", line 271 in wrap_session
  File "/opt/venv/lib/python3.12/site-packages/_pytest/main.py", line 318 in pytest_cmdline_main
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/opt/venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/opt/venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 169 in main
  File "/opt/venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 192 in console_main
  File "/opt/venv/lib/python3.12/site-packages/pytest/__main__.py", line 5 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main

Extension modules: markupsafe._speedups, confluent_kafka.cimpl, recordclass._dataobject, recordclass._litelist, recordclass._litetuple, charset_normalizer.md, guppy.sets.setsc, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, guppy.heapy.heapyc (total: 21)

@Zaczero
Copy link

Zaczero commented Jan 23, 2024

I have also noticed the crashes, and reverting back to 3.9.10 indeed fixed the issue. I tested 3.9.11 and 3.9.12, and both had similar behavior. I face the issue randomly when running this kind of code:

for file in ~200 files:
    # where file contains a highly nested dict of just dicts and strings (no numpy, no numbers, no dates)
    data = yaml.load(file)
    buffer = orjson.dumps(data, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS)

Sometimes it crashes, sometimes not.

@ahankinson
Copy link

Just chiming in here to say that I'm also seeing these issues, on both Mac and Linux. My Mac produced a crash report, and this seems to be the relevant section.

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   Python [64584]

VM Region Info: 0x106c00000 is not in any region.  Bytes after previous region: 1  Bytes before following region: 114688
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      MALLOC_TINY                 106b00000-106c00000    [ 1024K] rw-/rwx SM=PRV  
--->  GAP OF 0x1c000 BYTES
      __TEXT                      106c1c000-106d2c000    [ 1088K] r-x/rwx SM=COW  ...311-darwin.so

Kernel Triage:
VM - (arg = 0x3) mach_vm_allocate_kernel failed within call to vm_map_enter


Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	       0x18da4b11c __pthread_kill + 8
1   libsystem_pthread.dylib       	       0x18da82cc0 pthread_kill + 288
2   libsystem_c.dylib             	       0x18d95b57c raise + 32
3   Python                        	       0x1052c9a00 faulthandler_fatal_error + 448
4   libsystem_platform.dylib      	       0x18dab1a24 _sigtramp + 56
5   orjson.cpython-311-darwin.so  	       0x106499bb8 0x106490000 + 39864
6   orjson.cpython-311-darwin.so  	       0x1064990cc 0x106490000 + 37068
7   orjson.cpython-311-darwin.so  	       0x10649ce40 0x106490000 + 52800
8   orjson.cpython-311-darwin.so  	       0x10649eb40 0x106490000 + 60224
9   orjson.cpython-311-darwin.so  	       0x10649fe60 dumps + 612
10  Python                        	       0x10525eb84 _PyEval_EvalFrameDefault + 46716
11  Python                        	       0x105262070 _PyEval_Vector + 116
12  Python                        	       0x10525f7c4 _PyEval_EvalFrameDefault + 49852
13  Python                        	       0x105262070 _PyEval_Vector + 116
14  Python                        	       0x10525f7c4 _PyEval_EvalFrameDefault + 49852
15  Python                        	       0x1052528c4 PyEval_EvalCode + 168
16  Python                        	       0x1052a93f0 run_eval_code_obj + 84
17  Python                        	       0x1052a9354 run_mod + 112
18  Python                        	       0x1052ab790 PyRun_StringFlags + 112
19  Python                        	       0x1052ab6d8 PyRun_SimpleStringFlags + 64
20  Python                        	       0x1052c46d4 pymain_run_command + 144
21  Python                        	       0x1052c41a8 Py_RunMain + 228
22  Python                        	       0x1052c54c0 Py_BytesMain + 40
23  dyld                          	       0x18d709058 start + 2224

@davidmanzanares
Copy link

Hi, I'm also experiencing segfaults randomly. I've tried finding a minimal reproducible example, but it looks non deterministic.

I've managed to detect it with Valgrind though:

==1180649== Invalid read of size 32
==1180649==    at 0x6AFFA8B: ??? (in /home/david/code/.venv311/lib/python3.11/site-packages/orjson/orjson.cpython-311-x86_64-linux-gnu.so)
==1180649==    by 0x6AFDFB2: ??? (in /home/david/code/.venv311/lib/python3.11/site-packages/orjson/orjson.cpython-311-x86_64-linux-gnu.so)
==1180649==    by 0x1FFEFFEFAF: ???
==1180649==  Address 0x84d6f3ff0 is in a rw- anonymous segment
==1180649== 

@CJSmith-0141
Copy link

Also experiencing this on an orjson.dumps inside a fastapi application. https://github.com/tiangolo/fastapi/blob/92feb735317996ef81763da370efa92c61a6d925/fastapi/responses.py#L46

@CJSmith-0141
Copy link

As best I can tell this is the commit that introduced issues? Code that used to be disabled by default was enabled. a40f58b

@PasaOpasen
Copy link

PasaOpasen commented Jan 28, 2024

Got similar issue after several days of searching

[ 7452.698006] python[63069]: segfault at 7f51b335f000 ip 00007f524b6218ae sp 00007fffd26adfb8 error 4 in orjson.cpython-38-x86_64-linux-gnu.so[7f524b5fa000+30000] likely on CPU 4 (core 0, socket 0)
[ 7452.720881] Code: 66 66 66 2e 0f 1f 84 00 00 00 00 00 4c 01 c0 4c 01 c6 49 f7 d0 4c 01 c2 4c 89 10 4c 01 c8 48 ff c6 48 85 d2 0f 84 dd 02 00 00 <c5> fe 6f 1e c5 fe 7f 18 c5 e5 74 e0 c5 e5 74 e9 c5 d5 eb e4 c5 e5

also true for 3.9.11

@ZentixUA
Copy link

Same guys

uvicorn[3257]: segfault at 7f8063274000 ip 00007f80a45e3abe sp 00007ffd23351e78 error 4 in orjson.cpython-310-x86_64-linux-gnu.so[7f80a45bd000+2f000]

@harmant
Copy link

harmant commented Jan 31, 2024

We have the same random segfaults after the upgrading orjson from 3.9.10 to 3.9.12

python -VV:
Python 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110]

alexmv added a commit to alexmv/zulip that referenced this issue Feb 5, 2024
Version 3.9.11 and 3.9.12 are susceptible to random segfaults:
- ijl/orjson#452
timabbott pushed a commit to zulip/zulip that referenced this issue Feb 5, 2024
Version 3.9.11 and 3.9.12 are susceptible to random segfaults:
- ijl/orjson#452
@andersk
Copy link
Contributor

andersk commented Feb 5, 2024

This is a bit of a stab in the dark, but from commit 5205258:

while nb > 0 {
let v = StrVector::from_slice(core::slice::from_raw_parts(sptr, STRIDE));

We know from the termination of the previous while loop that nb < STRIDE, and this load is not aligned, so what’s to stop it from overreading the end of the source allocation?

This theory is consistent with all of the reported segfault addresses being at the beginning of a page.

@andersk
Copy link
Contributor

andersk commented Feb 5, 2024

A test case that doesn’t segfault but makes Valgrind angry:

$ valgrind python -c 'import orjson; orjson.dumps((b"\n" + b"x" * 4046).decode())'
==50092== Memcheck, a memory error detector
==50092== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==50092== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==50092== Command: python -c import\ orjson;\ orjson.dumps((b"\\n"\ +\ b"x"\ *\ 4046).decode())
==50092== 
==50092== Invalid read of size 16
==50092==    at 0x12DAA988: orjson::serialize::writer::simd::format_escaped_str_impl_128 (simd.rs:0)
==50092==    by 0x12DA85C9: format_escaped_str<&mut orjson::serialize::writer::byteswriter::BytesWriter> (json.rs:578)
==50092==    by 0x12DA85C9: serialize_str<&mut orjson::serialize::writer::byteswriter::BytesWriter, orjson::serialize::writer::formatter::CompactFormatter> (json.rs:165)
==50092==    by 0x12DA85C9: <orjson::serialize::per_type::unicode::StrSerializer as serde::ser::Serialize>::serialize (unicode.rs:29)
==50092==    by 0x12DACB7A: to_writer<&mut orjson::serialize::writer::byteswriter::BytesWriter, orjson::serialize::serializer::PyObjectSerializer> (json.rs:605)
==50092==    by 0x12DACB7A: serialize (serializer.rs:25)
==50092==    by 0x12DACB7A: dumps (lib.rs:354)
==50092==    by 0x49BC251: cfunction_vectorcall_FASTCALL_KEYWORDS (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4AB22C2: PyObject_Vectorcall (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x494D8DC: _PyEval_EvalFrameDefault (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4B8256B: _PyEval_Vector.constprop.0 (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4B82709: PyEval_EvalCode (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4BAD42F: run_mod (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4BCE6FC: PyRun_SimpleStringFlags (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4BCE9B4: Py_RunMain (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4FB50CD: (below main) (in /nix/store/7jiqcrg061xi5clniy7z5pvkc4jiaqav-glibc-2.38-27/lib/libc.so.6)
==50092==  Address 0x13e203a1 is 4,081 bytes inside a block of size 4,096 alloc'd
==50092==    at 0x484276B: malloc (in /nix/store/1iai1iry6zw0fn4b2rnb93yx4vgpd9bi-valgrind-3.22.0/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==50092==    by 0x4981DBF: _PyObject_Malloc (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x49EF4DB: PyUnicode_New.part.0 (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x49B0DDF: unicode_decode_utf8 (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4A981F1: method_vectorcall_FASTCALL_KEYWORDS (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4AB22C2: PyObject_Vectorcall (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x494D8DC: _PyEval_EvalFrameDefault (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4B8256B: _PyEval_Vector.constprop.0 (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4B82709: PyEval_EvalCode (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4BAD42F: run_mod (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4BCE6FC: PyRun_SimpleStringFlags (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4BCE9B4: Py_RunMain (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== 
==50092== 
==50092== HEAP SUMMARY:
==50092==     in use at exit: 620,813 bytes in 215 blocks
==50092==   total heap usage: 6,016 allocs, 5,801 frees, 10,140,991 bytes allocated
==50092== 
==50092== LEAK SUMMARY:
==50092==    definitely lost: 0 bytes in 0 blocks
==50092==    indirectly lost: 0 bytes in 0 blocks
==50092==      possibly lost: 0 bytes in 0 blocks
==50092==    still reachable: 620,813 bytes in 215 blocks
==50092==         suppressed: 0 bytes in 0 blocks
==50092== Rerun with --leak-check=full to see details of leaked memory
==50092== 
==50092== For lists of detected and suppressed errors, rerun with: -s
==50092== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

andersk added a commit to andersk/orjson that referenced this issue Feb 5, 2024
Fixes ijl#452, probably.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
@wilson3q
Copy link
Author

wilson3q commented Feb 7, 2024

Just an update: latest version seems to have fixed seg fault issue, at least no seg faults observed since I upgraded to 3.9.13 two days ago.

@andersk
Copy link
Contributor

andersk commented Feb 7, 2024

I suspect 3.9.13 reduced the probability of the issue since 58a8bd3 decreased the maximum overread from 31 bytes to 15 bytes, but it’s not eliminated. The Valgrind trace I posted above is from 3.9.13.

@wilson3q
Copy link
Author

wilson3q commented Feb 7, 2024

Yep. I agree with you. Hope your pull request will be merged in soon, so we don't have buffer overread issue.

ijl pushed a commit that referenced this issue Feb 14, 2024
Fixes #452, probably.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
ijl pushed a commit that referenced this issue Feb 14, 2024
Fixes #452, probably.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
ijl pushed a commit that referenced this issue Feb 14, 2024
Fixes #452, probably.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
ijl pushed a commit that referenced this issue Feb 14, 2024
Fixes #452, probably.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
@ijl ijl closed this as completed in 29884e6 Feb 14, 2024
@andersk
Copy link
Contributor

andersk commented Feb 14, 2024

I see that in 528220f you’ve added a check for whether the pointer crosses a page boundary and reinstated the buffer overread if it doesn’t. But a buffer overread is undefined behavior whether or not a page boundary is crossed. Valgrind still flags the same error with my above test case in 3.9.14.

Undefined behavior will cause problems eventually, even if the symptom isn’t as obvious as a segfault, and it might seem like it’s working until there’s a clever new compiler optimization relying on an incorrect invariant inferred from the contract that the program has broken. We need to avoid all UB, not just paper over its observed symptoms.

Moreover, it can’t possibly be saving significant time here, given this code is only there for handling the end of the buffer.

Please fully remove the buffer overread.

@raineth
Copy link

raineth commented Feb 16, 2024

I am still seeing crashes with 3.9.14, although less frequently. It might not be exactly the same issue mentioned upthread (my stack trace didn't make sense and I can't repro on demand yet).

andersk pushed a commit to andersk/zulip that referenced this issue Feb 16, 2024
Version 3.9.11 and 3.9.12 are susceptible to random segfaults:
- ijl/orjson#452

(cherry picked from commit 437361d)
timabbott pushed a commit to zulip/zulip that referenced this issue Feb 16, 2024
Version 3.9.11 and 3.9.12 are susceptible to random segfaults:
- ijl/orjson#452

(cherry picked from commit 437361d)
amaranand360 pushed a commit to amaranand360/zulip that referenced this issue Feb 17, 2024
Version 3.9.11 and 3.9.12 are susceptible to random segfaults:
- ijl/orjson#452
mananbordia pushed a commit to mananbordia/zulip that referenced this issue Feb 27, 2024
Version 3.9.11 and 3.9.12 are susceptible to random segfaults:
- ijl/orjson#452
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.