-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random crashes after upgrade to 3.9.12 #452
Comments
I'm also running into this issue with 3.9.12. Fix was to revert to 3.9.10. Randomly seg faults at different times in my test suite. Seems to be related to the NUMPY opt.
|
I have also noticed the crashes, and reverting back to 3.9.10 indeed fixed the issue. I tested 3.9.11 and 3.9.12, and both had similar behavior. I face the issue randomly when running this kind of code: for file in ~200 files:
# where file contains a highly nested dict of just dicts and strings (no numpy, no numbers, no dates)
data = yaml.load(file)
buffer = orjson.dumps(data, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS) Sometimes it crashes, sometimes not. |
Just chiming in here to say that I'm also seeing these issues, on both Mac and Linux. My Mac produced a crash report, and this seems to be the relevant section.
|
Hi, I'm also experiencing segfaults randomly. I've tried finding a minimal reproducible example, but it looks non deterministic. I've managed to detect it with Valgrind though:
|
Also experiencing this on an |
As best I can tell this is the commit that introduced issues? Code that used to be disabled by default was enabled. a40f58b |
Got similar issue after several days of searching
also true for 3.9.11 |
Same guys
|
We have the same random segfaults after the upgrading orjson from 3.9.10 to 3.9.12 python -VV: |
Version 3.9.11 and 3.9.12 are susceptible to random segfaults: - ijl/orjson#452
Version 3.9.11 and 3.9.12 are susceptible to random segfaults: - ijl/orjson#452
This is a bit of a stab in the dark, but from commit 5205258: orjson/src/serialize/writer/simd.rs Lines 97 to 98 in 4eb4f00
We know from the termination of the previous This theory is consistent with all of the reported segfault addresses being at the beginning of a page. |
A test case that doesn’t segfault but makes Valgrind angry: $ valgrind python -c 'import orjson; orjson.dumps((b"\n" + b"x" * 4046).decode())'
==50092== Memcheck, a memory error detector
==50092== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==50092== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==50092== Command: python -c import\ orjson;\ orjson.dumps((b"\\n"\ +\ b"x"\ *\ 4046).decode())
==50092==
==50092== Invalid read of size 16
==50092== at 0x12DAA988: orjson::serialize::writer::simd::format_escaped_str_impl_128 (simd.rs:0)
==50092== by 0x12DA85C9: format_escaped_str<&mut orjson::serialize::writer::byteswriter::BytesWriter> (json.rs:578)
==50092== by 0x12DA85C9: serialize_str<&mut orjson::serialize::writer::byteswriter::BytesWriter, orjson::serialize::writer::formatter::CompactFormatter> (json.rs:165)
==50092== by 0x12DA85C9: <orjson::serialize::per_type::unicode::StrSerializer as serde::ser::Serialize>::serialize (unicode.rs:29)
==50092== by 0x12DACB7A: to_writer<&mut orjson::serialize::writer::byteswriter::BytesWriter, orjson::serialize::serializer::PyObjectSerializer> (json.rs:605)
==50092== by 0x12DACB7A: serialize (serializer.rs:25)
==50092== by 0x12DACB7A: dumps (lib.rs:354)
==50092== by 0x49BC251: cfunction_vectorcall_FASTCALL_KEYWORDS (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x4AB22C2: PyObject_Vectorcall (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x494D8DC: _PyEval_EvalFrameDefault (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x4B8256B: _PyEval_Vector.constprop.0 (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x4B82709: PyEval_EvalCode (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x4BAD42F: run_mod (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x4BCE6FC: PyRun_SimpleStringFlags (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x4BCE9B4: Py_RunMain (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x4FB50CD: (below main) (in /nix/store/7jiqcrg061xi5clniy7z5pvkc4jiaqav-glibc-2.38-27/lib/libc.so.6)
==50092== Address 0x13e203a1 is 4,081 bytes inside a block of size 4,096 alloc'd
==50092== at 0x484276B: malloc (in /nix/store/1iai1iry6zw0fn4b2rnb93yx4vgpd9bi-valgrind-3.22.0/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==50092== by 0x4981DBF: _PyObject_Malloc (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x49EF4DB: PyUnicode_New.part.0 (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x49B0DDF: unicode_decode_utf8 (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x4A981F1: method_vectorcall_FASTCALL_KEYWORDS (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x4AB22C2: PyObject_Vectorcall (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x494D8DC: _PyEval_EvalFrameDefault (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x4B8256B: _PyEval_Vector.constprop.0 (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x4B82709: PyEval_EvalCode (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x4BAD42F: run_mod (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x4BCE6FC: PyRun_SimpleStringFlags (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== by 0x4BCE9B4: Py_RunMain (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==
==50092==
==50092== HEAP SUMMARY:
==50092== in use at exit: 620,813 bytes in 215 blocks
==50092== total heap usage: 6,016 allocs, 5,801 frees, 10,140,991 bytes allocated
==50092==
==50092== LEAK SUMMARY:
==50092== definitely lost: 0 bytes in 0 blocks
==50092== indirectly lost: 0 bytes in 0 blocks
==50092== possibly lost: 0 bytes in 0 blocks
==50092== still reachable: 620,813 bytes in 215 blocks
==50092== suppressed: 0 bytes in 0 blocks
==50092== Rerun with --leak-check=full to see details of leaked memory
==50092==
==50092== For lists of detected and suppressed errors, rerun with: -s
==50092== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) |
Fixes ijl#452, probably. Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Just an update: latest version seems to have fixed seg fault issue, at least no seg faults observed since I upgraded to 3.9.13 two days ago. |
I suspect 3.9.13 reduced the probability of the issue since 58a8bd3 decreased the maximum overread from 31 bytes to 15 bytes, but it’s not eliminated. The Valgrind trace I posted above is from 3.9.13. |
Yep. I agree with you. Hope your pull request will be merged in soon, so we don't have buffer overread issue. |
Fixes #452, probably. Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Fixes #452, probably. Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Fixes #452, probably. Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Fixes #452, probably. Signed-off-by: Anders Kaseorg <andersk@mit.edu>
I see that in 528220f you’ve added a check for whether the pointer crosses a page boundary and reinstated the buffer overread if it doesn’t. But a buffer overread is undefined behavior whether or not a page boundary is crossed. Valgrind still flags the same error with my above test case in 3.9.14. Undefined behavior will cause problems eventually, even if the symptom isn’t as obvious as a segfault, and it might seem like it’s working until there’s a clever new compiler optimization relying on an incorrect invariant inferred from the contract that the program has broken. We need to avoid all UB, not just paper over its observed symptoms. Moreover, it can’t possibly be saving significant time here, given this code is only there for handling the end of the buffer. Please fully remove the buffer overread. |
I am still seeing crashes with 3.9.14, although less frequently. It might not be exactly the same issue mentioned upthread (my stack trace didn't make sense and I can't repro on demand yet). |
Version 3.9.11 and 3.9.12 are susceptible to random segfaults: - ijl/orjson#452 (cherry picked from commit 437361d)
Version 3.9.11 and 3.9.12 are susceptible to random segfaults: - ijl/orjson#452 (cherry picked from commit 437361d)
Version 3.9.11 and 3.9.12 are susceptible to random segfaults: - ijl/orjson#452
Version 3.9.11 and 3.9.12 are susceptible to random segfaults: - ijl/orjson#452
This is from system dmesg output:
[Fri Jan 19 10:41:06 2024] python3[3421008]: segfault at 7fe28bd24000 ip 00007fe296824bde sp 00007ffdd5db46f8 error 4 in orjson.cpython-312-x86_64-linux-gnu.so[7fe2967fe000+2f000] [Fri Jan 19 10:41:06 2024] Code: 66 66 66 2e 0f 1f 84 00 00 00 00 00 4c 01 c0 4c 01 c6 49 f7 d0 4c 01 c2 4c 89 10 4c 01 c8 48 ff c6 48 85 d2 0f 84 dd 02 00 00 <c5> fe 6f 1e c5 fe 7f 18 c5 e5 74 e0 c5 e5 74 e9 c5 d5 eb e4 c5 e5
Not sure if other people encounter similar issues.
The text was updated successfully, but these errors were encountered: