Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault at exit when an exception exited stage_version context #250

Open
quarl opened this issue Jun 19, 2023 · 3 comments
Open

segfault at exit when an exception exited stage_version context #250

quarl opened this issue Jun 19, 2023 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@quarl
Copy link
Member

quarl commented Jun 19, 2023

$ cat repro.py
import datetime
import h5py
import numpy
from   versioned_hdf5           import VersionedHDF5File

import some_deshaw_module

def f():
    path_to_write = '/tmp/foo.h5'
    timestamp = numpy.datetime64(datetime.datetime.utcnow())
    f = h5py.File(path_to_write, 'w')
    vf = VersionedHDF5File(f)
    with vf.stage_version('v1', timestamp=timestamp) as sv:
        sv.create_dataset('foo', data=numpy.arange(10))
    with vf.stage_version('v2', timestamp=timestamp) as sv:
        1/0

f()

$ ipython repro.py
ZeroDivisionError: division by zero
zsh: segmentation fault (core dumped)

If I remove the 'import some_deshaw_module' or use python instead of ipython, then it doesn't reproduce the segfault.
I suspect whether the segfault happens depends on specifics of memory layout, so even if we get a reproducer without some_deshaw_module it might still not segfault on a different computer. I've seen similar cases where even the length of the contents of $PATH affected whether code segfaulted.

Before we do more work on making a more self-contained reproducer, are you able to see what the problem could be based on inspecting the code?

@quarl
Copy link
Member Author

quarl commented Jun 19, 2023

(internal ticket: https://desflow.nyc.deshaw.com/q/PyInf/10233)

@peytondmurray peytondmurray self-assigned this Jun 19, 2023
@ArvidJB
Copy link
Collaborator

ArvidJB commented Jun 20, 2023

Here's the stack trace:

(gdb) bt
#0  0x00007ffff7e900e4 in ?? ()
#1  0x00007fffee5974dc in H5P_close () from /usr/local/python/python-3.10/std/lib/libhdf5.so.310
#2  0x00007fffee5978b9 in H5P__close_list_cb () from /usr/local/python/python-3.10/std/lib/libhdf5.so.310
#3  0x00007fffee51f7a2 in H5I_dec_ref () from /usr/local/python/python-3.10/std/lib/libhdf5.so.310
#4  0x00007fffee473ed2 in H5D__virtual_reset_layout () from /usr/local/python/python-3.10/std/lib/libhdf5.so.310
#5  0x00007fffee5605bd in H5O__layout_reset () from /usr/local/python/python-3.10/std/lib/libhdf5.so.310
#6  0x00007fffee563659 in H5O_msg_reset () from /usr/local/python/python-3.10/std/lib/libhdf5.so.310
#7  0x00007fffee572fd1 in H5P__dcrt_layout_close () from /usr/local/python/python-3.10/std/lib/libhdf5.so.310
#8  0x00007fffee597566 in H5P_close () from /usr/local/python/python-3.10/std/lib/libhdf5.so.310
#9  0x00007fffee5978b9 in H5P__close_list_cb () from /usr/local/python/python-3.10/std/lib/libhdf5.so.310
#10 0x00007fffee51d959 in H5I_clear_type () from /usr/local/python/python-3.10/std/lib/libhdf5.so.310
#11 0x00007fffee59435e in H5P_term_package () from /usr/local/python/python-3.10/std/lib/libhdf5.so.310
#12 0x00007fffee3e2dcc in H5_term_library.part.0 () from /usr/local/python/python-3.10/std/lib/libhdf5.so.310
#13 0x00007ffff69341ec in __run_exit_handlers () from /lib64/libc.so.6
#14 0x00007ffff6934320 in exit () from /lib64/libc.so.6
#15 0x00007ffff7a4157b in Py_Exit (sts=1) at /usr/src/debug/deshaw-python310-3.10.4-1.el8.3.deshaw.x86_64/Python/pylifecycle.c:2862
#16 0x00007ffff7a3e67b in handle_system_exit () at /usr/src/debug/deshaw-python310-3.10.4-1.el8.3.deshaw.x86_64/Python/pythonrun.c:775
#17 0x00007ffff7a3e5fd in _PyErr_PrintEx (set_sys_last_vars=1, tstate=0x620920) at /usr/src/debug/deshaw-python310-3.10.4-1.el8.3.deshaw.x86_64/Python/pythonrun.c:785
#18 PyErr_PrintEx (set_sys_last_vars=1) at /usr/src/debug/deshaw-python310-3.10.4-1.el8.3.deshaw.x86_64/Python/pythonrun.c:880
#19 0x00007ffff7a3665a in PyErr_Print () at /usr/src/debug/deshaw-python310-3.10.4-1.el8.3.deshaw.x86_64/Python/pythonrun.c:886
#20 _PyRun_SimpleFileObject (fp=<optimized out>, filename=<optimized out>, closeit=<optimized out>, flags=0x7fffffffd418)
    at /usr/src/debug/deshaw-python310-3.10.4-1.el8.3.deshaw.x86_64/Python/pythonrun.c:462
#21 0x00007ffff7a36187 in _PyRun_AnyFileObject (fp=0x7539a0, filename='/usr/local/bin/ipython', closeit=1, flags=0x7fffffffd418)
    at /usr/src/debug/deshaw-python310-3.10.4-1.el8.3.deshaw.x86_64/Python/pythonrun.c:90
#22 0x00007ffff7a3350d in pymain_run_file_obj (skip_source_first_line=0, filename='/usr/local/bin/ipython', program_name='/usr/local/python/python-3.10/std/bin/python')
    at /usr/src/debug/deshaw-python310-3.10.4-1.el8.3.deshaw.x86_64/Modules/main.c:366
#23 pymain_run_file (config=0x604d80) at /usr/src/debug/deshaw-python310-3.10.4-1.el8.3.deshaw.x86_64/Modules/main.c:385
#24 pymain_run_python (exitcode=0x7fffffffd410) at /usr/src/debug/deshaw-python310-3.10.4-1.el8.3.deshaw.x86_64/Modules/main.c:600
#25 Py_RunMain () at /usr/src/debug/deshaw-python310-3.10.4-1.el8.3.deshaw.x86_64/Modules/main.c:679
#26 0x00007ffff7a000cd in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/deshaw-python310-3.10.4-1.el8.3.deshaw.x86_64/Modules/main.c:733
#27 0x00007ffff691dca3 in __libc_start_main () from /lib64/libc.so.6
#28 0x000000000040076e in _start ()

@peytondmurray
Copy link
Collaborator

I'm working on this now, but can't seem to reproduce the issue. Do you know of a substitute module for some_deshaw_module?

I'm also curious whether the timestamp kwarg is important. For now I'll try some other imports and see if I can't get this to segfault. So far I've only gotten ZeroDivisionError.

@peytondmurray peytondmurray added the question Further information is requested label Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants