Skip to content

Segmentation Fault #1090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
axsk opened this issue Jan 24, 2025 · 5 comments
Open

Segmentation Fault #1090

axsk opened this issue Jan 24, 2025 · 5 comments

Comments

@axsk
Copy link

axsk commented Jan 24, 2025

I keep running into Segmentation fault errors.

This happens most probably during calls to the OpenMM Python API.
Here is my "error message"

Stacktrace
[13736] signal 11 (1): Segmentation fault                                                                                                                                                                                                                  
in expression starting at none:0                                                                                                                                                                                                                           
_PyInterpreterState_GET at /usr/local/src/conda/python-3.12.8/Include/internal/pycore_pystate.h:133 [inlined]                                                                                                                                              
get_state at /usr/local/src/conda/python-3.12.8/Objects/obmalloc.c:866 [inlined]                                                                                                                                                                           
_PyObject_Free at /usr/local/src/conda/python-3.12.8/Objects/obmalloc.c:1850 [inlined]
PyObject_Free at /usr/local/src/conda/python-3.12.8/Objects/obmalloc.c:830
_buffer_info_free at /data/numerik/people/bzfsikor/conda/envs/conda_jl/lib/python3.12/site-packages/numpy/core/_multiarray_umath.cpython-312-x86_64-linux-gnu.so (unknown line)
array_dealloc at /data/numerik/people/bzfsikor/conda/envs/conda_jl/lib/python3.12/site-packages/numpy/core/_multiarray_umath.cpython-312-x86_64-linux-gnu.so (unknown line)
pydecref_ at /data/numerik/people/bzfsikor/software/julia_depot/packages/PyCall/1gn3u/src/PyCall.jl:118
pydecref at /data/numerik/people/bzfsikor/software/julia_depot/packages/PyCall/1gn3u/src/PyCall.jl:123
jfptr_pydecref_4550 at /data/numerik/people/bzfsikor/software/julia_depot/compiled/v1.11/PyCall/GkzkC_ddiUX.so (unknown line) 
run_finalizer at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/gc.c:299
jl_gc_run_finalizers_in_list at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/gc.c:389
run_finalizers at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/gc.c:435
enable_finalizers at ./gcutils.jl:161 [inlined]
unlock at ./lock.jl:178 [inlined]
macro expansion at ./lock.jl:275 [inlined]
#282 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2851
jfptr_YY.282_9263 at /data/numerik/people/bzfsikor/software/julia_depot/compiled/v1.11/REPL/u0gqU_dovaC.so (unknown line)
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
start_task at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/task.c:1202
Allocations: 775171170 (Pool: 775164164; Big: 7006); GC: 5349
fish: Job 1, 'env JULIA_HISTORY=./.history.jl…' terminated by signal SIGSEGV (Address boundary error)

I have no idea how to investigate this further.

@axsk axsk changed the title SegmentationFault Segmentation Fault Jan 24, 2025
@axsk
Copy link
Author

axsk commented Jan 24, 2025

And here is another one, which I run into more often. This puzzles me especially since it somehow involves CUDA as well..

Stacktrace
[89762] signal 11 (1): Segmentation fault
in expression starting at REPL[98]:1
_PyInterpreterState_GET at /usr/local/src/conda/python-3.12.8/Include/internal/pycore_pystate.h:133 [inlined]
get_gc_state at /usr/local/src/conda/python-3.12.8/Modules/gcmodule.c:134 [inlined]
PyObject_GC_Del at /usr/local/src/conda/python-3.12.8/Modules/gcmodule.c:2421
pydecref_ at /data/numerik/people/bzfsikor/software/julia_depot/packages/PyCall/1gn3u/src/PyCall.jl:118
pydecref at /data/numerik/people/bzfsikor/software/julia_depot/packages/PyCall/1gn3u/src/PyCall.jl:123
jfptr_pydecref_4550 at /data/numerik/people/bzfsikor/software/julia_depot/compiled/v1.11/PyCall/GkzkC_ddiUX.so (unknown line)
run_finalizer at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/gc.c:299
jl_gc_run_finalizers_in_list at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/gc.c:389
run_finalizers at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/gc.c:435
enable_finalizers at ./gcutils.jl:161 [inlined]
unlock at ./locks-mt.jl:68 [inlined]
popfirst! at ./task.jl:751
trypoptask at ./task.jl:992
jfptr_trypoptask_66779.1 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
get_next_task at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/scheduler.c:377 [inlined]
ijl_task_get_next at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/scheduler.c:438
poptask at ./task.jl:1012
wait at ./task.jl:1021
#wait#731 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
take! at /data/numerik/people/bzfsikor/software/julia_depot/packages/CUDA/1kIOw/lib/cudadrv/synchronization.jl:53
synchronization_worker at /data/numerik/people/bzfsikor/software/julia_depot/packages/CUDA/1kIOw/lib/cudadrv/synchronization.jl:119
unknown function (ip: 0x7f8cd97c33b5)
jlcapi_synchronization_worker_13623 at /data/numerik/people/bzfsikor/software/julia_depot/compiled/v1.11/CUDA/oWw5k_ddiUX.so (unknown line)
unknown function (ip: 0x7f8f3d3961c3)
unknown function (ip: 0x7f8f3d41685b)
Allocations: 1455439833 (Pool: 1455431030; Big: 8803); GC: 4054
fish: Job 1, 'env JULIA_HISTORY=./.history.jl…' terminated by signal SIGSEGV (Address boundary error)

@axsk
Copy link
Author

axsk commented Jan 24, 2025

I switched to a single threaded instance and have not yet observed this issue.
However, I don't make any (explicit) use of multi-threading nowhere in my code..

@axsk
Copy link
Author

axsk commented Mar 4, 2025

Whereas above examples happened randomly in my training loop, I can now reproduce the problem by tab-completing a PyObjets fields:

Stacktrace
julia> a[1]                                                                                                                                                                                                                                                             
PyObject <Atom 0 (H1) of chain 0 residue 0 (ACE)>                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                        
julia> a[1].                                                                                                                                                                                                                                                            
[12951] signal 11 (1): Segmentation fault                                                                                                                                                                                                                               
in expression starting at none:0                                                                                                                                                                                                                                        
        _PyInterpreterState_GET at /usr/local/src/conda/python-3.12.8/Include/internal/pycore_pystate.h:133 [inlined]                                                                                                                                                   
_PyType_Lookup at /usr/local/src/conda/python-3.12.8/Objects/typeobject.c:4729 [inlined]                                                                                                                                                                                
_PyObject_LookupSpecial at /usr/local/src/conda/python-3.12.8/Objects/typeobject.c:2167                                                                                                                                                                                 
_dir_object at /usr/local/src/conda/python-3.12.8/Objects/object.c:1758 [inlined]                                                                                                                                                                                       
PyObject_Dir at /usr/local/src/conda/python-3.12.8/Objects/object.c:1790                                                                                                                                                                                                
macro expansion at /data/numerik/people/bzfsikor/software/julia_depot/packages/PyCall/1gn3u/src/exception.jl:108 [inlined]                                                                                                                                              
propertynames at /data/numerik/people/bzfsikor/software/julia_depot/packages/PyCall/1gn3u/src/PyCall.jl:327                                                                                                                                                             
propertynames at ./reflection.jl:2612                                                                                                                                                                                                                                   
unknown function (ip: 0x7f671ed24e4b)                                                                                                                                                                                                                                   
complete_symbol at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPLCompletions.jl:208                                                                                                                              
#complete_identifiers!#57 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPLCompletions.jl:1179                                                                                                                   
complete_identifiers! at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPLCompletions.jl:1079 [inlined]                                                                                                             
completions at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPLCompletions.jl:1436                                                                                                                                 
#complete_line#85 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPL.jl:637                                                                                                                                       
complete_line at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPL.jl:634                                                                                                                                           
unknown function (ip: 0x7f68d8d8d97d)                                                                                                                                                                                                                                   
check_for_hint at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:387                                                                                                                                      
#143 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2527                                                                                                                                               
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]                                                                                                                                                                 
jl_f__call_latest at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/builtins.c:875                                                                                                                                                                
#invokelatest#2 at ./essentials.jl:1055 [inlined]                                                                                                                                                                                                                       
invokelatest at ./essentials.jl:1052 [inlined]                                                                                                                                                                                                                          
#30 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:1711                                                                                                                                                
jfptr_YY.30_8684 at /data/numerik/people/bzfsikor/software/julia_depot/compiled/v1.11/REPL/u0gqU_dovaC.so (unknown line)
#254 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2614
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
#30 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:1711
jfptr_YY.30_8724 at /data/numerik/people/bzfsikor/software/julia_depot/compiled/v1.11/REPL/u0gqU_dovaC.so (unknown line)
macro expansion at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2861 [inlined]
macro expansion at ./lock.jl:273 [inlined]
#282 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2851
jfptr_YY.282_9263 at /data/numerik/people/bzfsikor/software/julia_depot/compiled/v1.11/REPL/u0gqU_dovaC.so (unknown line)
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
start_task at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/task.c:1202
Allocations: 66606445 (Pool: 66604355; Big: 2090); GC: 104
fish: Job 1, 'env JULIA_HISTORY=./.history.jl…' terminated by signal SIGSEGV (Address boundary error)

Again, this only happens with multiple threads, when starting Julia with --threads=1 it works fine.

@bhawkins
Copy link

bhawkins commented Apr 8, 2025

Here's a simple script that reliably triggers a similar segfault for me:

using PyCall
const math = pyimport("math")

println("nthreads = ", Threads.nthreads())

function foo(n, niter=1)
    x = zeros(n)
    for iter = 1:niter
        Threads.@threads for i = 1:n
            x[i] += rand()
        end
    end
    return x
end

foo(50_000, 25_000)
Output on Linux with Julia 1.11.4 and Python 3.12.9
$ julia -t 8 pycall_segfault.jl 
nthreads = 8

[4600] signal 11 (1): Segmentation fault
in expression starting at /data/bhawkins/PRH-4/NISAR_L0_PR_RRSD_055_071_D_137S_20241015T075909_20241015T075935_P00406_F_J_001/pycall_segfault.jl:16
_PyInterpreterState_GET at /usr/local/src/conda/python-3.12.9/Include/internal/pycore_pystate.h:133 [inlined]
notify_code_watchers at /usr/local/src/conda/python-3.12.9/Objects/codeobject.c:32 [inlined]
code_dealloc at /usr/local/src/conda/python-3.12.9/Objects/codeobject.c:1705
pydecref_ at /home/jovyan/.julia/packages/PyCall/1gn3u/src/PyCall.jl:118
pydecref at /home/jovyan/.julia/packages/PyCall/1gn3u/src/PyCall.jl:123
jfptr_pydecref_4441 at /home/jovyan/.julia/compiled/v1.11/PyCall/GkzkC_FmVRe.so (unknown line)
run_finalizer at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/gc.c:299
jl_gc_run_finalizers_in_list at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/gc.c:389
run_finalizers at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/gc.c:435
jl_mutex_unlock at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia_locks.h:80 [inlined]
ijl_process_events at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jl_uv.c:398
ijl_task_get_next at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/scheduler.c:610
poptask at ./task.jl:1012
wait at ./task.jl:1021
task_done_hook at ./task.jl:694
jfptr_task_done_hook_66658.1 at /home/jovyan/.julia/juliaup/julia-1.11.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_finish_task at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/task.c:319
start_task at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/task.c:1213
Allocations: 664847 (Pool: 664810; Big: 37); GC: 1
Segmentation fault (core dumped)
Output on macos with Julia 1.11.3 and Python 3.13.2
$ julia -t 8 pycall_segfault.jl
nthreads = 8

[60363] signal 11 (2): Segmentation fault: 11
in expression starting at /Users/bhawkins/Downloads/pycall_segfault.jl:16
code_dealloc at /opt/homebrew/Cellar/python@3.13/3.13.2/Frameworks/Python.framework/Versions/3.13/Python (unknown line)
pydecref_ at /Users/bhawkins/.julia/packages/PyCall/1gn3u/src/PyCall.jl:118
pydecref at /Users/bhawkins/.julia/packages/PyCall/1gn3u/src/PyCall.jl:123
jfptr_pydecref_4460 at /Users/bhawkins/.julia/compiled/v1.11/PyCall/GkzkC_UERWi.dylib (unknown line)
run_finalizer at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/gc.c:299
jl_gc_run_finalizers_in_list at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/gc.c:389
run_finalizers at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/gc.c:435
jl_mutex_unlock at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/./julia_locks.h:80 [inlined]
ijl_task_get_next at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/scheduler.c:526
poptask at ./task.jl:1012
wait at ./task.jl:1021
task_done_hook at ./task.jl:694
jfptr_task_done_hook_66909.1 at /Users/bhawkins/.julia/juliaup/julia-1.11.3+0.aarch64.apple.darwin14/lib/julia/sys.dylib (unknown line)
jl_apply at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/./julia.h:2157 [inlined]
jl_finish_task at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/task.c:319
start_task at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/task.c:1213
Allocations: 673857 (Pool: 673823; Big: 34); GC: 1

Segmentation fault

On my mac this crashes about 90% of the time. If I comment out the pyimport then the probability of crashing goes down somewhat, maybe 70%. If I comment both the first two lines then it doesn't crash. I'm not sure how important the function body is, but it doesn't seem to crash if I have a parallel loop with just sleep(1) in it.

@bhawkins
Copy link

bhawkins commented Apr 8, 2025

Using the pylock() suggestion here makes the crash go away. My example isn't actually calling any Python, but I guess gc can run anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants