Skip to content
This repository has been archived by the owner on Jan 26, 2024. It is now read-only.

Segmentation fault in hipChildGraphNode::UpdateEventWaitLists #31

Open
FreddieWitherden opened this issue Jul 1, 2022 · 1 comment
Open

Comments

@FreddieWitherden
Copy link

FreddieWitherden commented Jul 1, 2022

Consider the following example (which is written in Python but translates 1:1 to HIP C and should be easy enough to follow) and uses hipGraphAddChildGraphNode to embed a captured graph inside of another graph:

from ctypes import c_int, c_double, c_void_p

import numpy as np

from pyfr.backends.hip.driver import HIP, HIPGraph
from pyfr.backends.hip.compiler import HIPRTC

N = 1024*4096

hip = HIP()
hip.set_device(0)

stream = hip.create_stream()

a_hp = hip.pagelocked_empty((N,), float)
b_hp = hip.pagelocked_empty((N,), float)
c_hp = hip.pagelocked_empty((N,), float)

a_hp[:] = np.random.randn(N)
b_hp[:] = np.random.randn(N)

a_cu = hip.mem_alloc(a_hp.nbytes)
b_cu = hip.mem_alloc(b_hp.nbytes)
c_cu = hip.mem_alloc(c_hp.nbytes)

hiprtc = HIPRTC()

src = '''extern "C" __global__
void add(int n, double x, double *a, double *b, double *c)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
if (i < n) c[i] = a[i] + x*b[i];
}
'''
mod = hip.load_module(hiprtc.compile('kern', src))

fun = mod.get_function('add', [c_int, c_double] + 3*[c_void_p])
params = fun.make_params((N // 128, 1, 1), (128, 1, 1))
params.set_args(N, 1.1, a_cu, b_cu, c_cu)

g = HIPGraph(hip)
n1 = g.add_memcpy(a_cu, a_hp, a_hp.nbytes)
n2 = g.add_memcpy(b_cu, b_hp, b_hp.nbytes)

# Flip me!
if True:
    stream.begin_capture()
    fun.exec_async(stream, params)
    gg = stream.end_capture()
    n3 = g.add_graph(gg, deps=[n1, n2])
else:
    n3 = g.add_kernel(params, deps=[n1, n2])

g.add_memcpy(c_hp, c_cu, c_hp.nbytes, deps=[n3])

gi = g.instantiate()
gi.launch(stream)
stream.synchronize()

print(np.allclose(c_hp, a_hp + 1.1*b_hp))

Running this under ROCm 5.2 I get a segmentation fault in:

(gdb) bt
#0  0x00007fff4b2e259c in hipChildGraphNode::UpdateEventWaitLists(std::vector<amd::Event*, std::allocator<amd::Event*> >) ()
from /opt/rocm-5.2.0/lib/libamdhip64.so
#1  0x00007fff4b2abbad in FillCommands(std::vector<std::vector<hipGraphNode*, std::allocator<hipGraphNode*> >, std::allocator<std::vector<hipGraphNode*, std::allocator<hipGraphNode*> > > >&, std::unordered_map<hipGraphNode*, std::vector<hipGraphNode*, std::allocator<hipGraphNode*> >, std::hash<hipGraphNode*>, std::equal_to<hipGraphNode*>, std::allocator<std::pair<hipGraphNode* const, std::vector<hipGraphNode*, std::allocator<hipGraphNode*> > > > >&, std::vector<hipGraphNode*, std::allocator<hipGraphNode*> >&, amd::Command*&, amd::Command*&, amd::HostQueue*) () from /opt/rocm-5.2.0/lib/libamdhip64.so
#2  0x00007fff4b2abfdf in hipGraphExec::Run(ihipStream_t*) () from /opt/rocm-5.2.0/lib/libamdhip64.so
#3  0x00007fff4b2ae87d in ihipGraphLaunch(hipGraphExec*, ihipStream_t*) () from /opt/rocm-5.2.0/lib/libamdhip64.so
#4  0x00007fff4b2ba3a7 in hipGraphLaunch () from /opt/rocm-5.2.0/lib/libamdhip64.so

Switching the if block (so we add the kernel directly to the graph) everything works as expected. As does just running the captured graph bare. Given all API commands execute successfully this indicates a bug.

Disassembling:

Dump of assembler code for function _ZN17hipChildGraphNode20UpdateEventWaitListsESt6vectorIPN3amd5EventESaIS3_EE:
0x00007fff4b2e2580 <+0>:     push   %r13
0x00007fff4b2e2582 <+2>:     push   %r12
0x00007fff4b2e2584 <+4>:     mov    %rsi,%r12
0x00007fff4b2e2587 <+7>:     push   %rbp
0x00007fff4b2e2588 <+8>:     push   %rbx
0x00007fff4b2e2589 <+9>:     sub    $0x28,%rsp
0x00007fff4b2e258d <+13>:    mov    0xa8(%rdi),%rax
0x00007fff4b2e2594 <+20>:    mov    0x8(%r12),%rdx
0x00007fff4b2e2599 <+25>:    mov    (%rsi),%rsi
=> 0x00007fff4b2e259c <+28>:    mov    (%rax),%rax

with the associated C++ being:

https://github.com/ROCm-Developer-Tools/hipamd/blob/06f64e1a53ddc9b0ca02993ff5bff95bea1f8f7f/src/hip_graph_internal.hpp#L413

@ansurya
Copy link
Contributor

ansurya commented Aug 22, 2022

Internal ticket created to track the issue .

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants