Skip to content

Segfault when running on Legion master #11

@manopapad

Description

@manopapad

Building pagerank against current Legion master and running it on hollywood.lux results in a segfault at this location:

Thread 12 (Thread 0x7f09487f1ac0 (LWP 59394)):
#0  0x00007f0962443722 in __GI___waitpid (pid=59395, stat_loc=stat_loc@entry=0x7f09487ec988, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x00007f09623ae107 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149
#2  0x0000000002e60108 in gasneti_bt_gdb ()
#3  0x0000000002e63a6f in gasneti_print_backtrace ()
#4  0x00000000014b025f in gasneti_defaultSignalHandler ()
#5  <signal handler called>
#6  0x00000000014cab3e in pull_init_task_impl (task=0x7f010c209f10, regions=..., ctx=0x7f010c050280, runtime=0xa2fd9e0) at pagerank_gpu.cu:231
#7  0x00000000014bbcf0 in Legion::LegionTaskWrapper::legion_task_wrapper<GraphPiece, &(pull_init_task_impl(Legion::Task const*, std::vector<Legion::PhysicalRegion, std::allocator<Legion::PhysicalRegion> > const&, Legion::Internal::TaskContext*, Legion::Runtime*))> (args=0x7f010c02e6d8, arglen=8, userdata=0x0, userlen=0, p=...) at ../legion/runtime/legion/legion.inl:20435
#8  0x0000000002834a4d in Realm::LocalTaskProcessor::execute_task (this=0xa6ac590, func_id=102, task_args=...) at ../legion/runtime/realm/proc_impl.cc:1090
#9  0x0000000002e2a556 in Realm::Task::execute_on_processor (this=0x7f010c02e560, p=...) at ../legion/runtime/realm/tasks.cc:306
#10 0x0000000002e2e22e in Realm::KernelThreadTaskScheduler::execute_task (this=0xa6a79b0, task=0x7f010c02e560) at ../legion/runtime/realm/tasks.cc:1380
#11 0x000000000292f956 in Realm::Cuda::GPUTaskScheduler<Realm::KernelThreadTaskScheduler>::execute_task (this=0xa6a79b0, task=0x7f010c02e560) at ../legion/runtime/realm/cuda/cuda_module.cc:1657
#12 0x0000000002e2d246 in Realm::ThreadedTaskScheduler::scheduler_loop (this=0xa6a79b0) at ../legion/runtime/realm/tasks.cc:1127
#13 0x0000000002e2d700 in Realm::ThreadedTaskScheduler::scheduler_loop_wlock (this=0xa6a79b0) at ../legion/runtime/realm/tasks.cc:1231
#14 0x0000000002e33c9c in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop_wlock> (obj=0xa6a79b0) at ../legion/runtime/realm/threads.inl:97
#15 0x00000000026cfbcb in Realm::KernelThread::pthread_entry (data=0x9f7ea10) at ../legion/runtime/realm/threads.cc:774
#16 0x00007f0964a826db in start_thread (arg=0x7f09487f1ac0) at pthread_create.c:463
#17 0x00007f0962480a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

It looks like the failing statement https://github.com/LuxGraph/Lux/blob/master/pagerank/pagerank_gpu.cu#L231 is trying to access GPU memory directly inside a GPU variant (i.e. in code that runs on the host). The same result could probably be achieved with a cudaMemCpy.

Note that this crash does not happen on the Legion stable branch.

Also note that alloc_bytes needs to be changed to alloc_bytes_local for Lux to compile properly against Legion master (at https://github.com/LuxGraph/Lux/blob/master/pagerank/pagerank_gpu.cu#L272 and https://github.com/LuxGraph/Lux/blob/master/pagerank/pagerank_gpu.cu#L275).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions