You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi, I wrote a communication framework on our company's self-developed GPGPU, using the IB interface of GLOO. when using torch.utils.data.dataloader which forks many processes. I got following error: gloo/transport/ibverbs/pair.cc:438] wc->status == IBV_WC_SUCCESS. 5 vs 0. Send for slot 0: Work Request Flushed Error
After debugging, I found that this problem was caused by fork's incomplete support for libibverbs. https://www.rdmamojo.com/2012/05/24/ibv_fork_init/
I think we need to prompt users who are using the Infiniband interface to set the environment variable RDMA_FORK_SAFE or IBV_ FORK_SAFE, or call this interface when initializing IB like nccl (gloo/ibverbs/device. cc).
The text was updated successfully, but these errors were encountered:
hi, I wrote a communication framework on our company's self-developed GPGPU, using the IB interface of GLOO. when using torch.utils.data.dataloader which forks many processes. I got following error:
gloo/transport/ibverbs/pair.cc:438] wc->status == IBV_WC_SUCCESS. 5 vs 0. Send for slot 0: Work Request Flushed Error
After debugging, I found that this problem was caused by fork's incomplete support for libibverbs.
https://www.rdmamojo.com/2012/05/24/ibv_fork_init/
I think we need to prompt users who are using the Infiniband interface to set the environment variable RDMA_FORK_SAFE or IBV_ FORK_SAFE, or call this interface when initializing IB like nccl (gloo/ibverbs/device. cc).
The text was updated successfully, but these errors were encountered: