-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ib_write_bw -d mlx5_0 -F -R -q 2 --use_cuda=0 <IP> - Couldn't allocate MR #126
Comments
Sorry: to provide some more context: |
Hi! |
try |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
we tried to test GPUDirect RDMA.
Test pod deployed from https://github.com/Mellanox/k8s-images
we deployed 2 pods:
Server pod:
root@rdma-cuda-test-pod-1:~# ib_write_bw -d mlx5_0 -F -R -q 2 --use_cuda=0
Client pod:
root@rdma-cuda-test-pod-1:~# ib_write_bw -d mlx5_0 -F -R -q 2 --use_cuda=0 192.168.111.1
initializing CUDA
Listing all CUDA devices in system:
CUDA device 0: PCIe address is 02:00
Picking device No. 0
[pid = 56, dev = 0] device name = [NVIDIA A30-8C]
creating CUDA Ctx
making it the current CUDA Ctx
cuMemAlloc() of a 262144 bytes GPU buffer
allocated GPU buffer address at 0000010013000000 pointer=0x10013000000
Couldn't allocate MR
failed to create mr
Failed to create MR
Failed to initialize RDMA contexts.
ERRNO: Bad address.
Failed to handle RDMA CM event.
ERRNO: Bad address.
Failed to connect RDMA CM events.
ERRNO: Bad address.
Segmentation fault (core dumped)
what does "Couldn't allocate MR" mean?
thanks in advance
The text was updated successfully, but these errors were encountered: