Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ib_write_bw -d mlx5_0 -F -R -q 2 --use_cuda=0 <IP> - Couldn't allocate MR #126

Open
francisguillier opened this issue Oct 4, 2021 · 3 comments

Comments

@francisguillier
Copy link

Hi,

we tried to test GPUDirect RDMA.

Test pod deployed from https://github.com/Mellanox/k8s-images

we deployed 2 pods:

Server pod:

root@rdma-cuda-test-pod-1:~# ib_write_bw -d mlx5_0 -F -R -q 2 --use_cuda=0


  • Waiting for client to connect... *

Client pod:

root@rdma-cuda-test-pod-1:~# ib_write_bw -d mlx5_0 -F -R -q 2 --use_cuda=0 192.168.111.1
initializing CUDA
Listing all CUDA devices in system:
CUDA device 0: PCIe address is 02:00

Picking device No. 0
[pid = 56, dev = 0] device name = [NVIDIA A30-8C]
creating CUDA Ctx
making it the current CUDA Ctx
cuMemAlloc() of a 262144 bytes GPU buffer
allocated GPU buffer address at 0000010013000000 pointer=0x10013000000
Couldn't allocate MR
failed to create mr
Failed to create MR
Failed to initialize RDMA contexts.
ERRNO: Bad address.
Failed to handle RDMA CM event.
ERRNO: Bad address.
Failed to connect RDMA CM events.
ERRNO: Bad address.
Segmentation fault (core dumped)

what does "Couldn't allocate MR" mean?

thanks in advance

@francisguillier
Copy link
Author

Sorry: to provide some more context:
I am testing GPU Operator + Network Operator.
nv-peermem has been enabled with GPU Operator deployment

@wangku0
Copy link

wangku0 commented May 24, 2023

Hi!
Have you solved this problem yet?I have also encountered this problem and would like to ask you how to solve it.
thanks in advance.

@zpkhor
Copy link

zpkhor commented Jan 9, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants