Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EINVAL error on PreparedQueuePair build #24

Open
mmasque opened this issue Sep 1, 2022 · 4 comments
Open

EINVAL error on PreparedQueuePair build #24

mmasque opened this issue Sep 1, 2022 · 4 comments

Comments

@mmasque
Copy link

mmasque commented Sep 1, 2022

Hello,

I'm trying out the library for a project, for now doing some tests using Soft-RoCE on a pair of Ubuntu20.10 VMs.

I cloned the repo and ran the example, and I inconsistently encounter the following error:
panicked at 'called Result::unwrap() on an Err value: Os { code: 22, kind: InvalidInput, message: "Invalid argument" }', ibverbs/examples/loopback.rs:16:10.

The code that gives the issue is
let qp_builder = pd.create_qp(&cq, &cq, ibverbs::ibv_qp_type::IBV_QPT_RC).build().unwrap();,
in particular the call in .build() to ibv_create_qp. I've run it through the debugger and have not been able to spot any differences between successful and unsuccessful runs.

I would say it happens about 50% of the time, and for the rest the code runs fine. Even more strangely, I tried moving the Protection Domain memory allocation code above the queue pair builder and found that it fixed the issue, but I have no idea why:

let mut mr = pd.allocate::<u64>(2).unwrap();
mr[1] = 0x42;

let qp_builder = pd.create_qp(&cq, &cq, ibverbs::ibv_qp_type::IBV_QPT_RC).build().unwrap();

System details:

I'm using this Vagrant VM image and the rdma_rxe driver for Soft-RoCE, which I load using sudo modload rdma_rxe before adding a soft-roce device: sudo rdma link add rxe_0 type rxe netdev eth1. I've tested the setup using ib_send_bw, which works.

Any idea what could be causing the problem?

@jonhoo
Copy link
Owner

jonhoo commented Sep 3, 2022

cc @rdelfin and @daniel-noland who have had more real-world experience with this crate than I have :)

@rdelfin
Copy link
Contributor

rdelfin commented Sep 4, 2022

I do remember the loopback example not quite working as written, but I'd have to test why. This might be a good opportunity to invest into fixing up the branch we're using properly to upstream.
cc @dmweis

@daniel-noland
Copy link
Contributor

Hello @mmasque, sorry for delay. I have been working on this crate again this weekend so I may be able to hunt this bug down.

I have noticed the same behavior with Soft-RoCE especially. I suspect that the gid index is the source of the problem but I'm not 100% sure yet. I'll let you know what I find.

@kyrie06
Copy link

kyrie06 commented Apr 15, 2024

Hello,

I'm trying out the library for a project, for now doing some tests using Soft-RoCE on a pair of Ubuntu20.10 VMs.

I cloned the repo and ran the example, and I inconsistently encounter the following error: panicked at 'called Result::unwrap() on an Err value: Os { code: 22, kind: InvalidInput, message: "Invalid argument" }', ibverbs/examples/loopback.rs:16:10.

The code that gives the issue is let qp_builder = pd.create_qp(&cq, &cq, ibverbs::ibv_qp_type::IBV_QPT_RC).build().unwrap();, in particular the call in .build() to ibv_create_qp. I've run it through the debugger and have not been able to spot any differences between successful and unsuccessful runs.

I would say it happens about 50% of the time, and for the rest the code runs fine. Even more strangely, I tried moving the Protection Domain memory allocation code above the queue pair builder and found that it fixed the issue, but I have no idea why:

let mut mr = pd.allocate::<u64>(2).unwrap();
mr[1] = 0x42;

let qp_builder = pd.create_qp(&cq, &cq, ibverbs::ibv_qp_type::IBV_QPT_RC).build().unwrap();

System details:

I'm using this Vagrant VM image and the rdma_rxe driver for Soft-RoCE, which I load using sudo modload rdma_rxe before adding a soft-roce device: sudo rdma link add rxe_0 type rxe netdev eth1. I've tested the setup using ib_send_bw, which works.

Any idea what could be causing the problem?

could u show me your ifconfig -a info,please.
I meet a question, when I use the examples to run on the local SoftRoce, I always get a error "thread 'main' panicked at ibverbs/examples/loopback.rs:19:49:
called Result::unwrap() on an Err value: Os { code: 110, kind: TimedOut, message: "Connection timed out" }" when handshake.
I confused about this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants