Skip to content
This repository has been archived by the owner on Nov 16, 2019. It is now read-only.

RDMA over ethernet #74

Open
markyeh opened this issue Jun 8, 2016 · 5 comments
Open

RDMA over ethernet #74

markyeh opened this issue Jun 8, 2016 · 5 comments

Comments

@markyeh
Copy link

markyeh commented Jun 8, 2016

Hi, folks,

Do you have any plan to support RoCE(RDMA over Converged Ethernet) device?

Now I tried to setup CaffeOnSpark in my GPU+RoCE environment, I added GID index in source code for RoCE connection.

But now I got failed in ibv_reg_mr() by using the "data_" address.
If I replaced the "data_" address with a malloc address (same size), It works.

That would be great if you could give me any suggestion about this issue.
Thanks.

@mriduljain
Copy link
Contributor

We don't have any plans for RoCE, but it would be great if you can make it work. Like Infiniband RDMA support it would be awesome to have ROCE support for CaffeOnSpark. I don't know much about RoCE, but can provide whatever help you need. Let me checkout Infiniband code tomorrow and help with what you are asking.

@mriduljain
Copy link
Contributor

SInce we use libverbs, we are almost ready for RoCE too. I guess we just need to add a couple of lines for RoCE handshake or equivalent.

@mriduljain
Copy link
Contributor

@markyeh I see, you apparently added that in your code. In next couple of weeks I may be able to check it by switching to RoCE, if you share the code.

@shenjingGitHub
Copy link

shenjingGitHub commented Sep 5, 2017

hi,markyeah!Can you run on InfiniBand status?When i use infiniband cards,the data cannot transfer between sending and receiving.Do you know how to resolve the problem?thank you~

@markyeh
Copy link
Author

markyeh commented Sep 6, 2017

Hi Shenjing,

Sorry, I do not use CaffeOnSpark anymore.
The RoCE issue that I mentioned should be a driver setup issue, just FYI.
Good luck.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants