Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable brpc use rdma #1836

Merged
merged 7 commits into from
Oct 25, 2022
Merged

enable brpc use rdma #1836

merged 7 commits into from
Oct 25, 2022

Conversation

Tuvie
Copy link
Contributor

@Tuvie Tuvie commented Jul 13, 2022

Trying to merge rdma implementation into master.
This implementation is little different from previous rdma branch.
I remove some proved useless code and try to improve the handshake process.

@Tuvie Tuvie force-pushed the master branch 3 times, most recently from 4549318 to e377e07 Compare July 14, 2022 05:10
CMakeLists.txt Outdated
set(WITH_RDMA_VAL "0")
if(WITH_RDMA)
set(WITH_RDMA_VAL "1")
set(BRPC_WITH_RDMA 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个有用到吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该是没用的,我删掉

@@ -149,6 +169,16 @@ int Channel::InitChannelOptions(const ChannelOptions* options) {
LOG(ERROR) << "Channel does not support the protocol";
return -1;
}

#if BRPC_WITH_RDMA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有BRPC_WITH_RDMA宏的时候,如果用户设置options里的use_rdma,是不是应该报错

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已调整为报错

@@ -85,6 +86,15 @@ ParseResult InputMessenger::CutInputMessage(
<< " bytes, the connection will be closed."
" Set max_body_size to allow bigger messages";
return result;
} else {
if (m->_read_buf.size() >= 4) {
char data[4];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

缩进有问题

if (m->_read_buf.size() >= 4) {
char data[4];
m->_read_buf.copy_to_cstr(data, 4);
if (strncmp(data, "RDMA", 4) == 0 && m->_rdma_state == Socket::RDMA_OFF) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 4 能否定义成常量

}
m->_rdma_state = RDMA_UNKNOWN;
} else {
delete m->_rdma_ep;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同厂内评论,确保这里一定是NULL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@zyearn
Copy link
Member

zyearn commented Aug 3, 2022

@Tuvie 可以简单介绍一下代码结构和改动思路吗?文件有点多,有个整体的overview的话review起来会方便一些,谢谢。

LOG(WARNING) << "SSL is not supported by RDMA";
return false;
}
if (opt->nshead_service) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http_master_service是不是也不支持

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以支持,之前的版本也允许。但是事实上不建议这么做,因为http头本身也很重。所以这个版本给取消了

static butil::Mutex* g_rdma_resource_mutex = NULL;
static RdmaResource* g_rdma_resource_list = NULL;

struct HelloMessage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个叫HelloMessage可能会让人误以为是测试代码

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是握手协议,发送的第一个信息命名为Hello


// This callback is used when extending a new region
// Generally, it is a memory region register call
typedef uint32_t (*Callback)(void*, size_t);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是不是定义在头文件中比较合适?
另外,这个命名不是很容易让人理解这个callback的作用

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是。这个改了

node = g_info->idle_list[block_type][index];
if (!node) {
// There is no block left, extend a new region
if (!ExtendBlockPool(FLAGS_rdma_memory_pool_increase_size_mb,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在当前bucket上找不到,为什么不换个bucket再找找,而是直接Extend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

设立多个bucket还是希望尽量减少相互的竞争。考虑到本身各个bucket随机使用,相对比较平均,一个bucket没有空间了,则大概率其他bucket也快没有了,这时不如直接extend出来。从实际使用建议上看,不建议依赖中途extend的机制,这个会引发Lkey的动态查找。所以暂时也没有太多优化这里。

// Move more blocks from global list to tls list
if (block_type == 0) {
node = g_info->idle_list[0][index];
tls_idle_list = node;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iobuf内部就有一个tls的block缓存,这里的tls_idle_list有什么不一样吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个block pool是为了做block分配前的thread local。iobuf里面的tls是分配出来以后的thread local。实际上iobuf里面不少操作并没有用到里面的tls cache。


void HelloMessage::Serialize(void* data) {
// Note serialization does include magic str
memcpy(data, MAGIC_STR, 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deserialize是不含这个MAGIC_STR的,所以serialize也不包含是不是更合理?

@tsyw1987
Copy link

tsyw1987 commented Aug 9, 2022

后续是否支持同一个进程中启动多个brpc server,并分别绑定到不同的rdma网卡?

}
}

bool HelloNegotiationValid(HelloMessage msg) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

参数是否应该使用引用(const HelloMessage&) 避免拷贝?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数为何单独放在这里,不再HelloMessage结构体呢?

int ReadFromFd(void* data, size_t len);


// Wrute at most len bytes from data to fd in _socket
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrute 拼写错误

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@Tuvie
Copy link
Contributor Author

Tuvie commented Aug 11, 2022

后续是否支持同一个进程中启动多个brpc server,并分别绑定到不同的rdma网卡?

这个可以在后续支持

static RdmaResource* g_rdma_resource_list = NULL;

struct HelloMessage {
void Serialize(void* data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serialize 是否可以设为常量成员函数呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@Tuvie
Copy link
Contributor Author

Tuvie commented Aug 16, 2022

@Tuvie 可以简单介绍一下代码结构和改动思路吗?文件有点多,有个整体的overview的话review起来会方便一些,谢谢。

已在docs中添加相应描述

@wwbmmm
Copy link
Contributor

wwbmmm commented Aug 16, 2022

LGTM

@zyearn
Copy link
Member

zyearn commented Aug 17, 2022

@Tuvie 可以简单介绍一下代码结构和改动思路吗?文件有点多,有个整体的overview的话review起来会方便一些,谢谢。

已在docs中添加相应描述

感谢。目前正在review,文件有点多,看完需要些时间。

@zyearn
Copy link
Member

zyearn commented Aug 31, 2022

LGTM.

@Tuvie 从上次提交到现在有什么bugfix吗?没有的话等 @wwbmmm 看看线上运行的稳定程度,没什么问题的话可以考虑合入了。

@zyearn
Copy link
Member

zyearn commented Aug 31, 2022

@Tuvie 有conflicts,需要merge master重新提交一下

@Tuvie Tuvie closed this Sep 3, 2022
@Tuvie Tuvie reopened this Sep 3, 2022
@Tuvie
Copy link
Contributor Author

Tuvie commented Sep 3, 2022

LGTM.

@Tuvie 从上次提交到现在有什么bugfix吗?没有的话等 @wwbmmm 看看线上运行的稳定程度,没什么问题的话可以考虑合入了。

目前暂时没有,也可以先等等

@Tuvie
Copy link
Contributor Author

Tuvie commented Sep 5, 2022

@Tuvie 有conflicts,需要merge master重新提交一下

done

@serverglen serverglen added the feature new feature label Sep 14, 2022
${OPENSSL_CRYPTO_LIBRARY}
${OPENSSL_SSL_LIBRARY}
${THRIFT_LIB}
${THRIFTNB_LIB}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider glog library. If brpc was compiled with WITH_GLOG set to ON, rdma_performace won't link successfully.

brpc::rdma::GlobalRdmaInitializeOrDie();
}

brpc::StartDummyServerAt(8001);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make dummy port configurable by adding a gflag say dummy_port.

return wc.byte_len;
}
default:
CHECK(false) << "This should not happen";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CHECK(false) << "This should not happen";
// Do not CHECK abort since wc.opcode could be IBV_WC_DRIVER2(136) which is
// a bug already fixed by rdma-core.
// FYI: https://github.com/linux-rdma/rdma-core/commit/4c905646de3e75bdccada4abe9f0d273d76eaf50
LOG(WARNING) << "This should not happen, wc.opcode = " << wc.opcode;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个mlx驱动的bug会在什么情况下触发呢?如果确实是一个严重问题,仅仅抛出WARNING应该也是不行的。

@Tuvie Tuvie force-pushed the master branch 2 times, most recently from 4159b9c to 1499b6a Compare September 23, 2022 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants