Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

channel和网络连接之间的关系 #131

Closed
gydong opened this issue Nov 26, 2017 · 18 comments
Closed

channel和网络连接之间的关系 #131

gydong opened this issue Nov 26, 2017 · 18 comments

Comments

@gydong
Copy link
Contributor

gydong commented Nov 26, 2017

从文档来看,使用brpc的http client,如果想达到长连接的效果,应该保存住channel实例并重用它。由此推断,channel和网络连接之间有关联关系。但在文档中又看到,rpc请求发出后,channel实例立马就可以析构了,不用等待结果回来后再析构。由此推断,channel和网络连接又好像没有关系。两个推断有矛盾的地方,求真相。

@jamesge
Copy link
Contributor

jamesge commented Nov 27, 2017

channel和socket是多对多的关系,多个channel可以共用一个socket(通过引用计数),一个channel也可以指向多个socket(连接集群)。
长连接并不需要“保存住channel实例并重用它”。重用的主要原因是连接集群的channel在Init时要连接一次名字服务,一般不会很快,所以要重用。
有一个edge case是:最后一个channel析构时会销毁对应的socket,如果一个socket只被一个channel引用,且那个channel反复地创建析构,那么对应的socket就会被反复地创建析构,类似于短链接。要规避这个情况,一般可以设置-defer_close_second,或者复用channel。

@gydong
Copy link
Contributor Author

gydong commented Nov 27, 2017

@jamesge 多谢解惑。

@cool-colo
Copy link

cool-colo commented Nov 27, 2017

@jamesge 顺便问一句,http connection type设置成short,即使复用channel, 每次请求也是新建连接吧?

@gydong
Copy link
Contributor Author

gydong commented Nov 27, 2017

@jamesge 我看了一下源码和文档,试图修改http example,看一下连接重用的情况(代码如下)。因为idle_timeout_second的默认值是10秒钟,所以我就加了个9秒钟的循环,期望看到和服务端之间只有一个TCP连接存在,但tcpdump抓包发现每隔9秒钟就多一个time-wait状态的TCP连接,和期望不符合,求解惑。

` while (true) {
// A Channel represents a communication line to a Server. Notice that
// Channel is thread-safe and can be shared by all threads in your program.
brpc::Channel channel;
brpc::ChannelOptions options;
options.protocol = FLAGS_protocol;
options.timeout_ms = FLAGS_timeout_ms/milliseconds/;
options.max_retry = FLAGS_max_retry;

  // Initialize the channel, NULL means using default options.
  // options, see `brpc/channel.h'.
  if (channel.Init(url, FLAGS_load_balancer.c_str(), &options) != 0) {
    LOG(ERROR) << "Fail to initialize channel";
    return -1; 
  }   

  // We will receive response synchronously, safe to put variables
  // on stack.
  brpc::Controller cntl;

  cntl.http_request().uri() = url;
  if (!FLAGS_d.empty()) {
    cntl.http_request().set_method(brpc::HTTP_METHOD_POST);
    cntl.request_attachment().append(FLAGS_d);
  }

  // Because `done'(last parameter) is NULL, this function waits until
  // the response comes back or error occurs(including timedout).
  channel.CallMethod(NULL, &cntl, NULL, NULL, NULL);
  if (cntl.Failed()) {
    std::cerr << cntl.ErrorText() << std::endl;
    return -1;
  }
  // If -http_verbose is on, brpc already prints the response to stderr.
  if (!brpc::FLAGS_http_verbose) {
    std::cout << cntl.response_attachment() << std::endl;
  }
  std::this_thread::sleep_for(std::chrono::seconds(9));
}`

@chenzhangyi
Copy link
Member

idle timeout 是server的配置. client端的socket如果没有被任何channel或者RPC引用的话, 默认是直接释放。 如果希望延迟释放需要配置一下 --defer_close_second

@gydong
Copy link
Contributor Author

gydong commented Nov 27, 2017

@chenzhangyi 这个文档和你讲的不一致,文档上说的都是客户端行为,文档链接 https://github.com/brpc/brpc/blob/master/docs/cn/client.md

@cool-colo
Copy link

@gydong 我也在测试brpc http client, ,10秒idle关闭连接没问题。 你上面的例子每9秒channel就析构了,而对应的socket又没有别的channel引用, 就会被close了。

@gydong
Copy link
Contributor Author

gydong commented Nov 28, 2017

@cool-colo 明白了。那我要重用连接的话,还是设置defer_close_second吧。

@jamesge
Copy link
Contributor

jamesge commented Nov 28, 2017

@cool-colo 设置为short即使复用channel每次也新建连接。
@gydong client端的是一个gflags叫-idle_timeout_second,关闭的是连接池中长久不用的连接;而server端的是ServerOptions中的idle_timeout_sec,关闭的是server端长久不用的连接。名字相似是历史问题,但并不是同一个东西。

@cool-colo
Copy link

@gydong @jamesge 我在测试过程还遇到了coredump, 用bthread并发向几台下游http服务器发请求, 不重用channel, connect type为pooled, 跑几个小时或一天就会coredump。 把connect type 改成short,其它不变, 跑一周也没问题。
所以为了长连接,并且不coredump, 我把channel用doublely buffer data cache起来了。 目前跑了一天多,没发现问题。

@jamesge
Copy link
Contributor

jamesge commented Nov 28, 2017

coredump的话你就具体看下挂在哪了。

@cool-colo
Copy link

应该是内存越界这类的问题, 跑起来内存占用量很稳定,不会是memory leak.
#0 0x0000000000b89bcf in std::string::_Rep::_M_dispose ()
#1 0x0000000000b8bfb2 in butil::FlatMap<std::string, std::string, butil::DefaultHasherstd::string, butil::DefaultEqualTostd::string, false>::~FlatMap() ()
#2 0x0000000000f460e7 in brpc::URI::~URI() () at src/brpc/uri.cpp:31
#3 0x0000000000ef9a0c in brpc::Controller::DeleteStuff() () at ./src/brpc/http_header.h:37
#4 0x0000000000efaccf in brpc::Controller::~Controller() () at src/brpc/controller.cpp:126
#5 0x0000000000b99a09 in HttpHandler::Prepare(parallel::cpp2::ComboHttpRequest&)::{lambda()#1}::operator()() const ()
#6 0x0000000000b99cec in std::_Function_handler<parallel::cpp2::HttpResponse (), std::_Bind_result<parallel::cpp2::HttpResponse, HttpHandler::Prepare(parallel::cpp2::ComboHttpRequest&)::{lambda()#1} ()> >::_M_invoke(std::_Any_data const&) ()
#7 0x0000000000b8d224 in std::enable_if<!(std::is_same<std::result_of<poi::ParallelExecutorparallel::cpp2::HttpResponse::AsyncExecute(std::function<parallel::cpp2::HttpResponse ()>&&)::{lambda()#1}::operator()() const::{lambda()#1} ()>::type, void>::value), folly::Try<poi::ParallelExecutorparallel::cpp2::HttpResponse::AsyncExecute(std::function<parallel::cpp2::HttpResponse ()>&&)::{lambda()#1}::operator()() const::{lambda()#1} ()> >::type folly::makeTryWith<poi::ParallelExecutorparallel::cpp2::HttpResponse::AsyncExecute(std::function<parallel::cpp2::HttpResponse ()>&&)::{lambda()#1}::operator()() const::{lambda()#1}>(std::is_same&&) ()
#8 0x0000000000b93834 in std::_Function_handler<void (), std::_Bind_result<void, poi::ParallelExecutorparallel::cpp2::HttpResponse::AsyncExecute(std::function<parallel::cpp2::HttpResponse ()>&&)::{lambda()#1} ()> >::_M_invoke(std::_Any_data const&) ()
#9 0x0000000000b8b4fc in poi::FunctionWrapper::ThreadFunc(void*) ()
#10 0x0000000000eeba7d in bthread::TaskGroup::task_runner(long) () at src/bthread/task_group.cpp:291
#11 0x000000000100f2f1 in bthread_make_fcontext ()

@jamesge
Copy link
Contributor

jamesge commented Nov 28, 2017

从这个栈看不出什么。你可以试下用example/http_c++模拟能否复现:

# 在8000-8002启动三个server ,测完后可用pkill http_server杀掉。
for ((i=0;i<3;++i)); do ( ./http_server -port $((8000+i)) & ) ; done 
# 上压力,使用bthread发送,连接方式是pooled
./benchmark_http -url=list://localhost:8000,localhost:8001,localhost:8002 -load_balancer=rr -use_bthread -connection_type pooled

@gydong
Copy link
Contributor Author

gydong commented Nov 28, 2017

@jamesge @chenzhangyi 测试了一下这个场景,代码如下(运行的命令为:http_client -defer_close_second=15 "www.baidu.com"),和预期还是不符。defer确实是defer了,但在前一个channel析构后,后一个构造的channel并没有重用到前一个channel腾出的连接。机器上还是看到很多time-wait状态的连接。是说重用这个动作只能发生在前一个channel析构之前吗?

` while (true) {
// A Channel represents a communication line to a Server. Notice that
// Channel is thread-safe and can be shared by all threads in your program.
std::unique_ptrbrpc::Channel channel(new brpc::Channel());
brpc::ChannelOptions options;
options.protocol = FLAGS_protocol;
options.timeout_ms = FLAGS_timeout_ms/milliseconds/;
options.max_retry = FLAGS_max_retry;

// Initialize the channel, NULL means using default options.
// options, see `brpc/channel.h'.
if (channel->Init(url, FLAGS_load_balancer.c_str(), &options) != 0) {
  LOG(ERROR) << "Fail to initialize channel";
  return -1; 
}   

// We will receive response synchronously, safe to put variables
// on stack.
brpc::Controller cntl;

cntl.http_request().uri() = url;
if (!FLAGS_d.empty()) {
  cntl.http_request().set_method(brpc::HTTP_METHOD_POST);
  cntl.request_attachment().append(FLAGS_d);
}   

// Because `done'(last parameter) is NULL, this function waits until
// the response comes back or error occurs(including timedout).
channel->CallMethod(NULL, &cntl, NULL, NULL, NULL);
if (cntl.Failed()) {
  std::cerr << cntl.ErrorText() << std::endl;
  return -1; 
}   
// If -http_verbose is on, brpc already prints the response to stderr.
if (!brpc::FLAGS_http_verbose) {
  std::cout << cntl.response_attachment() << std::endl;
}   
std::this_thread::sleep_for(std::chrono::seconds(9));

} `

@jamesge
Copy link
Contributor

jamesge commented Nov 28, 2017

你最好试下本地server,访问公网可能超时并导致关连接。defer_close对所有协议,single/pooled都有效。

@gydong
Copy link
Contributor Author

gydong commented Nov 28, 2017

好的,我再试一下。

@gydong
Copy link
Contributor Author

gydong commented Nov 28, 2017

已测试,证实 -defer_close_second 确实可以起作用。多谢解答!

@gydong gydong closed this as completed Dec 4, 2017
@beyond-wyc
Copy link

如果channel在程序运行期间没有发生析构,那为何在服务端netstat -a只看到端口有侦听,但没有ESTABLISHED的socket连接呢?
难道默认是短连接吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants