Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client未开启ssl,出现core,调用栈显示在ssl相关函数上。 #658

Closed
GardianT opened this issue Feb 18, 2019 · 12 comments · Fixed by #1814
Closed

client未开启ssl,出现core,调用栈显示在ssl相关函数上。 #658

GardianT opened this issue Feb 18, 2019 · 12 comments · Fixed by #1814

Comments

@GardianT
Copy link

Describe the bug (描述bug)
client未开启ssl。出现core,相关调用逻辑显示使用了ssl发送的逻辑。

To Reproduce (复现方法)
暂没有稳定复现方法。偶发core。 但能得知的信息是_ssl_state被判定成SSL_UNKNOWN

Expected behavior (期望行为)
未开启ssl,不执行

Additional context/screenshots (更多上下文/截图)
core完整信息如下。

(gdb) bt
#0 0x00007f2efb805304 in SSL_write () from /opt/compiler/gcc-4.8.2/lib/libssl.so.1.0.0
#1 0x00000000007ed0b7 in base::IOBuf::cut_into_SSL_channel (this=0x7f2cdc9ca090, ssl=ssl@entry=0x0, ssl_error=ssl_error@entry=0x7f2e282a36ac) at baidu/base/iobuf/base/iobuf.cpp:1046
#2 0x00000000007ed173 in base::IOBuf::cut_multiple_into_SSL_channel (ssl=0x0, pieces=pieces@entry=0x7f2e282a36d0, count=count@entry=256, ssl_error=ssl_error@entry=0x7f2e282a36ac) at baidu/base/iobuf/base/iobuf.cpp:1064
#3 0x000000000061794e in baidu::rpc::Socket::DoWrite (this=this@entry=0x7f2c9b9ae130, req=) at baidu/base/baidu-rpc/src/baidu/rpc/socket.cpp:1835
#4 0x000000000061dcc7 in baidu::rpc::Socket::KeepWrite (void_arg=) at baidu/base/baidu-rpc/src/baidu/rpc/socket.cpp:1734
#5 0x000000000077075a in bthread::TaskGroup::task_runner (skip_remained=) at baidu/base/bthread/bthread/task_group.cpp:293
#6 0x0000000000767e01 in bthread_make_fcontext ()
#7 0x0000000000000000 in ?? ()
(gdb) fr 3
#3 0x000000000061794e in baidu::rpc::Socket::DoWrite (this=this@entry=0x7f2c9b9ae130, req=) at baidu/base/baidu-rpc/src/baidu/rpc/socket.cpp:1835
1835 baidu/base/baidu-rpc/src/baidu/rpc/socket.cpp: No such file or directory.
(gdb) p _ssl_state
$1 = baidu::rpc::SSL_UNKNOWN

@GardianT
Copy link
Author

大约对应这里。

    CHECK_EQ(SSL_CONNECTED, ssl_state());
    if (_conn) {
        // TODO: Separate SSL stuff from SocketConnection
        return _conn->CutMessageIntoSSLChannel(_ssl_session, data_list, ndata);
    }

@GardianT
Copy link
Author

没进展么。。

@old-bear
Copy link
Contributor

不好意思,前两天有点忙。
麻烦进入第三层栈,把Socket对象里的数据都打出来看看。
另外,client是长链接还是短连接方式?core之前日志里有没有和server断开的信息

@GardianT
Copy link
Author

$1 = {static STREAM_FAKE_FD = 2147483647, static PROGRESS_INIT = 1, _versioned_ref = {<boost::atomics::atomic> = {<boost::atomics::detail::base_atomic<unsigned long, int>> = {m_storage = 21474836482}, }, },
_shared_part = {<boost::atomics::atomicbaidu::rpc::Socket::SharedPart*> = {<boost::atomics::detail::base_atomic<baidu::rpc::Socket::SharedPart*, void*>> = {m_storage = 139838969303664}, }, },
_nevent = {<boost::atomics::atomic> = {<boost::atomics::detail::base_atomic<int, int>> = {m_storage = 0}, }, }, _keytable_pool = 0x0, _fd = {<boost::atomics::atomic> = {<boost::atomics::detail::base_atomic<int, int>> = {
m_storage = -1}, }, }, _tos = 0, _reset_fd_real_us = 1550196902116673, _remote_side = {ip = {s_addr = 355817738}, port = 9992}, _local_side = {ip = {s_addr = 0}, port = 0}, _on_edge_triggered_events = 0x5f6dc0
baidu::rpc::InputMessenger::OnNewMessages(baidu::rpc::Socket*), _options = {fd = -1, remote_side = {ip = {s_addr = 355817738}, port = 9992}, user = 0x2ac7ae0, on_edge_triggered_events = 0x5f6dc0 baidu::rpc::InputMessenger::OnNewMessages(baidu::rpc::Socket*),
health_check_interval_s = 3, owns_ssl_ctx = true, ssl_ctx = 0x0, sni_name = {static npos = , _M_dataplus = {<std::allocator> = {<__gnu_cxx::new_allocator> = {}, },
_M_p = 0x17e1fb8 std::string::_Rep::_S_empty_rep_storage@@GLIBCXX_3.4+24 ""}}, keytable_pool = 0x0, conn = 0x0, app_connect = 0x0, initial_parsing_context = 0x0}, _user = 0x2ac7ae0, _conn = 0x0, _app_connect = 0x0, _this_id = 17179872856,
_preferred_index = 1, _hc_count = 0, _last_msg_size = 0, _avg_msg_size = 63, _read_buf = {base::IOBuf = {static DEFAULT_BLOCK_SIZE = 8192, static INITIAL_CAP = 32, static BLOCK_SIZE = 8192, static DEFAULT_PAYLOAD = 8160, static MAX_BLOCK_SIZE = 65536,
static MAX_PAYLOAD = 65504, static INVALID_AREA = 0, {_bv = {magic = 0, start = 0, refs = 0x0, nref = 0, cap_mask = 0, nbytes = 0}, _sv = {refs = {{offset = 0, length = 0, block = 0x0}, {offset = 0, length = 0, block = 0x0}}}}}, _block = 0x0},
_read_control_buf = {base::IOBuf = {static DEFAULT_BLOCK_SIZE = 8192, static INITIAL_CAP = 32, static BLOCK_SIZE = 8192, static DEFAULT_PAYLOAD = 8160, static MAX_BLOCK_SIZE = 65536, static MAX_PAYLOAD = 65504, static INVALID_AREA = 0, {_bv = {magic = 0,
start = 0, refs = 0x0, nref = 0, cap_mask = 0, nbytes = 0}, _sv = {refs = {{offset = 0, length = 0, block = 0x0}, {offset = 0, length = 0, block = 0x0}}}}}, _block = 0x0},
_last_readtime_us = {<boost::atomics::atomic> = {<boost::atomics::detail::base_atomic<long, int>> = {m_storage = 24016361002002}, }, },
_parsing_context = {<boost::atomics::atomicbaidu::rpc::Destroyable*> = {<boost::atomics::detail::base_atomic<baidu::rpc::Destroyable*, void*>> = {m_storage = 0}, }, }, _correlation_id = 0, _health_check_interval_s = 3,
_ninprocess = {<boost::atomics::atomic> = {<boost::atomics::detail::base_atomic<unsigned int, int>> = {m_storage = 1}, }, },
_auth_flag_error = {<boost::atomics::atomic> = {<boost::atomics::detail::base_atomic<unsigned long, int>> = {m_storage = 0}, }, }, _auth_id = {value = 63595583885171}, _auth_context = 0x0,
_ssl_state = baidu::rpc::SSL_UNKNOWN, _ssl_session = 0x0, _connection_type_for_progressive_read = baidu::rpc::CONNECTION_TYPE_UNKNOWN, _controller_released_socket = {<boost::atomics::atomic> = {<boost::atomics::detail::base_atomic<bool, int>> = {
m_storage = 0 '\000'}, }, }, _overcrowded = true, _fail_me_at_server_stop = false, _logoff_flag = {<boost::atomics::atomic> = {<boost::atomics::detail::base_atomic<bool, int>> = {
m_storage = 0 '\000'}, }, }, _recycle_flag = {<boost::atomics::atomic> = {<boost::atomics::detail::base_atomic<bool, int>> = {m_storage = 1 '\001'}, }, }, _error_code = 104, _error_text = {
static npos = , _M_dataplus = {<std::allocator> = {<__gnu_cxx::new_allocator> = {}, },
_M_p = 0x7f2ed162f4d8 "Fail to read from fd=2797 SocketId=17179872856@10.89.53.21:9992@27623: Connection reset by peer"}}, _pipeline_mutex = {_native_handle = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0,
__list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, _pipeline_q = 0x0, _id_wait_list_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0,
__next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, _id_wait_list = {impl = 0x0, head = 0, size = 0, conflict_head = 0, conflict_size = 0},
_last_writetime_us = {<boost::atomics::atomic> = {<boost::atomics::detail::base_atomic<long, int>> = {m_storage = 24016361002002}, }, },
_unwritten_bytes = {<boost::atomics::atomic> = {<boost::atomics::detail::base_atomic<long, int>> = {m_storage = 67270261}, }, }, _epollout_butex = 0x7f2c3517c550,
_write_head = {<boost::atomics::atomicbaidu::rpc::Socket::WriteRequest*> = {<boost::atomics::detail::base_atomic<baidu::rpc::Socket::WriteRequest*, void*>> = {m_storage = 139837280143696}, }, }, _stream_mutex = {_native_handle = {
__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, _stream_set = 0x0, _rdma_ep = 0x0, _enable_rdma = false,
_can_set_enable_rdma = {<boost::atomics::atomic> = {<boost::atomics::detail::base_atomic<bool, int>> = {m_storage = 1 '\001'}, }, }, _enable_rdma_lock = {_native_handle = {__data = {__lock = 0, __count = 0, __owner = 0,
__nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}}

@GardianT
Copy link
Author

链接方式是默认。日志因为已经被清理了,看不到了。

@old-bear
Copy link
Contributor

fd=-1,Fail to read from fd=2797 SocketId=17179872856@10.89.53.21:9992@27623: Connection reset by peer"
应该是链接被关闭后,不知道怎么又触发write了,我内部看下

@leonwu327
Copy link

请问现在这个问题有知道是什么原因了吗@old-bear?我们的环境中也出现了同样的问题,堆栈是一样的。core的时候没有其他的错误打印,只有两个CHECK信息:
Check failed: NULL == _write_head.load(butil::memory_order_relaxed).
#0 0x7f660a5b47ee brpc::Socket::WaitAndReset()
#1 0x7f660a5b8f5a brpc::Socket::HealthCheckThread()
#2 0x7f660a474e3a bthread::TaskGroup::task_runner()
#3 0x7f660a45dc81 bthread_make_fcontext

Check failed: SSL_CONNECTED == ssl_state() (3 vs 0).
#0 0x7f660a5b3d4c brpc::Socket::DoWrite()
#1 0x7f660a5ba267 brpc::Socket::KeepWrite()
#2 0x7f660a474e3a bthread::TaskGroup::task_runner()
#3 0x7f660a45dc81 bthread_make_fcontext
谢谢。

@leonwu327
Copy link

fd=-1,Fail to read from fd=2797 SocketId=17179872856@10.89.53.21:9992@27623: Connection reset by peer"
应该是链接被关闭后,不知道怎么又触发write了,我内部看下

是不是和这个问题类似:
#643
只不过这个解决方案只是去掉 CHECK,是不是可能还会引起这个core?@jamesge
多谢

@GardianT
Copy link
Author

GardianT commented May 9, 2019

remind,这个问题想问下有进度么?最近看见过四五次了。

@yichenluan
Copy link
Contributor

@GardianT 请问下你们现在是怎么解决的

@WoodsCumming
Copy link

Is there any progress on this issue?

@chenBright
Copy link
Contributor

提了个fix RP:#1814

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants