Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double delete in wspp_callback_client destructor? #32

Closed
glukacsy opened this issue Dec 14, 2015 · 10 comments · Fixed by #1080
Closed

Double delete in wspp_callback_client destructor? #32

glukacsy opened this issue Dec 14, 2015 · 10 comments · Fixed by #1080
Labels

Comments

@glukacsy
Copy link

Hi,

We have this weird issue that users report crashes, but so far all my tries to reproduce it failed. The crash seems to be happening in the destructor of the websocket callback client and the stacktraces are always the same, here is one for example:

Thread 64 Crashed:: Dispatch queue: com.apple.root.default-qos
0 libsystem_kernel.dylib 0x00007fff960630ae __pthread_kill + 10
1 libsystem_pthread.dylib 0x00007fff96cc6500 pthread_kill + 90
2 libsystem_c.dylib 0x00007fff9a11c41b __abort + 145
3 libsystem_c.dylib 0x00007fff9a11c38a abort + 144
4 libc++abi.dylib 0x00007fff8bc52f81 abort_message + 257
5 libc++abi.dylib 0x00007fff8bc7896a default_terminate_handler() + 46
6 com.cisco.SparkMacDesktop 0x000000010261ad09 CLSTerminateHandler() + 270
7 libc++abi.dylib 0x00007fff8bc7619e std::__terminate(void (_)()) + 8
8 libc++abi.dylib 0x00007fff8bc7622d std::terminate() + 77
9 libcpprest.2.7.dylib 0x00000001033b1ba9 web::websockets::client::details::wspp_callback_client::~wspp_callback_client() + 889
10 libc++.1.dylib 0x00007fff90411cb8 std::__1::__shared_weak_count::__release_shared() + 44
11 com.cisco.SparkMacDesktop 0x000000010275ba4f MercuryManager::_websocketConnect(int) + 1705
12 com.cisco.SparkMacDesktop 0x0000000102675e75 std::__1::__function::__func<pplx::details::_MakeVoidToUnitFunc(std::__1::function<void ()> const&)::'lambda'(), std::__1::allocator<pplx::details::_MakeVoidToUnitFunc(std::__1::function<void ()> const&)::'lambda'()>, unsigned char ()>::operator()() + 13
13 com.cisco.SparkMacDesktop 0x00000001027619f9 pplx::details::_PPLTaskHandle<unsigned char, pplx::task::_InitialTaskHandle<void, MercuryManager::websocketConnect(int)::$_0, pplx::details::_TypeSelectorNoAsync>, pplx::details::_TaskProcHandle>::invoke() const + 265
14 com.cisco.SparkMacDesktop 0x00000001026508f3 pplx::details::_TaskProcHandle::RunChoreBridge(void) + 19
15 libdispatch.dylib 0x00007fff95f123c3 _dispatch_client_callout + 8
16 libdispatch.dylib 0x00007fff95f16253 _dispatch_root_queue_drain + 1890
17 libdispatch.dylib 0x00007fff95f15ab8 _dispatch_worker_thread3 + 91
18 libsystem_pthread.dylib 0x00007fff96cc34f2 _pthread_wqthread + 1129
19 libsystem_pthread.dylib 0x00007fff96cc1375 start_wqthread + 13

Now, the interesting bit is that i see from the Casablanca code that some engineer might have run into this issue before because there seems to be a specific attempt to "handle" this:

...
switch (m_state) {
case DESTROYED:
// This should be impossible
std::abort();
...

(I am not fully sure this actually works cause if the object is indeed dead then accessing m_state might be a bad idea already, but anyway, that's not the point I am trying to make here).

Based on the stacktrace and the code in ws_client_wspp.cpp it looks like a smoking gun: someone tries to double delete the object and the hard coded abort() is getting fired.

Now, as much as we can ever be sure, I am pretty-pretty sure that we, the consuming app, is not doing this double delete. We create a unique pointer to a Web socket callback client and when we get the close handler invoked (set up via set_close_handler), for example if I lose network, then we create a new PPL task, wait a bit and attempt to restore the websocket connection. The way we do that is that we create a brand new callback client (with make_unique) and replace the existing member field with this new pointer. At this point the old callback client - now in CLOSED state - is released and its destructor is called.

I simply see no C++ way how can the destructor can be invoked twice. Do you guys have any idea what's going on here? I presume there were some issues in this area and that's why the codepath is there in the destructor with abort?

Regards,
Gergely

@ras0219-msft
Copy link
Contributor

I'm responsible for the code snippet above. It was not placed there because I encountered that issue -- I simply added it as standard practice invariant checking. If the code manages to reach that line, we're deep into undefined behavior territory so I felt like fast-fail was the appropriate response.

Reading closely what you've posted: do you have a unique_ptr to the wspp_callback_client itself? It inherits from std::enable_shared_from_this and we grab shared_ptr's internally. If you're calling std::make_unique<wspp_callback_client>() that's almost guaranteed to result in a double-delete. You must use std::make_shared<wspp_callback_client>() because it keeps itself alive until all async activity has stopped.

@glukacsy
Copy link
Author

Hi, thanks for the reply.

So no, I have a unique_ptr to the websocket_callback_client (which internally has a shared_ptr member to websocket_client_callback_impl) not to the websocket_callback_client_impl instance, so we should be fine there.

websocket_callback_client is declared in ws_client.h and does not inherit from anything - it uses PIMPL to have a private shared_ptr to a websocket_client_callback_impl instance. As a consumer I only have visibility to ws_client.h and so I assume I am free to store the websocket_callback_client instance on stack or via a unique_ptr or ... etc. Am I wrong here?

In any case, before I used a unique_ptr I had a simple stack based member to store the websocket callback client instance and encountered the crash as well. That's when I switched to using unique_ptr - but same results.

I am at the stage when I am about just create a new websocket_callback_client on the heap with a raw "new" and let it leak - thart way I will be sure the destructor is not called...

(btw, i would be happy not to create a new instance, but once an instance is closed I cannot seem to be re-using it to reconnect, I am forced to create a brand new instance - is this on purpose?)

regards,
Gergely

@kavyako kavyako added the bug label Jan 5, 2016
@bartek256
Copy link

bartek256 commented May 27, 2016

Hi @glukacsy,
Did you manage to somehow workaround or fix it?
I just received similar report and I too keep websocket_client_callback_impl on stack.

Thanks,
Bartek

@glukacsy
Copy link
Author

No, we simply moved the Websocket logic out of process, so if it crashes it does not affect the main app....

@ksinica
Copy link

ksinica commented Jul 15, 2016

Any workarounds for this issue?

@dashesy
Copy link

dashesy commented Oct 14, 2016

Any workaround / new information about this issue?

@JAEspigares
Copy link

I have the same issue in windows. It seems to happen when an invalid proxy uri (i.e. the uri does not include the scheme) is specified in websocket_client_config when creating the websocket_callback_client object.
I don't know if there is any other condition that cause the failure, but for now I'm being careful when setting proxy config and it seems to work.

@piotr-walczak
Copy link

Hi, this issue is a real killer on Win. Any updates?

@bin0101
Copy link

bin0101 commented Oct 17, 2018

Hi @ras0219-msft , Do we have any updates on this issue? We have the same issue.

@zzzhr1990
Copy link

zzzhr1990 commented Nov 6, 2018

same problem on OSX

__pthread_kill 0x00007fff7ab3db86
pthread_kill 0x00007fff7abf3c50
raise 0x00007fff7aa5a096
pplx::details::_ExceptionHolder::~_ExceptionHolder() 0x0000000101064c1e
std::__1::shared_ptr<Concurrency::streams::details::basic_ostream_helper<unsigned char> >::~shared_ptr() 0x0000000101062e51
pplx::details::_Task_completion_event_impl<web::http::http_response>::~_Task_completion_event_impl() 0x000000010108c9b3
std::__1::shared_ptr<Concurrency::streams::details::basic_ostream_helper<unsigned char> >::~shared_ptr() 0x0000000101062e51
web::http::client::details::request_context::~request_context() 0x0000000101186fda
boost::asio::detail::reactive_socket_recv_op<boost::asio::mutable_buffers_1, boost::asio::ssl::detail::io_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp>, boost::asio::ssl::detail::read_op<boost::asio::mutable_buffers_1>, boost::asio::detail::read_until_delim_string_op<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&>, boost::asio::basic_streambuf_ref<std::__1::allocator<char> >, boost::_bi::bind_t<void, boost::_mfi::mf1<void, web::http::client::details::asio_context, boost::system::error_code const&>, boost::_bi::list2<boost::_bi::value<std::__1::shared_ptr<web::http::client::details::asio_context> >, boost::arg<1> (*)()> > > > >::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) 0x00000001011781a9
boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) 0x0000000101102e14
boost::asio::detail::scheduler::run(boost::system::error_code&) 0x0000000101102b03
boost::asio::detail::posix_thread::func<(anonymous namespace)::threadpool_impl::add_thread()::'lambda'()>::run() 0x0000000101164233
boost_asio_detail_posix_thread_function 0x000000010116a108
_pthread_body 0x00007fff7abf133d
_pthread_start 0x00007fff7abf42a7
thread_start 0x00007fff7abf0425

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants