Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API node crash #1287

Closed
oxarbitrage opened this issue Aug 25, 2018 · 4 comments
Closed

API node crash #1287

oxarbitrage opened this issue Aug 25, 2018 · 4 comments

Comments

@oxarbitrage
Copy link
Member

oxarbitrage commented Aug 25, 2018

My api node segfaulted today, i am running it inside gdb to get a backtrace.

Here is the info i was able to get from the crash:


2722286ms th_a       application.cpp:506           handle_block         ] Got block: #29927763 01c8a95308b6162d64ff555c6dda208908d30cf9 time: 2018-08-25T17:45:21 latency: 1286 ms from: clockwork  irreversible: 29927745 (-18)

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff2b94700 (LWP 3662)]
0x00000000014c1304 in std::_Hashtable<std::shared_ptr<graphene::net::peer_connection>, std::shared_ptr<graphene::net::peer_connection>, std::allocator<std::shared_ptr<graphene::net::peer_connection> >, std::__detail::_Identity, std::equal_to<std::shared_ptr<graphene::net::peer_connection> >, std::hash<std::shared_ptr<graphene::net::peer_connection> >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::find (this=this@entry=0xa3f5b590, __k=<error reading variable: Cannot access memory at address 0x574460e779b4b>)
    at /usr/include/c++/4.9/bits/hashtable.h:1298
1298        _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
(gdb) 

(gdb) bt
#0  0x00000000014c1304 in std::_Hashtable<std::shared_ptr<graphene::net::peer_connection>, std::shared_ptr<graphene::net::peer_connection>, std::allocator<std::shared_ptr<graphene::net::peer_connection> >, std::__detail::_Identity, std::equal_to<std::shared_ptr<graphene::net::peer_connection> >, std::hash<std::shared_ptr<graphene::net::peer_connection> >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::find (this=this@entry=0xa3f5b590, __k=<error reading variable: Cannot access memory at address 0x574460e779b4b>)
    at /usr/include/c++/4.9/bits/hashtable.h:1298
#1  0x0000000001498e37 in find (__x=<error reading variable: Cannot access memory at address 0x574460e779b4b>, this=0xa3f5b590)
    at /usr/include/c++/4.9/bits/unordered_set.h:547
#2  graphene::net::detail::node_impl::terminate_inactive_connections_loop (this=0xa3f5b260) at /root/repo/pull1103/bitshares-core/libraries/net/node.cpp:985
#3  0x000000000149a91c in operator() (__closure=<optimized out>) at /root/repo/pull1103/bitshares-core/libraries/net/node.cpp:999
#4  fc::detail::void_functor_run<graphene::net::detail::node_impl::terminate_inactive_connections_loop()::<lambda()> >::run(void *, void *) (functor=<optimized out>, 
    prom=0x7fff52062e80) at /root/repo/pull1103/bitshares-core/libraries/fc/include/fc/thread/task.hpp:83
#5  0x000000000134f923 in fc::task_base::run_impl (this=0x7fff52062e90) at /root/repo/pull1103/bitshares-core/libraries/fc/src/thread/task.cpp:43
#6  0x000000000134db7f in run_next_task (this=<optimized out>) at /root/repo/pull1103/bitshares-core/libraries/fc/src/thread/thread_d.hpp:513
#7  fc::thread_d::process_tasks (this=0x7fffec0008c0) at /root/repo/pull1103/bitshares-core/libraries/fc/src/thread/thread_d.hpp:562
#8  0x000000000134dde1 in fc::thread_d::start_process_tasks (my=140737152813248) at /root/repo/pull1103/bitshares-core/libraries/fc/src/thread/thread_d.hpp:493
#9  0x000000000159e9b1 in make_fcontext ()
#10 0x0000000000000000 in ?? ()
(gdb) 
@jmjatlanta
Copy link
Contributor

I am sorry that happened to you, but it is great you caught the trace. I hope that takes us a long way into figuring out the cause. At first glance, it looks like the connection cleanup code is trying to do its job when another thread got there first. I'll dig in and report back what I find.

@jmjatlanta
Copy link
Contributor

The code is in a separate thread, attempting to clean up old connections. It creates an exception to be passed as a parameter to the disconnect function, and is in the process of adding details to the exception message. But it attempts to do a find on _active_connections, which is a member variable and an unordered list. Another thread is probably working with that list, and causing an issue. unordered_list is not thread safe (it can handle some threading situations safely, but not all).

Here is the section of code that I believe is segfaulting

for( const peer_connection_ptr& peer : peers_to_disconnect_gently )
{
fc::exception detailed_error( FC_LOG_MESSAGE(warn, "Disconnecting due to inactivity",
( "last_message_received_seconds_ago", (peer->get_last_message_received_time() - fc::time_point::now() ).count() / fc::seconds(1 ).count() )
( "last_message_sent_seconds_ago", (peer->get_last_message_sent_time() - fc::time_point::now() ).count() / fc::seconds(1 ).count() )
( "inactivity_timeout", _active_connections.find(peer ) != _active_connections.end() ? _peer_inactivity_timeout * 10 : _peer_inactivity_timeout ) ) );
disconnect_from_peer( peer.get(), "Disconnecting due to inactivity", false, detailed_error );
}

I need to review all processes that write to _active_connections to see if there is an instance of more than 1 thread writing to this collection.

@jmjatlanta
Copy link
Contributor

What happens when you use a std::unordered_list in a threaded environment: https://gist.github.com/jmjatlanta/63cf7a13b9c88d47c9eb63deee5e5118

@jmjatlanta
Copy link
Contributor

I strongly believe that this issue is a duplicate of #1256. If you would like to comment, please do so in issue #1256.

If you have evidence that this issue is unique, please comment here and we will re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants