Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use shared_from_this to avoid crash on shutdown #179

Merged
merged 5 commits into from
Apr 7, 2020

Conversation

dorezyuk
Copy link
Contributor

@dorezyuk dorezyuk commented Apr 1, 2020

Hi guys,

we had a crash on shutdown with following stack trace

Stack trace of thread 3423:
#0 0x00007f2f89d7ee97 __GI_raise (libc.so.6)
#1 0x00007f2f89d80801 __GI_abort (libc.so.6)
#2 0x00007f2f8a3d5957 n/a (libstdc++.so.6)
#3 0x00007f2f8a3dbae6 n/a (libstdc++.so.6)
#4 0x00007f2f8a3dab49 n/a (libstdc++.so.6)
#5 0x00007f2f8a3db4b8 __gxx_personality_v0 (libstdc++.so.6)
#6 0x00007f2f8a141573 n/a (libgcc_s.so.1)
#7 0x00007f2f8a141df5 _Unwind_Resume (libgcc_s.so.1)
#8 0x00007f2f8b12d4c9 _ZN5boost11unique_lockINS_5mutexEE4lockEv (libtf2_ros.so)
#9 0x00007f2f8adf2e5b _ZN3ros12TimerManagerINS_4TimeENS_8DurationENS_10TimerEventEE6removeEi (libroscpp.so)
#10 0x00007f2f8adf1c3b _ZN3ros5Timer4Impl4stopEv (libroscpp.so)
#11 0x00007f2f8adf1cfb _ZN3ros5Timer4ImplD2Ev (libroscpp.so)
#12 0x00007f2f8adf3e82 _ZN5boost6detail17sp_counted_impl_pIN3ros5Timer4ImplEE7disposeEv (libroscpp.so)
#13 0x00007f2f8adf1bd1 _ZN3ros5TimerD2Ev (libroscpp.so)
#14 0x00007f2f8b426999 _ZN15mbf_costmap_nav14CostmapWrapperD1Ev (libmbf_costmap_server.so)
#15 0x00007f2f8b4269c9 _ZN15mbf_costmap_nav14CostmapWrapperD0Ev (libmbf_costmap_server.so)
#16 0x00005560e651c28a _ZN5boost6detail15sp_counted_base7releaseEv (mbf_costmap_nav)
#17 0x00007f2f8b3cbe47 _ZN15mbf_costmap_nav23CostmapNavigationServerD1Ev (libmbf_costmap_server.so)
#18 0x00005560e651b4ce _ZN5boost6detail18sp_counted_impl_pdIPN15mbf_costmap_nav23CostmapNavigationServerENS0_13sp_ms_deleterIS3_EEE7disposeEv (mbf_costmap_nav)
#19 0x00005560e651bf71 _ZN5boost10shared_ptrIN15mbf_costmap_nav23CostmapNavigationServerEED2Ev (mbf_costmap_nav)
#20 0x00007f2f89d83041 __run_exit_handlers (libc.so.6)
#21 0x00007f2f89d8313a __GI_exit (libc.so.6)
#22 0x00007f2f89d61b9e __libc_start_main (libc.so.6)
#23 0x00005560e651ac5a _start (mbf_costmap_nav)

I think that the reason might be a race condition such that the ros::Timer callback is still called, after the costmap_wrapper instance is destructed. ros::Timer offers for that the track_object API, which might solve the problem.

@MichaelGrupp
Copy link
Contributor

My take on this problem: #180 (high five for the synchronized opening of 2 PRs despite being out of office 😄)

For completeness, the corresponding rosout:

[/navigation/move_base_flex INFO 1585321294.883993]: terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::lock_error> >'
[/navigation/move_base_flex INFO 1585321294.884960]:   what():  boost: mutex lock failed in pthread_mutex_lock: Invalid argument

@@ -120,7 +120,7 @@ void CostmapWrapper::checkDeactivate()
// navigation sequence, what is terribly inefficient; the timer is stopped on costmap re-activation and
// reset after every new call to deactivate
shutdown_costmap_timer_ =
private_nh_.createTimer(shutdown_costmap_delay_, &CostmapWrapper::deactivate, this, true);
private_nh_.createTimer(shutdown_costmap_delay_, &CostmapWrapper::deactivate, shared_from_this(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still would not help if the mutex used in deactivate() is gone (see my PR), or does shared_from_this prevent this? (I never used shared_from_this, so I don't know too much about it yet)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I argue that we shouldn't need this if the destruction order is clean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quote from roscpp api:

Since this is a shared pointer, the object will automatically be tracked with a weak_ptr so that if it is deleted before the Timer goes out of scope the callback will no longer be called (and therefore will not crash).

The object will still be deleted, but the timer will not be called after deletion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also call shutdown_costmap_timer_.stop(); from the destructor to get the same effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is not so much in the lifetime of the members of the costmap_wrapper instance but more in the lifetime of the instance itself.

consider the code below, demonstrating the current costmap_wrapper:

#include <boost/enable_shared_from_this.hpp>
#include <chrono>
#include <ros/ros.h>
#include <thread>

struct costmap_wrapper {
  ~costmap_wrapper() { ROS_INFO_STREAM(__PRETTY_FUNCTION__); }

  void connect()
  {
    ros::NodeHandle nh;
    timer_ = nh.createTimer(ros::Duration(0.5), &costmap_wrapper::callback,
                            this, true);
    ROS_INFO_STREAM("timer started in " << __PRETTY_FUNCTION__);
  }

  void callback(const ros::TimerEvent& _event)
  {
    ROS_INFO_STREAM("enter " << __PRETTY_FUNCTION__);
    // now do some heavy work
    std::this_thread::sleep_for(std::chrono::seconds(1));
    ROS_INFO_STREAM("ups! " << __PRETTY_FUNCTION__);
  }

private:
  ros::Timer timer_;
};

void call_costmap_wrapper()
{
  boost::shared_ptr<costmap_wrapper> wrapper(new costmap_wrapper());
  wrapper->connect();

  std::this_thread::sleep_for(std::chrono::seconds(1));
}

int main(int argc, char** argv)
{
  ros::init(argc, argv, "my_node");
  ros::NodeHandle nh;

  std::thread t1(call_costmap_wrapper);

  ros::spin();
  t1.join();

  return 0;
}

the output of this code is

[/my_node  INFO 1585760423.779882585]: timer started in void costmap_wrapper::connect()
[/my_node  INFO 1585760424.280075292]: enter void costmap_wrapper::callback(const ros::TimerEvent&)
[/my_node  INFO 1585760424.781423688]: costmap_wrapper::~costmap_wrapper()
[/my_node  INFO 1585760425.280427291]: ups! void costmap_wrapper::callback(const ros::TimerEvent&)

meaning if the callback takes some time (in our case we have a mutex in it...) the instance might be destroyed while the callback is active.

now lets see how this play out with the shared_from_this approach

struct better_costmap_wrapper
    : public boost::enable_shared_from_this<better_costmap_wrapper> {
  ~better_costmap_wrapper() { ROS_INFO_STREAM(__PRETTY_FUNCTION__); }

  void connect()
  {
    ros::NodeHandle nh;
    timer_ =
        nh.createTimer(ros::Duration(0.5), &better_costmap_wrapper::callback,
                       shared_from_this(), true);
    ROS_INFO_STREAM("timer started in " << __PRETTY_FUNCTION__);
  }

  void callback(const ros::TimerEvent& _event)
  {
    ROS_INFO_STREAM("enter " << __PRETTY_FUNCTION__);
    // now do some heavy work
    std::this_thread::sleep_for(std::chrono::seconds(1));
    ROS_INFO_STREAM("ups! " << __PRETTY_FUNCTION__);
  }

private:
  ros::Timer timer_;
};

void call_better_costmap_wrapper()
{
  boost::shared_ptr<better_costmap_wrapper> good(new better_costmap_wrapper());
  good->connect();

  std::this_thread::sleep_for(std::chrono::seconds(1));
}

int main(int argc, char** argv)
{
  ros::init(argc, argv, "my_node");
  ros::NodeHandle nh;

  std::thread t2(call_better_costmap_wrapper);

  ros::spin();
  t2.join();

  return 0;
}

the output in this case is

[/my_node  INFO 1585760587.080750296]: timer started in void better_costmap_wrapper::connect()
[/my_node  INFO 1585760587.580732346]: enter void better_costmap_wrapper::callback(const ros::TimerEvent&)
[/my_node  INFO 1585760588.581101970]: ups! void better_costmap_wrapper::callback(const ros::TimerEvent&)
[/my_node  INFO 1585760588.581327469]: better_costmap_wrapper::~better_costmap_wrapper()

meaning we wait until we exit the callback before calling the destructor

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot reproduce, so I suspect must have something to do with extra plugins you load. Can you give a fast try with the default mbf configuration?
@Rayman, can u reproduce using your https://github.com/Rayman/turtlebot3_mbf?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've reproduced the issue with the launch files from https://github.com/Rayman/turtlebot3_mbf. If you run the commands from the readme and run rosnode kill /move_base_flex it will crash with the following backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff6513801 in __GI_abort () at abort.c:79
#2  0x00007ffff6b68957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff6b6eae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff6b6db49 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff6b6e4b8 in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff68d4573 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7  0x00007ffff68d4df5 in _Unwind_Resume () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8  0x00007ffff53d28d0 in class_loader::MultiLibraryClassLoader::shutdownAllClassLoaders() ()
   from /opt/ros/melodic/lib/libclass_loader.so
#9  0x00007ffff53d2900 in class_loader::MultiLibraryClassLoader::~MultiLibraryClassLoader() ()
   from /opt/ros/melodic/lib/libclass_loader.so
#10 0x00007ffff5854a8f in pluginlib::ClassLoader<costmap_2d::Layer>::~ClassLoader() () from /opt/ros/melodic/lib/libcostmap_2d.so
#11 0x00007ffff5833e41 in costmap_2d::Costmap2DROS::~Costmap2DROS() () from /opt/ros/melodic/lib/libcostmap_2d.so
#12 0x00007ffff7baf9c9 in mbf_costmap_nav::CostmapWrapper::~CostmapWrapper() () from /opt/ros/melodic/lib/libmbf_costmap_server.so
#13 0x000055555555e28a in boost::detail::sp_counted_base::release() ()
#14 0x00007ffff7b54e47 in mbf_costmap_nav::CostmapNavigationServer::~CostmapNavigationServer() ()
   from /opt/ros/melodic/lib/libmbf_costmap_server.so
#15 0x000055555555d4ce in boost::detail::sp_counted_impl_pd<mbf_costmap_nav::CostmapNavigationServer*, boost::detail::sp_ms_deleter<mbf_costmap_nav::CostmapNavigationServer> >::dispose() ()
#16 0x000055555555df71 in boost::shared_ptr<mbf_costmap_nav::CostmapNavigationServer>::~shared_ptr() ()
#17 0x00007ffff6516041 in __run_exit_handlers (status=0, listp=0x7ffff68be718 <__exit_funcs>, 
    run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#18 0x00007ffff651613a in __GI_exit (status=<optimized out>) at exit.c:139
#19 0x00007ffff64f4b9e in __libc_start_main (main=0x55555555c660 <main>, argc=3, argv=0x7fffffffd628, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd618) at ../csu/libc-start.c:344
#20 0x000055555555cc5a in _start ()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um... 🤔 not with master. Are u running sources or .debs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was running with the debs. I now tested with master. Same issue, but clearer backtrace due to build_type=debug:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff63e6801 in __GI_abort () at abort.c:79
#2  0x00007ffff6a3b957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff6a41ae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff6a40b49 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff6a414b8 in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff67a7573 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7  0x00007ffff67a7df5 in _Unwind_Resume () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8  0x00007ffff4f7a8d0 in class_loader::MultiLibraryClassLoader::shutdownAllClassLoaders() ()
   from /opt/ros/melodic/lib/libclass_loader.so
#9  0x00007ffff4f7a900 in class_loader::MultiLibraryClassLoader::~MultiLibraryClassLoader() ()
   from /opt/ros/melodic/lib/libclass_loader.so
#10 0x00007ffff53fca8f in pluginlib::ClassLoader<costmap_2d::Layer>::~ClassLoader() () from /opt/ros/melodic/lib/libcostmap_2d.so
#11 0x00007ffff53dbe41 in costmap_2d::Costmap2DROS::~Costmap2DROS() () from /opt/ros/melodic/lib/libcostmap_2d.so
#12 0x00007ffff7b7fdac in mbf_costmap_nav::CostmapWrapper::~CostmapWrapper (this=0x555555d45d60, __in_chrg=<optimized out>)
    at /home/ramon/ros/turtlebot3_mbf/src/move_base_flex/mbf_costmap_nav/src/mbf_costmap_nav/costmap_wrapper.cpp:65
#13 0x00007ffff7b7fdc8 in mbf_costmap_nav::CostmapWrapper::~CostmapWrapper (this=0x555555d45d60, __in_chrg=<optimized out>)
    at /home/ramon/ros/turtlebot3_mbf/src/move_base_flex/mbf_costmap_nav/src/mbf_costmap_nav/costmap_wrapper.cpp:67
#14 0x00007ffff7b3b7d9 in boost::checked_delete<mbf_costmap_nav::CostmapWrapper> (x=0x555555d45d60)
    at /usr/include/boost/core/checked_delete.hpp:34
#15 0x00007ffff7b69576 in boost::detail::sp_counted_impl_p<mbf_costmap_nav::CostmapWrapper>::dispose (this=0x555555d7c910)
    at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:92
#16 0x0000555555562595 in boost::detail::sp_counted_base::release (this=0x555555d7c910)
    at /usr/include/boost/smart_ptr/detail/sp_counted_base_std_atomic.hpp:110
#17 0x0000555555562621 in boost::detail::shared_count::~shared_count (this=0x555555799180, __in_chrg=<optimized out>)
    at /usr/include/boost/smart_ptr/detail/shared_count.hpp:426
#18 0x00007ffff7b070ce in boost::shared_ptr<mbf_costmap_nav::CostmapWrapper>::~shared_ptr (this=0x555555799178, 
    __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/shared_ptr.hpp:341
#19 0x00007ffff7af35a8 in mbf_costmap_nav::CostmapNavigationServer::~CostmapNavigationServer (this=0x555555796c00, 
    __in_chrg=<optimized out>)
    at /home/ramon/ros/turtlebot3_mbf/src/move_base_flex/mbf_costmap_nav/src/mbf_costmap_nav/costmap_navigation_server.cpp:90
#20 0x000055555556523e in boost::detail::sp_ms_deleter<mbf_costmap_nav::CostmapNavigationServer>::destroy (this=0x555555796bf8)
    at /usr/include/boost/smart_ptr/make_shared_object.hpp:59
#21 0x00005555555657f6 in boost::detail::sp_ms_deleter<mbf_costmap_nav::CostmapNavigationServer>::operator() (this=0x555555796bf8)
    at /usr/include/boost/smart_ptr/make_shared_object.hpp:93
#22 0x00005555555654dd in boost::detail::sp_counted_impl_pd<mbf_costmap_nav::CostmapNavigationServer*, boost::detail::sp_ms_deleter<mbf_costmap_nav::CostmapNavigationServer> >::dispose (this=0x555555796be0) at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:172
#23 0x0000555555562595 in boost::detail::sp_counted_base::release (this=0x555555796be0)
    at /usr/include/boost/smart_ptr/detail/sp_counted_base_std_atomic.hpp:110
#24 0x0000555555562621 in boost::detail::shared_count::~shared_count (this=0x55555576a068 <costmap_nav_srv_ptr+8>, 
    __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/detail/shared_count.hpp:426
#25 0x0000555555563438 in boost::shared_ptr<mbf_costmap_nav::CostmapNavigationServer>::~shared_ptr (
    this=0x55555576a060 <costmap_nav_srv_ptr>, __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/shared_ptr.hpp:341
#26 0x00007ffff63e9041 in __run_exit_handlers (status=0, listp=0x7ffff6791718 <__exit_funcs>, 
    run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#27 0x00007ffff63e913a in __GI_exit (status=<optimized out>) at exit.c:139
#28 0x00007ffff63c7b9e in __libc_start_main (main=0x55555556178e <main(int, char**)>, argc=3, argv=0x7fffffffd3b8, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd3a8) at ../csu/libc-start.c:344
#29 0x000055555556148a in _start ()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the dorezyuk/fix-timer branch fixes the problem. michael/costmap-wrapper-mutex did not fix this issue.

@spuetz spuetz self-requested a review April 3, 2020 07:22
Copy link
Collaborator

@corot corot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, cannot reproduce, but the changes make sense and 2 people already confirmed that it works (@MichaelGrupp, do u confirm too?), so ready to merge after requested change is done.

mbf_costmap_nav/src/move_base_server_node.cpp Outdated Show resolved Hide resolved
@corot corot merged commit 0453b60 into naturerobots:master Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants