-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use shared_from_this to avoid crash on shutdown #179
Conversation
My take on this problem: #180 (high five for the synchronized opening of 2 PRs despite being out of office 😄) For completeness, the corresponding rosout:
|
@@ -120,7 +120,7 @@ void CostmapWrapper::checkDeactivate() | |||
// navigation sequence, what is terribly inefficient; the timer is stopped on costmap re-activation and | |||
// reset after every new call to deactivate | |||
shutdown_costmap_timer_ = | |||
private_nh_.createTimer(shutdown_costmap_delay_, &CostmapWrapper::deactivate, this, true); | |||
private_nh_.createTimer(shutdown_costmap_delay_, &CostmapWrapper::deactivate, shared_from_this(), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still would not help if the mutex used in deactivate()
is gone (see my PR), or does shared_from_this
prevent this? (I never used shared_from_this
, so I don't know too much about it yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I argue that we shouldn't need this if the destruction order is clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quote from roscpp api:
Since this is a shared pointer, the object will automatically be tracked with a weak_ptr so that if it is deleted before the Timer goes out of scope the callback will no longer be called (and therefore will not crash).
The object will still be deleted, but the timer will not be called after deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also call shutdown_costmap_timer_.stop();
from the destructor to get the same effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is not so much in the lifetime of the members of the costmap_wrapper instance but more in the lifetime of the instance itself.
consider the code below, demonstrating the current costmap_wrapper:
#include <boost/enable_shared_from_this.hpp>
#include <chrono>
#include <ros/ros.h>
#include <thread>
struct costmap_wrapper {
~costmap_wrapper() { ROS_INFO_STREAM(__PRETTY_FUNCTION__); }
void connect()
{
ros::NodeHandle nh;
timer_ = nh.createTimer(ros::Duration(0.5), &costmap_wrapper::callback,
this, true);
ROS_INFO_STREAM("timer started in " << __PRETTY_FUNCTION__);
}
void callback(const ros::TimerEvent& _event)
{
ROS_INFO_STREAM("enter " << __PRETTY_FUNCTION__);
// now do some heavy work
std::this_thread::sleep_for(std::chrono::seconds(1));
ROS_INFO_STREAM("ups! " << __PRETTY_FUNCTION__);
}
private:
ros::Timer timer_;
};
void call_costmap_wrapper()
{
boost::shared_ptr<costmap_wrapper> wrapper(new costmap_wrapper());
wrapper->connect();
std::this_thread::sleep_for(std::chrono::seconds(1));
}
int main(int argc, char** argv)
{
ros::init(argc, argv, "my_node");
ros::NodeHandle nh;
std::thread t1(call_costmap_wrapper);
ros::spin();
t1.join();
return 0;
}
the output of this code is
[/my_node INFO 1585760423.779882585]: timer started in void costmap_wrapper::connect()
[/my_node INFO 1585760424.280075292]: enter void costmap_wrapper::callback(const ros::TimerEvent&)
[/my_node INFO 1585760424.781423688]: costmap_wrapper::~costmap_wrapper()
[/my_node INFO 1585760425.280427291]: ups! void costmap_wrapper::callback(const ros::TimerEvent&)
meaning if the callback takes some time (in our case we have a mutex in it...) the instance might be destroyed while the callback is active.
now lets see how this play out with the shared_from_this approach
struct better_costmap_wrapper
: public boost::enable_shared_from_this<better_costmap_wrapper> {
~better_costmap_wrapper() { ROS_INFO_STREAM(__PRETTY_FUNCTION__); }
void connect()
{
ros::NodeHandle nh;
timer_ =
nh.createTimer(ros::Duration(0.5), &better_costmap_wrapper::callback,
shared_from_this(), true);
ROS_INFO_STREAM("timer started in " << __PRETTY_FUNCTION__);
}
void callback(const ros::TimerEvent& _event)
{
ROS_INFO_STREAM("enter " << __PRETTY_FUNCTION__);
// now do some heavy work
std::this_thread::sleep_for(std::chrono::seconds(1));
ROS_INFO_STREAM("ups! " << __PRETTY_FUNCTION__);
}
private:
ros::Timer timer_;
};
void call_better_costmap_wrapper()
{
boost::shared_ptr<better_costmap_wrapper> good(new better_costmap_wrapper());
good->connect();
std::this_thread::sleep_for(std::chrono::seconds(1));
}
int main(int argc, char** argv)
{
ros::init(argc, argv, "my_node");
ros::NodeHandle nh;
std::thread t2(call_better_costmap_wrapper);
ros::spin();
t2.join();
return 0;
}
the output in this case is
[/my_node INFO 1585760587.080750296]: timer started in void better_costmap_wrapper::connect()
[/my_node INFO 1585760587.580732346]: enter void better_costmap_wrapper::callback(const ros::TimerEvent&)
[/my_node INFO 1585760588.581101970]: ups! void better_costmap_wrapper::callback(const ros::TimerEvent&)
[/my_node INFO 1585760588.581327469]: better_costmap_wrapper::~better_costmap_wrapper()
meaning we wait until we exit the callback before calling the destructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot reproduce, so I suspect must have something to do with extra plugins you load. Can you give a fast try with the default mbf configuration?
@Rayman, can u reproduce using your https://github.com/Rayman/turtlebot3_mbf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've reproduced the issue with the launch files from https://github.com/Rayman/turtlebot3_mbf. If you run the commands from the readme and run rosnode kill /move_base_flex
it will crash with the following backtrace:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff6513801 in __GI_abort () at abort.c:79
#2 0x00007ffff6b68957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff6b6eae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff6b6db49 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff6b6e4b8 in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff68d4573 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7 0x00007ffff68d4df5 in _Unwind_Resume () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8 0x00007ffff53d28d0 in class_loader::MultiLibraryClassLoader::shutdownAllClassLoaders() ()
from /opt/ros/melodic/lib/libclass_loader.so
#9 0x00007ffff53d2900 in class_loader::MultiLibraryClassLoader::~MultiLibraryClassLoader() ()
from /opt/ros/melodic/lib/libclass_loader.so
#10 0x00007ffff5854a8f in pluginlib::ClassLoader<costmap_2d::Layer>::~ClassLoader() () from /opt/ros/melodic/lib/libcostmap_2d.so
#11 0x00007ffff5833e41 in costmap_2d::Costmap2DROS::~Costmap2DROS() () from /opt/ros/melodic/lib/libcostmap_2d.so
#12 0x00007ffff7baf9c9 in mbf_costmap_nav::CostmapWrapper::~CostmapWrapper() () from /opt/ros/melodic/lib/libmbf_costmap_server.so
#13 0x000055555555e28a in boost::detail::sp_counted_base::release() ()
#14 0x00007ffff7b54e47 in mbf_costmap_nav::CostmapNavigationServer::~CostmapNavigationServer() ()
from /opt/ros/melodic/lib/libmbf_costmap_server.so
#15 0x000055555555d4ce in boost::detail::sp_counted_impl_pd<mbf_costmap_nav::CostmapNavigationServer*, boost::detail::sp_ms_deleter<mbf_costmap_nav::CostmapNavigationServer> >::dispose() ()
#16 0x000055555555df71 in boost::shared_ptr<mbf_costmap_nav::CostmapNavigationServer>::~shared_ptr() ()
#17 0x00007ffff6516041 in __run_exit_handlers (status=0, listp=0x7ffff68be718 <__exit_funcs>,
run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#18 0x00007ffff651613a in __GI_exit (status=<optimized out>) at exit.c:139
#19 0x00007ffff64f4b9e in __libc_start_main (main=0x55555555c660 <main>, argc=3, argv=0x7fffffffd628, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd618) at ../csu/libc-start.c:344
#20 0x000055555555cc5a in _start ()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um... 🤔 not with master. Are u running sources or .debs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was running with the debs. I now tested with master. Same issue, but clearer backtrace due to build_type=debug:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff63e6801 in __GI_abort () at abort.c:79
#2 0x00007ffff6a3b957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff6a41ae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff6a40b49 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff6a414b8 in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff67a7573 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7 0x00007ffff67a7df5 in _Unwind_Resume () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8 0x00007ffff4f7a8d0 in class_loader::MultiLibraryClassLoader::shutdownAllClassLoaders() ()
from /opt/ros/melodic/lib/libclass_loader.so
#9 0x00007ffff4f7a900 in class_loader::MultiLibraryClassLoader::~MultiLibraryClassLoader() ()
from /opt/ros/melodic/lib/libclass_loader.so
#10 0x00007ffff53fca8f in pluginlib::ClassLoader<costmap_2d::Layer>::~ClassLoader() () from /opt/ros/melodic/lib/libcostmap_2d.so
#11 0x00007ffff53dbe41 in costmap_2d::Costmap2DROS::~Costmap2DROS() () from /opt/ros/melodic/lib/libcostmap_2d.so
#12 0x00007ffff7b7fdac in mbf_costmap_nav::CostmapWrapper::~CostmapWrapper (this=0x555555d45d60, __in_chrg=<optimized out>)
at /home/ramon/ros/turtlebot3_mbf/src/move_base_flex/mbf_costmap_nav/src/mbf_costmap_nav/costmap_wrapper.cpp:65
#13 0x00007ffff7b7fdc8 in mbf_costmap_nav::CostmapWrapper::~CostmapWrapper (this=0x555555d45d60, __in_chrg=<optimized out>)
at /home/ramon/ros/turtlebot3_mbf/src/move_base_flex/mbf_costmap_nav/src/mbf_costmap_nav/costmap_wrapper.cpp:67
#14 0x00007ffff7b3b7d9 in boost::checked_delete<mbf_costmap_nav::CostmapWrapper> (x=0x555555d45d60)
at /usr/include/boost/core/checked_delete.hpp:34
#15 0x00007ffff7b69576 in boost::detail::sp_counted_impl_p<mbf_costmap_nav::CostmapWrapper>::dispose (this=0x555555d7c910)
at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:92
#16 0x0000555555562595 in boost::detail::sp_counted_base::release (this=0x555555d7c910)
at /usr/include/boost/smart_ptr/detail/sp_counted_base_std_atomic.hpp:110
#17 0x0000555555562621 in boost::detail::shared_count::~shared_count (this=0x555555799180, __in_chrg=<optimized out>)
at /usr/include/boost/smart_ptr/detail/shared_count.hpp:426
#18 0x00007ffff7b070ce in boost::shared_ptr<mbf_costmap_nav::CostmapWrapper>::~shared_ptr (this=0x555555799178,
__in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/shared_ptr.hpp:341
#19 0x00007ffff7af35a8 in mbf_costmap_nav::CostmapNavigationServer::~CostmapNavigationServer (this=0x555555796c00,
__in_chrg=<optimized out>)
at /home/ramon/ros/turtlebot3_mbf/src/move_base_flex/mbf_costmap_nav/src/mbf_costmap_nav/costmap_navigation_server.cpp:90
#20 0x000055555556523e in boost::detail::sp_ms_deleter<mbf_costmap_nav::CostmapNavigationServer>::destroy (this=0x555555796bf8)
at /usr/include/boost/smart_ptr/make_shared_object.hpp:59
#21 0x00005555555657f6 in boost::detail::sp_ms_deleter<mbf_costmap_nav::CostmapNavigationServer>::operator() (this=0x555555796bf8)
at /usr/include/boost/smart_ptr/make_shared_object.hpp:93
#22 0x00005555555654dd in boost::detail::sp_counted_impl_pd<mbf_costmap_nav::CostmapNavigationServer*, boost::detail::sp_ms_deleter<mbf_costmap_nav::CostmapNavigationServer> >::dispose (this=0x555555796be0) at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:172
#23 0x0000555555562595 in boost::detail::sp_counted_base::release (this=0x555555796be0)
at /usr/include/boost/smart_ptr/detail/sp_counted_base_std_atomic.hpp:110
#24 0x0000555555562621 in boost::detail::shared_count::~shared_count (this=0x55555576a068 <costmap_nav_srv_ptr+8>,
__in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/detail/shared_count.hpp:426
#25 0x0000555555563438 in boost::shared_ptr<mbf_costmap_nav::CostmapNavigationServer>::~shared_ptr (
this=0x55555576a060 <costmap_nav_srv_ptr>, __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/shared_ptr.hpp:341
#26 0x00007ffff63e9041 in __run_exit_handlers (status=0, listp=0x7ffff6791718 <__exit_funcs>,
run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#27 0x00007ffff63e913a in __GI_exit (status=<optimized out>) at exit.c:139
#28 0x00007ffff63c7b9e in __libc_start_main (main=0x55555556178e <main(int, char**)>, argc=3, argv=0x7fffffffd3b8,
init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd3a8) at ../csu/libc-start.c:344
#29 0x000055555556148a in _start ()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me the dorezyuk/fix-timer
branch fixes the problem. michael/costmap-wrapper-mutex
did not fix this issue.
73fceed
to
c6f04dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, cannot reproduce, but the changes make sense and 2 people already confirmed that it works (@MichaelGrupp, do u confirm too?), so ready to merge after requested change is done.
Hi guys,
we had a crash on shutdown with following stack trace
I think that the reason might be a race condition such that the ros::Timer callback is still called, after the costmap_wrapper instance is destructed. ros::Timer offers for that the track_object API, which might solve the problem.