Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Costmap_2d crash if deactivate and cleanup too quick #2573

Closed
BriceRenaudeau opened this issue Sep 27, 2021 · 4 comments
Closed

Costmap_2d crash if deactivate and cleanup too quick #2573

BriceRenaudeau opened this issue Sep 27, 2021 · 4 comments

Comments

@BriceRenaudeau
Copy link
Contributor

BriceRenaudeau commented Sep 27, 2021

Bug report

  • Operating System:
    • Ubuntu 20.04
  • ROS2 Version:
    • Galactic
  • Version or commit hash:
    • from source: galactic branch

Steps to reproduce issue

Create a costmap_2D with some plugins.
Then repeatedly configure - activate - deactivate - cleanup.

costmap_ros_->deactivate();
costmap_ros_->cleanup();
costmap_ros_->configure();
costmap_ros_->activate();

Expected behavior

The costmap is repeatedly activated and deactivated without a crash.

Actual behavior

After some time, the node crash due to a segfault.

Additional information

Using backward_ros I got this detailed failure.

[my_process-30] [INFO] [$1632495493.170435700] [my_process.my_process_node]: Costmap param has changed : Reset costmap
[my_process-30] [INFO] [$1632495493.170635747] [my_process.my_process_node]: Costmap stop
[my_process-30] [INFO] [$1632495493.170665985] [my_process.local_costmap]: Deactivating
[my_process-30] [INFO] [$1632495493.193337897] [my_process.local_costmap]: Cleaning up
[my_process-30] [INFO] [$1632495493.199129804] [my_process.local_costmap]: Configuring
[my_process-30] [INFO] [$1632495493.204976571] [my_process.local_costmap]: Using plugin "static_layer"
[my_process-30] [INFO] [$1632495493.205099693] [my_process.local_costmap]: Subscribing to the map topic (/map_forbidden) with transient local durability
[my_process-30] [INFO] [$1632495493.205587776] [my_process.local_costmap]: Initialized plugin "static_layer"
[my_process-30] [INFO] [$1632495493.205612920] [my_process.local_costmap]: Using plugin "obstacle_layer"
[my_process-30] [INFO] [$1632495493.205659863] [my_process.local_costmap]: Subscribed to Topics: scan_filtered
[my_process-30] [INFO] [$1632495493.208056320] [my_process.local_costmap]: Initialized plugin "obstacle_layer"
[my_process-30] [INFO] [$1632495493.208093655] [my_process.local_costmap]: Using plugin "inflation_layer"
[my_process-30] [INFO] [$1632495493.208935372] [my_process.local_costmap]: Initialized plugin "inflation_layer"
[my_process-30] [INFO] [$1632495493.220125523] [my_process.local_costmap]: Activating
[my_process-30] [INFO] [$1632495493.220909263] [my_process.local_costmap]: Checking transform
[my_process-30] [INFO] [$1632495493.220943918] [my_process.local_costmap]: Timed out waiting for transform from base_footprint to odom to become available, tf error: Could not find a connection between 'odom' and 'base_footprint' because they are not part of the same tree.Tf has two or more unconnected trees.
[my_process-30] [INFO] [$1632495493.224342891] [my_process.local_costmap]: StaticLayer: Resizing static layer to 426 X 352 at 0.050000 m/pix
[my_process-30] Stack trace (most recent call last) in thread 393537:
[my_process-30] #11 Object "", at 0xffffffffffffffff, in
[my_process-30] #10 Source "../sysdeps/unix/sysv/linux/x86_64/clone.S", line 95, in __clone [0x7fa9098f7292]
[my_process-30] #9 Source "/build/glibc-YbNSs7/glibc-2.31/nptl/pthread_create.c", line 477, in start_thread [0x7fa9097b5608]
[my_process-30] #8 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28", at 0x7fa909c09de3, in
[my_process-30] #7 Object "/home/brice/elodie2_ws/build/nav2_util/src/libnav2_util_core.so", at 0x7fa90a0991fc, in std::thread::_State_impl<std::thread::Invoker<std::tuple<nav2_util::NodeThread::NodeThread(std::shared_ptrrclcpp::node_interfaces::NodeBaseInterface)::{lambda()#1}> > >::M_run()
[my_process-30] #6 Object "/home/brice/elodie2_ws/install/rclcpp/lib/librclcpp.so", at 0x7fa909eab60b, in rclcpp::executors::SingleThreadedExecutor::spin()
[my_process-30] #5 Object "/home/brice/elodie2_ws/install/rclcpp/lib/librclcpp.so", at 0x7fa909ea3adc, in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&)
[my_process-30] #4 Object "/opt/ros/galactic/lib/libtf2_ros.so", at 0x7fa9095c9c39, in rclcpp::GenericTimer<std::function<void ()>, (void*)0>::execute_callback()
[my_process-30] #3 Object "/opt/ros/galactic/lib/libtf2_ros.so", at 0x7fa9095ca3e7, in std::Function_handler<void (), std::Bind<void (tf2_ros::CreateTimerROS::(tf2_ros::CreateTimerROS, unsigned long, std::function<void (unsigned long const&)>))(unsigned long const&, std::function<void (unsigned long const&)>)> >::M_invoke(std::Any_data const&)
[my_process-30] #2 Object "/opt/ros/galactic/lib/libtf2_ros.so", at 0x7fa9095c52a7, in std::Function_handler<void (unsigned long const&), std::Bind<void (tf2_ros::Buffer::(tf2_ros::Buffer, std::Placeholder<1>, std::shared_ptr<std::promise<geometry_msgs::msg::TransformStamped<std::allocator > > >, std::shared_future<geometry_msgs::msg::TransformStamped<std::allocator > >, std::function<void (std::shared_future<geometry_msgs::msg::TransformStamped<std::allocator > > const&)>))(unsigned long const&, std::shared_ptr<std::promise<geometry_msgs::msg::TransformStamped<std::allocator > > >, std::shared_future<geometry_msgs::msg::TransformStamped<std::allocator > >, std::function<void (std::shared_future<geometry_msgs::msg::TransformStamped<std::allocator > > const&)>)> >::M_invoke(std::Any_data const&, unsigned long const&)
[my_process-30] #1 Object "/opt/ros/galactic/lib/libtf2_ros.so", at 0x7fa9095bfcdf, in tf2_ros::Buffer::timerCallback(unsigned long const&, std::shared_ptr<std::promise<geometry_msgs::msg::TransformStamped
<std::allocator > > >, std::shared_future<geometry_msgs::msg::TransformStamped
<std::allocator > >, std::function<void (std::shared_future<geometry_msgs::msg::TransformStamped<std::allocator > > const&)>)
[my_process-30] #0 Object "/home/brice/elodie2_ws/build/nav2_costmap_2d/liblayers.so", at 0x7fa900988d99, in tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan<std::allocator >, tf2_ros::Buffer>::transformReadyCallback(std::shared_future<geometry_msgs::msg::TransformStamped<std::allocator > > const&, unsigned long)
[my_process-30] Segmentation fault (Address not mapped to object [0x78])
[ERROR] [my_process-30]: process has died [pid 392956, exit code -11, cmd

This crash is due to the tf2_ros waitForTransform function that creates a timer to call a callback:
https://github.com/ros2/geometry2/blob/8e508d01789803334dfcd59ba277cb49c0ea4219/tf2_ros/src/buffer.cpp#L258-L261

When cleaning the costmap, the callback function doesn't exist anymore hence the timer creates a segfault.

The crash happens after the call of tf_buffer_->canTransform:
https://github.com/ros-planning/navigation2/blob/e5a4c844c748233dc4d3039b1ea5809c81dac45f/nav2_costmap_2d/src/costmap_2d_ros.cpp#L219-L221

My false fix

To avoid this crash I added a delay(transform_tolerance_) between deactivate() and cleanup().

The best way might be to secure tf2_ros but I may not have the skills to fix it.

@SteveMacenski
Copy link
Member

SteveMacenski commented Sep 27, 2021

[my_process-30] #3 Object "/opt/ros/galactic/lib/libtf2_ros.so", at 0x7fa9095ca3e7, in std::Function_handler<void (), std::Bind<void (tf2_ros::CreateTimerROS::(tf2_ros::CreateTimerROS, unsigned long, std::function<void (unsigned long const&)>))(unsigned long const&, std::function<void (unsigned long const&)>)> >::M_invoke(std::Any_data const&)

It looks to me that the timer created in the TF buffer didn't get a chance to fully shut down or something before it was destructed. So when its called, its crashing because the object doesn't exist. The can transform call you mention specifically confirms that I believe, since that timer is used in the can transform function almost exclusively.

I think the fix would be to kill the timers that are active in the destructor of the buffer. All of the destructurs of the Buffer (e.g. buffer and the interfaces it derives from) use the default destructor. I think the Buffer needs to reset / clear the timers on destruction should solve the issue. Though they're in an unodered map of timers so that should clear on destruction, so maybe its just an order thing? The interface need to be destroyed before the timers perhaps? Try inverting these 2 line's order https://github.com/ros2/geometry2/blob/8e508d01789803334dfcd59ba277cb49c0ea4219/tf2_ros/include/tf2_ros/buffer.h#L291-L294.

This is an issue for TF2 though, can you file it there and delete this? I don't think you need all the Nav2 specific code, you can reproduce this using the buffer alone. Is certainly a problem, but its a problem for TF2.

@SteveMacenski
Copy link
Member

Any update here?

@BriceRenaudeau
Copy link
Contributor Author

I created an Issue for tf2_ros : ros2/geometry2#460 (comment)
But they are still trying to fix it.

@SteveMacenski
Copy link
Member

Awesome! Closing here since it has motion elsewhere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants