Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect DepthCameraPlugin globals with a mutex #2949

Conversation

ivanpauno
Copy link
Contributor

I'm hunting reasons why gazebo server hangs when shutting down ....

One of the failures mode I found is the following one:

#0  0x00007f960dcc87a0 in std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () at /lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007f9524382777 in gazebo::DepthCameraPlugin::~DepthCameraPlugin() () at /usr/lib/x86_64-linux-gnu/gazebo-11/plugins/libDepthCameraPlugin.so
#2  0x00007f95138cd7e6 in gazebo_plugins::GazeboRosCamera::~GazeboRosCamera() () at /home/ivanpauno/ros2_ws/other_ws/pc_pipe_ws/install/gazebo_plugins/lib/libgazebo_ros_camera.so
#3  0x00007f95138cd84a in gazebo_plugins::GazeboRosCamera::~GazeboRosCamera() () at /home/ivanpauno/ros2_ws/other_ws/pc_pipe_ws/install/gazebo_plugins/lib/libgazebo_ros_camera.so
#4  0x00007f960d98760c in gazebo::sensors::Sensor::Fini() () at /lib/x86_64-linux-gnu/libgazebo_sensors.so.11
#5  0x00007f960d97d4d0 in gazebo::sensors::MultiCameraSensor::Fini() () at /lib/x86_64-linux-gnu/libgazebo_sensors.so.11
#6  0x00007f960d98fa52 in gazebo::sensors::SensorManager::SensorContainer::Fini() () at /lib/x86_64-linux-gnu/libgazebo_sensors.so.11
#7  0x00007f960d991079 in gazebo::sensors::SensorManager::Fini() () at /lib/x86_64-linux-gnu/libgazebo_sensors.so.11
#8  0x00007f960d9865ed in gazebo::sensors::fini() () at /lib/x86_64-linux-gnu/libgazebo_sensors.so.11
#9  0x00007f960e8217b6 in gazebo::shutdown() () at /lib/x86_64-linux-gnu/libgazebo.so.11
#10 0x00007f960e7ebdd5 in gazebo::Server::Run() () at /lib/x86_64-linux-gnu/libgazebo.so.11

the main thread was taking forever in a std::map::erase, which based on the traceback must be one of these two:

https://github.com/osrf/gazebo/blob/67c43463c87685b7c9d485a9f8d2e6ac760815b9/plugins/DepthCameraPlugin.cc#L46-L49

there are a few reasons why a std::map::erase might take for ever:

  • The map is huge, which doesn't seem to be the case.
  • The iterator passed is invalid, which has undefined behavior.
    Also doesn't seem to be the case, this should be a valid key at that point.
    From cppreference:

The iterator pos must be valid and dereferenceable. Thus the end() iterator (which is valid, but is not dereferenceable) cannot be used as a value for pos.

  • The map is corrupted. In this case it can happen if there's more than one DepthCameraPlugin active accessing the maps from different threads.

For the second and third case I would actually expect a segfault, but I think that if you're unlucky enough it can take long for that to happen (reading from an invalid pointer is not a segfault).


When I tried to reproduce from source I always hit a different failure mode (a deadlock) and not this one, so I cannot confirm the exact reason of the bug.
Just in case, this PR fixes both the second and third reasons above: don't use an iterator for erasing, protect the globals with a mutex.

Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
@traversaro
Copy link
Collaborator

I'm hunting reasons why gazebo server hangs when shutting down ....

Thanks a lot for working on this! It would simplify drastically running Gazebo-based tests.

@ivanpauno
Copy link
Contributor Author

@jacobperron I don't have write access here, so please merge when you think this is ready.

The PR checkers failures doesn't seem related.

@jacobperron
Copy link

I don't consider myself a maintainer, so I'll leave it up to @scpeters (or someone else).

Copy link
Member

@scpeters scpeters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to reproduce the failure using pc_pipe on 18.04 and this fixes it. Thanks!

@scpeters scpeters merged commit b36188c into gazebosim:gazebo11 Apr 7, 2021
scpeters pushed a commit to scpeters/gazebo that referenced this pull request Apr 14, 2021
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
scpeters pushed a commit to scpeters/gazebo that referenced this pull request Apr 16, 2021
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
scpeters pushed a commit that referenced this pull request Apr 21, 2021
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants