Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the possibility of deadlock in detachAll method in wholebodydynamics and baseEstimatorV1 #146

Merged
merged 2 commits into from
Apr 8, 2022

Conversation

traversaro
Copy link
Member

While debugging robotology/gazebo-yarp-plugins#619 (comment), I noticed that the gazebo_yarp_robotinterface while closing was hanging during this phase:

[INFO] interrupt1 phase starting...
[INFO] interrupt1 phase finished.
[INFO] shutdown phase starting...
[INFO] Entering action level 2 of phase shutdown
[INFO] Executing detach action, level 2 on device wholebodydynamics with parameters []

It turns out that indeed there is the possibility of a deadlock. The case is when detachAll is called by thread2 and is able to lock the this->deviceMutex when yarp::os::PeriodicThread (thread1) already started its loop (i.e. step was called) but the run method was not called, i.e. if the execution of thread1 is on a line that goes from https://github.com/robotology/yarp/blob/v3.6.0/src/libYARP_os/src/yarp/os/PeriodicThread.cpp#L226 to https://github.com/robotology/yarp/blob/v3.6.0/src/libYARP_os/src/yarp/os/PeriodicThread.cpp#L246 .

In that case, thread2 is blocked in PeriodicThread::stop (while locking this->deviceMutex) as that call contains a call to join of thread1, but thread1 was blocked at the beginning of the run method (https://github.com/robotology/whole-body-estimators/blob/v0.6.1/devices/wholeBodyDynamics/WholeBodyDynamicsDevice.cpp#L2704) waiting for this->deviceMutex to be available, a classical example of deadlock.

A simple way to get rid of the deadlock is just call the stop of a thread before the this->deviceMutex is locked.

@traversaro traversaro requested review from HosameldinMohamed and prashanthr05 and removed request for HosameldinMohamed April 3, 2022 14:18
@traversaro
Copy link
Member Author

Interstingly, when using wholebodydynamics with gazebo_yarp_robotinterface this was happening all the time. I am wondering if this was ever happening with yarprobotinterface. A question for people that used the robot recently (I guess @S-Dafarra @GiulioRomualdi @isorrentino @gabrielenava @Giulero but I am just guessing, feel free to tag other people): where you able to close with a simple Ctrl+C the yarprobotinterface that contained the wholebodydynamics device, or it was a necessary multiple Ctrl+C ?

@S-Dafarra
Copy link
Collaborator

Interstingly, when using wholebodydynamics with gazebo_yarp_robotinterface this was happening all the time. I am wondering if this was ever happening with yarprobotinterface. A question for people that used the robot recently (I guess @S-Dafarra @GiulioRomualdi @isorrentino @gabrielenava @Giulero but I am just guessing, feel free to tag other people): where you able to close with a simple Ctrl+C the yarprobotinterface that contained the wholebodydynamics device, or it was a necessary multiple Ctrl+C ?

Indeed sometimes we had to kill the yarprobotinterface. Most of the times we were blaming some joints not reaching the parking position

@traversaro
Copy link
Member Author

traversaro commented Apr 4, 2022

Indeed sometimes we had to kill the yarprobotinterface. Most of the times we were blaming some joints not reaching the parking position

Interesting! Let's see if this PR fixes that behaviour.

@gabrielenava
Copy link

gabrielenava commented Apr 4, 2022

A question for people that used the robot recently (I guess @S-Dafarra @GiulioRomualdi @isorrentino @gabrielenava @Giulero but I am just guessing, feel free to tag other people): where you able to close with a simple Ctrl+C the yarprobotinterface that contained the wholebodydynamics device, or it was a necessary multiple Ctrl+C ?

we almost always run the yarprobotinterface through yarpmanager, sometimes it happened that we had to kill it but usually it stopped with the stop button. I don't know if in this case the stopping procedure is more elaborate than ctrl+c and resolves the problem without further intervention.

tagging @HosameldinMohamed to confirm. (EDIT: which is already reviewer of the PR so I think he received the notification already :-))

@traversaro
Copy link
Member Author

A question for people that used the robot recently (I guess @S-Dafarra @GiulioRomualdi @isorrentino @gabrielenava @Giulero but I am just guessing, feel free to tag other people): where you able to close with a simple Ctrl+C the yarprobotinterface that contained the wholebodydynamics device, or it was a necessary multiple Ctrl+C ?

we almost always run the yarprobotinterface through yarpmanager, sometimes it happened that we had to kill it but usually it stopped with the stop button. I don't know if in this case the stopping procedure is more elaborate than ctrl+c and resolves the problem without further intervention.

If I am not wrong, the stop command should correspond to the single Ctrl+C.

@traversaro
Copy link
Member Author

According to @Nicogene he is not experiencing any deadlock even without this PR. So probably there was some particular bug fixed in robotology/gazebo-yarp-plugins#618 that highlighted this deadlock. In any case, I think it is something worth fixing.

@HosameldinMohamed
Copy link
Collaborator

tagging @HosameldinMohamed to confirm. (EDIT: which is already reviewer of the PR so I think he received the notification already :-))

I don't think I faced this issue with the wholeBodyDynamics instances we run, either with Gazebo or with the real robot.

Copy link
Collaborator

@HosameldinMohamed HosameldinMohamed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you said, it's a good fix nonetheless.

Thanks!

@HosameldinMohamed HosameldinMohamed merged commit e564855 into robotology:master Apr 8, 2022
@traversaro
Copy link
Member Author

Thanks!

@traversaro traversaro deleted the fixdeadlock branch April 8, 2022 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants