Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Support timed locking in PlanningSceneMonitor #2794

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

henningkayser
Copy link
Member

@henningkayser henningkayser commented Jul 29, 2021

This replaces the shared_mutex of the PlanningSceneMonitor with a shared_timed_mutex that allows specifying a timeout for lock attempts. The scoped PS wrappers LockedPlanningSceneRO and LockedPlanningSceneRW now support passing a timeout duration and may fail with a TryLockTimeoutException in case of a failed attempt. This is very useful for making sure the calling thread is not blocked for too long in case of concurrency issues.
Additionally, the new scoped lock wrapper TimedLock allows tracking lock sources (=caller names), start time, waiting time and the duration that the lock persists. An optional DiagnosticStatus publisher can be enabled for debugging these lock events at runtime.

The scope-less locking functions (lockSceneRead(), etc...) are deprecated since they are error-prone and don't work well with the diagnostics feature.

My motivation for this was a lengthy debugging session for getting rid of spurious deadlocks and performance issues.
Especially, in scenarios where the planning scene monitor is being used in a time-constrained loop (servo collision checking, online sampling for controller commands) it's critical to understand where time is being spent and what threads are being blocked.
I'm not convinced that the DiagnosticStatus publisher should be merged as is (moveit_profiler would do a similar job), but the timed locking feature seems really useful and safer to me.

Left TODOs:

  • Handle locking of octomap_monitor
  • Keep track of lock owners and print names with warning/error/exception
  • Move boilerplate code into separate header

@henningkayser henningkayser force-pushed the pr/psm_support_timed_locking branch from 5386bf6 to 76f8891 Compare July 29, 2021 22:06
@@ -1062,6 +1075,55 @@ bool PlanningSceneMonitor::getShapeTransformCache(const std::string& target_fram
return true;
}

UniqueTimedLock PlanningSceneMonitor::acquireUniqueLock(const std::string& source, std::chrono::milliseconds timeout)
{
// TODO(henningkayser): lock/unlock octomap
Copy link
Member Author

@henningkayser henningkayser Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The octomap could be locked here and then unlocked with the destructor. However, this would increase the time the octree is being locked since some callbacks haven't done this before. Any alternative suggestions?

This replaces the shared_mutex of the PSM with a shared_timed_mutex
that allows specifying a timeout for attempting to lock the mutex
for read and write using the LockedPlanningSceneRO and
LockedPlanningSceneRW classes. In case of a failed attempt, a
TryLockTimeoutException is thrown so that the calling thread is
unblocked in case of concurrency issues.
Additionally, a new scoped lock wrapper class TimedLock allows
tracking lock sources (=caller names), start time, waiting time and
the duration that the lock persists. An optional DiagnosticStatus
publisher can be enabled for debugging these lock events at runtime.

The scope-less locking functions (lockSceneRead(), etc...) are deprecated
since they are error-prone and don't work well with the diagnostics feature.
@henningkayser henningkayser force-pushed the pr/psm_support_timed_locking branch from 76f8891 to 560188c Compare July 29, 2021 22:26
@@ -717,5 +813,8 @@ class LockedPlanningSceneRW : public LockedPlanningSceneRO
{
return planning_scene_monitor_->getPlanningScene();
}

private:
std::shared_ptr<UniqueTimedLock> ulock_;
Copy link
Member Author

@henningkayser henningkayser Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open for debate: The shared pointers here copy the old behavior. I'm thinking of not allowing this since it doesn't make a lot of sense to share write access between different entities. I'd suggest making LockedPlanningSceneRW non-copyable. LockedPlanningSceneRO should probably also not be using a shared pointer, acquiring another shared lock seems way more reasonable to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who are we sharing the lock with? Since the class constructs the lock itself and doesn't seem to hand it out at any point, it doesn't have be a shared pointer or even unique pointer in my opinion. It's just normal class member, right?

@AndyZe
Copy link
Member

AndyZe commented Jul 30, 2021

Just a high-level comment. I wonder what the overhead of a shared_timed_mutex is compared to a regular shared_mutex. Answers here suggest it can be significant. Should benchmark.

@henningkayser
Copy link
Member Author

Just a high-level comment. I wonder what the overhead of a shared_timed_mutex is compared to a regular shared_mutex. Answers here suggest it can be significant. Should benchmark.

Good point! I'd be surprised if there was a measurable difference between shared_timed_mutex (without timeouts) and shared_mutex since they both implement the same reader/writer behavior. The wrapper class TimedLock however could actually have performance impacts, we should make sure that this is not the case.

@felixvd
Copy link
Contributor

felixvd commented Jul 31, 2021

Just a high-level comment. I wonder what the overhead of a shared_timed_mutex is compared to a regular shared_mutex. Answers here suggest it can be significant. Should benchmark.

Do you have an idea how to benchmark this? If yes, we could integrate it in #2717

@AndyZe
Copy link
Member

AndyZe commented Jul 31, 2021

@captain-yoshi and @felixvd here's a script that is good for benchmarking repeated, high-frequency reads of the planning scene. (#2698 (comment))

To properly test this PR, some writes to the planning scene should also be included (to mimic normal operation). For example, updating joint angles at 100-200 Hz or so.

@AndyZe
Copy link
Member

AndyZe commented Jul 31, 2021

I'm also curious how std::mutex would perform, compared to shared_mutex.

@v4hn
Copy link
Contributor

v4hn commented Aug 13, 2021

From your descriptions it looks like there are some things left to do to cleanup this patch.

I only skimmed through the changes for now. My main concern is backward compatibility. Is this patch supposed to be API compatible except for deprecating the non-RAII locking API (to which I definitely agree)?

Copy link
Contributor

@jliukkonen jliukkonen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of having more insight into what's going on is really good. However, the diagnostics come with extra cost and I'm not sure having them included should be the default in this case.

class TimedLock
{
public:
TimedLock(const std::string& source, std::shared_timed_mutex& mtx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this implementation has much higher cost than just the switch from std::shared_mutex to std::shared_timed_mutex (the cost of which remains to be shown). My problem is how large this object is and how much data it stores even if one doesn't want to use the diagnostic features.

The following are now copied:

  • String (if long, SSO will not help us)
  • std::chrono::milliseconds object

In addition, construct_time_ and lock_time_ are default constructed every time together with a std::function. I think we should aim for making this class as cheap as the underlying STL lock if no diagnostics are requested and as cheap as possible if one needs the diagnostics. Unfortunately I don't see a way to do that without making the diagnostics flag a compile time constant and rewriting large parts of the code here. Also, using diagnostics would then require recompiling large chunks of code, but even then I think that might be the better way to do this.

}

private:
const std::string source_;
Copy link
Contributor

@jliukkonen jliukkonen Aug 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having const member variable(s) disables moves for the class.

Hmm, also, locks can only be moved, so now I'm confused about how this class works in the first place. Anyway, it is pretty much always a good idea to not use constant member variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else if (timeout_ == std::chrono::milliseconds::zero())
lock_.lock();
else if (!lock_.try_lock_for(timeout_))
throw TryLockTimeoutException("Timeout trying to lock PlanningSceneMonitor from '" + source_ + "'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who will catch this exception? If nobody catches it, then the program just terminates if we exceed timeout.

* @param timeout An optional duration that locking the shared mutex is allowed to take
* @throws planning_scene_monitor::TryLockTimeoutException in case of a lock timeout
*/
SharedTimedLock acquireSharedLock(const std::string& source,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be even more obvious what is happening if this was called acquireReadLock.

* @param timeout An optional duration that locking the unique mutex is allowed to take
* @throws planning_scene_monitor::TryLockTimeoutException in case of a lock timeout
*/
UniqueTimedLock acquireUniqueLock(const std::string& source,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be even more obvious what is happening if this was called acquireWriteLock.

@@ -717,5 +813,8 @@ class LockedPlanningSceneRW : public LockedPlanningSceneRO
{
return planning_scene_monitor_->getPlanningScene();
}

private:
std::shared_ptr<UniqueTimedLock> ulock_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who are we sharing the lock with? Since the class constructs the lock itself and doesn't seem to hand it out at any point, it doesn't have be a shared pointer or even unique pointer in my opinion. It's just normal class member, right?

@@ -959,6 +968,7 @@ bool PlanningSceneMonitor::waitForCurrentRobotState(const ros::Time& t, double w
void PlanningSceneMonitor::lockSceneRead()
{
scene_update_mutex_.lock_shared();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened here? Is this a clang-format switching the type of whitespace or something like that?-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants