Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only one timer + MultiThreadedExecutor will block executor #877

Closed
liqinghua opened this issue Sep 27, 2019 · 4 comments
Closed

Only one timer + MultiThreadedExecutor will block executor #877

liqinghua opened this issue Sep 27, 2019 · 4 comments
Labels
more-information-needed Further information is required

Comments

@liqinghua
Copy link

Bug report

Required Info:

  • Operating System: 16.04
  • Installation type: source
  • Version or commit hash: master or dashing
  • DDS implementation: Fast-RTPS
  • Client library (if applicable): rclcpp

Issue

On Dashing, will block executor, and cannot be recover
On Master, will block until next_exec_timeout_

MultiThreadedExecutor::run(size_t)
{
  while (rclcpp::ok(this->context_) && spinning.load()) {
    executor::AnyExecutable any_exec;
    {
      std::lock_guard<std::mutex> wait_lock(wait_mutex_);
      if (!rclcpp::ok(this->context_) || !spinning.load()) {
        return;
      }

     //BUG TODO:  this will own wait_mutex_, will block scheduled_timers_ remove  
      if (!get_next_executable(any_exec, next_exec_timeout_)) {
        continue;
      }
      if (any_exec.timer) {
        // Guard against multiple threads getting the same timer.
        if (scheduled_timers_.count(any_exec.timer) != 0) {
          // Make sure that any_exec's callback group is reset before
          // the lock is released.
          if (any_exec.callback_group) {
            any_exec.callback_group->can_be_taken_from().store(true);
          }
          continue;
        }
        scheduled_timers_.insert(any_exec.timer);
      }
    }
    if (yield_before_execute_) {
      std::this_thread::yield();
    }

    execute_any_executable(any_exec);

    if (any_exec.timer) {
      //BUG TODO:  scheduled_timers_.erase need wait_mutex_ release.
      std::lock_guard<std::mutex> wait_lock(wait_mutex_);
      auto it = scheduled_timers_.find(any_exec.timer);
      if (it != scheduled_timers_.end()) {
        scheduled_timers_.erase(it);
      }
    }
    // Clear the callback_group to prevent the AnyExecutable destructor from
    // resetting the callback group `can_be_taken_from`
    any_exec.callback_group.reset();
  }
}
@liqinghua
Copy link
Author

On Dashing version, fixing suggestion

diff --git a/src/ros2/rclcpp/rclcpp/include/rclcpp/executors/multi_threaded_executor.hpp b/src/ros2/rclcpp/rclcpp/include/rclcpp/executors/multi_threaded_executor.hpp
index 72df5dd..0fd60ff 100644
--- a/src/ros2/rclcpp/rclcpp/include/rclcpp/executors/multi_threaded_executor.hpp
+++ b/src/ros2/rclcpp/rclcpp/include/rclcpp/executors/multi_threaded_executor.hpp
@@ -78,6 +78,8 @@ private:
   size_t number_of_threads_;
   bool yield_before_execute_;
 
+  std::mutex timer_wait_mutex_;
+
   std::set<TimerBase::SharedPtr> scheduled_timers_;
 };
 
diff --git a/src/ros2/rclcpp/rclcpp/src/rclcpp/executors/multi_threaded_executor.cpp b/src/ros2/rclcpp/rclcpp/src/rclcpp/executors/multi_threaded_executor.cpp
index 24c5c79..01436c7 100644
--- a/src/ros2/rclcpp/rclcpp/src/rclcpp/executors/multi_threaded_executor.cpp
+++ b/src/ros2/rclcpp/rclcpp/src/rclcpp/executors/multi_threaded_executor.cpp
@@ -80,22 +80,24 @@ MultiThreadedExecutor::run(size_t)
       if (!get_next_executable(any_exec)) {
         continue;
       }
-      if (any_exec.timer) {
-        // Guard against multiple threads getting the same timer.
-        if (scheduled_timers_.count(any_exec.timer) != 0) {
-          continue;
-        }
-        scheduled_timers_.insert(any_exec.timer);
+    }
+
+    if (any_exec.timer) {
+      std::lock_guard<std::mutex> wait_lock(timer_wait_mutex_);
+      // Guard against multiple threads getting the same timer.
+      if (scheduled_timers_.count(any_exec.timer) != 0) {
+        continue;
       }
+      scheduled_timers_.insert(any_exec.timer);
     }
+
     if (yield_before_execute_) {
       std::this_thread::yield();
     }
-
+    
     execute_any_executable(any_exec);
-
     if (any_exec.timer) {
-      std::lock_guard<std::mutex> wait_lock(wait_mutex_);
+      std::lock_guard<std::mutex> wait_lock(timer_wait_mutex_);
       auto it = scheduled_timers_.find(any_exec.timer);
       if (it != scheduled_timers_.end()) {
         scheduled_timers_.erase(it);


@ivanpauno
Copy link
Member

Could you provide an example of the problem?

@ivanpauno ivanpauno added the more-information-needed Further information is required label Sep 27, 2019
@peterpena
Copy link
Contributor

This interrupt in Executor::execute_any_executable:

   if (rcl_trigger_guard_condition(&interrupt_guard_condition_) != RCL_RET_OK) {
     throw std::runtime_error(rcl_get_error_string().str);
   }

exist to avoid a thread, in the case the executor is multithreaded and the threads are not mutually exclusive, waiting indefinitely in rcl_wait in wait.c in rcl. When thread 1 goes into wait_for_work and adds the timer handle to wait_set and is woken by the timer, it starts executing the timer. At the same time, the lock is released and the other thread waiting, thread 2, goes into 'wait_for_work'. Since thread 1 is executing the timer callback, the memory strategy does not add the timer handle to wait_set and thread 2 goes into rcl_wait with a wait_set that has no timer handle, and when thread 1 finishes executing the timer callback it is waiting for thread 2 to unlock;
effectively, having thread 1 and 2 waiting indefinitely. With the interrupt_guard_condition triggered, any thread waiting in rcl_wait will wake up and look for work to be executed. You can run the code given to see this in effect:

#include <chrono>
#include <condition_variable>
#include <memory>
#include <mutex>
#include <string>
#include <thread>


#include "rclcpp/rclcpp.hpp"

using namespace std::chrono_literals;

class MultiTimer final : public rclcpp::Node
{
public:
  MultiTimer() : Node("multi_timer")
  {
    timer_ = this->create_wall_timer(
      2s, std::bind(&MultiTimer::timer_callback, this));
  }

private:
  void timer_callback()
  {
    RCLCPP_WARN(this->get_logger(), "Timer executed!");
  }

  rclcpp::TimerBase::SharedPtr timer_;
};

int main(int argc, char * argv[])
{
  rclcpp::init(argc, argv);
  rclcpp::executors::MultiThreadedExecutor executor;
  auto multi_timer = std::make_shared<MultiTimer>();
  executor.add_node(multi_timer);
  executor.spin();
  rclcpp::shutdown();
  return 0;
}

If the interrupt guard condition did not exist, your solution will still not work because the thread will be waiting in rcl_wait indefinitely, and even if the thread can erase the timer from the scheduled_timers_ set while the other thread is asleep, the thread in rcl_wait will not wake up. scheduled_timers_ is solely used to keep track of the timers used. I tested the multithreaded executor with one timer to verify. It does what is expected.

@ivanpauno
Copy link
Member

@peterpena I don't see the connection between your comment and @liqinghua bug report.
@liqinghua I'm closing this due to long time inactivity, we can reopen the issue if you provide an example of the problem.

PS: There was a thread safety problem that made MultiThreadedExecutor hung when using timers. That was fixed and backported to Dashing #869 at about the same time that @liqinghua opened this issue. This is likely a duplicate of that (already fixed) error.

nnmm pushed a commit to ApexAI/rclcpp that referenced this issue Jul 9, 2022
Otherwise, the callback could attempt to take a reference
to a guard condition that has already been deleted.

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
DensoADAS pushed a commit to DensoADAS/rclcpp that referenced this issue Aug 5, 2022
In particular, mark consumer_buffer() as ACQUIRE and
release_consumer_buffer() as RELEASE.

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more-information-needed Further information is required
Projects
None yet
Development

No branches or pull requests

3 participants