Skip to content
This repository was archived by the owner on May 31, 2025. It is now read-only.

Fix bookkeeping of ongoing calls for recursive calls to CallbackQueue::callAvailable() #2378

Closed

Conversation

meyerj
Copy link
Contributor

@meyerj meyerj commented Oct 8, 2024

While investigating a segmentation fault in one of our applications, and while working on #2377 yesterday, I found another small possible bug in the implementation of ros::CallbackQueue::callAvailable() in the way it keeps track of how many callbacks are currently being executed in any spinner thread, namely the calling_ variable.

When copying over pending calls to the thread-local callback queue held in tls->callbacks, the calling_ member was not updated correctly with the number of added callbacks if tls->callbacks was not empty before. This may be the case if callAvailable() is called recursively, e.g. from within another callback.

The issue should only affect the calling_ member and empty() and isEmpty() member functions, which apparently are unused within roscpp itself, but may be used by external code.

…::callAvailable()

When copying over pending calls to the thread-local callback queue held in tls->callbacks, the calling_
member was not updated correctly with the number of added callbacks if tls->callbacks was not empty before.
This may be the case if callAvailable() is called recursively, e.g. from within another callback.

The issue should only affect the empty() and isEmpty() member functions, which apparently are unused
within the CallbackQueue class or within roscpp itself, but may be used by external code.

Signed-off-by: Johannes Meyer <johannes@intermodalics.eu>
@sloretz
Copy link
Contributor

sloretz commented Apr 25, 2025

Thank you for the PR!

ROS Noetic will reach end-of-life on May 31st, 2025. Every change comes with a risk of introducing regressions, and there isn't much time left to fix them. To make sure this PR doesn't introduce any regressions please:

  • Describe how you tested this change
  • Recruit at least one more person to review this PR and try it out on their system

@meyerj
Copy link
Contributor Author

meyerj commented May 2, 2025

Thank you for the PR!

ROS Noetic will reach end-of-life on May 31st, 2025. Every change comes with a risk of introducing regressions, and there isn't much time left to fix them. To make sure this PR doesn't introduce any regressions please:

  • Describe how you tested this change
  • Recruit at least one more person to review this PR and try it out on their system

The code where the segmentation fault was triggered was something like

void callbackOnTopicA(msg) {
  a_queue_.push_back(msg);
}

void callbackOnTopicB(msg) {
  // ...

  ros::Time t = ros::Time::now() + ros::Duration(3.0);
  while (ros::Time::now() < t && ros::ok()) {
    if (correspondingMessageWasReceivedOnTopicA(msg.header.stamp)) {
       doSomething();
       break;
    }

    // Same CallbackQueue that is handling this callbackOnTopicB(), running in a single-threaded spinner.
    callback_queue_->callAvailable(ros::WallDuration(1.0));
  }
}

so a similar purpose as message_filters::TimeSynchronizer. The inner call to callAvailable() then segfaults with

#0  0x00007ffff7e14a14 in std::_Deque_iterator<ros::CallbackQueue::CallbackInfo, ros::CallbackQueue::CallbackInfo&, ros::CallbackQueue::CallbackInfo*> std::__copy_move_dit<true, ros::CallbackQueue::CallbackInfo, ros::CallbackQueue::CallbackInfo&, ros::CallbackQueue::CallbackInfo*, std::_Deque_iterator<ros::CallbackQueue::CallbackInfo, ros::CallbackQueue::CallbackInfo&, ros::CallbackQueue::CallbackInfo*> >(std::_Deque_iterator<ros::CallbackQueue::CallbackInfo, ros::CallbackQueue::CallbackInfo&, ros::CallbackQueue::CallbackInfo*>, std::_Deque_iterator<ros::CallbackQueue::CallbackInfo, ros::CallbackQueue::CallbackInfo&, ros::CallbackQueue::CallbackInfo*>, std::_Deque_iterator<ros::CallbackQueue::CallbackInfo, ros::CallbackQueue::CallbackInfo&, ros::CallbackQueue::CallbackInfo*>) ()
   from /opt/ros/intermodalics/lib/libroscpp.so
#1  0x00007ffff7e142a1 in std::deque<ros::CallbackQueue::CallbackInfo, std::allocator<ros::CallbackQueue::CallbackInfo> >::_M_erase(std::_Deque_iterator<ros::CallbackQueue::CallbackInfo, ros::CallbackQueue::CallbackInfo&, ros::CallbackQueue::CallbackInfo*>) ()
   from /opt/ros/intermodalics/lib/libroscpp.so
#2  0x00007ffff7e1291b in ros::CallbackQueue::callOneCB(ros::CallbackQueue::TLS*) () from /opt/ros/intermodalics/lib/libroscpp.so
#3  0x00007ffff7e12d7b in ros::CallbackQueue::callAvailable(ros::WallDuration) () from /opt/ros/intermodalics/lib/libroscpp.so
[...]

Here it is mentioned explicitly that recursing into callAvailable() is allowed and that

These requirements add a lot of complexity to CallbackQueue, and make it not as efficient as it could be.

But I think the issue is quite obvious from just the changed code snippet, like that the the number of callbacks added to the calling_ member variable does not match the number of elements in the callbacks_ list that are added to this thread's tls->callbacks list and then removed from callbacks_, without the need to even check where those variables are used for what purpose or what side effects the wrong counting may have.

So probably just leave the pull request for future reference if nobody else volunteers to test or review it...

@sloretz sloretz closed this May 31, 2025
@sloretz
Copy link
Contributor

sloretz commented May 31, 2025

ROS 1 is end-of-life (EOL) as of today, May 31st 2025. I am archiving this repository because:

  • it only supports ROS 1
  • it isn't needed anymore in ROS 2

If you still rely on ROS 1, read this page to learn about your options.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants