Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timer class methods aren't thread safe #1025

Closed
ivanpauno opened this issue Mar 13, 2020 · 7 comments
Closed

Timer class methods aren't thread safe #1025

ivanpauno opened this issue Mar 13, 2020 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@ivanpauno
Copy link
Member

ivanpauno commented Mar 13, 2020

Feature request

Feature description

Methods are using timer_handle_ member variable without adding any kind of thread-safe protection, e.g.:

if (rcl_timer_cancel(timer_handle_.get()) != RCL_RET_OK) {

rcl functions aren't thread safe, so a std::mutex (or maybe std::recursive_mutex) should be added to protect concurrent access of timer_handle_.

See #1013 (comment).

Implementation considerations

@ivanpauno ivanpauno added the bug Something isn't working label Mar 13, 2020
@rotu
Copy link
Contributor

rotu commented Mar 19, 2020

Could you please add the description of what the bug is to this issue description?

@rotu
Copy link
Contributor

rotu commented Mar 19, 2020

I mean in the top post - the top comment right now is basically just a link, and well, it should describe the issue

What functions aren't threadsafe that should be? All instance methods of rclcpp::TimerBase?

Are there any observed behavioral problems or test failures that have been linked with this issue?

@ivanpauno
Copy link
Member Author

I mean in the top post - the top comment right now is basically just a link, and well, it should describe the issue

Done. If you read the old comment the post description was linking, it says almost the same thing.

What functions aren't threadsafe that should be? All instance methods of rclcpp::TimerBase?

All the ones that access this member variable:

std::shared_ptr<rcl_timer_t> timer_handle_;

Are there any observed behavioral problems or test failures that have been linked with this issue?

No, my complain is just theoretical now.
Several methods are accessing and modifying a data structure (making using of rcl functions), without protecting from concurrent access.
It might not have been detected as an issue now. It can segfault, hang or behave strangely in the future.

@wjwwood
Copy link
Member

wjwwood commented Mar 27, 2020

I doubt I get to this before the API freeze, What's the expectation here for triage? I don't think we need to investigate anything right? I'd either put it on the foxy board or backlog. If on the foxy board we can randomly assign it to me, but again, it will be after API freeze before I have any time.

@rotu
Copy link
Contributor

rotu commented Mar 27, 2020

FYI: not directly related to timers, but I did catch a case where the rclcpp API (rclcpp::Node constructor)'s lack of thread-safety caused a hard crash. I got lucky in that it was readily reproducible and visible.

I think we need a broader effort to instrument and report unsafe usage of functions that are not threadsafe.

ros2/rosbag2#329

@audrow
Copy link
Member

audrow commented Jun 18, 2020

@ivanpauno

I believe that the timer class is thread safe because all of the functions that use timer_handle_ say that they are thread safe in their documentation, except for rcl_timer_init which is in the constructor and is locked with a mutex from the clock. Below are the functions and the documentation that states they are thread safe.

I'm not sure why some of the code-blocks below aren't importing correctly.

timer_handle_ usages in timer.cpp

rcl_timer_init

This function is in the constructor and has a lock from the clock.

{
std::lock_guard<std::mutex> clock_guard(clock_->get_clock_mutex());
if (
rcl_timer_init(
timer_handle_.get(), clock_handle, rcl_context.get(), period.count(), nullptr,
rcl_get_default_allocator()) != RCL_RET_OK)
{
RCUTILS_LOG_ERROR_NAMED(
"rclcpp",
"Couldn't initialize rcl timer handle: %s\n", rcl_get_error_string().str);
rcl_reset_error();
}
}

rcl_timer_cancel

if (rcl_timer_cancel(timer_handle_.get()) != RCL_RET_OK) {

Documentation:

https://github.com/ros2/rcl/blob/ffcfda18db7aa9ad7c01ebdd26ae7bdd73c8206a/rcl/include/rcl/timer.h#L465-L491

rcl_timer_is_canceled

rcl_ret_t ret = rcl_timer_is_canceled(timer_handle_.get(), &is_canceled);

Documentation:

https://github.com/ros2/rcl/blob/ffcfda18db7aa9ad7c01ebdd26ae7bdd73c8206a/rcl/include/rcl/timer.h#L493-L519

rcl_timer_is_ready

if (rcl_timer_is_ready(timer_handle_.get(), &ready) != RCL_RET_OK) {

Documentation:

https://github.com/ros2/rcl/blob/ffcfda18db7aa9ad7c01ebdd26ae7bdd73c8206a/rcl/include/rcl/timer.h#L262-L290

rcl_timer_get_time_until_next_call

https://github.com/ros2/rclcpp/blob/cc65905efa5e2b7f2ba3d3c5ce0a69a716b95630/rclcpp/src/rclcpp/timer.cpp#L122-124

Documentation:

https://github.com/ros2/rcl/blob/ffcfda18db7aa9ad7c01ebdd26ae7bdd73c8206a/rcl/include/rcl/timer.h#L292-L325

Returning timer_handle_

I don't believe anything should be done here.

return timer_handle_;

timer_handle_ usages in timer.hpp

rcl_ret_t ret = rcl_timer_call(timer_handle_.get());

Documentation:

https://github.com/ros2/rcl/blob/ffcfda18db7aa9ad7c01ebdd26ae7bdd73c8206a/rcl/include/rcl/timer.h#L192-L234

@ivanpauno
Copy link
Member Author

I believe that the timer class is thread safe because all of the functions that use timer_handle_ say that they are thread safe in their documentation, except for rcl_timer_init which is in the constructor and is locked with a mutex from the clock. Below are the functions and the documentation that states they are thread safe.

Thanks for your analysis, I believe it's correct.
I've also checked some of the methods marked as thread safe in rcl, and they seem to be so.

rcl_timer_init which is in the constructor and is locked with a mutex from the clock

I don't like much using the mutex of another object, though reading the implementation this is unfortunately needed (a refactor in rcl would be needed to change that).

I'm not sure why some of the code-blocks below aren't importing correctly.

Code blocks from other repositories aren't rendered.
An extra space/new-line might affect if a code block is rendered or not.


It seems that this was only a false alarm from my side, so I will close the issue.
Thanks @audrow for the detailed analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants