-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Isolate timer interface #447
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some style issues and some tests are failing https://build.ros2.org/job/Rpr__geometry2__ubuntu_focal_amd64/313/
|
I have fixed these issues, please review again. |
tf2_ros/src/create_timer_ros.cpp
Outdated
rclcpp::node_interfaces::NodeTimersInterface::SharedPtr node_timers) | ||
: node_base_(node_base), node_timers_(node_timers), next_timer_handle_index_(0) | ||
rclcpp::node_interfaces::NodeTimersInterface::SharedPtr node_timers, | ||
bool spin_thread) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spin_thread
doesn't seem quite right here. Since this has to do specifically with the timer processing, I think a different name here would be more appropriate so its more clear to users what this actually is for (e.g. allows the timer interface to be processed independently from the base_node
application processing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you give me some suggestions for naming ? how about is_independent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure actually. I'd be interested what the maintainers here think (or if they feel its even worth parameterizing or should just always be enabled)
tf2_ros/src/create_timer_ros.cpp
Outdated
rclcpp::CallbackGroupType::MutuallyExclusive, false); | ||
executor_ = std::make_shared<rclcpp::executors::SingleThreadedExecutor>(); | ||
executor_->add_callback_group(callback_group_, node_base_); | ||
dedicated_thread_ = std::make_unique<std::thread>([&]() {executor_->spin();}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for maintainers: Would it make more sense to you to only have this thread running when in use (e.g. when we create_timer
create this thread as well and when we reset / cancel it we destroy it then?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we we shouldn't start a thread if it's going to be unused. That will cause overhead and make debugging generally harder. Especially if there ends up potentially being several instances.
please review this PR, i need your suggestion, thank you! @clalancette @tfoote. |
@ahcorde can you give this a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about the threading, if you cancel the callback the thread should also stop. And while looking at that. I see that you've added cleanup for the thread on destruction. However it doesn't appear that the timer gets cancelled on destruction which I think is a potential lifecycle issue as the callback refers to this
but if this
destructs that will be a use after free bug.
At first I was thinking that this was just going to be for internal usage and then the internal thread used for subscribing should work. But this is for a user callback. And as such we should not create inaccessible callback groups and extra threads. We should expose to the user the ability to associate the timer callback with a callback group of their choice. If they do not associate the callback with a group, they will get the current behavior of using the default callback group. If the user cares about this not blocking they have the option then to use a different default callback group setting or to create a custom callback group. This leaves the user in charge of the threading and provides them with all the standard threading tools that are consistent with the rest of the system instead of a single dimensional boolean value.
tf2_ros/src/create_timer_ros.cpp
Outdated
rclcpp::CallbackGroupType::MutuallyExclusive, false); | ||
executor_ = std::make_shared<rclcpp::executors::SingleThreadedExecutor>(); | ||
executor_->add_callback_group(callback_group_, node_base_); | ||
dedicated_thread_ = std::make_unique<std::thread>([&]() {executor_->spin();}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we we shouldn't start a thread if it's going to be unused. That will cause overhead and make debugging generally harder. Especially if there ends up potentially being several instances.
@tfoote thanks for your suggestions, i have updated this PR, please review again. |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2021-9-16/22372/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a test ?
sorry, i couldn't find a test file about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks that's much cleaner passing through the callback groups.
So @gezp, what would the implementation of this look like if we wanted to isolate this in Nav2? Would we need to make a new callback group to pass to it? That works, but isn't very clean, since I think this is the behavior that should be default, otherwise its a little buried to users that TF2's timer is being run on their node's executor.
I believe the opposite is better, but agree that exposing it to the user is necessary. I think by default we should construct a callback group for use internally, but if a user does not want isolated behavior, then they can pass in their own default / other callback group. That would still leave the user in charge of the threading if they wanted to but then not trip up TF2 with user application code execution models. |
yes, if we want to isolate |
It's not buried. Every callback that you register uses the nodes default executor unless you specify otherwise.
For consistency we shouldn't create an extra thread (not just a callback queue is required for isolating) just for TF timer callbacks. That's not how we work with any other callbacks. And I don't see why this should appear by default in a different callback queue than any other time callback that you register. There's no inherently blocking call to within the library or anything else that differentiates this callback for a tf2 timer from any other callback or timer on the system all of which will be blocked by the long running callback. The case that this is solving is that the user/developer has both registered a long running callback and then wants a quick firing callback to be run in parallel. To do that the standard answer is to change the executor to be multi threaded, or put the callback in different queues so they can be serviced simultaneously. The tf2 library is designed to be a library that can be accessed from any thread and doesn't change or add any complexity to the users' threading model. Adding a TF2 timer callback shouldn't automatically add a thread to your system. Adding a thread by default will solve this case, but will make all the other cases more complex. In this case it's very clear, if you run a long running callback your timers won't fire until you return, unless you enable multi-threading in one way or another. If these timer callback were to fire by default in an isolated thread, for consistency I would want all the timer callbacks to fire on isolated threads. At which point we're now significantly increasing the complexity of the default simplest system. And we would have the option to overide the complex system to return it to a simple single threadable system. But that's counter to our design goals of keeping the default situation simple. |
I think its a matter of preference. I respect that viewpoint. My thoughts on it is that utilities like TF should fully encapsulate, by default, its execution (but be override-able) such that users don't have to think about it in their application code. Keeping the application code using utility libraries as clear as possible is my usual aim.
I intellectually don't disagree with that, but I'm not sure on average how many users really understand what TF is doing behind the scenes to have that appreciation when they're simply copy-pasting code around to get a TF Buffer / Listener to work in their project. They're not trying to use CreateTimerROS (or the actual API of For example, below is a typical snippet of code for using TF in a ROS 2 application. It's not immediately clear to anyone what the timer interface is doing, or how it might impact their system's execution. I would argue it is pretty hidden from a user that calling
No disagreement intellectually to that, but it requires a user of TF to be able to get to a point that they're even aware that this is the issue at hand. Some design advice / exposure / documentation to that respect would go a long way (or by default hiding those details from a user). I don't believe the thread needs to be running at all times, just created "in-time" when a timer is created and destroyed when the timer is canceled. But I respect your position -- just wanting to give you a full accounting. This can be solved by documentation (and internal warnings within TF if it detects that this is what is happening) as well. If we don't enable default behavior that hides this concern from them, I would recommend having a warning if we detect lock-up on that timer/future so that people can immediately know what's happening and what might be a recommended course of action to resolve. |
So what's the word here, can we merge this? |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2021-10-28/22947/1 |
@ahcorde @clalancette what's the good word? |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2021-11-18/23209/1 |
@ahcorde the changes are passing CI and significantly cleaned up. I've rerequested a review from you. |
What's the good word? |
@ahcorde could you have a time to review this PR? |
Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com> fix some issues Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com> update Signed-off-by: zhenpeng ge <zhenpeng.ge@qq.com>
Signed-off-by: zhenpeng ge zhenpeng.ge@qq.com
As described in #446,
CreateTimerROS
create timers by using application's Node, there are some potential issue when application has some long running callback of services, etc.In this PR, a dedicated callback group and executor with thread are used to isolate TF timer so that timers won't be blocked by application's callback.