Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publishing right before shutdown due to SIGINT #1706

Closed
erikschuitema opened this issue Jul 1, 2021 · 7 comments
Closed

Publishing right before shutdown due to SIGINT #1706

erikschuitema opened this issue Jul 1, 2021 · 7 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@erikschuitema
Copy link

Feature request

Feature description

Sometimes it is useful to publish messages right before shutting down, for example, a (lifecycle) state change message.

If a node shuts down due to a SIGINT signal and uses the default signal handler, the global context is shutdown as well. If rclcpp::spin(node) is used inside main() and SIGINT ends it, any code following it cannot publish because of this.

The callbacks installed with rclcpp::on_shutdown() are called after the shutdown, so this is also not the place to publish.

Currently, the only possible way I could find is to install a signal handler before rclcpp::init(), which will be stored as the 'old' signal handler inside the rclcpp::SignalHandler singleton when it installs the default handler. This 'old' handler does get called before actually shutting down.

Since signal handlers are fairly limited and typically require global variables to do anything useful, it would be a nice feature if callbacks could be installed that are called before a shutdown.

Implementation considerations

One option would be to have rclcpp::on_shutdown() accept an option to specify that the callback be called either before or after shutting down.
This implies that the callback is executed before/after any call to rclcpp::shutdown(), instead of only after SIGINT, but this could actually be an advantage.

@wjwwood
Copy link
Member

wjwwood commented Jul 1, 2021

So, having a callback before shutdown is pretty straightforward, we would just need to add a second loop through the callbacks before shutdown here:

// rcl shutdown
rcl_ret_t ret = rcl_shutdown(rcl_context_.get());
if (RCL_RET_OK != ret) {
rclcpp::exceptions::throw_from_rcl_error(ret);
}
// set shutdown reason
shutdown_reason_ = reason;
// call each shutdown callback
{
std::lock_guard<std::mutex> lock(on_shutdown_callbacks_mutex_);
for (const auto & callback : on_shutdown_callbacks_) {
(*callback)();
}
}

As you suggested, with either a flag to on_shutdown() or something similar we could separate pre- and post- shutdown callbacks.


However, I'm not sure a pre-shutdown callback will be sufficient. If you publish and then shutdown and then (presumably) continue by shutting the process down immediately, you may never send the published message, because publishing may be done asynchronously or not depending on the middleware implementation. This is why people are working on this issue: ros2/rmw#295 ("Add ability to confirm reader receive all sent messages"). You could use something like this in an "on shutdown" callback, but you wouldn't want to block for too long as you'd be holding up all the other callbacks and the shutdown of other contexts.

@wjwwood wjwwood added enhancement New feature or request help wanted Extra attention is needed labels Jul 1, 2021
@wjwwood
Copy link
Member

wjwwood commented Jul 1, 2021

But either way a feature to do callbacks pre-shutdown might be generally useful.

@erikschuitema
Copy link
Author

It would be intuitive if messages from reliable publishers are always delivered, even if the publication is immediately followed by rcl_shutdown() plus process exit. Is it possible to have rcl_shutdown() perform wait_for_acknowledgments (from ros2/rmw#295) for all reliable publishers? Then all pre-shutdown callbacks can be performed before starting the wait. The timeout on wait_for_acknowledgments is debatable of course.

@Barry-Xu-2018
Copy link
Collaborator

It is good to add callback of pre-shutdown.
While writing an example codes, I meet this problem. It is inconvenient to call wait_for_acknowledgments before shutdown (such as press ctrl + c). I have to create custom context for overriding shutdown().
https://github.com/Barry-Xu-2018/examples/blob/topic-example-for-wait_for_all_acked/rclcpp/topics/minimal_publisher/member_function_with_wait_for_all_acked.cpp.
I think we can provide the interface for pre-shutdown and user decides what he does want to do.

@wjwwood
Copy link
Member

wjwwood commented Jul 12, 2021

It would be intuitive if messages from reliable publishers are always delivered, even if the publication is immediately followed by rcl_shutdown() plus process exit. Is it possible to have rcl_shutdown() perform wait_for_acknowledgments (from ros2/rmw#295) for all reliable publishers?

It would not be intuitive if publishers prevent processes from exiting though. What happens in the case that the process with the subscription crashes and doesn't respond to ack/nak from the publisher? How long does it stay open trying to deliver it?

I think it's better if this is documented and then we provide some way to keep the process alive, IF that is what the user wants.

I'm still not convinced that delaying shutdown by blocking on wait_for_acknowledgments is a good idea.

@erikschuitema
Copy link
Author

I tried to create a minimal example that illustrates the current behavior (Foxy) of a reliable publisher and subscriber, in case the publisher shuts down immediately after publishing: https://gist.github.com/erikschuitema/249677e36f233fc421d1e0087d301668
The goal of the publisher is simple: wait for one subscriber, publish a number of messages with qos.reliable() and then quit.

Run the publisher with reliable_pubsub -p 10 to let it publish 10 messages right after the first subscriber connects, and then quit. Run the subscriber with reliable_pubsub -s 10 to receive 10 messages and then quit.
Most of the time, messages are never received. This is against my expectation. To me, it looks the same as executing int main() {std::cout << "Hello";} and not seeing any output, unless I would explicitly wait for cout to finish flushing before exiting.

The example above can execute an arbitrary sleep after publication, which changes the behavior. For example, run the publisher as reliable_pubsub -p 10 100 to add a sleep of 100ms after publishing.
With this sleep, some, all or no messages are delivered, depending on the length of the sleep. Such timing dependent behavior can be difficult to understand/debug for programmers not familiar with the issue at hand.

@wjwwood to answer your question: my personal view is that if I configure a publisher to be reliable, it's perfectly fine if it hangs a limited amount of time upon exiting due to a crashed subscriber. I don't see a 'liveliness' parameter for subscribers, unfortunately. Crashed subscribers are not the common case for me. If I really need the publisher to terminate, I can send SIGKILL.

@wjwwood
Copy link
Member

wjwwood commented Nov 29, 2021

#1714 is merged, and once ros2/examples#316 and friends are merged, you can choose to do this yourself. But I still do not think we should do this automatically. I'm going to close this issue, but we can continue to discuss it and if other maintainers jump in and disagree we can reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants