Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't shutdown contained entities. #313

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

clalancette
Copy link
Collaborator

It is tempting to think of the entities in the RMW graph as a hierarchy, where an rmw_context_t contains zero or more rmw_node_t, and rmw_node_t contains zero or more publishers, subscriptions, clients, and services.

However, the reality is that the upper layers (particularly rclcpp and rclpy) don't exactly view the entities like that. More specifically, each entity is considered standalone, that happens to have linkage to other entities in the graph. For example, a publisher is considered to be a standalone entity that happens to be linked to a particular node.

Because of this, it is not proper to shutdown all nodes within a context when the context is shutdown. The node should continue to live on until it is shutdown. And a similar argument goes for the node shutdown; it should not shutdown the publishers, subscriptions, clients, and services that are contained within it.

This manifested itself as a exception that was being thrown in some of the tests in test_communication. Because it is using a loop with rclcpp::ok(), followed by a publisher->publish(), the test would sometimes throw an exception if Ctrl-C was hit between the rclcpp::ok() and the publish() call. And that's because even though the context has been shut down, the publisher is an independent entity that should continue to exist until the next rclcpp::ok().

The fix here is simple; don't shut "contained" entities down during a context or node shutdown.

It is tempting to think of the entities in the RMW
graph as a hierarchy, where an rmw_context_t contains
zero or more rmw_node_t, and rmw_node_t contains zero
or more publishers, subscriptions, clients, and services.

However, the reality is that the upper layers (particularly
rclcpp and rclpy) don't exactly view the entities like that.
More specifically, each entity is considered standalone,
that happens to have linkage to other entities in the graph.
For example, a publisher is considered to be a standalone entity
that happens to be linked to a particular node.

Because of this, it is not proper to shutdown all nodes within
a context when the context is shutdown.  The node should continue
to live on until it is shutdown.  And a similar argument goes
for the node shutdown; it should not shutdown the publishers,
subscriptions, clients, and services that are contained within it.

This manifested itself as a exception that was being thrown in
some of the tests in test_communication.  Because it is using
a loop with rclcpp::ok(), followed by a publisher->publish(),
the test would sometimes throw an exception if Ctrl-C was hit between
the rclcpp::ok() and the publish() call.  And that's because even
though the context has been shut down, the publisher is an
independent entity that should continue to exist until
the next rclcpp::ok().

The fix here is simple; don't shut "contained" entities down during
a context or node shutdown.

Signed-off-by: Chris Lalancette <clalancette@gmail.com>
@clalancette clalancette requested a review from Yadunund November 14, 2024 13:19
@Yadunund
Copy link
Member

Hmm this contradicts the documentation for rcl_node_fini seen here. Could it be that the test expecation needs to be fixed instead?

@clalancette
Copy link
Collaborator Author

Could it be that the test expecation needs to be fixed instead?

I don't think so; the pattern of while (rclcpp::ok()) { pub->publish() } is pretty well established.

Hmm this contradicts the documentation for rcl_node_fini seen here.

I read the documentation for rcl_node_fini as saying "it is the users responsibility to shut down all entities before shutting down the node".

Further, I'll point out that none of the rest of the current RMWs clean up entities when calling rmw_destroy_node:

  • rmw_cyclonedds_cpp - removes this node from the graph, frees memory associated with the node, then decrements the references to the context
  • rmw_fastrtps_cpp - calls __rmw_destroy_node, which removes this node from the graph and frees memory associated with the node, then decrements the references to the context
  • rmw_connextdds - removes this node from the graph and frees the memory associated with the node, then decrements the references to the context

So I think that we should do the same as all of the rest, and not actually destroy all of the associated entities here.

Copy link
Member

@Yadunund Yadunund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying!

@clalancette clalancette merged commit 5366957 into rolling Nov 20, 2024
8 checks passed
@clalancette clalancette deleted the clalancette/dont-shutdown-contained branch November 20, 2024 21:51
YuanYuYuan added a commit to ZettaScaleLabs/rmw_zenoh that referenced this pull request Nov 21, 2024
YuanYuYuan added a commit to ZettaScaleLabs/rmw_zenoh that referenced this pull request Nov 22, 2024
YuanYuYuan added a commit to ZettaScaleLabs/rmw_zenoh that referenced this pull request Nov 25, 2024
imstevenpmwork pushed a commit to ZettaScaleLabs/rmw_zenoh that referenced this pull request Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants