Filter service requests for inactive managed nodes #1836

huchijwk · 2021-12-06T01:27:29Z

Implement the service request filtering function
based on managed node's requirements in
https://design.ros2.org/articles/node_lifecycle.html

Signed-off-by: Wonguk Jeong huchijwk@gmail.com

huchijwk · 2021-12-06T01:34:45Z

This PR is about unimplemented service request filtering in the managed node design.

Primary State: Inactive

...
While in this state, the node will not receive any execution time to read topics, perform processing of data, respond to functional service requests, etc.

This includes API changes where LifecycleNode's create_service API return type has changed.

rclcpp::Service -> rclcpp_lifecycle:LifecycleService

alsora · 2021-12-08T14:32:57Z

Thank you for the PR.
Indeed there is currently a gap between the design and the implementation of lifecycle nodes.
This applies also to subscriptions.

However, I think that this is a complex situation where we need to evaluate various possible approaches.

What happens with your PR is that if an inactive server receives a request, it will still try to execute it (i.e. calling the new handle_request), but then it will return immediately before doing any work or producing a response.
The consequence is that on the client side, there is no way to know that the request has been dropped.
When the server node becomes active again, the dropped request is now lost and it will not be processed.

In my opinion, a better approach would be to act at the executor layer.
If a server is inactive, it shouldn't take new requests still leaving them into the middleware buffers.
This has the advantage that as soon as the server returns active, it will be able to process those requests (assuming that it didn't receive too many of them and its queue was overrun).
This has also the advantage of avoiding the performance cost required for extracting a request that will just be dropped.

huchijwk · 2021-12-08T14:53:45Z

@alsora thank you. I have the same concerns as you about my PR. Actually, I was thinking about overriding the take_type_erased_request of ServiceBase as a virtual function and returning false. what do you think? This prevents the executor from taking the request.

alsora · 2021-12-08T15:25:38Z

Mmm, I need to double check, but I'm afraid that this approach may result in the executor continuously trying to take the request and so, never going back to sleep.

I think that it's necessary to propagate the information about the service being inactive to the executor itself, such that this server is not used.
For example, considering the existing waitset-based executor, the executor could check if the entity is active before adding it to the waitset

huchijwk · 2021-12-08T15:46:02Z

@alsora I think you're right. (executor recognizes the state and controls the waitset.) If so, isn't it right to generalize the rclcpp service to handle the active/inactive state instead of the Lifecycle-specific service (LifecycleService)? This seems to be a common feature

Ah, and I think that once this part is finished, subscription can be handled similarly as you said.

alsora · 2021-12-08T16:00:07Z

In general, I think it would be better to keep the concept of active/inactive entities confined to a lifecycle wrapper.

However, we currently have that in rclcpp/include/rclcpp/strategies/allocator_memory_strategy.hpp we add services to a waitset via rcl_wait_set_add_service(wait_set, service.get(), NULL) but for waitables we use waitable->add_to_wait_set(wait_set);.

We could create add the add_to_wait_set also to subscriptions, services, etc.
Then, the implementation of this function for a regular server would just call rcl_wait_set_add_service(wait_set, service.get(), NULL) as before, but for a lifecycle server it would first check the boolean denoting if the entity is active or not.

A call to service->deactivate() would change this boolean to false, thus making the add_to_wait_set() function do nothing.

huchijwk · 2021-12-09T06:10:43Z

@alsora I re-implemented,

declare add_to_wait_set() (pure virtual) in ServiceBase
implement add_to_wait_set() in Service
override add_to_wait_set() in LifecycleService

And,, since add_handles_to_wait_set() of rclcpp/include/rclcpp/strategies/allocator_memory_strategy.hpp traverses rcl_service_t (shared_ptr) not ServiceBase, maintains map to find ServiceBase weak pointer

rclcpp/include/rclcpp/strategies/allocator_memory_strategy.hpp

rclcpp_lifecycle/include/rclcpp_lifecycle/lifecycle_service.hpp

huchijwk · 2021-12-09T14:00:41Z

@alsora done what you've mentioned.

rclcpp/include/rclcpp/strategies/allocator_memory_strategy.hpp

rclcpp_lifecycle/include/rclcpp_lifecycle/lifecycle_service.hpp

alsora

Great!
The PR looks good to me, I just left some last minor comments.

huchijwk · 2021-12-09T15:22:24Z

@alsora Thanks for the review. Added a commit that applies the last comment. :)

bpwilcox · 2021-12-12T20:58:51Z

This PR looks great! I agree that we should apply similar changes to subscriptions and perhaps action servers as well to comply with the intended lifecycle design.

@huchijwk Do you intend to add subsequent changes for subscriptions after this PR? Also, perhaps a demo that mirrors the one in https://github.com/ros2/demos/tree/master/lifecycle but with servics/clients would be a helpful aid.

rclcpp_lifecycle/test/test_lifecycle_service.cpp

huchijwk · 2021-12-12T22:45:52Z

@bpwilcox Yes, I will work on subscription and demo as well after this PR is merged.

fujitatomoya

overall looks good to me, but a couple of questions.

probably this is off topic from this PR. with current implementation application is responsible to activate / inactivate endpoints such as rclcpp_lifecycle::LifecyclePublisher and rclcpp_lifecycle::LifecycleService, but i think it should be managed by LifecycleNode based on the current state?

rclcpp_lifecycle/include/rclcpp_lifecycle/lifecycle_node_impl.hpp

huchijwk · 2021-12-13T02:38:34Z

overall looks good to me, but a couple of questions.

probably this is off topic from this PR. with current implementation application is responsible to activate / inactivate endpoints such as rclcpp_lifecycle::LifecyclePublisher and rclcpp_lifecycle::LifecycleService, but i think it should be managed by LifecycleNode based on the current state?

@fujitatomoya According to the design concept, you are right. I think it makes sense to manage it in the lifecycle node, unless there is a specific rationale that it is implemented to change the state of the publisher through the API explicitly.

rclcpp_lifecycle/include/rclcpp_lifecycle/lifecycle_service.hpp

Implement the service request filtering function based on managed node's requirements in https://design.ros2.org/articles/node_lifecycle.html Signed-off-by: Wonguk Jeong <huchijwk@gmail.com>

There is no need to display a warning if the user explicitly inactivate the service Signed-off-by: Wonguk Jeong <huchijwk@gmail.com>

Signed-off-by: Wonguk Jeong <huchijwk@gmail.com>

- remove unnecessary include <map> - use cleaner expression - add comment regarding behavior of add_to_wait_set() Signed-off-by: Wonguk Jeong <huchijwk@gmail.com>

Signed-off-by: Wonguk Jeong <huchijwk@gmail.com>

huchijwk · 2021-12-15T16:47:39Z

Rebased

alsora · 2021-12-21T09:47:04Z

Hi @ivanpauno @fujitatomoya , it looks like all comments have been addressed.
Any reason for not moving forward with this PR?

ivanpauno · 2021-12-21T12:54:45Z

Hi @ivanpauno @fujitatomoya , it looks like all comments have been addressed.

Hi @alsora, I haven't done a thorough out review yet.
I will try to take a look on the first week of January.

fujitatomoya

lgtm with follow-up mentioned in https://github.com/ros2/rclcpp/pull/1836/files#r767374194

fujitatomoya · 2021-12-21T18:05:57Z

@alsora @bpwilcox @huchijwk could you check your comments can be resolved?

alsora · 2021-12-21T20:14:32Z

For some reason I don't see the "Resolve conversation" button on my comments.
Anyhow, I confirm that for me and Brian (who is on vacation) this is good to go.

huchijwk · 2021-12-21T22:20:23Z

Comments resolved

ivanpauno

I have some minor API feedback, but otherwise this LGTM.

I think we can ignore my feedback for the moment and double check with other maintainers what they think.

ivanpauno · 2021-12-23T14:47:42Z

rclcpp_lifecycle/include/rclcpp_lifecycle/lifecycle_node.hpp

   */
  template<typename ServiceT, typename CallbackT>
-  typename rclcpp::Service<ServiceT>::SharedPtr
+  typename rclcpp_lifecycle::LifecycleService<ServiceT>::SharedPtr


I know this is the same we do for publishers, but IMO it would be better to have create_service()/create_publisher() alongside create_lifecycle_service()/create_lifecycle_publisher().

You can currently still create a non-managed entity by using the full method path, but having non-overriding methods would be clearer IMO.

This would be a breaking change, so we can ignore it for the moment.

I'm not sure. Consider some code that operates on a node-like object. It nice to be able to pass in an rclcpp::Node or rclcpp_lifecycle::LifecycleNode to the same API (provided both classes are implementing the required interface). In this scenario, if we have lifecycle-specific methods, we'd end up not using the managed entities for lifecycle.

However, I'm not sure if this idea of passing Node and LifecycleNode interchangeable is true in practice.

If a user chooses LifecycleNode, then I would think they want managed entities in most cases. So, I lean towards "overriding" create_service(), create_publisher(), etc. If the user wants to create an non-managed entity, they still have that option.

I guess either way, we'll be breaking API (currently this PR is changing the return type).

Coming back to this a bit later being interested in the suite of managed entity implementation

I would agree with @ivanpauno thinking from a potential user bug perspective.

I could imagine a user who wants a mixture of entity types with some being active even in Inactive state (e.g., a GetXParam.srv used to grab a loaded variable while the node is Inactive as you don't want to spin other entities).

This may* be poor user design in theory but I could easily see someone trying to do this. If create_service was overriden to default to a managed entity, it would be difficult to explicitly see why the service is turned off/not accepting requests as it was automatically turned into a ManagedEntity type under the hood.

@jacobperron I think you bring up a great point, I'm not sure the feature parity between the two at the moment

Thinking out loud: I can see the other side, however, where you likely want to use ManagedEntity types and discourage any non-managed entity types within lifecycle nodes. Possibly there could be a compiler warning or equivalent when calling create_service instead of create_lifecycle_service from within a lifecycle node.

ivanpauno · 2021-12-23T14:49:26Z

rclcpp_lifecycle/include/rclcpp_lifecycle/lifecycle_node_impl.hpp

+  rclcpp::AnyServiceCallback<ServiceT> any_service_callback;
+  any_service_callback.set(std::forward<CallbackT>(callback));
+
+  rcl_service_options_t service_options = rcl_service_get_default_options();
+  service_options.qos = qos_profile;
+
+  auto service = LifecycleService<ServiceT>::make_shared(
+    node_base_->get_shared_rcl_node_handle(), service_name, any_service_callback, service_options);
+  auto serv_base_ptr = std::dynamic_pointer_cast<rclcpp::ServiceBase>(service);
+  node_services_->add_service(serv_base_ptr, group);
+  return service;


Is it possible to replace this code with something similar to

rclcpp/rclcpp_lifecycle/include/rclcpp_lifecycle/lifecycle_node_impl.hpp

Lines 52 to 57 in bba7c9f

using PublisherT = rclcpp_lifecycle::LifecyclePublisher<MessageT, AllocatorT>;

return rclcpp::create_publisher<MessageT, AllocatorT, PublisherT>(

*this,

topic_name,

qos,

options);

?

I considered it, and I did not do it because I thought I had to change the create_service template of rclcpp, which is somewhat far from the purpose of this PR.

Unlike create_publisher, the return type of create_service is a specific type rclcpp::Service.

rclcpp/rclcpp/include/rclcpp/create_publisher.hpp

Lines 45 to 46 in 7a2ee23

std::shared_ptr<PublisherT>

create_publisher(

rclcpp/rclcpp/include/rclcpp/create_service.hpp

Lines 33 to 34 in 7a2ee23

typename rclcpp::Service<ServiceT>::SharedPtr

create_service(

In the next PR, it is planned to change the create_service part to a factory design like a publisher. https://github.com/ros2/rclcpp/pull/1836/files#r767374194
At that time, I will also consider changing the template parameter from rclcpp::Service to ServiceT as well.

ivanpauno · 2021-12-23T14:51:04Z

rclcpp_lifecycle/include/rclcpp_lifecycle/lifecycle_service.hpp

+ * It is more a convenient interface class
+ * than a necessary base class.
+ */
+class LifecycleServiceInterface


is there any reason why we need a new interface and this cannot be the same as LifecyclePublisherInterface?
I would rather have a single ManagedEntityInterface than multiple ones.

IMO, using LifecyclePublisherInterface in a service seemed odd.

I thought of unified interface (ManagedEntityInterface) too, but since it was a part that had to change the API that was not related to the purpose of this PR, I thought to make a separate PR that consolidates the interface and deprecates the existing API after this PR is included.

By the way, if #1846 is accepted, there may be no compatibility issues as that API will become an internal API.

ivanpauno · 2021-12-23T14:52:18Z

rclcpp_lifecycle/include/rclcpp_lifecycle/lifecycle_service.hpp

+  void on_activate() override {enabled_ = true;}
+
+  void on_deactivate() override {enabled_ = false;}
+
+  bool is_activated() override {return enabled_;}


I would also maybe add a SimpleManagedEntity implementation of the interface that just sets a flag like this, as it seems we're repeating the same implementation in many places.

I will implement it together with the additional PR related to the ManagedEntityInterface mentioned above. how is it?

My PR plan is as follows.

Lifecycle Service implementation (this PR)

Change rclcpp::Service to factory design

Integrated managed node interface

Implement Lifecycle Subscription

Automatic transition of lifecycle (if Allow LifecycleNode to automatically transition Lifecycle-enabled ROS2 entities #1846 is accepted)

Correct me if I'm mistaken, but can we just rename this interface to ManagedEntityInterface (or perhaps LifecycleEntityInterface, which seems more consistent) now, without changing anything else? Then we can avoid a deprecation cycle.

ivanpauno · 2021-12-23T14:57:07Z

I think we can ignore my feedback for the moment and double check with other maintainers what they think.

@jacobperron @wjwwood could you share your thoughts about #1836 (comment), #1836 (comment) and #1846 (I followed #1846 ideas in the rclpy managed node implementation).

jacobperron · 2022-01-06T16:42:45Z

rclcpp/include/rclcpp/service.hpp

+  RCLCPP_PUBLIC
+  virtual
+  bool
+  add_to_wait_set(rcl_wait_set_t * wait_set) = 0;


My one concern with this new API is how it will interact with the existing WaitSet classes. I suppose this will not work as expected if someone chooses to use a WaitSet object. Maybe we should update this statement to use the new API:

rclcpp/rclcpp/include/rclcpp/wait_set_policies/detail/storage_policy_common.hpp

Lines 362 to 365 in 2d6e636

rcl_ret_t ret = rcl_wait_set_add_service(

&rcl_wait_set_,

service_ptr_pair.second->get_service_handle().get(),

nullptr);

I'm not sure though; I'd like to hear from @ivanpauno and @wjwwood

Yeah, this new method seems like a poor way to control whether or not a service is handled in the managed node, and it won't work with the WaitSet classes.

I guess we don't have lifecycle subscriptions yet either, but I would say we need a new way to indicate whether a subscription should be waited on or not.

Initially I thought the callback groups could already be used for this, but you could have a callback group with one normal service and one lifecycle service. So I don't think that will be useful.

The other question is, should it just not "get execution time" as was quoted in the first comment of this pr, or should requests received while not active be discarded? I.e., imagine: "service is inactive" -> "request is received" -> "service becomes active", should the request received before being active be handled? My intuition is no, but I'm not 100% sure.

If we said that requests received during an inactive state are ignored, we should implement the lifecycle service using a callback that wraps the user's callback and only calls it if the lifecycle state is active.

@wjwwood Hmm... ignoring requests in inactive state seems fine too. If we decide to ignore requests, what do you think about filtering by overriding the handle request instead of wrapping the callback? (The initial implementation did that.)

@alsora what do you think about ignoring requests in inactive state?

According to the design document https://design.ros2.org/articles/node_lifecycle.html

While in this state, the node will not receive any execution time to read topics, perform processing of data, respond to functional service requests, etc.
In the inactive state, any data that arrives on managed topics will not be read and or processed. Data retention will be subject to the configured QoS policy for the topic.

This seems to only refer to execution time, i.e. the executor will ignore the entity for as long as it is inactive.

From a performance point of view, the current approach is also much more efficient than having a "wrapper callback".
The "wrapper callback" approach requires deserialization, going through all the ROS layers and doing work in the executor.
When a server is inactive, it may be for a variety of reason, but I wouldn't want it to perform any work at all.

Lastly, although a wrapper callback may work with subscriptions, how would that be implemented with servers where the callback usually needs to send a response back to the client?
Will this require for the response to have a "success/failure" bool field ? How does the client interpret if the server invocation failed due to the server being inactive or due to problems in the request?

For what concerns the fact that this won't work with the new WaitSet class, is this really a problem? The point of the WaitSet class is to decouple waiting and scheduling. The user is responsible for taking the data and executing it, so it should be easy to handle this.
Anyhow, can you elaborate why this wouldn't work? The WaitSet class is currently calling rcl_wait_set_add_service, can't we have it call server->add_to_wait_set() API which is overridden here?

For what is worth, I looked into how to implement lifecycle entities also with the RMW listener APIs and the events executor. Here the approach would be slightly different since the add_to_wait_set function is not invoked.
The approach would consist in the on_activate and on_deactivate functions to remove/restore the listener callback if present.

If we decide to ignore requests, what do you think about filtering by overriding the handle request instead of wrapping the callback? (The initial implementation did that.)

That could work, but maybe it produces overhead you don't want, as @alsora said.

When a server is inactive, it may be for a variety of reason, but I wouldn't want it to perform any work at all.

I think to achieve this we would need to have a new active state in each entity and the executor/wait set in rclcpp would need to know to ignore it when it is inactive, as well we'd need a way to notify the executor/wait set when that state changes (so it can build a new wait set that includes it).

Once we have that, we can either let the messages/requests accumulate or we can clear them as we become active, which ever seems appropriate (we could even have a configuration for that).

Lastly, although a wrapper callback may work with subscriptions, how would that be implemented with servers where the callback usually needs to send a response back to the client?
Will this require for the response to have a "success/failure" bool field ? How does the client interpret if the server invocation failed due to the server being inactive or due to problems in the request?

I would say the request is taken (cleared out), but never acted on. This would be no different than any other timed out service call from the client's perspective. This is already the case and a shortcoming of services, so it's something worth improving, but not specific to this issue (lifecycle behavior), as there are other reasons a service may appear available to the client but fail to respond (service callback throws or service server is destroyed before request is received, etc).

For what concerns the fact that this won't work with the new WaitSet class, is this really a problem? The point of the WaitSet class is to decouple waiting and scheduling. The user is responsible for taking the data and executing it, so it should be easy to handle this.
Anyhow, can you elaborate why this wouldn't work? The WaitSet class is currently calling rcl_wait_set_add_service, can't we have it call server->add_to_wait_set() API which is overridden here?

It can work, the WaitSet class just needs to be updated. However, I would not do it by overriding the add_to_wait_set() method. I would instead add a state to all waitable entities like "active" and let the wait set introspect that when deciding whether or not to add it to the rcl wait set.

Plus as I said before, you need a feedback mechanism so that when that state changes the wait set can wake up and rebuild the wait set to include the newly active items.

huchijwk mentioned this pull request Dec 7, 2021

LifecycleNode: implement service request filtering of inactive managed nodes. #1838

Open

huchijwk force-pushed the master branch 2 times, most recently from 4c0186a to 28d3199 Compare December 9, 2021 01:50

huchijwk changed the title ~~Filter service requests for inactive managed nodes~~ [WIP] Filter service requests for inactive managed nodes Dec 9, 2021

huchijwk force-pushed the master branch from 28d3199 to 3096b98 Compare December 9, 2021 04:55

huchijwk changed the title ~~[WIP] Filter service requests for inactive managed nodes~~ Filter service requests for inactive managed nodes Dec 9, 2021