Skip to content

Conversation

@huchijwk
Copy link

@huchijwk huchijwk commented Dec 6, 2021

Implement the service request filtering function
based on managed node's requirements in
https://design.ros2.org/articles/node_lifecycle.html

Signed-off-by: Wonguk Jeong huchijwk@gmail.com

@huchijwk
Copy link
Author

huchijwk commented Dec 6, 2021

This PR is about unimplemented service request filtering in the managed node design.

Primary State: Inactive

...
While in this state, the node will not receive any execution time to read topics, perform processing of data, respond to functional service requests, etc.

This includes API changes where LifecycleNode's create_service API return type has changed.

  • rclcpp::Service -> rclcpp_lifecycle:LifecycleService

@alsora
Copy link
Collaborator

alsora commented Dec 8, 2021

Thank you for the PR.
Indeed there is currently a gap between the design and the implementation of lifecycle nodes.
This applies also to subscriptions.

However, I think that this is a complex situation where we need to evaluate various possible approaches.

What happens with your PR is that if an inactive server receives a request, it will still try to execute it (i.e. calling the new handle_request), but then it will return immediately before doing any work or producing a response.
The consequence is that on the client side, there is no way to know that the request has been dropped.
When the server node becomes active again, the dropped request is now lost and it will not be processed.

In my opinion, a better approach would be to act at the executor layer.
If a server is inactive, it shouldn't take new requests still leaving them into the middleware buffers.
This has the advantage that as soon as the server returns active, it will be able to process those requests (assuming that it didn't receive too many of them and its queue was overrun).
This has also the advantage of avoiding the performance cost required for extracting a request that will just be dropped.

@huchijwk
Copy link
Author

huchijwk commented Dec 8, 2021

@alsora thank you. I have the same concerns as you about my PR. Actually, I was thinking about overriding the take_type_erased_request of ServiceBase as a virtual function and returning false. what do you think? This prevents the executor from taking the request.

@alsora
Copy link
Collaborator

alsora commented Dec 8, 2021

Mmm, I need to double check, but I'm afraid that this approach may result in the executor continuously trying to take the request and so, never going back to sleep.

I think that it's necessary to propagate the information about the service being inactive to the executor itself, such that this server is not used.
For example, considering the existing waitset-based executor, the executor could check if the entity is active before adding it to the waitset

@huchijwk
Copy link
Author

huchijwk commented Dec 8, 2021

@alsora I think you're right. (executor recognizes the state and controls the waitset.) If so, isn't it right to generalize the rclcpp service to handle the active/inactive state instead of the Lifecycle-specific service (LifecycleService)? This seems to be a common feature

Ah, and I think that once this part is finished, subscription can be handled similarly as you said.

@alsora
Copy link
Collaborator

alsora commented Dec 8, 2021

In general, I think it would be better to keep the concept of active/inactive entities confined to a lifecycle wrapper.

However, we currently have that in rclcpp/include/rclcpp/strategies/allocator_memory_strategy.hpp we add services to a waitset via rcl_wait_set_add_service(wait_set, service.get(), NULL) but for waitables we use waitable->add_to_wait_set(wait_set);.

We could create add the add_to_wait_set also to subscriptions, services, etc.
Then, the implementation of this function for a regular server would just call rcl_wait_set_add_service(wait_set, service.get(), NULL) as before, but for a lifecycle server it would first check the boolean denoting if the entity is active or not.

A call to service->deactivate() would change this boolean to false, thus making the add_to_wait_set() function do nothing.

@huchijwk huchijwk force-pushed the master branch 2 times, most recently from 4c0186a to 28d3199 Compare December 9, 2021 01:50
@huchijwk huchijwk changed the title Filter service requests for inactive managed nodes [WIP] Filter service requests for inactive managed nodes Dec 9, 2021
@huchijwk huchijwk changed the title [WIP] Filter service requests for inactive managed nodes Filter service requests for inactive managed nodes Dec 9, 2021
@huchijwk
Copy link
Author

huchijwk commented Dec 9, 2021

@alsora I re-implemented,

  • declare add_to_wait_set() (pure virtual) in ServiceBase
  • implement add_to_wait_set() in Service
  • override add_to_wait_set() in LifecycleService

And,, since add_handles_to_wait_set() of rclcpp/include/rclcpp/strategies/allocator_memory_strategy.hpp traverses rcl_service_t (shared_ptr) not ServiceBase, maintains map to find ServiceBase weak pointer

@huchijwk
Copy link
Author

huchijwk commented Dec 9, 2021

@alsora done what you've mentioned.

Copy link
Collaborator

@alsora alsora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!
The PR looks good to me, I just left some last minor comments.

@huchijwk
Copy link
Author

huchijwk commented Dec 9, 2021

@alsora Thanks for the review. Added a commit that applies the last comment. :)

@bpwilcox
Copy link
Contributor

This PR looks great! I agree that we should apply similar changes to subscriptions and perhaps action servers as well to comply with the intended lifecycle design.

@huchijwk Do you intend to add subsequent changes for subscriptions after this PR? Also, perhaps a demo that mirrors the one in https://github.com/ros2/demos/tree/master/lifecycle but with servics/clients would be a helpful aid.

@huchijwk
Copy link
Author

@bpwilcox Yes, I will work on subscription and demo as well after this PR is merged.

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good to me, but a couple of questions.

probably this is off topic from this PR. with current implementation application is responsible to activate / inactivate endpoints such as rclcpp_lifecycle::LifecyclePublisher and rclcpp_lifecycle::LifecycleService, but i think it should be managed by LifecycleNode based on the current state?

@huchijwk
Copy link
Author

overall looks good to me, but a couple of questions.

probably this is off topic from this PR. with current implementation application is responsible to activate / inactivate endpoints such as rclcpp_lifecycle::LifecyclePublisher and rclcpp_lifecycle::LifecycleService, but i think it should be managed by LifecycleNode based on the current state?

@fujitatomoya According to the design concept, you are right. I think it makes sense to manage it in the lifecycle node, unless there is a specific rationale that it is implemented to change the state of the publisher through the API explicitly.

Implement the service request filtering function
based on managed node's requirements in
https://design.ros2.org/articles/node_lifecycle.html

Signed-off-by: Wonguk Jeong <huchijwk@gmail.com>
There is no need to display a warning
if the user explicitly inactivate the service

Signed-off-by: Wonguk Jeong <huchijwk@gmail.com>
Signed-off-by: Wonguk Jeong <huchijwk@gmail.com>
- remove unnecessary include <map>
- use cleaner expression
- add comment regarding behavior of add_to_wait_set()

Signed-off-by: Wonguk Jeong <huchijwk@gmail.com>
Signed-off-by: Wonguk Jeong <huchijwk@gmail.com>
@huchijwk
Copy link
Author

Rebased

@alsora
Copy link
Collaborator

alsora commented Dec 21, 2021

Hi @ivanpauno @fujitatomoya , it looks like all comments have been addressed.
Any reason for not moving forward with this PR?

@ivanpauno
Copy link
Member

Hi @ivanpauno @fujitatomoya , it looks like all comments have been addressed.

Hi @alsora, I haven't done a thorough out review yet.
I will try to take a look on the first week of January.

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fujitatomoya
Copy link
Collaborator

@alsora @bpwilcox @huchijwk could you check your comments can be resolved?

@alsora
Copy link
Collaborator

alsora commented Dec 21, 2021

For some reason I don't see the "Resolve conversation" button on my comments.
Anyhow, I confirm that for me and Brian (who is on vacation) this is good to go.

@huchijwk
Copy link
Author

Comments resolved

Copy link
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor API feedback, but otherwise this LGTM.

I think we can ignore my feedback for the moment and double check with other maintainers what they think.

*/
template<typename ServiceT, typename CallbackT>
typename rclcpp::Service<ServiceT>::SharedPtr
typename rclcpp_lifecycle::LifecycleService<ServiceT>::SharedPtr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is the same we do for publishers, but IMO it would be better to have create_service()/create_publisher() alongside create_lifecycle_service()/create_lifecycle_publisher().

You can currently still create a non-managed entity by using the full method path, but having non-overriding methods would be clearer IMO.


This would be a breaking change, so we can ignore it for the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. Consider some code that operates on a node-like object. It nice to be able to pass in an rclcpp::Node or rclcpp_lifecycle::LifecycleNode to the same API (provided both classes are implementing the required interface). In this scenario, if we have lifecycle-specific methods, we'd end up not using the managed entities for lifecycle.

However, I'm not sure if this idea of passing Node and LifecycleNode interchangeable is true in practice.

If a user chooses LifecycleNode, then I would think they want managed entities in most cases. So, I lean towards "overriding" create_service(), create_publisher(), etc. If the user wants to create an non-managed entity, they still have that option.

I guess either way, we'll be breaking API (currently this PR is changing the return type).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this a bit later being interested in the suite of managed entity implementation

I would agree with @ivanpauno thinking from a potential user bug perspective.

I could imagine a user who wants a mixture of entity types with some being active even in Inactive state (e.g., a GetXParam.srv used to grab a loaded variable while the node is Inactive as you don't want to spin other entities).

This may* be poor user design in theory but I could easily see someone trying to do this. If create_service was overriden to default to a managed entity, it would be difficult to explicitly see why the service is turned off/not accepting requests as it was automatically turned into a ManagedEntity type under the hood.

@jacobperron I think you bring up a great point, I'm not sure the feature parity between the two at the moment

Thinking out loud: I can see the other side, however, where you likely want to use ManagedEntity types and discourage any non-managed entity types within lifecycle nodes. Possibly there could be a compiler warning or equivalent when calling create_service instead of create_lifecycle_service from within a lifecycle node.

Comment on lines +130 to +140
rclcpp::AnyServiceCallback<ServiceT> any_service_callback;
any_service_callback.set(std::forward<CallbackT>(callback));

rcl_service_options_t service_options = rcl_service_get_default_options();
service_options.qos = qos_profile;

auto service = LifecycleService<ServiceT>::make_shared(
node_base_->get_shared_rcl_node_handle(), service_name, any_service_callback, service_options);
auto serv_base_ptr = std::dynamic_pointer_cast<rclcpp::ServiceBase>(service);
node_services_->add_service(serv_base_ptr, group);
return service;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to replace this code with something similar to

using PublisherT = rclcpp_lifecycle::LifecyclePublisher<MessageT, AllocatorT>;
return rclcpp::create_publisher<MessageT, AllocatorT, PublisherT>(
*this,
topic_name,
qos,
options);
?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered it, and I did not do it because I thought I had to change the create_service template of rclcpp, which is somewhat far from the purpose of this PR.

Unlike create_publisher, the return type of create_service is a specific type rclcpp::Service.

std::shared_ptr<PublisherT>
create_publisher(

typename rclcpp::Service<ServiceT>::SharedPtr
create_service(

In the next PR, it is planned to change the create_service part to a factory design like a publisher. https://github.com/ros2/rclcpp/pull/1836/files#r767374194
At that time, I will also consider changing the template parameter from rclcpp::Service to ServiceT as well.

* It is more a convenient interface class
* than a necessary base class.
*/
class LifecycleServiceInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason why we need a new interface and this cannot be the same as LifecyclePublisherInterface?
I would rather have a single ManagedEntityInterface than multiple ones.

Copy link
Author

@huchijwk huchijwk Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, using LifecyclePublisherInterface in a service seemed odd.

I thought of unified interface (ManagedEntityInterface) too, but since it was a part that had to change the API that was not related to the purpose of this PR, I thought to make a separate PR that consolidates the interface and deprecates the existing API after this PR is included.

By the way, if #1846 is accepted, there may be no compatibility issues as that API will become an internal API.

Comment on lines +74 to +78
void on_activate() override {enabled_ = true;}

void on_deactivate() override {enabled_ = false;}

bool is_activated() override {return enabled_;}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also maybe add a SimpleManagedEntity implementation of the interface that just sets a flag like this, as it seems we're repeating the same implementation in many places.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will implement it together with the additional PR related to the ManagedEntityInterface mentioned above. how is it?

My PR plan is as follows.

  1. Lifecycle Service implementation (this PR)
  2. Change rclcpp::Service to factory design
  3. Integrated managed node interface
  4. Implement Lifecycle Subscription
  5. Automatic transition of lifecycle (if Allow LifecycleNode to automatically transition Lifecycle-enabled ROS2 entities #1846 is accepted)

Copy link
Member

@jacobperron jacobperron Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm mistaken, but can we just rename this interface to ManagedEntityInterface (or perhaps LifecycleEntityInterface, which seems more consistent) now, without changing anything else? Then we can avoid a deprecation cycle.

@ivanpauno
Copy link
Member

I think we can ignore my feedback for the moment and double check with other maintainers what they think.

@jacobperron @wjwwood could you share your thoughts about #1836 (comment), #1836 (comment) and #1846 (I followed #1846 ideas in the rclpy managed node implementation).

RCLCPP_PUBLIC
virtual
bool
add_to_wait_set(rcl_wait_set_t * wait_set) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My one concern with this new API is how it will interact with the existing WaitSet classes. I suppose this will not work as expected if someone chooses to use a WaitSet object. Maybe we should update this statement to use the new API:

rcl_ret_t ret = rcl_wait_set_add_service(
&rcl_wait_set_,
service_ptr_pair.second->get_service_handle().get(),
nullptr);

I'm not sure though; I'd like to hear from @ivanpauno and @wjwwood

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this new method seems like a poor way to control whether or not a service is handled in the managed node, and it won't work with the WaitSet classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't have lifecycle subscriptions yet either, but I would say we need a new way to indicate whether a subscription should be waited on or not.

Initially I thought the callback groups could already be used for this, but you could have a callback group with one normal service and one lifecycle service. So I don't think that will be useful.

The other question is, should it just not "get execution time" as was quoted in the first comment of this pr, or should requests received while not active be discarded? I.e., imagine: "service is inactive" -> "request is received" -> "service becomes active", should the request received before being active be handled? My intuition is no, but I'm not 100% sure.

If we said that requests received during an inactive state are ignored, we should implement the lifecycle service using a callback that wraps the user's callback and only calls it if the lifecycle state is active.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjwwood Hmm... ignoring requests in inactive state seems fine too. If we decide to ignore requests, what do you think about filtering by overriding the handle request instead of wrapping the callback? (The initial implementation did that.)

@alsora what do you think about ignoring requests in inactive state?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the design document https://design.ros2.org/articles/node_lifecycle.html

While in this state, the node will not receive any execution time to read topics, perform processing of data, respond to functional service requests, etc.
In the inactive state, any data that arrives on managed topics will not be read and or processed. Data retention will be subject to the configured QoS policy for the topic.

This seems to only refer to execution time, i.e. the executor will ignore the entity for as long as it is inactive.

From a performance point of view, the current approach is also much more efficient than having a "wrapper callback".
The "wrapper callback" approach requires deserialization, going through all the ROS layers and doing work in the executor.
When a server is inactive, it may be for a variety of reason, but I wouldn't want it to perform any work at all.

Lastly, although a wrapper callback may work with subscriptions, how would that be implemented with servers where the callback usually needs to send a response back to the client?
Will this require for the response to have a "success/failure" bool field ? How does the client interpret if the server invocation failed due to the server being inactive or due to problems in the request?

For what concerns the fact that this won't work with the new WaitSet class, is this really a problem? The point of the WaitSet class is to decouple waiting and scheduling. The user is responsible for taking the data and executing it, so it should be easy to handle this.
Anyhow, can you elaborate why this wouldn't work? The WaitSet class is currently calling rcl_wait_set_add_service, can't we have it call server->add_to_wait_set() API which is overridden here?

For what is worth, I looked into how to implement lifecycle entities also with the RMW listener APIs and the events executor. Here the approach would be slightly different since the add_to_wait_set function is not invoked.
The approach would consist in the on_activate and on_deactivate functions to remove/restore the listener callback if present.

Copy link
Member

@wjwwood wjwwood Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to ignore requests, what do you think about filtering by overriding the handle request instead of wrapping the callback? (The initial implementation did that.)

That could work, but maybe it produces overhead you don't want, as @alsora said.

When a server is inactive, it may be for a variety of reason, but I wouldn't want it to perform any work at all.

I think to achieve this we would need to have a new active state in each entity and the executor/wait set in rclcpp would need to know to ignore it when it is inactive, as well we'd need a way to notify the executor/wait set when that state changes (so it can build a new wait set that includes it).

Once we have that, we can either let the messages/requests accumulate or we can clear them as we become active, which ever seems appropriate (we could even have a configuration for that).

Lastly, although a wrapper callback may work with subscriptions, how would that be implemented with servers where the callback usually needs to send a response back to the client?
Will this require for the response to have a "success/failure" bool field ? How does the client interpret if the server invocation failed due to the server being inactive or due to problems in the request?

I would say the request is taken (cleared out), but never acted on. This would be no different than any other timed out service call from the client's perspective. This is already the case and a shortcoming of services, so it's something worth improving, but not specific to this issue (lifecycle behavior), as there are other reasons a service may appear available to the client but fail to respond (service callback throws or service server is destroyed before request is received, etc).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what concerns the fact that this won't work with the new WaitSet class, is this really a problem? The point of the WaitSet class is to decouple waiting and scheduling. The user is responsible for taking the data and executing it, so it should be easy to handle this.
Anyhow, can you elaborate why this wouldn't work? The WaitSet class is currently calling rcl_wait_set_add_service, can't we have it call server->add_to_wait_set() API which is overridden here?

It can work, the WaitSet class just needs to be updated. However, I would not do it by overriding the add_to_wait_set() method. I would instead add a state to all waitable entities like "active" and let the wait set introspect that when deciding whether or not to add it to the rcl wait set.

Plus as I said before, you need a feedback mechanism so that when that state changes the wait set can wake up and rebuild the wait set to include the newly active items.

@audrow audrow changed the base branch from master to rolling June 28, 2022 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants