Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds design document for Deadline, Liveliness, and Lifespan. #212

Merged
merged 12 commits into from
Sep 24, 2019

Conversation

nburek
Copy link
Contributor

@nburek nburek commented Feb 13, 2019

This commit adds a new design document that explores how support should
be added to ROS for Deadline, Liveliness, and Lifespan QoS policies.

Design doc for task ros2/rclcpp#572

This commit adds a new design document that explores how support should
be added to ROS for Deadline, Liveliness, and Lifespan QoS policies.

Relates to issue: ros2/rclcpp#572
@tfoote tfoote added the in review Waiting for review (Kanban column) label Feb 13, 2019
@dirk-thomas
Copy link
Member

Please see the developer guide regarding the style of markdown files.

@dirk-thomas
Copy link
Member

I think there should be a section considering how this proposal affects RMW implementations which are not DDS based.

@dirk-thomas
Copy link
Member

@ros2/rmw_implementations FYI

Copy link
Member

@gbiggs gbiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the direction this is going, but the technical details need to be worked out more.

articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
@nburek
Copy link
Contributor Author

nburek commented Feb 14, 2019

We had an offline review about this as well with a few people. The following are the notes from that meeting. Responses and updates to the open questions and comments above coming in the next day.

- Main question is how does this impact rmw implementations that don’t support these QoS natively (non-DDS).

- We should not mandate behavior of an rmw when it doesn't support a policy, but we should have recommendations on behavior and a means for them to make this information available to the application.  

- We need to explore an explicit api for checking what capabilities the rmw layer provides as well as implicit behavior and make a decision if we just want to do one or the other or both.  
 
- Would be useful to review the flushed out API changes a little down the line when they’re ready.
 
- If we expect Lifespan to be used in a lot of cases then we do want to add native support for it in ROS for the cases when the RMW implementation is not DDS.  
 
- Not providing full stack support for the deadline feels odd. If we aren’t getting the full impact of the system then we need to be clear about what the deadline is actually a measurement of.
 
- Look at using multiple QOS structs for services or a unique struct for services. Compare and make a recommendation. 
 
- Actions and how they handle these new QoS settings can be separated out into its own design doc. 

- Similarly, if we decide an explicit API for checking rmw capabilities would be useful then we can separate that out into its own design review. 

@gbiggs
Copy link
Member

gbiggs commented Feb 15, 2019

I don't agree with treating actions separately for QoS. While it may be like that now, I think the design should be unified. Either that or treat services in the same way.

Having an API for checking RMW capabilities would be useful, and I agree that it's a separate discussion.

@spiderkeys
Copy link

spiderkeys commented Feb 15, 2019

A few minor notes:

All three of the policies mentioned apply on a per-instance basis in DDS. In the not too distant future, ROS2 will support IDL 4.2, which will allow users to annotate fields as keys. Even though there are not currently any plans (that I know of) to support a concept of keyed fields in RMW, this may still result in the underlying DDS implementation using the keys to create instances. This could lead to some unintended consequences if the implementation in RMW does not take instances into account.

One example is that in Connext, there is no guaranteed way to be notified of all instances within a datareader's (or writer's) cache that have violated a deadline:

  • The manual query of the Datareader's deadline status only returns the instance handle of the last violation
  • If multiple instances violate the deadline before you handle the status condition on the waitset, you will have missed all but the latest
  • If multiple instances violate the deadline in succession before the middleware threads execute listener callbacks, there is no provision for maintaining a queue of instance handles for which to invoke an installed Deadline listener.

I've spoken with some folks at RTI, and they agree that there should probably be a way to query or be alerted of all outstanding violations. I think it amounted to this aspect of the deadline API not being explicitly specified in the standard. I think you may also end up running into this issue with Liveliness unless you use a ReadCondition to handle the detection, as LivelinessChangedStatus similarly only contains a handle to the last instance handle that triggered it.

All of this to say that there should probably be some footnotes somewhere in the docs if these features are implemented with only single instances in mind! I often wonder if for features like this it might just be better to have a DDS-specific interface if they may not be generally applicable across the entire ROS2 ecosystem. As someone who is working on multiple projects which use native DDS in production and have some hopes of moving completely to ROS2, I would find this very useful and be willing to help out. I think this discussion around RMW "capabilities" could be a good starting point.

@nburek
Copy link
Contributor Author

nburek commented Feb 18, 2019

I don't agree with treating actions separately for QoS. While it may be like that now, I think the design should be unified. Either that or treat services in the same way.

I believe that there was a design discussion that occurred around this topic during the design of Actions. I'm basing my design of not handling Actions explicitly like we do Topics and Services based on the interface that Actions currently implement that explicitly exposes the QoS policy for the underlying topics and also the lack of direct support for Actions in the rmw layer.

I've spoken with some folks at RTI, and they agree that there should probably be a way to query or be alerted of all outstanding violations. I think it amounted to this aspect of the deadline API not being explicitly specified in the standard. I think you may also end up running into this issue with Liveliness unless you use a ReadCondition to handle the detection, as LivelinessChangedStatus similarly only contains a handle to the last instance handle that triggered it.

I've updated the interface slightly to allow for multiple violations instead of only the latest, but noted that it is dependent on the RMW layers implementation.

Even though there are not currently any plans (that I know of) to support a concept of keyed fields in RMW, this may still result in the underlying DDS implementation using the keys to create instances.

This seems like it would be a bigger problem outside even this design. Even if the RMW layer doesn't expose keyed fields it shouldn't break anything if the underlying DDS implementation needs to start using them. That seems like something that will need to be solved in the migration to IDL 4.2

I often wonder if for features like this it might just be better to have a DDS-specific interface...
I can see the usefulness of having that, but I'm not sure if it's something that should be encoruaged. Adding a DDS specific interface would tie any Node using it to a DDS implementation of the RMW. That reduces the portability of ROS nodes that end up relying on it. That was part of the reason why these QoS settings are being explicitly defined in ROS terms instead of just entirely leaning on passing them through to the rmw dds implementations.

articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
@nburek
Copy link
Contributor Author

nburek commented Feb 28, 2019

Are there any other outstanding concerns with this design?
Do I need to schedule another virtual meeting to go over it with interested parties or are we good to merge it in?

@gbiggs
Copy link
Member

gbiggs commented Mar 1, 2019

I didn't know there was a first virtual meeting. I'd certainly like to have another to talk over any latent concerns and clarify points from the comments if necessary.

Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for taking the time to write it down. I do think it would be good to have a live conversation about it just to make sure we're all on the same page before merging this, but it's mostly to clarify details. I don't see any major blockers.

articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
For Topic Subscribers it establishes the maximum amount of time allowed to pass between receiving messages.
For Topic Publishers it establishes the maximum amount of time allowed to pass between sending messages.
For Service Owners it establishes the maximum amount of time allowed to pass between receiving a request and when a response for that request is sent.
For Service Clients it establishes the maximum amount of time allowed to pass between sending a request and when a response for that request is received.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a behavior that DDS ensures (like in RTI's request/reply API)? or is it something that we would ensure in rcl and above? (presumably by emulating the behavior of deadline QoS for topics)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I looked before I did not see anything in the DDS RPC spec about deadline behavior for the request/reply pattern. This is something that would need to be tracked in the ROS layers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjwwood, @nburek: There is LatencyBudged which is unfortunately just a hint to the DDS implementation and not enforced.

Maybe it makes sense to use separate two semantically different things by calling them differently in ROS 2:

  • Deadline: An instance must be updated within a specified duration.
  • Latency budget: Maximum acceptable delay from the time the data is written to the time it is received by the subscribing application.

The notation of a latency budget could then implemented enforceable on the ROS 2 level for pub/sub, RPC and actions as an additional QOS. Which could also be made even more useful by allowing the user to select between sender timestamp and data timestamp (stored in ROS 2 stamped types) for discerning if the latency budget was exceeded.

articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
- How does the Deadline policy take into account the additional overhead of ROS (such as deserialization) when determining if a deadline was missed?
- As a simplification it is not going to attempt to take into account any ROS overhead. A deadline will be considered missed if the rmw layer does not receive a message by the deadline and not if the user application on top of ROS does not receive it by the deadline. A new deadline policy could be added later that takes this into account.
- What happens if multiple policy violations occur between the invocation of the callback?
- This is dependent on the rmw implementation. If the rmw layer supports tracking multiple status events then it can return them in subsequent calls to rmw_take_status. If it does not support tracking multiple events at once then it may just return one of the events, such as the first or the last to occur in the sequence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be one way or the other and guaranteed by rmw, because I don't think it will acceptable to say "sometimes you'll get all the events, but in other situations you may not". The easy solution is to only ever have the latest event available. Again we can discuss this in more detail in a live meeting.

articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved
@nburek
Copy link
Contributor Author

nburek commented Mar 1, 2019

Let's plan to have another virtual meeting to discuss the remaining concerns. Please let me know if you can't attend and I can find another time. I'll post on discourse as well, but I want to make sure everyone already involved in this review can attend if they want.

Currently I'm thinking Wednesday March 6th at 9:00am PST (UTC-08:00)

You have been invited to an online meeting, powered by Amazon Chime.

  1. Click to join the meeting:

https://chime.aws/8643562590

Meeting ID: 8643 56 2590

  1. You can use your computer’s microphone and speakers, however, a headset is recommended. Or, call in using your phone:

United States Toll-Free: +1 855-552-4463
Meeting PIN: 8643 56 2590

One-click Mobile Dial-in (United States (1)): +1 206-462-5569,,8643562590#

United States (1): +1 206-462-5569
Spain Toll-Free (1): +34 900 813 473
United Kingdom Toll-Free (1): +44 800 085 5175
International: https://chime.aws/dialinnumbers/

@gbiggs
Copy link
Member

gbiggs commented Mar 3, 2019

That time doesn't work for me, unfortunately. Due to family commitments, realistically the latest I can start is midnight and the earliest I can start is 7AM (Tokyo time). If you can find a time outside those, I'll be grateful. If not, then please record the meeting so I can listen to it later.

@dejanpan
Copy link

dejanpan commented Mar 4, 2019

That time doesn't work for me, unfortunately. Due to family commitments, realistically the latest I can start is midnight and the earliest I can start is 7AM (Tokyo time). If you can find a time outside those, I'll be grateful. If not, then please record the meeting so I can listen to it later.

@nburek since all participants on this thread are either from PST or JST time zone, a time between 3PM-9PM JST would work well: https://savvytime.com/converter/jst-to-pst.

@nburek
Copy link
Contributor Author

nburek commented Mar 27, 2019

I've made updates to the design document that we discussed at the virtual meeting a couple weeks back. The biggest update was to move Services to future work. I think the last big open question was if we needed to support buffering events of the same type so that the application wouldn't miss an instance of an event. After a little more research we decided not to add a buffer but to coalesce multiple events for the same instance as that appears to be what the DDS api is tailored towards at the moment.

@prajakta-gokhale
Copy link

@wjwwood @gbiggs @tfoote

Could you please take a look at this again? Thanks!

@nburek
Copy link
Contributor Author

nburek commented Jul 19, 2019

I've updated the design doc to match the details of the implementation. It is ready for a final review before being merged.

@adam-dunc
Copy link

@tfoote Can we please get this merged? This has been released in Dashing and per @nburek the design has been updated to reflect what was implemented.


The liveliness policy establishes a contract for how entities report that they are still alive.
For Subscriptions it establishes the level of reporting that they require from the Publishers to which they are subscribed.
For Publishers it establishes the level of reporting that they will provide to Subscribers that they are alive.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we merge this as-is, then we need to have an issue which describes the work to be done in order to properly support liveliness with instances.

Right now, we assume the data model is one to one with publishers and subscriptions. If someone were to use a keyed data type with instances, then I'm not sure how this would break. It is already possible for users to create keyed types using DDS style IDL files in ROS.

It may need to be mentioned as a limitation in the API docs as well until we decide what to do differently.

To be clear, I support merging this as-is, but I would like to see one of the authors open up the issue that describes the necessary follow up work and/or the current limitations w.r.t. instances and keyed data.

articles/qos_deadline_liveliness_lifespan.md Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Outdated Show resolved Hide resolved

Both the Deadline and Liveliness policies generate events from the rmw layer of which the application will need to be informed.
For Deadlines, the Subscriber receives event notifications if it doesn't receive anything within the deadline and the Publisher receives event notifications if it doesn't publish anything within the deadline.
For Liveliness, Subscribers receive events when the Publisher they're listening to is no longer alive.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be covered in a follow up issue (the same one as for deadline or in a separate one).

Also, lifecycle impacts data buffering

I not sure what @deeplearningrobotics is speaking about here.

articles/qos_deadline_liveliness_lifespan.md Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Show resolved Hide resolved
articles/qos_deadline_liveliness_lifespan.md Show resolved Hide resolved
@wjwwood
Copy link
Member

wjwwood commented Aug 20, 2019

Sorry for letting it linger, but after looking once more I see a few small things we should tie up before merging.

@nburek
Copy link
Contributor Author

nburek commented Aug 26, 2019

I've opened an issue to track enhancing the design of liveliness and deadline to support instanced topics with keys. See: #252

Copy link
Contributor Author

@nburek nburek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for letting it linger, but after looking once more I see a few small things we should tie up before merging.

Thanks for taking a look again, @wjwwood. I've made the requested updates and opened one new issue to track keyed instances. See my comment above about opening the other issue to support data composition.

articles/qos_deadline_liveliness_lifespan.md Show resolved Hide resolved

Both the Deadline and Liveliness policies generate events from the rmw layer of which the application will need to be informed.
For Deadlines, the Subscriber receives event notifications if it doesn't receive anything within the deadline and the Publisher receives event notifications if it doesn't publish anything within the deadline.
For Liveliness, Subscribers receive events when the Publisher they're listening to is no longer alive.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can create a follow up issue here as well, but this sounds like a message composition feature to me. I know that DDS allows you to compose data from multiple publishers into a single subscription, but does ROS have plans to allow that as well? If not, then I don't think it makes sense to open an issue to update the design for it.

@prajakta-gokhale
Copy link

@wjwwood could you please review the latest changes?

@wjwwood
Copy link
Member

wjwwood commented Aug 29, 2019

Yeah, I'll try to get to them soon. Sorry for the delay.

Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, I think we just have the one point to clear up which I commented on yesterday. Once we decide what to do there I think this will be ready for merge.

@wjwwood
Copy link
Member

wjwwood commented Sep 12, 2019

Link to the outstanding thread, since I couldn't find it easily before:

#212 (comment)

when a subscriber will receive a liveliness event when there are multiple publishers for a topic.
@nburek
Copy link
Contributor Author

nburek commented Sep 24, 2019

lgtm, I think we just have the one point to clear up which I commented on yesterday. Once we decide what to do there I think this will be ready for merge.

@wjwwood Thanks for giving it another look. Sorry for the delay in my response as I've been on vacation for the past few weeks. I've made an update for the particular section in question. Take a look and let me know if you think that addresses the concern.

Signed-off-by: William Woodall <william@osrfoundation.org>
@wjwwood
Copy link
Member

wjwwood commented Sep 24, 2019

No worries @nburek, thanks for touching it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in review Waiting for review (Kanban column)
Projects
None yet
Development

Successfully merging this pull request may close these issues.