feat: add qos_reliablity param #329

MrBlenny · 2024-12-05T01:09:01Z

Changelog

Adds a qos_reliablity param to the ros2 bridge that allows the default reliability to be set via param. Defaults to "automatic" to retain existing behavior.

Docs

N/A

Description

The web-socket bridge runs on TCP which causes issues on flakey internet connections. There is some discussion here:
https://github.com/orgs/foxglove/discussions/15

This PR adds the ability to specify "qos_reliability" of "best_effort" which is useful when running a bridge on a different computer to the robot for example:

ROBOT COMPUTER -> DDS/UDP -> REMOTE COMPUTER -> Websocket/TCP -> Foxglove / Browser

jtbandes · 2024-12-05T01:11:26Z

when running a bridge on a different computer to the robot

Just curious, why are you doing this rather than running the bridge on the robot?

MrBlenny · 2024-12-05T03:06:11Z

Running the bridge on the robot means the connection from the Robot to Foxglove is TCP.
When comms is poor (for example satellite internet on a field robot) the TCP connection can go down for a few seconds which will lead to a build-up of messages, potentially up to the send_buffer_limit. When connection is established there will then be a large flood of messages, some of which may no longer be relevant.

In my use-case, it is fine if messages are dropped via the DDS middleware. Adding WebRTC or WebTransport would be a nicer solution but this param provides a quick fix.

achim-k · 2024-12-05T13:25:14Z

README.md

@@ -84,6 +84,7 @@ Parameters are provided to configure the behavior of the bridge. These parameter
 * (ROS 2) __num_threads__: The number of threads to use for the ROS node executor. This controls the number of subscriptions that can be processed in parallel. 0 means one thread per CPU core. Defaults to `0`.
 * (ROS 2) __min_qos_depth__: Minimum depth used for the QoS profile of subscriptions. Defaults to `1`. This is to set a lower limit for a subscriber's QoS depth which is computed by summing up depths of all publishers. See also [#208](https://github.com/foxglove/ros-foxglove-bridge/issues/208).
 * (ROS 2) __max_qos_depth__: Maximum depth used for the QoS profile of subscriptions. Defaults to `25`.
+ * (ROS 2) __qos_reliability__: The default QoS reliability setting for subscriptions the bridge creates. Can be 'reliable', 'best_effort', or 'automatic'. Defaults to `automatic`.


Would be good to explain what the automatic setting does under the hood.

I've added a description of all options and another best_effort_if_volatile.
I know this is borderline but... this is a very useful default if running the bridge remote as transient_local topics are typically not things that should be thrown away.

A better option would probably be to have a regex matcher on topics that specifies which QoS foxglove should use when in subscribes.

defunctzombie · 2024-12-10T15:05:28Z

README.md

+ * (ROS 2) __qos_reliability__: The default QoS reliability setting for subscriptions the bridge creates. Defaults to `automatic`.
+   * `reliable`: ALWAYS subscribe with a "reliable" QoS profile.
+   * `best_effort`: ALWAYS subscribe with a "best effort" QoS profile.
+   * `best_effort_if_volatile`: subscribe as "best effort" if all the participants are "volatile". If any are "transient_local", subscribe as "reliable".


Why is it wrong to subscribe with "best effort" if any of the participants are "transient_local" ?

I'm not sure I can fully answer this question but the following info may be helpful: https://docs.ros.org/en/rolling/Concepts/Intermediate/About-Quality-of-Service-Settings.html#qos-compatibilities

It's a common pattern to publish a topic with "transient_local" durability whenever that value changes. For example a /robot/healthy Bool topic may be published only when the value changes from true<->false.

If the bridge subscribes to a transient_local topic with best_effort over a flakey connection. The historic transient_local message may be lost. These sorts of transient_local topics are typically not re-published so the bridge will never get the value.

best_effort_if_volatile is a useful default in this case but as mentioned, it's probably better to add a regex matcher param instead of this qos_reliability param.

i.e.
best_effort_qos_topic_whitelist: List of regular expressions (ECMAScript) for topics that should use be forced to use 'best_effort' QoS. Unmatched topics will use 'reliable' QoS if ALL publishers are 'reliable', 'best_effort' if any publishers are 'best_effort'. Defaults to ["(?!)"] (match nothing).

I might have a look at adding this today.

I have made said change. Ready for review again.

👍 we'll take a look

It's a common pattern to publish a topic with "transient_local" durability whenever that value changes. For example a /robot/healthy Bool topic may be published only when the value changes from true<->false.

FWIW my general advice on this is to to pick a cadence where you publish things like this even if they have not changed. It helps with data recording and debugging afterwards because you have the state information present without having to go back minutes or hours in history to lookup the topic value (which is hard in some workflows).

README.md

ros2_foxglove_bridge/src/param_utils.cpp

Co-authored-by: Hans-Joachim Krauch <achim-k@users.noreply.github.com>

### Changelog * add best_effort_qos_topic_whitelist param (#329) * Add missing functional include in message_definition_cache.cpp (#334)

tonynajjar · 2025-02-17T13:20:15Z

Running the bridge on the robot means the connection from the Robot to Foxglove is TCP.
When comms is poor (for example satellite internet on a field robot) the TCP connection can go down for a few seconds which will lead to a build-up of messages, potentially up to the send_buffer_limit. When connection is established there will then be a large flood of messages, some of which may no longer be relevant.

Hey @MrBlenny, out of curiosity, did you try setting max_qos_depth to a small number? Shouldn't that theoretically achieve what you want i.e. only keep the last few messages?

MrBlenny · 2025-02-17T20:37:08Z

The ros2 message handler is called and the messages are pushed onto the websockets queue for transmission in a non-blocking way as far as I can tell. That queue will then start to build up if the websockets connection is poor.

I've actually just been looking at this some more this week... The websocketpp lib would need to have a blocking send method of knowing which messages have been sent and which ones have not (by tracking message ID for example)

MrBlenny · 2025-02-20T05:11:50Z

@tonynajjar https://github.com/foxglove/ros-foxglove-bridge/pull/339/files may be of interest

* Fix "no matching function" error on yocto kirkstone (foxglove#331) ### Changelog None ### Docs None ### Description foxglove#330 Fixes for Yocto issue Fix for compilation error on yocto Kirkstone. > error: no matching function for call to 'max(long unsigned int, size_t) * bump to 0.8.2 (foxglove#332) Bumping to 0.8.2 for foxglove#331 * Add missing functional include in message_definition_cache.cpp (foxglove#334) ### Changelog message_definition_cache.cpp uses std::function, so it should include the functional STL header ### Docs None ### Description message_definition_cache.cpp uses std::function, so it should include the functional STL header * add best_effort_qos_topic_whitelist param (foxglove#329) ### Changelog Adds a `qos_reliablity` param to the ros2 bridge that allows the default reliability to be set via param. Defaults to "automatic" to retain existing behavior. ### Docs N/A ### Description The web-socket bridge runs on TCP which causes issues on flakey internet connections. There is some discussion here: https://github.com/orgs/foxglove/discussions/15 This PR adds the ability to specify "qos_reliability" of "best_effort" which is useful when running a bridge on a different computer to the robot for example: **ROBOT COMPUTER** -> DDS/UDP -> **REMOTE COMPUTER** -> Websocket/TCP -> Foxglove / Browser --------- Co-authored-by: Hans-Joachim Krauch <achim-k@users.noreply.github.com> * v0.8.3 (foxglove#336) ### Changelog * add best_effort_qos_topic_whitelist param (foxglove#329) * Add missing functional include in message_definition_cache.cpp (foxglove#334) * feat: add send_buffer_queue param * chore: lint fixes --------- Co-authored-by: Graham Harison <gupyfish@gmail.com> Co-authored-by: Jacob Bandes-Storch <jacob@foxglove.dev> Co-authored-by: Silvio Traversaro <silvio@traversaro.it> Co-authored-by: Hans-Joachim Krauch <achim-k@users.noreply.github.com>

feat: add qos_reliablity param

d789f66

jtbandes requested review from achim-k and defunctzombie December 5, 2024 01:12

defunctzombie requested a review from jtbandes December 5, 2024 02:39

achim-k approved these changes Dec 5, 2024

View reviewed changes

feat: add a "best_effort_if_volatile" option

7e3aa4b

defunctzombie reviewed Dec 10, 2024

View reviewed changes

MrBlenny added 3 commits December 11, 2024 11:07

feat: add bestEffortQosTopicWhiteList

57b8418

chore: update comment

b29b0d2

chore: remove nested if

a061d63

defunctzombie requested a review from achim-k January 3, 2025 23:34

achim-k approved these changes Jan 7, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

ros2_foxglove_bridge/src/param_utils.cpp Outdated Show resolved Hide resolved

MrBlenny and others added 2 commits January 28, 2025 21:03

Update ros2_foxglove_bridge/src/param_utils.cpp

ad3d8ca

Co-authored-by: Hans-Joachim Krauch <achim-k@users.noreply.github.com>

Update README.md

392c2b8

Co-authored-by: Hans-Joachim Krauch <achim-k@users.noreply.github.com>

achim-k merged commit 2fba8a3 into foxglove:main Jan 28, 2025
9 checks passed

achim-k mentioned this pull request Feb 3, 2025

v0.8.3 #336

Merged

achim-k added a commit that referenced this pull request Feb 3, 2025

v0.8.3 (#336)

e4af8e1

### Changelog * add best_effort_qos_topic_whitelist param (#329) * Add missing functional include in message_definition_cache.cpp (#334)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add qos_reliablity param #329

feat: add qos_reliablity param #329

MrBlenny commented Dec 5, 2024

jtbandes commented Dec 5, 2024

MrBlenny commented Dec 5, 2024 •

edited

Loading

achim-k Dec 5, 2024

MrBlenny Dec 10, 2024

defunctzombie Dec 10, 2024

jtbandes Dec 10, 2024

MrBlenny Dec 10, 2024 •

edited

Loading

MrBlenny Dec 11, 2024 •

edited

Loading

defunctzombie Dec 12, 2024

tonynajjar commented Feb 17, 2025

MrBlenny commented Feb 17, 2025

MrBlenny commented Feb 20, 2025

feat: add qos_reliablity param #329

feat: add qos_reliablity param #329

Conversation

MrBlenny commented Dec 5, 2024

Changelog

Docs

Description

jtbandes commented Dec 5, 2024

MrBlenny commented Dec 5, 2024 • edited Loading

achim-k Dec 5, 2024

Choose a reason for hiding this comment

MrBlenny Dec 10, 2024

Choose a reason for hiding this comment

defunctzombie Dec 10, 2024

Choose a reason for hiding this comment

jtbandes Dec 10, 2024

Choose a reason for hiding this comment

MrBlenny Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

MrBlenny Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

defunctzombie Dec 12, 2024

Choose a reason for hiding this comment

tonynajjar commented Feb 17, 2025

MrBlenny commented Feb 17, 2025

MrBlenny commented Feb 20, 2025

MrBlenny commented Dec 5, 2024 •

edited

Loading

MrBlenny Dec 10, 2024 •

edited

Loading

MrBlenny Dec 11, 2024 •

edited

Loading