Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed test_events_executors in zenoh #2643

Merged
merged 3 commits into from
Oct 17, 2024

Conversation

ahcorde
Copy link
Contributor

@ahcorde ahcorde commented Oct 4, 2024

Fixed test_events_executors in zenoh

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Alejandro Hernández Cordero <ahcorde@gmail.com>
@@ -555,7 +555,7 @@ if(TARGET test_executor_notify_waitable)
target_link_libraries(test_executor_notify_waitable ${PROJECT_NAME} mimick rcpputils::rcpputils)
endif()

ament_add_gtest(test_events_executor executors/test_events_executor.cpp TIMEOUT 5)
ament_add_gtest(test_events_executor executors/test_events_executor.cpp TIMEOUT 60)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this... each test in executors/test_events_executor.cpp is basic pub/sub, client/server with EventExecutor. i would not expect that is gonna take more than 5 seconds, maybe 10 seconds. if that takes more than that, i would suspect something wrong in the libraries but test. btw do you happen to know which test takes a long time to process?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my testing, it seems like the cancel_while_timers_running test is timing out.
I agree with @fujitatomoya that we should find out the cause for why this times out with rmw_zenoh and try to address it.

105: [==========] Running 11 tests from 1 test suite.
105: [----------] Global test environment set-up.
105: [----------] 11 tests from TestEventsExecutor
105: [ RUN      ] TestEventsExecutor.run_pub_sub
105: [INFO] [1728076414.221154335] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 976ea020597cd8d73eaf8ddf5222a054.
105: [       OK ] TestEventsExecutor.run_pub_sub (694 ms)
105: [ RUN      ] TestEventsExecutor.run_clients_servers
105: [INFO] [1728076414.902934390] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 976ea020597cd8d73eaf8ddf5222a054.
105: [       OK ] TestEventsExecutor.run_clients_servers (671 ms)
105: [ RUN      ] TestEventsExecutor.spin_once_max_duration_timeout
105: [INFO] [1728076415.575311398] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 976ea020597cd8d73eaf8ddf5222a054.
105: [       OK ] TestEventsExecutor.spin_once_max_duration_timeout (655 ms)
105: [ RUN      ] TestEventsExecutor.spin_once_max_duration_timer
105: [INFO] [1728076416.229936940] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 976ea020597cd8d73eaf8ddf5222a054.
105: [       OK ] TestEventsExecutor.spin_once_max_duration_timer (660 ms)
105: [ RUN      ] TestEventsExecutor.spin_some_max_duration
105: [INFO] [1728076416.890282041] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 976ea020597cd8d73eaf8ddf5222a054.
105: [       OK ] TestEventsExecutor.spin_some_max_duration (691 ms)
105: [ RUN      ] TestEventsExecutor.spin_some_zero_duration
105: [INFO] [1728076417.581796625] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 976ea020597cd8d73eaf8ddf5222a054.
105: [       OK ] TestEventsExecutor.spin_some_zero_duration (659 ms)
105: [ RUN      ] TestEventsExecutor.spin_all_max_duration
105: [INFO] [1728076418.240770747] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 976ea020597cd8d73eaf8ddf5222a054.
1/1 Test #105: test_events_executor .............***Timeout   5.09 sec

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem here is the time it takes for rmw_zenoh_cpp to create a Context. As it stands, it takes ~600ms for rmw_zenoh_cpp to create a Context. That time is dominated by the time to create a "session" in Zenoh, which takes ~500ms. The other ~100ms is taken checking that our connection to the router is valid.

There isn't a whole lot we can do about this, at least not with some tradeoffs. Switching to Zenoh 1.0 will allow us to configure this value down, I believe. However, if we tune this to a smaller number, that will mean that doing a ros2 topic list when using rmw_zenoh_cpp will not always return all of the data on the first try.

Thus, my suggestion is that we do actually increase the timeout here. In my local testing, 20 seconds was enough to not timeout, but in order to make this not flaky (particularly on Windows), I think setting 60 seconds here seems reasonable.

(however, I do have a comment about the use of the rmw_zenoh_cpp identifier, which I'll post separately)

@ahcorde ahcorde marked this pull request as draft October 9, 2024 21:14
Comment on lines 483 to 496
std::string rmw_implementation_str = std::string(rmw_get_implementation_identifier());
if (rmw_implementation_str == "rmw_zenoh_cpp") {
EXPECT_EQ("", pub_log_msg);
EXPECT_EQ("", sub_log_msg);
} else {
EXPECT_EQ(
"New subscription discovered on topic '/test_topic', requesting incompatible QoS. "
"No messages will be sent to it. Last incompatible policy: DURABILITY_QOS_POLICY",
pub_log_msg);
EXPECT_EQ(
"New publisher discovered on topic '/test_topic', offering incompatible QoS. "
"No messages will be sent to it. Last incompatible policy: DURABILITY_QOS_POLICY",
sub_log_msg);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of hard-coding rmw_zenoh_cpp in here, particularly since it is not yet in the core.

I think we can do this better by using the QoS compatibility checker:

qos_check_compatible(const QoS & publisher_qos, const QoS & subscription_qos)
. This calls down into the RMW to see if the QoS are compatible.

Thus, I think something like the following would work:

  rclcpp::QoSCheckCompatibleResult qos_compatible = rclcpp::qos_check_compatible(publisher->get_actual_qos(), subscription->get_actual_qos());
  if (qos_compatible.compatibility == rclcpp::QoScompatibility::Error) {
    EXPECT_EQ(
      "New subscription discovered on topic '/test_topic', requesting incompatible QoS. "
      "No messages will be sent to it. Last incompatible policy: DURABILITY_QOS_POLICY",
      pub_log_msg);
    EXPECT_EQ(
      "New publisher discovered on topic '/test_topic', offering incompatible QoS. "
      "No messages will be sent to it. Last incompatible policy: DURABILITY_QOS_POLICY",
      sub_log_msg);
  } else {
    EXPECT_EQ("", pub_log_msg);
    EXPECT_EQ("", sub_log_msg);
  }

(this is similar to what I did in #2653.

Feredback
Signed-off-by: Alejandro Hernández Cordero <ahcorde@gmail.com>
@ahcorde ahcorde requested a review from clalancette October 17, 2024 10:10
@ahcorde ahcorde marked this pull request as ready for review October 17, 2024 10:10
Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me with green CI.

@clalancette
Copy link
Contributor

Pulls: #2643
Gist: https://gist.githubusercontent.com/clalancette/b735e7ba4ac807ae0aeecde0d2db5899/raw/a41ab32ab6f3f03e4dae8b1bc774ec9829aab76a/ros2.repos
BUILD args: --packages-up-to rclcpp
TEST args: --packages-select rclcpp
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14717

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@ahcorde ahcorde merged commit 4567b6e into rolling Oct 17, 2024
3 checks passed
@clalancette clalancette deleted the ahcorde/rolling/zenoh_fix_events_executor branch October 17, 2024 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants