wait_for_service not being woken by graph events #280

dhood · 2018-03-13T20:27:24Z

While investigating the appropriate timeout for ros2/system_tests#259, I noticed a correlation between the timeout used in wait_for_service calls (20s) and the time taken for tests to run successfully.

The tests that have two wait_for_service calls with timeouts of 20s each take one of 6, 26, or 46 seconds to run. Change the wait_for_service call to each be 30s each and the tests take one of 6, 36, or 66 seconds to run. Change the wait_for_service call to be multiple 1s wait_for_service calls and the tests never take longer than 9s.

Note that the tests still pass, they just spend an unnecessary amount of time in the wait_for_service calls, presumably because the waitset is not triggered by any graph event of the service coming up.

Given that wait_for_service passes in the end, my money is on the graph event triggering before we wait on the waitset. Therefore we are waiting for something that has already occurred.

We have come across this in rmw_fastrtps_cpp before: what we need is an equivalent to ros2/rmw_fastrtps#147, which prevents guard conditions from being triggered between the time we check them to decide if we should wait, and the time we actually wait.

This seems related to #201 but distinct in that this is a race condition in services showing up as opposed to #201 being a race condition in services going away.

The text was updated successfully, but these errors were encountered:

dhood · 2018-05-22T03:09:40Z

As a correction, while #201 is primarily about services not being reported correctly as not available, it also references the issue of services not being reported as available, which prompted this workaround: ros2/rclcpp#262

I've modified that workaround in ros2/rclcpp#476 which fixes this specific issue, but not the underlying cause

dhood added the bug Something isn't working label Mar 13, 2018

mikaelarguedas added this to the bouncy milestone Mar 15, 2018

dhood mentioned this issue Mar 28, 2018

Flaky: multithreaded comms tests hanging with connext on osx ros2/build_farmer#103

Closed

dhood added the ready Work is about to start (Kanban column) label Mar 29, 2018

dhood mentioned this issue Apr 4, 2018

Build farmer handoff 2018-04-04 ros2/build_farmer#107

Closed

dhood self-assigned this May 8, 2018

nuclearsandwich mentioned this issue May 9, 2018

Build Farmer Handoff 2018-05-09 ros2/build_farmer#117

Closed

dhood mentioned this issue May 22, 2018

Workaround for wait_for_service lasting the full timeout with connext ros2/rclcpp#476

Merged

dhood added in progress Actively being worked on (Kanban column) and removed ready Work is about to start (Kanban column) labels May 22, 2018

dhood mentioned this issue May 22, 2018

race condition in graph changes and service is available #201

Closed

dhood added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels May 22, 2018

dhood closed this as completed in ros2/rclcpp#476 May 23, 2018

dhood removed the in review Waiting for review (Kanban column) label May 23, 2018

mjcarroll mentioned this issue May 23, 2018

Build Farmer Handoff 2018-05-23 ros2/build_farmer#119

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wait_for_service not being woken by graph events #280

wait_for_service not being woken by graph events #280

dhood commented Mar 13, 2018

dhood commented May 22, 2018

wait_for_service not being woken by graph events #280

wait_for_service not being woken by graph events #280

Comments

dhood commented Mar 13, 2018

dhood commented May 22, 2018