-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the reliability of test_get_type_description_service. #1107
Improve the reliability of test_get_type_description_service. #1107
Conversation
We were seeing occasional failures of test_get_type_description_service on all platforms. It turns out that the removal of a service from the graph cache is an asynchronous operation. In particular, all of our current RMW implementations publish a graph update to the graph topic, and only remove things from the graph once that data has been delivered. This can potentially happen in another thread. That means that immediately after service_fini(), a call to get_service_names_and_types() may actually still have the old service name in it. Since our get_type_description tests were relying on this to go away, this was causing it to be flakey. We fix this here by adding in a new helper function, service_not_exists(). This helper is not just the inverse of service_exists() (which returns immediately when it finds the service in the graph cache). Instead, service_not_exists() waits until the service has *left* the cache before returning (or times out). In my testing, this fixes the flakiness locally. Signed-off-by: Chris Lalancette <clalancette@gmail.com>
This would close #1108 |
std::this_thread::sleep_for(std::chrono::milliseconds(100)); | ||
std::this_thread::sleep_for(std::chrono::milliseconds(10)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nice to have) adding argument std::chrono::milliseconds period
would be more flexible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nice to have) adding argument
std::chrono::milliseconds period
would be more flexible.
Yeah, that's true. Given that this is only used in this test, it would be easy enough to add later if we need the flexibility. So I'm going to not do that for now, but thanks for the idea.
In particular, make sure to match all service_description_init with the appropriate finis. While we are in here, also add in a "service_not_exists" function that will quit much earlier, speeding up the test. This is essentially a backport of #1107 and parts of #1112, but adapted for Iron. Signed-off-by: Chris Lalancette <clalancette@gmail.com>
In particular, make sure to match all service_description_init with the appropriate finis. While we are in here, also add in a "service_not_exists" function that will quit much earlier, speeding up the test. This is essentially a backport of #1107 and parts of #1112, but adapted for Iron. Signed-off-by: Chris Lalancette <clalancette@gmail.com>
We were seeing occasional failures of test_get_type_description_service on all platforms.
It turns out that the removal of a service from the graph cache is an asynchronous operation. In particular, all of our current RMW implementations publish a graph update to the graph topic, and only remove things from the graph once that data has been delivered. This can potentially happen in another thread.
That means that immediately after service_fini(), a call to get_service_names_and_types() may actually still have the old service name in it. Since our get_type_description tests were relying on this to go away, this was causing it to be flakey.
We fix this here by adding in a new helper function, service_not_exists(). This helper is not just the inverse of service_exists() (which returns immediately when it finds the service in the graph cache). Instead, service_not_exists() waits until the service has left the cache before returning (or times out).
In my testing, this fixes the flakiness locally.