-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sleep before getting publisher info. #646
Conversation
0fdf84a
to
ad44e39
Compare
@jacobperron I paged you since I know you'd been looking at timing-sensitive tests in other parts of the stack. I don't know whether it's expected that there's a time after the creation of the publisher and subscriber where they may not be added to the count. |
I wouldn't expect a delay, but I suppose it can vary based on what's happening in the RMW implementation. Does the failure only occur with |
rcl/test/rcl/test_info_by_topic.cpp
Outdated
@@ -357,6 +359,7 @@ TEST_F( | |||
&subscription_options); | |||
ASSERT_EQ(ret, RCL_RET_OK) << rcl_get_error_string().str; | |||
const std::string fqdn = std::string("/") + this->topic_name; | |||
std::this_thread::sleep_for(std::chrono::milliseconds(250)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this does help, then I'd recommend polling in a loop, because this only requires a minor hiccup in the machine to still flake.
I have to go back and check my test logs. I think I've seen this failure with Fast-RTPS as well but I also get occasional segfaults with Fast-RTPS (as does the build farm). |
In all my test runs I only ever saw this exact failure with cyclonedds.
|
i would use rcl_count_publishers & rcl_count_subscribers to make sure if publication and subscription are ready instead of sleeping. if it can count writer/reader from rmw graph cache, which means that it can get writer/reader info for publication and subscription. |
Thanks for the suggestion. If I understand correctly this would essentially be creating the publisher and subscriber and polling to determine that they were actually created before trying to use them? |
No, rcl_count_publishers & rcl_count_subscribers does not include polling, it is one shot to count the endpoint. so as @wjwwood mentioned, polling would be required in the test code to confirm the publication and subscription are ready. i would do the following,
BTW, helper functions are getting organized #631, so this might get along with it. |
104654e
to
2d1c16b
Compare
I appreciate the guidance @fujitatomoya. I'm stepping out of my usual wheelhouse for this one. I took a look at #631 and rebased this PR onto it. Which means that I'll rebase this once more when #631 actually merges. Using the wait for entity helpers does show improvement. |
could you rebase? #631 is merged. |
This test appears to be timing/load sensitive. I can reproduce intermittent failures locally when stressing my host and the more stress the faster the reproduction. Since adding the sleep I have not seen any failures to find the expected publisher info locally. Signed-off-by: Steven! Ragnarök <steven@nuclearsandwich.com>
Signed-off-by: Steven! Ragnarök <steven@nuclearsandwich.com>
2d1c16b
to
50a5530
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but one minor comment, I would like to hear from others.
@@ -357,6 +361,8 @@ TEST_F( | |||
&subscription_options); | |||
ASSERT_EQ(ret, RCL_RET_OK) << rcl_get_error_string().str; | |||
const std::string fqdn = std::string("/") + this->topic_name; | |||
ASSERT_TRUE(wait_for_established_subscription(&publisher, 10, 100)); | |||
ASSERT_TRUE(wait_for_subscription_to_be_ready(&subscription, &this->context, 10, 100)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this needs to be called here. I believe that we can confirm that graph cache is constructed vi wait_for_established_subscription.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case tests both rcl_get_publishers_info_by_topic
and rcl_get_subscriptions_info_by_topic
, so I think we need to add a new wait_helper(such as wait_for_established_publisher
) which is similar to wait_for_established_subscription
. (make sure two directions are established.)
NOTE: wait_for_subscription_to_be_ready
is used for a subscriber to wait to get message data from a publisher.
it means there should be a rcl_publish
calling before (or at a new thread) and then rcl_take
to get message data.
So we don't use it here.
This looks to be superseded by #859, so I'm going to close it out. Feel free to reopen if you think otherwise. |
This test appears to be timing/load sensitive. I can reproduce
intermittent failures locally when stressing my host and the more stress
the faster the reproduction. Since adding the sleep I have not seen any
failures to find the expected publisher info locally.
This seems to resolve unreliable test failures such as this one. There still appear to be occasional segfaults when using rmw_fastrtps_cpp or rmw_fastrtps_dynamic_cpp which I don't think are related.