Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve transport::Publisher reliability #2725

Merged
merged 2 commits into from
Sep 30, 2020

Conversation

nlamprian
Copy link

It has been observed that OnPublishComplete can run before result is checked in SendMessage. In that case, the pubIds entry is deleted by OnPublishComplete, and then it's recreated in SendMessage. This blocks publication of any future messages, thus rendering the publisher unusable.

The fix guards access to pubIds in SendMessage, and it processes the pubIds entry only if it still exists.

@chapulina
Copy link
Contributor

Thanks for the PR, @nlamprian . Do you think it would be possible to write a test that exercises the fix?

@nlamprian
Copy link
Author

Oof, this will require some effort. I'll need to familiarize myself with the full implementation to understand how to trigger the conditions. I'll give it a try.

@nlamprian
Copy link
Author

The unit test is added. It successfully reproduces the error. Basically, for the error to show up, there have to be callbacks in the Publication. A callback is added by a subscription to the Publication, which, in this case, happens in the ConnectionManager with a SubscriptionTransport. It's still not clear to me how everything around all of this works, but nonetheless, I replicated the conditions in the test. There, it can be seen that the first message is sent and received, but the second one is not.

@scpeters scpeters changed the title #2724 Improve transport::Publisher reliability Jul 6, 2020
@scpeters scpeters added the 9 Gazebo 9 label Jul 15, 2020
@nlamprian nlamprian force-pushed the nlamprian/failing-publisher branch from 2a2c94f to 1a96bbc Compare August 10, 2020 03:37
@nlamprian
Copy link
Author

May I please ask for an update on this? Is there something holding back the review? The publisher continues to be a problem for me, and I would hope to see it fixed soon.

@chapulina
Copy link
Contributor

@osrf-jenkins run tests

Thank you for the contribution and the test, it looks good to me. Let's just run a round of CI to make sure everything is ok.

@chapulina chapulina self-requested a review September 9, 2020 00:52
@chapulina chapulina self-assigned this Sep 21, 2020
@chapulina
Copy link
Contributor

@osrf-jenkins run tests again?

@nlamprian nlamprian force-pushed the nlamprian/failing-publisher branch from 1a96bbc to 99b8d87 Compare September 27, 2020 02:04
@chapulina
Copy link
Contributor

@osrf-jenkins run tests

Copy link
Contributor

@chapulina chapulina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix and test look good and CI is happy 👍

@chapulina chapulina merged commit c8bf695 into gazebosim:gazebo9 Sep 30, 2020
@nlamprian nlamprian deleted the nlamprian/failing-publisher branch October 1, 2020 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
9 Gazebo 9
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants