-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gazebo9 - transport failure during ConnectPubToSub call #2875
Comments
Looking at what that method does, it seems that it's meant to work more like a "reset" than an "init". Maybe the original intent was to be able to reinitialize the topic manager during runtime? I'm not sure.
I believe that's meant to be called once per process that will use Internally, Gazebo already calls it once per process during setup, so I think that no plugins should ever call it. If you're writing a standalone program that uses
Agreed that it should prevent your crash. But you may still end up with a disconnected pub and sub? Or maybe it will be attempted again? I think that's worth investigating. |
We have continued investigating. Interestingly, the crashes that we have collected so far all occur for these two built-in topics:
The nullptr crashes occur for "sub" requests. How would a subscription request to a built-in topic fail due to TopicManager missing the topic (since it is not found, nullptr)? Maybe a race condition between the TopicManager singleton and Advertise and Subscribe... We will continue investigating, sharing this finding for now. |
I have a reliable reproduction case for this. Source follows (builds using same settings as
In summary, the above example does:
In this case we Everything above is perfectly legal usage of the Gazebo APIs, however what happens is as follows (paraphrasing to skip irrelevant bits): Server process
master
client process
Server process again now
finally! In summary, the communication takes the following path
By the time that message has made its networked round-trip, the server process has long-since called This causes a null pointer derefence crash in the server process - I would expect that the real expected behavior would be a "failed subscription" of some sort on the Client side. It saw that a topic existed, but by the time its subscription request went through, there was no such topic. This case can, of course, happen in ways other than this particular pathological usage - however this gives us a good reproducible example. @chapulina do you know, or have an idea who would know, what the correct behavior would really be here? I notice that the all uses of
But, I'm not sure if that will break the subscription in the Client somehow |
Exception:
Stack Trace:
Description
Gazebo transport is failing while setting up a new connection between a new subscriber to an existing publisher on certain topic.
This is occurring within
ConnectPubToSub
due toPublicationPtr publication = this->FindPublication(_topic)
being an uninitliazedshared_ptr<Publication>
.A possible reason being that
TopicManager::FindPublication
cannot find an existingPublication
withinTopicManager::advertisedTopics
for the given topic. This might be caused due to an async call toTopicManager::Init
orTopicManager::Fini
viagazebo::transport::init
orgazebo::transport::fini
within the same process(while loading a plugin maybe?), which seems to clearTopicManager::advertisedTopics
.If
TopicManager::advertisedTopics
is cleared between a call toConnectSubToPub
andConnectPubToSub
, the exception will occur.. I have made an attempt to repro the crash with a bit of a hack here -- https://github.com/ojasjoshi/gzserver_transport_failure/blob/main/README.mdChecking the existence of
PublicationPtr
such as inConnectSubToPub
viaTopicManager::UpdatePublications(_pub.topic(), _pub.msg_type())
might help fix this.I am not sure what the purpose of TopicManager::Init() is? What are the best practices to call gazebo::transport::init explicitly?
The text was updated successfully, but these errors were encountered: