Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roscpp crash in ros::PublisherLink::setHeader() #2032

Open
caijimin opened this issue Aug 27, 2020 · 2 comments
Open

roscpp crash in ros::PublisherLink::setHeader() #2032

caijimin opened this issue Aug 27, 2020 · 2 comments

Comments

@caijimin
Copy link

We use ros-kinetic and witnessed a few crashes occasionally. It’s hard to reproduce and I didn’t have reproducible test case.

#0  0x00007fc0be68f428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007fc0be69102a in __GI_abort () at abort.c:89
#2  0x00007fc0befd284d in __gnu_cxx::__verbose_terminate_handler() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fc0befd06b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fc0befd0701 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fc0befd0919 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fc0c318e1a9 in void boost::throw_exception<boost::bad_weak_ptr>(boost::bad_weak_ptr const&) ()
   from /opt/raccoon/lib/libroscpp.so
#7  0x00007fc0c318c9f2 in ros::PublisherLink::setHeader(ros::Header const&) ()
   from /opt/raccoon/lib/libroscpp.so
#8  0x00007fc0c3219185 in ros::TransportPublisherLink::onHeaderReceived(boost::shared_ptr<ros::Connection> const&, ros::Header const&) () from /opt/raccoon/lib/libroscpp.so
#9  0x00007fc0c319846b in ros::Connection::onHeaderRead(boost::shared_ptr<ros::Connection> const&, boost::shared_array<unsigned char> const&, unsigned int, bool) () from /opt/raccoon/lib/libroscpp.so
#10 0x00007fc0c3194c13 in ros::Connection::readTransport() () from /opt/raccoon/lib/libroscpp.so
#11 0x00007fc0c3213c3a in ros::TransportTCP::socketUpdate(int) () from /opt/raccoon/lib/libroscpp.so
#12 0x00007fc0c3251c60 in ros::PollSet::update(int) () from /opt/raccoon/lib/libroscpp.so
#13 0x00007fc0c31d2625 in ros::PollManager::threadFunc() () from /opt/raccoon/lib/libroscpp.so
#14 0x00007fc0c22265d5 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0
#15 0x00007fc0c1dee6ba in start_thread (arg=0x7fc0b70bc700) at pthread_create.c:333
#16 0x00007fc0be76141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

From disassemble code

(gdb) disass 0x00007fc0c318c9f2
   0x00007fc0c318c9c6 <+950>:	lock cmpxchg %ecx,0x8(%rdx)
   0x00007fc0c318c9cb <+955>:	jne    0x7fc0c318c9be <_ZN3ros13PublisherLink9setHeaderERKNS_6HeaderE+942>
   0x00007fc0c318c9cd <+957>:	test   %eax,%eax
   0x00007fc0c318c9cf <+959>:	jne    0x7fc0c318c9f8 <_ZN3ros13PublisherLink9setHeaderERKNS_6HeaderE+1000>
   0x00007fc0c318c9d1 <+961>:	mov    0x30c360(%rip),%rax        # 0x7fc0c3498d38
   0x00007fc0c318c9d8 <+968>:	lea    -0xe0(%rbp),%r14
   0x00007fc0c318c9df <+975>:	mov    %r14,%rdi
   0x00007fc0c318c9e2 <+978>:	lea    0x10(%rax),%r15
   0x00007fc0c318c9e6 <+982>:	mov    %r15,-0xe0(%rbp)
   0x00007fc0c318c9ed <+989>:	callq  0x7fc0c317b7a0 <_ZN5boost15throw_exceptionINS_12bad_weak_ptrEEEvRKT_@plt>
=> 0x00007fc0c318c9f2 <+994>:	nopw   0x0(%rax,%rax,1)
   0x00007fc0c318c9f8 <+1000>:	mov    0x8(%r15),%rax

Seems it crash in publisher_link.cpp:96 share_from_this()

 63 bool PublisherLink::setHeader(const Header& header)
 64 {
 65   header.getValue("callerid", caller_id_);
 66 
 67   std::string md5sum, type, latched_str;
 68   if (!header.getValue("md5sum", md5sum))
 69   {
 70     ROS_ERROR("Publisher header did not have required element: md5sum");
 71     return false;
 72   }
 73 
 74   md5sum_ = md5sum;
 75 
 76   if (!header.getValue("type", type))
 77   {
 78     ROS_ERROR("Publisher header did not have required element: type");
 79     return false;
 80   }
 81 
 82   latched_ = false;
 83   if (header.getValue("latching", latched_str))
 84   {
 85     if (latched_str == "1")
 86     {
 87       latched_ = true;
 88     }
 89   }
 90 
 91   connection_id_ = ConnectionManager::instance()->getNewConnectionID();
 92   header_ = header;
 93 
 94   if (SubscriptionPtr parent = parent_.lock())
 95   {
 96     parent->headerReceived(shared_from_this(), header);
 97   }
 98 
 99   return true;
100 }

Had anybody seen something like that before?
Thanks in advance.

@caijimin
Copy link
Author

Seems TransportPublisherLink->parent_ is bad_weak_ptr now.

(gdb) x/16w $r15
0x7fc0c34938c8 <_ZTVN5boost12bad_weak_ptrE+16>:	0xc318d260	0x00007fc0	0xc318d280	0x00007fc0
0x7fc0c34938d8 <_ZTVN5boost12bad_weak_ptrE+32>:	0xc318d250	0x00007fc0	0x00000000	0x00000000 <--- 
 parent_->px
0x7fc0c34938e8 <_ZTVN3ros13PublisherLinkE+8>:	0xc3493820	0x00007fc0	0x00000000	0x00000000
0x7fc0c34938f8 <_ZTVN3ros13PublisherLinkE+24>:	0x00000000	0x00000000	0x00e94cb0	0x00000000

@dshwtc
Copy link

dshwtc commented Sep 6, 2021

We met the same problem occasionally. Has this been solved?
The crashed node is publishing tf, when another node that subscribe the tf is just closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants