Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static ROS 2 subscription causes segmentation fault #1766

Open
alsora opened this issue Sep 3, 2021 · 6 comments
Open

Static ROS 2 subscription causes segmentation fault #1766

alsora opened this issue Sep 3, 2021 · 6 comments
Labels
more-information-needed Further information is required

Comments

@alsora
Copy link
Collaborator

alsora commented Sep 3, 2021

Bug report

Required Info:

  • Operating System:
    • Ubuntu 20.04
  • Installation type:
    • binaries
  • Version or commit hash:
    • Galactic
  • DDS implementation:
    • Fast-DDS and CycloneDDS
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

The following program terminates abruptly with a segmentation fault

#include <chrono>
#include <memory>

#include "rclcpp/rclcpp.hpp"
#include "std_msgs/msg/string.hpp"

static rclcpp::Subscription<std_msgs::msg::String>::SharedPtr s_test_sub;

int main(int argc, char **argv)
{
    rclcpp::init(argc, argv);

    auto node = std::make_shared<rclcpp::Node>("my_node");

    s_test_sub = node->create_subscription<std_msgs::msg::String>(
        "my_topic",
        rclcpp::SensorDataQoS(),
        [](std_msgs::msg::String::ConstSharedPtr msg) { (void)msg; });

    rclcpp::shutdown();
    return 0;
}

With Fast-DDS

[rcl|context.c:157] failed to finalize rmw context while cleaning up context, memory may be leaked: Finalizing a context with active nodes, at /tmp/binarydeb/ros-galactic-rmw-fastrtps-cpp-5.0.0/src/rmw_init.cpp:159
[ERROR] [1630667049.700022883] [rclcpp]: failed to finalize context: error not set
cannot publish data, at /tmp/binarydeb/ros-galactic-rmw-fastrtps-shared-cpp-5.0.0/src/rmw_publish.cpp:59 during '__function__'
[ERROR] [1630667049.700135214] [my_node.rclcpp]: Error in destruction of rcl subscription handle: Failed to delete datareader, at /tmp/binarydeb/ros-galactic-rmw-fastrtps-shared-cpp-5.0.0/src/subscription.cpp:54, at /tmp/binarydeb/ros-galactic-rcl-3.1.2/src/rcl/subscription.c:174

>>> [rcutils|error_handling.c:108] rcutils_set_error_state()
This error state is being overwritten:

  'cannot publish data, at /tmp/binarydeb/ros-galactic-rmw-fastrtps-shared-cpp-5.0.0/src/rmw_publish.cpp:59'

with this new error message:

  'Failed to delete datareader, at /tmp/binarydeb/ros-galactic-rmw-fastrtps-shared-cpp-5.0.0/src/subscription.cpp:54'

rcutils_reset_error() should be called after error handling to avoid this.
<<<
__function__:79: 'destroy_subscription' failed
Segmentation fault (core dumped)

With CycloneDDS

Not all nodes were finished before finishing the context
.Ensure `rcl_node_fini` is called for all nodes before `rcl_context_fini`,to avoid leaking.
terminate called without an active exception
Aborted (core dumped)

Additional information

NOTE: the problem can be "fixed" by adding the line s_test_sub.reset() before returning from the program.

The rclcpp::Node public APIs allow to create ROS 2 subscriptions outside of a node class.
However, it looks like the lifespan of this subscription is still tied to the one of the node, thus making the aforementioned API not very useful.

The fix above is not really a solution, as it can only work in an example program and not in a more complex scenario where there be a variety of components creating entities from a node.

My expected behavior would be that calling rclcpp::shutdown() would "disable" ROS 2 entities, so I don't expect that static instance to still be usable, but at the same time I would like the program to be able to terminate gracefully.

The same problem applies also to other entities, e.g. publishers.

fujitatomoya added a commit to fujitatomoya/ros2_test_prover that referenced this issue Sep 14, 2021
  ros2/rclcpp#1766

Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
@fujitatomoya
Copy link
Collaborator

@alsora #1754 can fix this problem.

@alsora
Copy link
Collaborator Author

alsora commented Nov 16, 2021

I just cherry-picked this on top of my galactic branch but it does not fix the issue.

@fujitatomoya
Copy link
Collaborator

i only confirmed that cyclonedds can avoid core dumped, but rmw_fastrtps still has the segmentation fault.

@fujitatomoya
Copy link
Collaborator

@MiguelCompany

it seems it cannot access listener_ in DomainParticipantImpl object. could you take a look at?

(gdb) frame 0
#0  0x00007faace026d63 in eprosima::fastdds::dds::DomainParticipantImpl::set_listener (this=0x0, listener=0x0)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/domain/DomainParticipantImpl.hpp:101
101	        listener_ = listener;


│   fastdds::dds::DomainParticipantListener*)>        endbr64                                                                    │
│   fastdds::dds::DomainParticipantListener*)+4>      push   %rbp                                                                │
│   fastdds::dds::DomainParticipantListener*)+5>      mov    %rsp,%rbp                                                           │
│   fastdds::dds::DomainParticipantListener*)+8>      sub    $0x20,%rsp                                                          │
│   fastdds::dds::DomainParticipantListener*)+12>     mov    %rdi,-0x18(%rbp)                                                    │
│   fastdds::dds::DomainParticipantListener*)+16>     mov    %rsi,-0x20(%rbp)                                                    │
│   fastdds::dds::DomainParticipantListener*)+20>     mov    %fs:0x28,%rax                                                       │
│   fastdds::dds::DomainParticipantListener*)+29>     mov    %rax,-0x8(%rbp)                                                     │
│   fastdds::dds::DomainParticipantListener*)+33>     xor    %eax,%eax                                                           │
│   fastdds::dds::DomainParticipantListener*)+35>     mov    -0x18(%rbp),%rax                                                    │
│   fastdds::dds::DomainParticipantListener*)+39>     mov    -0x20(%rbp),%rdx                                                    │
│  >fastdds::dds::DomainParticipantListener*)+43>     mov    %rdx,0x418(%rax)                                                    │
│   fastdds::dds::DomainParticipantListener*)+50>     lea    -0xc(%rbp),%rax                                                     │
│   fastdds::dds::DomainParticipantListener*)+54>     mov    $0x0,%esi                                                           │
│   fastdds::dds::DomainParticipantListener*)+59>     mov    %rax,%rdi                                                           │
│   fastdds::dds::DomainParticipantListener*)+62>     callq  0x7faacdd6a630 <_ZN8eprosima8fastrtps5types12ReturnCode_tC1Ej@plt>  │


(gdb) print listener
$1 = (eprosima::fastdds::dds::DomainParticipantListener *) 0x0
(gdb) print listener_
Cannot access memory at address 0x418

(gdb) info registers
rax            0x0                 0
rbx            0x7faacf0c8087      140371594870919
rcx            0x0                 0
rdx            0x0                 0
rsi            0x0                 0
rdi            0x0                 0
rbp            0x7ffef5a8d8e0      0x7ffef5a8d8e0
rsp            0x7ffef5a8d8c0      0x7ffef5a8d8c0
r8             0x7faace9a18a0      140371587373216
r9             0x7faace9a18a0      140371587373216
r10            0xfffffffffffff0bf  -3905
r11            0x7faace023ec0      140371577421504
r12            0x0                 0
r13            0x28b6              10422
r14            0x7faaceeaefc8      140371592671176
r15            0x5614958b3a00      94646408264192
rip            0x7faace026d63      0x7faace026d63 <eprosima::fastdds::dds::DomainParticipantImpl::set_listener(eprosima::fastdds::dds::DomainParticipantListener*)+43>
eflags         0x10246             [ PF ZF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
k0             0x0                 0
k1             0x0                 0
k2             0x0                 0
k3             0x0                 0
k4             0x0                 0
k5             0x0                 0
k6             0x0                 0
k7             0x0                 0

gdb full stack trace
(gdb) info threads
  Id   Target Id                          Frame 
* 1    Thread 0x7faace9a0f40 (LWP 392958) 0x00007faace026d63 in eprosima::fastdds::dds::DomainParticipantImpl::set_listener (
    this=0x0, listener=0x0) at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/domain/DomainParticipantImpl.hpp:101
(gdb) bt
#0  0x00007faace026d63 in eprosima::fastdds::dds::DomainParticipantImpl::set_listener (this=0x0, listener=0x0)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/domain/DomainParticipantImpl.hpp:101
#1  0x00007faace023f58 in eprosima::fastdds::dds::DomainParticipant::set_listener (this=0x5614958b66f0, listener=0x0, mask=...)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/domain/DomainParticipant.cpp:78
#2  0x00007faace023f03 in eprosima::fastdds::dds::DomainParticipant::set_listener (this=0x5614958b66f0, listener=0x0)
    at /root/ros2_ws/colcon_ws/src/eProsima/Fast-DDS/src/cpp/fastdds/domain/DomainParticipant.cpp:71
#3  0x00007faace85f8fd in rmw_fastrtps_shared_cpp::destroy_participant (participant_info=0x5614958b4c60)
    at /root/ros2_ws/colcon_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/participant.cpp:301
#4  0x00007faace855f14 in rmw_fastrtps_shared_cpp::decrement_context_impl_ref_count (context=0x5614958b4260)
    at /root/ros2_ws/colcon_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/init_rmw_context_impl.cpp:86
#5  0x00007faace94e2a4 in rmw_destroy_node (node=0x561495919620)
    at /root/ros2_ws/colcon_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/src/rmw_node.cpp:99
#6  0x00007faacec92bad in rmw_destroy_node (v1=0x561495919620)
    at /root/ros2_ws/colcon_ws/src/ros2/rmw_implementation/rmw_implementation/src/functions.cpp:268
#7  0x00007faacf1113cd in rcl_node_fini (node=0x5614958b58f0) at /root/ros2_ws/colcon_ws/src/ros2/rcl/rcl/src/rcl/node.c:381
#8  0x00007faacfba8f01 in (anonymous namespace)::NodeHandleWithContext::~NodeHandleWithContext (this=0x56149598e200, 
    __in_chrg=<optimized out>) at /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/node_interfaces/node_base.cpp:95
#9  0x00007faacfbab318 in __gnu_cxx::new_allocator<(anonymous namespace)::NodeHandleWithContext>::destroy<(anonymous namespace)::NodeHandleWithContext> (this=0x56149598e200, __p=0x56149598e200) at /usr/include/c++/9/ext/new_allocator.h:153
#10 0x00007faacfbab2eb in std::allocator_traits<std::allocator<(anonymous namespace)::NodeHandleWithContext> >::destroy<(anonymous namespace)::NodeHandleWithContext> (__a=..., __p=0x56149598e200) at /usr/include/c++/9/bits/alloc_traits.h:497
#11 0x00007faacfbab1ab in std::_Sp_counted_ptr_inplace<(anonymous namespace)::NodeHandleWithContext, std::allocator<(anonymous namespace)::NodeHandleWithContext>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x56149598e1f0)
    at /usr/include/c++/9/bits/shared_ptr_base.h:557
#12 0x00005614942a3c28 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x56149598e1f0)
    at /usr/include/c++/9/bits/shared_ptr_base.h:155
#13 0x00005614942a1717 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x561495a8c478, 
    __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
#14 0x00007faacfb2feae in std::__shared_ptr<rcl_node_s, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x561495a8c470, 
    __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#15 0x00007faacfb2fece in std::shared_ptr<rcl_node_s>::~shared_ptr (this=0x561495a8c470, __in_chrg=<optimized out>)
    at /usr/include/c++/9/bits/shared_ptr.h:103
#16 0x00007faacfcade36 in rclcpp::SubscriptionBase::~SubscriptionBase (this=0x561495a8c450, __in_chrg=<optimized out>)
    at /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/subscription_base.cpp:84
--Type <RET> for more, q to quit, c to continue without paging--
#17 0x00005614942ea942 in rclcpp::Subscription<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void>, std_msgs::msg::String_<std::allocator<void> >, std_msgs::msg::String_<std::allocator<void> >, rclcpp::message_memory_strategy::MessageMemoryStrategy<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void> > >::~Subscription (this=0x561495a8c450, 
    __in_chrg=<optimized out>) at /root/ros2_ws/colcon_ws/install/rclcpp/include/rclcpp/subscription.hpp:75
#18 0x00005614942fcb63 in __gnu_cxx::new_allocator<rclcpp::Subscription<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void>, std_msgs::msg::String_<std::allocator<void> >, std_msgs::msg::String_<std::allocator<void> >, rclcpp::message_memory_strategy::MessageMemoryStrategy<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void> > > >::destroy<rclcpp::Subscription<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void>, std_msgs::msg::String_<std::allocator<void> >, std_msgs::msg::String_<std::allocator<void> >, rclcpp::message_memory_strategy::MessageMemoryStrategy<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void> > > > (this=0x561495a8c450, __p=0x561495a8c450) at /usr/include/c++/9/ext/new_allocator.h:153
#19 0x00005614942f3131 in std::allocator_traits<std::allocator<rclcpp::Subscription<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void>, std_msgs::msg::String_<std::allocator<void> >, std_msgs::msg::String_<std::allocator<void> >, rclcpp::message_memory_strategy::MessageMemoryStrategy<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void> > > > >::destroy<rclcpp::Subscription<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void>, std_msgs::msg::String_<std::allocator<void> >, std_msgs::msg::String_<std::allocator<void> >, rclcpp::message_memory_strategy::MessageMemoryStrategy<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void> > > > (__a=..., __p=0x561495a8c450) at /usr/include/c++/9/bits/alloc_traits.h:497
#20 0x00005614942efcf3 in std::_Sp_counted_ptr_inplace<rclcpp::Subscription<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void>, std_msgs::msg::String_<std::allocator<void> >, std_msgs::msg::String_<std::allocator<void> >, rclcpp::message_memory_strategy::MessageMemoryStrategy<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void> > >, std::allocator<rclcpp::Subscription<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void>, std_msgs::msg::String_<std::allocator<void> >, std_msgs::msg::String_<std::allocator<void> >, rclcpp::message_memory_strategy::MessageMemoryStrategy<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void> > > >, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x561495a8c440)
    at /usr/include/c++/9/bits/shared_ptr_base.h:557
#21 0x00005614942a3c28 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x561495a8c440)
    at /usr/include/c++/9/bits/shared_ptr_base.h:155
#22 0x00005614942a1717 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x561494358228 <s_test_sub+8>, 
    __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
#23 0x00005614942a0f3e in std::__shared_ptr<rclcpp::Subscription<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void>, std_msgs::msg::String_<std::allocator<void> >, std_msgs::msg::String_<std::allocator<void> >, rclcpp::message_memory_strategy::MessageMemoryStrategy<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void> > >, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x561494358220 <s_test_sub>, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#24 0x00005614942a0fc6 in std::shared_ptr<rclcpp::Subscription<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void>, std_msgs::msg::String_<std::allocator<void> >, std_msgs::msg::String_<std::allocator<void> >, rclcpp::message_memory_strategy::MessageMemoryStrategy<std_msgs::msg::String_<std::allocator<void> >, std::allocator<void> > > >::~shared_ptr (
--Type <RET> for more, q to quit, c to continue without paging--
    this=0x561494358220 <s_test_sub>, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
#25 0x00007faaced07a27 in __run_exit_handlers (status=0, listp=0x7faaceea9718 <__exit_funcs>, 
    run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#26 0x00007faaced07be0 in __GI_exit (status=<optimized out>) at exit.c:139
#27 0x00007faacece50ba in __libc_start_main (main=0x56149429ab09 <main(int, char**)>, argc=1, argv=0x7ffef5a91b88, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffef5a91b78) at ../csu/libc-start.c:342
#28 0x000056149429a8be in _start ()

(gdb) frame 8
#8  0x00007faacfba8f01 in (anonymous namespace)::NodeHandleWithContext::~NodeHandleWithContext (this=0x56149598e200, 
    __in_chrg=<optimized out>) at /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/node_interfaces/node_base.cpp:95
95	    if (rcl_node_fini(node_handle_) != RCL_RET_OK) {
(gdb) print node_handle_
$2 = (rcl_node_t *) 0x5614958b58f0
(gdb) print *node_handle_
$4 = {context = 0x561495853f40, impl = 0x5614958b5c70}

(gdb) frame 5
#5  0x00007faace94e2a4 in rmw_destroy_node (node=0x561495919620)
    at /root/ros2_ws/colcon_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/src/rmw_node.cpp:99
99	  inner_ret = rmw_fastrtps_shared_cpp::decrement_context_impl_ref_count(context);
(gdb) print *context
$12 = {instance_id = 1, implementation_identifier = 0x7faace968de0 "rmw_fastrtps_cpp", options = {instance_id = 1, 
    implementation_identifier = 0x7faace968de0 "rmw_fastrtps_cpp", domain_id = 18446744073709551615, security_options = {
      enforce_security = RMW_SECURITY_ENFORCEMENT_PERMISSIVE, security_root_path = 0x0}, 
    localhost_only = RMW_LOCALHOST_ONLY_DISABLED, enclave = 0x5614958b59a0 "/", allocator = {
      allocate = 0x7faacf0c7fb9 <__default_allocate>, deallocate = 0x7faacf0c8009 <__default_deallocate>, 
      reallocate = 0x7faacf0c802c <__default_reallocate>, zero_allocate = 0x7faacf0c8087 <__default_zero_allocate>, 
      state = 0x0}, impl = 0x0}, actual_domain_id = 0, impl = 0x5614958b4510}


(gdb) frame 4
#4  0x00007faace855f14 in rmw_fastrtps_shared_cpp::decrement_context_impl_ref_count (context=0x5614958b4260)
    at /root/ros2_ws/colcon_ws/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/init_rmw_context_impl.cpp:86
86	  err = rmw_fastrtps_shared_cpp::destroy_participant(participant_info);
(gdb) print *participant_info
$13 = {participant_ = 0x5614958b66f0, listener_ = 0x5614958b5fe0, publisher_ = 0x56149590fa60, subscriber_ = 0x561495910100, 
  entity_creation_mutex_ = {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, 
          __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, 
        __align = 0}}, <No data fields>}, leave_middleware_default_qos = false, 
  publishing_mode = publishing_mode_t::ASYNCHRONOUS}

@MiguelCompany
Copy link
Contributor

@fujitatomoya Thank you for the detailed report. I have taken a quick look, and it seems an issue with the order of destruction.

  • When the application is loaded, the destructor of s_test_sub is registered
  • When the context is initialized, it ends up calling this method to create a DomainParticipant.
  • This calls DomainParticipantFactory::get_instance(), which in turn registers the destructor of this object.
  • When the application exits, the following happens
    • The DomainParticipantFactory is destroyed, which in turn destroys all the participants created.
    • The destructor of s_test_sub is invoked, which gets all the way down and calls set_listener on an already deleted participant.

I will try to think of a way to workaround this, but I'm not sure there's an easy way out.

@clalancette
Copy link
Contributor

Since this seems to be a problem down in Fast-RTPS, I would suggest that we close this issue and instead move it to https://github.com/eProsima/Fast-DDS . I'm still not sure there is much we can do about it, but that at least seems like the appropriate place. @MiguelCompany @alsora does that sound reasonable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more-information-needed Further information is required
Projects
None yet
Development

No branches or pull requests

4 participants