Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for user-specified content filters #68

Merged
merged 30 commits into from
Mar 22, 2022
Merged

Conversation

asorbini
Copy link
Collaborator

This PR is a follow up to #12 which takes a different approach in introducing support for content-filtered topics by trying to simplify the implementation and overcome some issues that arose during development of that PR, particularly with respect to concurrency.

The main problem introduced by #12 is the need to delete and re-create the DDS DataReader if the content-filter expression of the associated ROS2 Subscription is modified. Specifically, if the reader was associated with a content-filtered topic and the expression is empty, the reader must be recreated on the base, unfiltered, topic, and viceversa (if the reader didn't have a filter and must now use one, it will be deleted and recreated on a new content-filtered topic).

Beside introducing a whole lot of concurrency problems, a big issue with this strategy is that it makes "enabling/disabling filtering" a very expensive operation both locally (because the existing reader must be finalized and a new one created, causing potentially a whole lot of memory deallocation/allocation), and remotely/on the network (since the deletion of reader will be announced to other participants in order to unmatch it from any remote writer that was communicating with it, and then the new reader will need to be announced and re-matched again).

This is an unfortunate byproduct of the existing DDS API which differentiates between "regular topics" and "content-fitlered topics" instead of making the "filter" a property of the DataReader: if it were possible to create a content-filtered topic with an empty expression (which would cause it to behave like the base unfiltered topic), it would be possible for rmw_connextdds to always create a content-filtered topic for every new subscription, even when no filter expression is specified by the user.

In order to enable this use case, we take advantage of RTI Connext DDS' ability to register custom content-filter implementations and create a new custom content-filter which extends the built-in SQL-like filter included in Connext by adding the ability to handle empty filter expressions.

All DataReaders created by rmw_connextdds now use a content-filtered topic, and the ROS2 layer can easily manipulate the filtering expression.

The custom content filter is encapsulated in a new package, rti_connext_dds_custom_sql_filter.

The filter supports writer-side filtering but it might introduce some minor overhead in the case of an empty expression. This overhead should be characterized through performance testing to make sure any degradation is within an acceptable range.

The branch for this PR was created off of the current master (944da9d) with the addition of the implementation for new RMW APIs and some changes to existing internal functions to create content-filtered topics that were introduced by @iuhilnehc-ynos in #12.

Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Copy link
Collaborator

@iuhilnehc-ynos iuhilnehc-ynos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asorbini

Thanks for creating this PR.

rti_connext_dds_custom_sql_filter/package.xml Outdated Show resolved Hide resolved
@iuhilnehc-ynos
Copy link
Collaborator

@asorbini

After confirming this PR with my cft_demo, I am ready to test it on the test cases of rcl, but it failed.

  • test_node__rmw_connextdds Failed

such as, failed on ASSERT_EQ(RCL_RET_OK, rcl_context_fini(&context)); by the following test command

$ colcon test --packages-select rcl --ctest-args -R test_node
...
[ERROR] [1635152005.805349072] [rmw_connextdds]: failed to finalize domain participant factory
[ERROR] [1635152005.805398997] [rmw_connextdds]: failed to finalize DDS participant factory
[rcl|context.c:157] failed to finalize rmw context while cleaning up context, memory may be leaked: failed to finalize domain participant factory, at /home/chenlh/Projects/ROS2/ros2-master/src/ros2/rmw_connextdds/rmw_connextdds_common/src/common/rmw_context.cpp:437
...
/home/chenlh/Projects/ROS2/ros2-master/src/ros2/rcl/rcl/test/rcl/test_node.cpp:451: Failure
Expected equality of these values:
  0
  rcl_context_fini(&context)
    Which is: 1
error not set
[  FAILED  ] TestNodeFixture__rmw_connextdds.test_rcl_node_init_with_internal_errors (5344 ms)
  • memory leak
==398917== 88 bytes in 1 blocks are definitely lost in loss record 9 of 24
==398917==    at 0x483E0F3: operator new(unsigned long, std::nothrow_t const&) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==398917==    by 0x5B5EC78: rti_connext_dds_custom_sql_filter::register_content_filter(DDS_DomainParticipantImpl*) (custom_sql_filter.cpp:686)
==398917==    by 0x576EDEB: rmw_connextdds_configure_participant(rmw_context_impl_s*, DDS_DomainParticipantImpl*) (dds_api_ndds.cpp:241)
==398917==    by 0x56EB196: rmw_context_impl_s::initialize_participant(bool) (rmw_context.cpp:221)
==398917==    by 0x56EA5EC: rmw_context_impl_s::initialize_node(char const*, char const*, bool) (rmw_context.cpp:140)
==398917==    by 0x5743662: rmw_api_connextdds_create_node(rmw_context_s*, char const*, char const*) (rmw_node.cpp:86)
==398917==    by 0x509DFF4: rmw_create_node (rmw_api_impl_ndds.cpp:292)
==398917==    by 0x486BCBE: rcl_node_init (node.c:256)
==398917==    by 0x161F88: TestNodeFixture__rmw_connextdds_test_rcl_node_namespace_restrictions_Test::TestBody() (test_node.cpp:640)
==398917==    by 0x1DB15D: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2433)
==398917==    by 0x1D4152: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2469)
==398917==    by 0x1AF74B: testing::Test::Run() (gtest.cc:2508)

@asorbini
Copy link
Collaborator Author

@iuhilnehc-ynos thank you for reviewing! I'll take a look at the failure and report back

- Add missing package dependencies for rti_connext_dds_custom_sql_filter
- Clean up all participants upon factory finalization
- Reset context state upon finalization (rmw_connextddsmicro)
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
@asorbini
Copy link
Collaborator Author

Memory leaks and unit-test failures fixed in d4788f9

Signed-off-by: Andrea Sorbini <asorbini@rti.com>
…t doesn't have one.

- Rename internal functions related to content-filters
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
- Make sure participant is enabled before deleting contained entities when using Connext debug libraries.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
@iuhilnehc-ynos
Copy link
Collaborator

iuhilnehc-ynos commented Oct 26, 2021

@asorbini

The failure of rcl_context_fini(&context) was fixed. Thank you.
There is still a memory leak.

$ valgrind --leak-check=full ./test_node__rmw_connextdds

// rti_connext_dds_custom_sql_filter::register_content_filter

==455533== 88 bytes in 1 blocks are definitely lost in loss record 23 of 46
==455533==    at 0x483E0F3: operator new(unsigned long, std::nothrow_t const&) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==455533==    by 0x5B5EC78: rti_connext_dds_custom_sql_filter::register_content_filter(DDS_DomainParticipantImpl*) (custom_sql_filter.cpp:686)
==455533==    by 0x576EDEB: rmw_connextdds_configure_participant(rmw_context_impl_s*, DDS_DomainParticipantImpl*) (dds_api_ndds.cpp:241)
==455533==    by 0x56EB196: rmw_context_impl_s::initialize_participant(bool) (rmw_context.cpp:221)
==455533==    by 0x56EA5EC: rmw_context_impl_s::initialize_node(char const*, char const*, bool) (rmw_context.cpp:140)
==455533==    by 0x5743662: rmw_api_connextdds_create_node(rmw_context_s*, char const*, char const*) (rmw_node.cpp:86)
==455533==    by 0x509DFF4: rmw_create_node (rmw_api_impl_ndds.cpp:292)
==455533==    by 0x486BCBE: rcl_node_init (node.c:256)
==455533==    by 0x163807: TestNodeFixture__rmw_connextdds_test_rcl_node_namespace_restrictions_Test::TestBody() (test_node.cpp:699)
==455533==    by 0x1DB903: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2433)
==455533==    by 0x1D48F8: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2469)
==455533==    by 0x1AFEF1: testing::Test::Run() (gtest.cc:2508)

// RTI_CustomSqlFilter_writer_attach
==455533== 
==455533== 136,344 (3,328 direct, 133,016 indirect) bytes in 13 blocks are definitely lost in loss record 46 of 46
==455533==    at 0x483FD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==455533==    by 0x6B5C995: RTIOsapiHeap_reallocateMemoryInternal (heap.c:748)
==455533==    by 0x6B2E2B9: REDASkiplist_newDefaultAllocator (SkiplistDefaultAllocator.c:289)
==455533==    by 0x5B5D4B4: RTI_CustomSqlFilter_writer_attach(void*, void**, void*) (custom_sql_filter.cpp:240)
==455533==    by 0x5E16A01: DDS_ContentFilter_writer_attach_wrapperI (ContentFilteredTopic.c:1420)
==455533==    by 0x67701C3: PRESPsService_assertFilteredTypeWriterRecord (PsServiceImpl.c:1048)
==455533==    by 0x67CBB40: PRESPsService_linkToRemoteReader (PsServiceLink.c:1828)
==455533==    by 0x6767099: PRESPsService_onLinkToRemoteEndpointEvent (PsServiceEvent.c:124)
==455533==    by 0x69FD705: RTIEventActiveGeneratorThread_loop (ActiveGenerator.c:226)
==455533==    by 0x6B685E5: RTIOsapiThreadChild_onSpawned (Thread.c:1388)
==455533==    by 0x51B7608: start_thread (pthread_create.c:477)
==455533==    by 0x54FD292: clone (clone.S:95)
==455533== 

Is there an unregistered operation for the custom plugin at the end?

Updated:

Sorry, I cleaned my environment and rebuilt it, I think that there is no memory leak related to the current PR.

Signed-off-by: Andrea Sorbini <asorbini@rti.com>
@asorbini
Copy link
Collaborator Author

@iuhilnehc-ynos I did find some memory leaks (and they should be resolved now), but I wasn't able to reproduce the one you are seeing. I tried both Release and Debug, since I have introduced some guarded code for Debug-only to make sure that the participant is enabled before trying to delete a content-filtered topic (I added comments in the code about it).

Please give a try to this latest version and let me know.

@iuhilnehc-ynos
Copy link
Collaborator

iuhilnehc-ynos commented Oct 26, 2021

@asorbini

Can I have a question not related to this PR?
https://issues.omg.org/issues/DDS15-319

I don't know why RTIConnect needs to add ' between a string which seems inconvenient for the users.
Maybe it's better to add ' implicitly inside the RTIConnect library if using a placeholder in filter_expression.
e.x:

1. user must use ' only if the string value in `filter_expression`
         filter_expression :    const char * filter = "name='a space b'"
2. using placeholder
         filter_expression :    const char * filter = "name=%0"
         expression_param:  
                   good: const char * value = "a space b"
                   not good:  const char * value = "'a space b'"       <----    `need to add ' inside "`

what do you think about the code snippet?

    auto options = rclcpp::SubscriptionOptions();
    options.content_filter_options.filter_expression = "node = %0";
    std::ostringstream expression_parameter;
    expression_parameter << "'" << this->get_fully_qualified_name() << "'";
    options.content_filter_options.expression_parameters = {expression_parameter.str()};

@asorbini
Copy link
Collaborator Author

I agree that the single quotes are a bit quirky, but I'm not sure the spec needs clarification like that issue claims. From Annex B.2 ("SQL grammar in BNF"):

Parameter ::= INTEGERVALUE
          | CHARVALUE
          | FLOATVALUE
          | STRING
          | ENUMERATEDVALUE
          | PARAMETER

Token expression
The syntax and meaning of the tokens used in the SQL grammar is described as follows:
[..]
STRING - Any series of characters encapsulated in single quotes, except a new-line character or a right quote. A string
starts with a left or right quote, but ends with a right quote
PARAMETER - A parameter is of the form %n, where n represents a natural number (zero included) smaller than 100.
It refers to the n + 1 th argument in the given context.

The one possible confusion might come from the fact that the spec does not explicitly say that values passed as parameters (a.k.a. token PARAMETER) must be instantiations of the other terminal tokens (e.g. STRING), but I feel like that should be the interpretation in lack of further text.

If I had to guess, the reason why string literals must be encapsulated in quotes could be that parameters are not typed and they are rather strings themselves. If strings didn't require explicit quotes for literal strings, there would be no way to determine if a string like, e.g., "2" should be interpreted as an integer parameter (2) or a string parameter ("2") from just "looking at the (expanded) expression".

Nonetheless, the middleware could use type information about filtered samples to guess the type of a parameter, and I have checked internally, and this is in general a known interoperability issue. We plan to address it and resolve it in the next revision of the DDS spec (1.5), but unfortunately that will take a while to be approved and then implemented, and I don't have any other solutions to offer for now.

Thankfully, it's not really a problem since Content-Filtered Topics are only supported by Connext :)

@iuhilnehc-ynos
Copy link
Collaborator

@asorbini

I have updated the interface names on ros2/rmw#302. Besides that, I also updated some other name for the structure, mainly updating from *content_filtered_topic_options* to *content_filter_options*.

rename_patch_based_on_13087f106acd106babf0719d1539a293233ab458.patch.txt

Could you update this PR using the new interface names that you suggested?

@clalancette
Copy link
Contributor

Thankfully, it's not really a problem since Content-Filtered Topics are only supported by Connext :)

While that may be true now, my understanding is that both Cyclone DDS and Fast DDS are either working on support, or are very close to having support for this. So I'd be careful about making any decisions based on that.

@asorbini
Copy link
Collaborator Author

asorbini commented Oct 26, 2021

Both Cyclone DDS and Fast DDS are either working on support, or are very close to having support for this.

I'm glad to hear other implementations are also adding support. Hopefully they are doing so in a standard-compliant and interoperable way, making sure to test that their implementation works with other existing and compliant solutions. Unfortunately that hasn't always been the case so far, at least not for all vendors, so I won't hold my breath too long.

Signed-off-by: Andrea Sorbini <asorbini@rti.com>
@asorbini
Copy link
Collaborator Author

@iuhilnehc-ynos I applied the patch (thank you for providing it), pulled other repos and now doing a build to verify. I will push the changes as soon as that's finished. FYI I noticed that your rmw_fastrtps clone didn't yet include the renaming of the options structure. I made the changes locally, so no problem, but you may have forgotten to push that commit.

@asorbini
Copy link
Collaborator Author

Patch applied in 883e9cf

@iuhilnehc-ynos
Copy link
Collaborator

@asorbini

FYI I noticed that your rmw_fastrtps clone didn't yet

Thank you.
I updated the branch 'topic-content_filtered_topic' on ros2/rmw_fastrtps#513 .
I'll not use the branch 'topic-content_filtered_topic-updated' branch which is just for a demo.
I think I should update the branch name of rmw_fastrtps on my test ros2.repos of https://github.com/iuhilnehc-ynos/ros2/tree/topic-debug-connextdds-cft

Copy link
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have bandwith to do a detailed review here, but once we polish the comments in ros2/rmw_fastrtps#513 and ros2/rmw#302 I think it's fine to go ahead with this one (with @iuhilnehc-ynos or @fujitatomoya approval).

@fujitatomoya
Copy link
Collaborator

@ivanpauno i will review tomorrow.

@iuhilnehc-ynos can you also do final check the implementation just in case.

CC: @asorbini

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks good to me. but ros2/rmw#302 is under review, so we might need to do some adjustment.

@fujitatomoya
Copy link
Collaborator

I started another Windows build since I should have now resolved the build issues: Build Status

@fujitatomoya
Copy link
Collaborator

@asorbini windows still unstable.

@fujitatomoya
Copy link
Collaborator

@asorbini i will go ahead to merge the stub version #77 to push the interface for CFT.

@fujitatomoya
Copy link
Collaborator

@iuhilnehc-ynos if you have time, could you take a look at this?

@fujitatomoya
Copy link
Collaborator

fujitatomoya commented Mar 21, 2022

Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
@fujitatomoya
Copy link
Collaborator

@asorbini i rebased this branch on master with removing stub functions for CFT, and also addressed cpplint error. i will retry the whole CI for this branch.

@fujitatomoya
Copy link
Collaborator

@fujitatomoya
Copy link
Collaborator

fujitatomoya commented Mar 21, 2022

CI requires default rmw implementation, https://ci.ros2.org/job/ci_linux-aarch64/10995/console.

retry with rmw_fastrtps and rmw_connextdds:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@asorbini
Copy link
Collaborator Author

@fujitatomoya thank you for addressing the linter errors! It looks like the arm64 build failed because no RMW was enabled

@ivanpauno
Copy link
Member

CI requires default rmw implementation

We don't support Connext in aarch64, so no need to run that one.
In linux/windows we could cancel one of the two jobs that are running, and only keep one.

@fujitatomoya
Copy link
Collaborator

fujitatomoya commented Mar 21, 2022

@asorbini there are still unstable errors, https://ci.ros2.org/job/ci_linux/16388/

sometimes I see the core crash with the following back trace.

test_init__rmw_connextdds
./build/rcl/test/test_init__rmw_connextdds
...
DDS_DomainParticipantFactory_get_participants:ERROR: Bad parameter: self
Segmentation fault (core dumped)
root@tomoyafujita:~/ros2_ws/ros2-cft# gdb ./build/rcl/test/test_init__rmw_connextdds core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./build/rcl/test/test_init__rmw_connextdds...
(No debugging symbols found in ./build/rcl/test/test_init__rmw_connextdds)
[New LWP 955990]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./build/rcl/test/test_init__rmw_connextdds'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007faabe68a224 in DDS_DomainParticipantFactory_unlockI ()
   from /opt/rti.com/rti_connext_dds-5.3.1/lib/x64Linux3gcc5.4.0/libnddsc.so
(gdb) bt
#0  0x00007faabe68a224 in DDS_DomainParticipantFactory_unlockI ()
   from /opt/rti.com/rti_connext_dds-5.3.1/lib/x64Linux3gcc5.4.0/libnddsc.so
#1  0x00007faabe68cdbe in DDS_DomainParticipantFactory_get_participants ()
   from /opt/rti.com/rti_connext_dds-5.3.1/lib/x64Linux3gcc5.4.0/libnddsc.so
#2  0x00007faabef1e3e6 in rmw_connextdds_finalize_participant_factory_context(rmw_context_impl_s*) ()
   from /root/ros2_ws/ros2-cft/install/rmw_connextdds_common/lib/librmw_connextdds_common_pro.so
#3  0x00007faabeebe606 in rmw_context_impl_s::finalize() [clone .part.0] ()
   from /root/ros2_ws/ros2-cft/install/rmw_connextdds_common/lib/librmw_connextdds_common_pro.so
#4  0x00007faabeebfc85 in rmw_api_connextdds_init(rmw_init_options_s const*, rmw_context_s*)::{lambda()#3}::operator()() const [clone .isra.0] () from /root/ros2_ws/ros2-cft/install/rmw_connextdds_common/lib/librmw_connextdds_common_pro.so
#5  0x00007faabeec4c7e in rmw_api_connextdds_init(rmw_init_options_s const*, rmw_context_s*) ()
   from /root/ros2_ws/ros2-cft/install/rmw_connextdds_common/lib/librmw_connextdds_common_pro.so
#6  0x00007faabf4fd63f in rcl_init () from /root/ros2_ws/ros2-cft/install/rcl/lib/librcl.so
#7  0x0000565109771f17 in TestRCLFixture__rmw_connextdds_test_rcl_init_internal_error_Test::TestBody() ()
#8  0x00005651097bb2a1 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()
#9  0x00005651097af546 in testing::Test::Run() ()
#10 0x00005651097af6a5 in testing::TestInfo::Run() ()
#11 0x00005651097af7cd in testing::TestSuite::Run() ()
#12 0x00005651097afd73 in testing::internal::UnitTestImpl::RunAllTests() ()
#13 0x00005651097affd8 in testing::UnitTest::Run() ()
#14 0x000056510976bb34 in main ()

and if i do not see the coredump, i meet the bunch of following errors in console.

./build/rcl/test/test_init__rmw_connextdds
...
[ERROR] [1647897965.816402612] [rmw_connextdds]: failed to lookup from environment: var=RMW_CONNEXT_USE_DEFAULT_PUBLISH_MODE, rc=some string error
DDS_DomainParticipantFactory_get_participants:ERROR: Bad parameter: self
DS_DomainParticipantFactory_unlockI:!precondition: self == ((void *)0)
DDS_DomainParticipantFactory_get_participants:!unlock factory
[ERROR] [1647897965.816418234] [rmw_connextdds]: failed to list existing participants
[ERROR] [1647897965.816420902] [rmw_connextdds]: failed to finalize participant factory
[ERROR] [1647897965.816422478] [rmw_connextdds]: failed to finalize RMW context

i think this is really hard to debug on our side since it seems related to DDS implementation.

@fujitatomoya
Copy link
Collaborator

Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
@asorbini
Copy link
Collaborator Author

I pushed fixes for the warnings and started a new Windows job to confirm it: Build Status

@asorbini
Copy link
Collaborator Author

@asorbini there are still unstable errors, https://ci.ros2.org/job/ci_linux/16388/

sometimes I see the core crash with the following back trace.
test_init__rmw_connextdds

and if i do not see the coredump, i meet the bunch of following errors in console.

./build/rcl/test/test_init__rmw_connextdds
...
[ERROR] [1647897965.816402612] [rmw_connextdds]: failed to lookup from environment: var=RMW_CONNEXT_USE_DEFAULT_PUBLISH_MODE, rc=some string error
DDS_DomainParticipantFactory_get_participants:ERROR: Bad parameter: self
DS_DomainParticipantFactory_unlockI:!precondition: self == ((void *)0)
DDS_DomainParticipantFactory_get_participants:!unlock factory
[ERROR] [1647897965.816418234] [rmw_connextdds]: failed to list existing participants
[ERROR] [1647897965.816420902] [rmw_connextdds]: failed to finalize participant factory
[ERROR] [1647897965.816422478] [rmw_connextdds]: failed to finalize RMW context

i think this is really hard to debug on our side since it seems related to DDS implementation.

Thank you for pointing this out, I will investigate and try to fix it. I believe it has to do with error handling code in the RMW more than internal Connext code. It seems like the RMW is trying to list the local participants, but the DDS factory has not been initialized yet. It might be as easy as adding a check for factory != NULL in that code path.

Signed-off-by: Andrea Sorbini <asorbini@rti.com>
@asorbini
Copy link
Collaborator Author

Testing fix for test errors: Build Status

@nuclearsandwich
Copy link
Member

One solution would be to update ros2/ci to also include package `rti_connext_dds_custom_sql_filter" in the ignored packages. I don't think that change would require to recreate the plans on the Jenkins server, but I'm not 100% sure. @clalancette and @nuclearsandwich, any thoughts on how feasible this would be with the code freeze coming up?

I'm getting to this quite late, and so the code freeze deadline is something you'll want to take up with @clalancette but in terms of general feasibility the logic in https://github.com/ros2/ci/blob/9815af79f8edca8a89e3ad913bb6bc5b7a5fb906/ros2_batch_job/__main__.py#L699-L721 just needs to be updated to include additional packages around rmw_connextdds and that does not require a re-deploy as long as the new package does not add any additional dependencies at the system level. If it does add new dependencies, then that is definitely an rmw freeze problem and at this stage I would say it would be best not to do that for Humble.

... we currently don't provide a package with libraries for that architecture (although we should probably reconsider this going forward, since Connext does support arm64 of course). The black listing causes all packages in ros2/rmw_connextdds to be ignored.

There are several steps to making this happen and I'd love to complete our RHEL support for Connext as well. When you've got cycles for this @asorbini I'd suggest opening an "extending platform support" or similarly phrased issue on rmw_connextdds. The current way we're distributing Pro artifacts is pretty much at its breaking point so I'd like not to add more (such as for arm64) to it without refactoring it.

@fujitatomoya
Copy link
Collaborator

@asorbini confirmed that #68 (comment) is also solved with my local environment. thanks!

@asorbini
Copy link
Collaborator Author

@fujitatomoya that's great to hear. I think the PR should be ready for being merged 🚀

@nuclearsandwich Thank you for replying and confirming the changes. I ended up getting rid of that additional module since it didn't add much other than several (albeit minor) headaches. Probably not right away, but I'll follow up on your suggestion to track changes to platform support in an issue, and I'm happy to discuss existing issues and possible solutions either there on in a separate email thread whenever you want.

@fujitatomoya fujitatomoya merged commit 2c97c79 into master Mar 22, 2022
@delete-merged-branch delete-merged-branch bot deleted the asorbini/cft branch March 22, 2022 00:38
@fujitatomoya
Copy link
Collaborator

@asorbini thanks for the fix, this has been merged!

cwecht pushed a commit to cwecht/rmw_connextdds that referenced this pull request Mar 16, 2023
* Add support for user-specified content filters.

Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* - Resolve memory leak of custom content-filter resources
- Add missing package dependencies for rti_connext_dds_custom_sql_filter
- Clean up all participants upon factory finalization
- Reset context state upon finalization (rmw_connextddsmicro)
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Assume non-null options argument
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* - Return error when retrieving content-filter from a subscription that doesn't have one.
- Rename internal functions related to content-filters
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Fix compilation error, oops.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* - Define RMW_CONNEXT_DEBUG when building Debug libraries.
- Make sure participant is enabled before deleting contained entities when using Connext debug libraries.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Resolve memory leak for finalization on error.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Rename content filter public API.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Add client/service QoS getters (ros2#67)

Signed-off-by: Mauro Passerino <mpasserino@irobot.com>

* Changelogs

Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

* 0.8.1

* Fix cpplint errors (ros2#69)

* Use static_cast instead of C-style cast

Fixes cpplint error.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Update NOLINT category

Relates to ament/ament_lint#324

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* 0.8.2

Signed-off-by: Audrow Nash <audrow@hey.com>

* Update rti-connext-dds dependency to 6.0.1. (ros2#71)

Now that this package is available in the ROS bootstrap repository for Ubuntu Focal and Jammy we can bump the expected dependency version.

* 0.8.3

* Add rmw listener apis (ros2#44)

* Add stubs for setting listener callbacks

Signed-off-by: Mauro Passerino <mpasserino@irobot.com>

* Address PR suggestions

Signed-off-by: Mauro Passerino <mpasserino@irobot.com>

* Fix linter issues

Signed-off-by: Mauro Passerino <mpasserino@irobot.com>

Co-authored-by: Mauro Passerino <mpasserino@irobot.com>
Co-authored-by: Alberto Soragna <alberto.soragna@gmail.com>

* Changelog. (ros2#73)

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>

* 0.9.0

* add stub for content filtered topic

Signed-off-by: Chen Lihui <lihui.chen@sony.com>

* * Rebased branch asorbini/cft on top of 0.9.0.
* Resolved CFT finalization issues on error.
* Verified and cleaned up build for rmw_connextddsmicro.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Move custom SQL filter to rmw_connextdds_common
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Try to resolve linking error on Windows.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Optionally disable writer-side CFT optimizations to support Windows.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* No need to declare private CFT function on Windows.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* remove stub implementation for ContentFilteredTopic.

Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>

* address cpplint error.

Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>

* Avoid conversion warnings on Windows.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Use strtol instead of sscanf to avoid warnings on Windows.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Avoid finalizing participants if factory is not available.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

Co-authored-by: mauropasse <mauropasse@hotmail.com>
Co-authored-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Co-authored-by: Jacob Perron <jacob@openrobotics.org>
Co-authored-by: Audrow Nash <audrow@hey.com>
Co-authored-by: Steven! Ragnarök <nuclearsandwich@users.noreply.github.com>
Co-authored-by: Steven! Ragnarök <steven@nuclearsandwich.com>
Co-authored-by: iRobot ROS <49500531+irobot-ros@users.noreply.github.com>
Co-authored-by: Mauro Passerino <mpasserino@irobot.com>
Co-authored-by: Alberto Soragna <alberto.soragna@gmail.com>
Co-authored-by: Chris Lalancette <clalancette@openrobotics.org>
Co-authored-by: Chen Lihui <lihui.chen@sony.com>
Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
asorbini added a commit that referenced this pull request Mar 30, 2023
* Add sequence numbers to message info structure (#74)

* Fill reception_sequence_number/publication_sequence_number in all rmw_take_*_with_info() functions

Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

* Add rmw_feature_supported()

Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

* add stub for content filtered topic (#77)

* add stub for content filtered topic

Signed-off-by: Chen Lihui <lihui.chen@sony.com>

* Add support for user-specified content filters (#68)

* Add support for user-specified content filters.

Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* - Resolve memory leak of custom content-filter resources
- Add missing package dependencies for rti_connext_dds_custom_sql_filter
- Clean up all participants upon factory finalization
- Reset context state upon finalization (rmw_connextddsmicro)
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Assume non-null options argument
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* - Return error when retrieving content-filter from a subscription that doesn't have one.
- Rename internal functions related to content-filters
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Fix compilation error, oops.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* - Define RMW_CONNEXT_DEBUG when building Debug libraries.
- Make sure participant is enabled before deleting contained entities when using Connext debug libraries.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Resolve memory leak for finalization on error.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Rename content filter public API.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Add client/service QoS getters (#67)

Signed-off-by: Mauro Passerino <mpasserino@irobot.com>

* Changelogs

Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

* 0.8.1

* Fix cpplint errors (#69)

* Use static_cast instead of C-style cast

Fixes cpplint error.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Update NOLINT category

Relates to ament/ament_lint#324

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* 0.8.2

Signed-off-by: Audrow Nash <audrow@hey.com>

* Update rti-connext-dds dependency to 6.0.1. (#71)

Now that this package is available in the ROS bootstrap repository for Ubuntu Focal and Jammy we can bump the expected dependency version.

* 0.8.3

* Add rmw listener apis (#44)

* Add stubs for setting listener callbacks

Signed-off-by: Mauro Passerino <mpasserino@irobot.com>

* Address PR suggestions

Signed-off-by: Mauro Passerino <mpasserino@irobot.com>

* Fix linter issues

Signed-off-by: Mauro Passerino <mpasserino@irobot.com>

Co-authored-by: Mauro Passerino <mpasserino@irobot.com>
Co-authored-by: Alberto Soragna <alberto.soragna@gmail.com>

* Changelog. (#73)

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>

* 0.9.0

* add stub for content filtered topic

Signed-off-by: Chen Lihui <lihui.chen@sony.com>

* * Rebased branch asorbini/cft on top of 0.9.0.
* Resolved CFT finalization issues on error.
* Verified and cleaned up build for rmw_connextddsmicro.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Move custom SQL filter to rmw_connextdds_common
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Try to resolve linking error on Windows.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Optionally disable writer-side CFT optimizations to support Windows.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* No need to declare private CFT function on Windows.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* remove stub implementation for ContentFilteredTopic.

Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>

* address cpplint error.

Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>

* Avoid conversion warnings on Windows.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Use strtol instead of sscanf to avoid warnings on Windows.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Avoid finalizing participants if factory is not available.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>

Co-authored-by: mauropasse <mauropasse@hotmail.com>
Co-authored-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Co-authored-by: Jacob Perron <jacob@openrobotics.org>
Co-authored-by: Audrow Nash <audrow@hey.com>
Co-authored-by: Steven! Ragnarök <nuclearsandwich@users.noreply.github.com>
Co-authored-by: Steven! Ragnarök <steven@nuclearsandwich.com>
Co-authored-by: iRobot ROS <49500531+irobot-ros@users.noreply.github.com>
Co-authored-by: Mauro Passerino <mpasserino@irobot.com>
Co-authored-by: Alberto Soragna <alberto.soragna@gmail.com>
Co-authored-by: Chris Lalancette <clalancette@openrobotics.org>
Co-authored-by: Chen Lihui <lihui.chen@sony.com>
Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com>

* 0.10.0

Signed-off-by: Audrow Nash <audrow@hey.com>

* Update launch_testing_ros output filter prefixes for Connext6 (#80)

Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

* Properly initialize CDR stream before using it for filtering (#81)

Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Exclude missing sample info fields when building rmw_connextddsmicro (#79)

* Exclude missing sample info fields when building micro.
* Report features individually for each RMW implementation.
* Return special value for unsupported sequence numbers.

Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Co-authored-by: Chris Lalancette <clalancette@openrobotics.org>

* 0.11.0

Signed-off-by: Audrow Nash <audrow@hey.com>

* Resolve build error with RTI Connext DDS 5.3.1 (#82)

Signed-off-by: Andrea Sorbini <asorbini@rti.com>

* Changelog.

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>

* 0.11.1

* Use destinct callbacks for each event type

---------

Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Chen Lihui <lihui.chen@sony.com>
Signed-off-by: Audrow Nash <audrow@hey.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
Co-authored-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Co-authored-by: Chen Lihui <lihui.chen@sony.com>
Co-authored-by: Andrea Sorbini <asorbini@rti.com>
Co-authored-by: mauropasse <mauropasse@hotmail.com>
Co-authored-by: Jacob Perron <jacob@openrobotics.org>
Co-authored-by: Audrow Nash <audrow@hey.com>
Co-authored-by: Steven! Ragnarök <nuclearsandwich@users.noreply.github.com>
Co-authored-by: Steven! Ragnarök <steven@nuclearsandwich.com>
Co-authored-by: iRobot ROS <49500531+irobot-ros@users.noreply.github.com>
Co-authored-by: Mauro Passerino <mpasserino@irobot.com>
Co-authored-by: Alberto Soragna <alberto.soragna@gmail.com>
Co-authored-by: Chris Lalancette <clalancette@openrobotics.org>
Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants