Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add unit test to check for node names across rmw impl. #260

Merged
merged 1 commit into from
Mar 30, 2018

Conversation

dirk-thomas
Copy link
Member

Connect to ros2/ros2#438.

@dirk-thomas dirk-thomas added the in review Waiting for review (Kanban column) label Mar 29, 2018
@dirk-thomas dirk-thomas self-assigned this Mar 29, 2018
@dirk-thomas dirk-thomas force-pushed the node_name_in_user_data branch from 6eeb04f to 5972e29 Compare March 29, 2018 20:52
@dirk-thomas dirk-thomas force-pushed the node_name_in_user_data branch from 5972e29 to 9186d24 Compare March 29, 2018 20:54
@Karsten1987
Copy link
Contributor

Would it make sense to also check that the function recognizes when a node is not available anymore? Right now, we test that when a node is started, the node name is populated correctly. What's the behavior when the node exits? Can we test this?

@dirk-thomas
Copy link
Member Author

This could be tested but I am not trying to cover that case in this PR. This is only checking that cross vendor nodes "agree" on how the node name should be exchanged.

@dirk-thomas dirk-thomas merged commit 9d1b3fe into master Mar 30, 2018
@dirk-thomas dirk-thomas deleted the node_name_in_user_data branch March 30, 2018 17:57
@dirk-thomas dirk-thomas removed the in review Waiting for review (Kanban column) label Mar 30, 2018
@dhood
Copy link
Member

dhood commented Mar 30, 2018

since this added new tests I ran a build to check for flakiness. Something showed up, if you could take a look please: Build Status

@dirk-thomas
Copy link
Member Author

The output indicates that the actual test was successful:

[node_name_list] found expected node name

And the second process exited cleanly as a result of that:

(node_name_list) rc 0

Based on that the first process receives a signal to shutdown:

(node_with_name) signal SIGINT

I assume the first process didn't react on that signal and after a timeout was being killed:

(node_with_name) signal SIGTERM
(node_with_name) rc -15

I can't reproduce the problem locally with --retest-until-fail 50 though. The problem doesn't seem to be related to the actual logic of the test but a more general flakiness regarding a reliable shutdown. But especially rmw_fastrtps_cpp is rather quick in shutting down (compared to rmw_connext_cpp).

@dhood
Copy link
Member

dhood commented Mar 30, 2018

Doesn't look like it even received the signal since it didn't print signal_handler(2) like usual. rclcpp::shutdown will remove the signal handler so maybe that's related.
You might need to fiddle with the teardown handlers, sometimes if they interrupt and already-exiting process that causes issues. The node_name_list probably belongs as the second executable so that it gets the primary_exit_handler because that's the one that will exit "naturally".

Also from this test output it doesn't look like ros2/ros2cli#72 has been resolved:
https://ci.ros2.org/job/ci_linux/4161/testReport/junit/test_rclcpp.build.test_rclcpp/test_node_name__rmw_opensplice_cpp__rmw_connext_cpp_/test_node_name/ (from my understanding the blank names shouldn't be returned by get_node_names)

@dhood
Copy link
Member

dhood commented Mar 30, 2018

sorry, I misread this line, node_name_list is already the second executable so should get the primary_exit_handler.
It still seems the signal handler for node_with_name may have been removed by the time launch sends the sigint though.

@dirk-thomas
Copy link
Member Author

You might need to fiddle with the teardown handlers,

While it would certainly be worth investigating and fixing I might not follow up on this launch specific issue in the context of the node name changes though. The problem is not related to this patch (changing the way node names are being passed in DDS and ensure that it works cross vendor).

Also from this test output it doesn't look like ros2/ros2cli#72 has been resolved

The test node only calls the rclcpp API but doesn't filter empty results out. That filtering only happens in the command line tool and is part of ros2/ros2cli#76.

It still seems the signal handler for node_with_name may have been removed by the time launch sends the sigint though.

I don't think this is what is happening here. The node_with_name runs forever until it receives a signal. So the signal handler being deregistered can't be the reason why the process doesn't exit in time. The shutdown logic is only triggered when the signal has been received.

@dirk-thomas
Copy link
Member Author

I committed 19946c4 to visually distinguish the empty line separating two result lists (still an empty line) and lines with empty node names (now showing up as "- ").

@dhood
Copy link
Member

dhood commented Apr 2, 2018

Ok, definitely for the most part the tests seem to be working; they didn't cause any flakiness in the builds over the weekend, so I agree that it isn't necessary to put fixing this test's flakiness at the top of our to-do list if we're confident the issue is elsewhere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants