Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ros2component] find_container_node_names sometimes fails in the presence of other nodes #321

Closed
cottsay opened this issue Aug 24, 2019 · 6 comments · Fixed by #322
Closed
Assignees
Labels
bug Something isn't working

Comments

@cottsay
Copy link
Member

cottsay commented Aug 24, 2019

The ros2component API tests call find_container_node_names, which sometimes fails in the presence of other running tests:

________________________ test_find_container_node_names ________________________
test/test_api.py:30: in test_find_container_node_names
    assert len(find_container_node_names(
ros2component/api/__init__.py:204: in find_container_node_names
    services = get_service_info(node=node, remote_node_name=n.full_name)
../../../../install/ros2node/lib/python3.6/site-packages/ros2node/api/__init__.py:76: in get_service_info
    return get_topics(remote_node_name, node.get_service_names_and_types_by_node)
../../../../install/ros2node/lib/python3.6/site-packages/ros2node/api/__init__.py:59: in get_topics
    names_and_types = func(node.name, node.namespace)
../../../../install/rclpy/lib/python3.6/site-packages/rclpy/node.py:1505: in get_service_names_and_types_by_node
    capsule, node_name, node_namespace)
E   RuntimeError: Failed to get_service_names_and_types: Unable to find GUID for node , at /home/jenkins-agent/workspace/nightly_linux_release/ws/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/rmw_node_info_and_types.cpp:87

On the CI farm, it seems to be common that this test runs at the same time as the pendulum_control tests, and it causes this test to fail. I was able to reproduce the behavior locally by running both tests in simultaneously in a loop.

@cottsay cottsay added the bug Something isn't working label Aug 24, 2019
@cottsay cottsay self-assigned this Aug 24, 2019
@ivanpauno
Copy link
Member

I think, the problem is that we're first getting a list of node names, and then using it for list services of those nodes. If the node has disappeared in the middle, get_service_info will fail.

The problem is not only the test, we're using the same logic in other places:

with NodeStrategy(args) as node:
node_names = get_node_names(node=node)
with DirectNode(args) as node:
container_node_names = find_container_node_names(
node=node, node_names=node_names
)
.

IMO, find_container_node_names should be rewritten to handle the error nicely.

@hidmic
Copy link
Contributor

hidmic commented Aug 26, 2019

That is correct. I wonder if this is the only place where we have such races.

@cottsay
Copy link
Member Author

cottsay commented Sep 16, 2019

Is the resolution for this issue progressing? It's been the only nightly failure for Linux release/debug for several days now. It would be great to get those builds clean again.

If the fix is still a ways out, I'd like to reconsider isolating the domain ID of the test and adding another test case for this issue specifically, since the test is failing due to an issue it isn't actually looking for.

@dirk-thomas
Copy link
Member

I'd like to reconsider isolating the domain ID of the test

As discussed on previous tickets there is no good approach to select a unique but no colliding domain id in our test infrastructure atm.

@ivanpauno
Copy link
Member

I'm working on this. I'll finish with it between today and tomorrow.

@cottsay
Copy link
Member Author

cottsay commented Sep 16, 2019

Thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants