Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OneRobot.rigidlyAttachedParent test fails in jobs #3103

Closed
TSNoble opened this issue Nov 15, 2024 · 3 comments · Fixed by #3105
Closed

OneRobot.rigidlyAttachedParent test fails in jobs #3103

TSNoble opened this issue Nov 15, 2024 · 3 comments · Fixed by #3105
Labels
bug Something isn't working

Comments

@TSNoble
Copy link
Contributor

TSNoble commented Nov 15, 2024

Description

The test mentioned in the title, defined in moveit_core/robot_state/test/robot_state_test.cpp has been frequently failing on Github jobs.

https://github.com/moveit/moveit2/actions/runs/11860308971/job/33055270697
https://github.com/moveit/moveit2/actions/runs/11860056590/job/33054441260
https://github.com/moveit/moveit2/actions/runs/11782535125/job/32817780351
https://github.com/moveit/moveit2/actions/runs/11781483295/job/32814401931

to list a few.

Most notably, this seems to consistently affect the SonarScan job on main, despite other jobs which (I assume run the same tests) consistently passing

ROS Distro

Humble

OS and version

Rolling Docker Image

Source or binary build?

Source

If binary, which release version?

N/A

If source, which branch?

main

Which RMW are you using?

FastRTPS

Steps to Reproduce

Unclear. The test fails fairly consistently on various jobs. I've tried running the test locally 10000 times with the following command...

cd ~/ws_moveit/build/moveit_core/robot_state
./test_robot_state --gtest_filter="*rigid*" --gtest_repeat=10000 | grep "PASSED" | wc -l

Which consistently prints out 10000 passes

Expected behavior

Since the test pass consistently locally, it's reasonable to expect them to do so on the jobs.

Actual behavior

Many PRs have failed with the error given below.

Backtrace or Console output

On the jobs:

    [ RUN      ] OneRobot.rigidlyConnectedParent
    Error:   Group 'base_with_bad_subgroups' has unsatisfied subgroups
             at line 304 in /home/runner/work/moveit2/moveit2/.work/upstream_ws/src/srdfdom/src/model.cpp
    [INFO] [1731689210.102607487] [moveit_1614370902.moveit.core.robot_model]: Loading robot model 'one_robot'...
    [WARN] [1731689210.102743753] [moveit_1614370902.moveit.core.robot_state]: Returning dirty link transforms
    test_robot_state: /home/runner/work/moveit2/moveit2/.work/target_ws/src/moveit2/moveit_core/robot_state/src/robot_state.cpp:1309: const Eigen::Isometry3d& moveit::core::RobotState::getFrameInfo(const std::string&, const moveit::core::LinkModel*&, bool&) const: Assertion `checkLinkTransforms()' failed.

Running locally:

Note: Google Test filter = *rigid*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from OneRobot
[ RUN      ] OneRobot.rigidlyConnectedParent
Error:   Group 'base_with_bad_subgroups' has unsatisfied subgroups
         at line 304 in /root/ws_moveit/src/srdfdom/src/model.cpp
[WARN] [1731700916.222734076] [moveit_85773106]: exception thrown while creating node for logging: failed to create guard condition: context argument is null, at ./src/rcl/guard_condition.c:65
[WARN] [1731700916.222763620] [moveit_85773106]: if rclcpp::init was not called, messages from this logger may be missing from /rosout
[INFO] [1731700916.222779245] [moveit_85773106.moveit.core.robot_model]: Loading robot model 'one_robot'...
[ERROR] [1731700916.222920470] [moveit_85773106.moveit.core.robot_state]: Unable to find link for frame 'no_object'
[ERROR] [1731700916.222934489] [moveit_85773106.moveit.core.robot_state]: Unable to find link for frame 'object/no_subframe'
[ERROR] [1731700916.222937242] [moveit_85773106.moveit.core.robot_state]: Unable to find link for frame ''
[ERROR] [1731700916.222948465] [moveit_85773106.moveit.core.robot_state]: Unable to find link for frame 'object/'
[ERROR] [1731700916.222960187] [moveit_85773106.moveit.core.robot_state]: Unable to find link for frame '/'
[       OK ] OneRobot.rigidlyConnectedParent (5 ms)
[----------] 1 test from OneRobot (5 ms total)

[----------] Global test environment tear-down
[==========] 1 test

Interestingly, the failure always comes after a Returning dirty link transforms message, which does not occur when the tests pass locally. Given the failure is related to invalid transforms, this might be related.

@TSNoble TSNoble added the bug Something isn't working label Nov 15, 2024
@TSNoble
Copy link
Contributor Author

TSNoble commented Nov 15, 2024

checkLinkTransforms() calls getDirtyLinkTransforms(), which returns dirty_link_transforms_. This variable appears to be updated whenever the RobotState changes, and appears to get reset when updateLinkTransforms() is called.

I think the intent is that updateLinkTransforms() should always be called after modifying the robot state if querying transforms is expected. The OneRobot fixture has an empty teardown, so this is not guaranteed between tests. Some tests call updateLinkTransforms() in their body, so my best guess is that, if the tests are running in a random sequence the failure will occur if it modifies the state and does not call this method.

(Then again, OneRobot only has a robot_model_ member. all RobotStates are local to the tests that use them)

If my assumption is correct, I think the nicest approach would be to add a new test fixture OneRobotWithState, which builds a state and calls updateLinkTransforms() on setup and teardown to guarantee that tests aren't affecting one another.

@TSNoble
Copy link
Contributor Author

TSNoble commented Nov 15, 2024

New theory: The test in question constructs a RobotState from the model, which immediately populates the dirty transforms:

RobotState::RobotState(const RobotModelConstPtr& robot_model)
  : robot_model_(robot_model)
  , has_velocity_(false)
  , has_acceleration_(false)
  , has_effort_(false)
  , dirty_link_transforms_(nullptr)
  , dirty_collision_body_transforms_(nullptr)
  , rng_(nullptr)
{
  if (robot_model == nullptr)
  {
    throw std::invalid_argument("RobotState cannot be constructed with nullptr RobotModelConstPtr");
  }

  dirty_link_transforms_ = robot_model_->getRootJoint();
  init();
}

This seems like the more likely cause, but I'm unsure why this would propagate. I'd usually suspect a race condition issue, but the tests are single-threaded as far as I'm aware

Edit: Actually, I'm unsure why the tests are passing at all given that this implies the dirty links are never reset...

@TSNoble
Copy link
Contributor Author

TSNoble commented Nov 15, 2024

Fixed with #3105. See #3102 for explanation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant