Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPPI path_follow_critic crash #3456

Closed
tonynajjar opened this issue Mar 8, 2023 · 9 comments
Closed

MPPI path_follow_critic crash #3456

tonynajjar opened this issue Mar 8, 2023 · 9 comments

Comments

@tonynajjar
Copy link
Contributor

tonynajjar commented Mar 8, 2023

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22
  • ROS2 Version:
    • Humble
  • Version or commit hash:
    • latest humble

Steps to reproduce issue

I still can't tell what are the preconditions leading to this crash but I get it frequently (I'll add more info as I discover them)

[controller_server-2]          77:   template<typename _Res, typename _MemPtr, typename _Tp>
[controller_server-2] #6    Source "/code/ros2_ws/src/navigation2/nav2_controller/src/controller_server.cpp", line 404, in computeControl [0x7f9bc496a5af]
[controller_server-2]         402:       updateGlobalPath();
[controller_server-2]         403: 
[controller_server-2]       > 404:       computeAndPublishVelocity();
[controller_server-2]         405: 
[controller_server-2]         406:       if (isGoalReached()) {
[controller_server-2]         407:         RCLCPP_INFO(get_logger(), "Reached the goal!");
[controller_server-2] #5    Source "/code/ros2_ws/src/navigation2/nav2_controller/src/controller_server.cpp", line 483, in computeAndPublishVelocity [0x7f9bc496bee6]
[controller_server-2]         481:   try {
[controller_server-2]         482:     cmd_vel_2d =
[controller_server-2]       > 483:       controllers_[current_controller_]->computeVelocityCommands(
[controller_server-2]         484:       pose,
[controller_server-2]         485:       nav_2d_utils::twist2Dto3D(twist),
[controller_server-2]         486:       goal_checkers_[current_goal_checker_].get());
[controller_server-2] #4    Source "/code/ros2_ws/src/navigation2/nav2_mppi_controller/src/controller.cpp", line 98, in computeVelocityCommands [0x7f9b8a749357]
[controller_server-2]          95:   nav_msgs::msg::Path transformed_plan = path_handler_.transformPath(robot_pose);
[controller_server-2]          96: 
[controller_server-2]          97:   geometry_msgs::msg::TwistStamped cmd =
[controller_server-2]       >  98:     optimizer_.evalControl(robot_pose, robot_speed, transformed_plan, goal_checker);
[controller_server-2]          99: 
[controller_server-2]         100: #ifdef BENCHMARK_TESTING
[controller_server-2]         101:   auto end = std::chrono::system_clock::now();
[controller_server-2] #3    Source "/code/ros2_ws/src/navigation2/nav2_mppi_controller/src/optimizer.cpp", line 139, in evalControl [0x7f9b8a783bd7]
[controller_server-2]         136:   prepare(robot_pose, robot_speed, plan, goal_checker);
[controller_server-2]         137: 
[controller_server-2]         138:   do {
[controller_server-2]       > 139:     optimize();
[controller_server-2]         140:   } while (fallback(critics_data_.fail_flag));
[controller_server-2]         141: 
[controller_server-2]         142:   utils::savitskyGolayFilter(control_sequence_, control_history_, settings_);
[controller_server-2] #2    Source "/code/ros2_ws/src/navigation2/nav2_mppi_controller/src/optimizer.cpp", line 156, in optimize [0x7f9b8a783b62]
[controller_server-2]         153: {
[controller_server-2]         154:   for (size_t i = 0; i < settings_.iteration_count; ++i) {
[controller_server-2]         155:     generateNoisedTrajectories();
[controller_server-2]       > 156:     critic_manager_.evalTrajectoriesScores(critics_data_);
[controller_server-2]         157:     updateControlSequence();
[controller_server-2]         158:   }
[controller_server-2]         159: }
[controller_server-2] #1    Source "/code/ros2_ws/src/navigation2/nav2_mppi_controller/src/critic_manager.cpp", line 74, in evalTrajectoriesScores [0x7f9b8a7b31bf]
[controller_server-2]          71:     if (data.fail_flag) {
[controller_server-2]          72:       break;
[controller_server-2]          73:     }
[controller_server-2]       >  74:     critics_[q]->score(data);
[controller_server-2]          75:   }
[controller_server-2]          76: }
[controller_server-2] #0  | Source "/code/ros2_ws/src/navigation2/nav2_mppi_controller/src/critics/path_follow_critic.cpp", line 44, in setPathCostsIfNotSet
[controller_server-2]     |    43:   utils::setPathFurthestPointIfNotSet(data);
[controller_server-2]     | >  44:   utils::setPathCostsIfNotSet(data, costmap_ros_);
[controller_server-2]     |    45:   const size_t path_size = data.path.x.shape(0) - 1;
[controller_server-2]     | Source "/code/ros2_ws/src/navigation2/nav2_mppi_controller/include/nav2_mppi_controller/tools/utils.hpp", line 410, in findPathCosts
[controller_server-2]     |   408: {
[controller_server-2]     |   409:   if (!data.path_pts_valid) {
[controller_server-2]     | > 410:     findPathCosts(data, costmap_ros);
[controller_server-2]     |   411:   }
[controller_server-2]     |   412: }
[controller_server-2]     | Source "/code/ros2_ws/src/navigation2/nav2_mppi_controller/include/nav2_mppi_controller/tools/utils.hpp", line 385, in operator=
[controller_server-2]     |   383:       using namespace nav2_costmap_2d; // NOLINT
[controller_server-2]     |   384:       case (LETHAL_OBSTACLE):
[controller_server-2]     | > 385:         (*data.path_pts_valid)[idx] = false;
[controller_server-2]     |   386:         continue;
[controller_server-2]     |   387:       case (INSCRIBED_INFLATED_OBSTACLE):
[controller_server-2]       Source "/usr/include/c++/11/bits/stl_bvector.h", line 95, in score [0x7f9b940a4451]
[controller_server-2]          92:       if (__x)
[controller_server-2]          93: 	*_M_p |= _M_mask;
[controller_server-2]          94:       else
[controller_server-2]       >  95: 	*_M_p &= ~_M_mask;
[controller_server-2]          96:       return *this;
[controller_server-2]          97:     }
[controller_server-2] Segmentation fault (Address not mapped to object [(nil)])
[ERROR] [controller_server-2]: process has died [pid 2557471, exit code -11, cmd '/code/ros2_ws/install/nav2_controller/lib/nav2_controller/controller_server --ros-args -p use_sim_time:=True --params-file /tmp/unaliased_st_s9n4u'].

Additional information

@SteveMacenski
Copy link
Member

SteveMacenski commented Mar 8, 2023

I can say that I haven't seen this before and hasn't been reported during our extensive beta testing either. Interesting, very interesting

I suspect its perhaps the -1 here https://github.com/ros-planning/navigation2/blob/main/nav2_mppi_controller/include/nav2_mppi_controller/tools/utils.hpp#L373. If we had a path of size 10, this could create a vector with elements 0-8 (of size 9), while the for loop just below it is iterating from 0-9.

But then the question comes by does this only happen sometimes since we set a (*data.path_pts_valid)[idx] each time regardless. Do you always see the throw only in the LETHAL_OBSTACLE case? Is the idx always the end point?

The other possible situation is that your path is empty or only 1 point long so the length of data.path_pts_valid is invalid in the construction at size 0 or resizing -1. Does this happen on approach to the goal pose or when the path is trivial?

@tonynajjar
Copy link
Contributor Author

Do you always see the throw only in the LETHAL_OBSTACLE case? 

I've seen it 4-5 times so far and only with LETHAL_OBSTACLE

Is the idx always the end point?

I did not check yet, will do tomorroe

The other possible situation is that your path is empty or only 1 point long so the length of data.path_pts_valid is invalid in the construction at size 0 or resizing -1.

Will also check that

Does this happen on approach to the goal pose or when the path is trivial?

It did not seem to happen at a particular distinguishible moment but I could be wrong. I'm pretty sure it happens when we are still far from the goal.

@SteveMacenski
Copy link
Member

I'm pretty sure it happens when we are still far from the goal.

OK. I didn't really think that was the case anyway but was worth just outlining anything I could see as a potential cause. There are protections and exceptions that should be thrown for trivial paths anyway long before that point (but I wasn't 100% sure there wasn't a hole - but if you didn't see it on approach to goal only, we can discard that).

I did not check yet, will do tomorroe

Thanks 👍 I highly suspect that's it.

@tonynajjar
Copy link
Contributor Author

tonynajjar commented Mar 9, 2023

[planner_server-3] [WARN] [1678357276.133696470] [planner_server]: SmacPlannerHybrid: failed to create plan, invalid use: Starting point in lethal space! Cannot create feasible plan..
[planner_server-3] [WARN] [1678357276.133914319] [planner_server]: Planning algorithm SmacPlannerHybrid failed to generate a valid path to (14.44, -84.03)
[planner_server-3] [WARN] [1678357276.133968525] [planner_server]: [compute_path_to_pose] [ActionServer] Aborting handle.

I see this log before the crash, seems related

@tonynajjar
Copy link
Contributor Author

tonynajjar commented Mar 9, 2023

I suspect its perhaps the -1 here

I think you were right here. The crash happens when path_segments_count is 1 because the vector data.path_pts_valid has then size 0 and we try to access it at index 0.

But then the question comes by does this only happen sometimes since we set a (*data.path_pts_valid)[idx] each time regardless

I guess this is the reason https://stackoverflow.com/questions/16620222/vector-going-out-of-bounds-without-giving-error. Replacing [] with at() does indeed yield an error every time

@tonynajjar tonynajjar mentioned this issue Mar 9, 2023
7 tasks
@SteveMacenski
Copy link
Member

Awesome, thanks for the report! There’s always 4 bugs no matter how much testing I do…

@tonynajjar
Copy link
Contributor Author

I should have asked chatgpt way earlier:

image

@doisyg
Copy link
Contributor

doisyg commented Mar 12, 2023

Thanks for fixing this !
I was seeing random controller_server crash after few hours of testing, it may be related.

@tonynajjar
Copy link
Contributor Author

tonynajjar commented Mar 12, 2023

No problem! I did see another crash after this fix, if I remember correctly somewhere in the same critic, but didn't give it much attention as I thought it would reproduce but it hasn't yet. So you might find more, I'll keep an eye open as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants