Feature Request: Feedback for Navigation Failures #3043

karl-schulz · 2022-06-25T10:24:59Z

Summary

When reacting to navigation failures, it is essential to know why it failed to be able to react to it in a reasonable way.

Explanation

Even inside the stack, planner's ComputePathToPose and controller's FollowPath provide no additional information than success/failure and a std_msgs/Empty.
From the outside of navigation, there is also no way of knowing the overall reason for a navigation failure (e.g. Planner or Controller failure, no progress, ...). Calling the NavigateToPose action, the resulting response is just a std_msgs/Empty.

Example where feedback could be useful: In the behavior trees, we often see recovery actions for planner/controller failures. This makes sense if the computation failed because the robot is stuck somehow and turning/waiting can resolve the issue. But if the planner failed because the goal is occupied, it makes no sense that the robot backs up or turns around as soon as it sees it. Different reactions depending on the cause for failure are needed.

What can we do about it?

First of all: Do you agree that this is problematic?

I am not sure what the best way of dealing with this issue is and wanted to openly ask what you think about it.

Some ideas:

Planner and controllers could loosely publish failure reasons on separate topics for anyone to react to it
Alternatively, they could use the blackboard for this
The action interfaces could have error fields

Still, I don't see a clean way of dealing with failure reasons in the behavior trees, the architecture doesn't seem to allow for it.

The text was updated successfully, but these errors were encountered:

vinnnyr · 2022-06-27T16:25:46Z

In terms of implementation, I wonder if some of this is a good application of diagnostics.

I think that is essentially the same idea as

Planner and controllers could loosely publish failure reasons on separate topics for anyone to react to it

mikeferguson · 2022-06-27T18:23:46Z

Just a note, ROS diagnostics are usually meant to be "human readable" - the format is a bunch of strings, so it's not usually the best system if you want to autonomously make decisions based on the data.

mikeferguson · 2022-06-27T18:24:56Z

I will also note - this feature (different recovery behaviors depending on the actual cause of failure) was a frequent request in ROS1 as well - we never actually got there.

SteveMacenski · 2022-06-27T19:06:03Z

But if the planner failed because the goal is occupied, it makes no sense that the robot backs up or turns around as soon as it sees it. Different reactions depending on the cause for failure are needed.

That is all logic to be specified and customized in your behavior tree. Everyone's application is different so there's only so much I can do in the default BTs so that they work for most everyone's use-cases using generalized BT nodes. You could, and should, in that situation create some custom BT nodes for checking if the goal is occupied to take different corrective action if you don't believe a given default recovery is rational. Ex. the flow would be something like 'if can't create path, check if goal is occupied, if so, do XYZ` and you can have a variety of condition nodes for specifying actions.

With that said, there is on-going work to improve the default BTs and I'm definitely open to suggestions!

But thinking about this a bit more generally, we could give users some notion of what failed (e.g. which component crapped out: planner or controller) as a result type to return. Then that information of the action result type could be available in the BT blackboard for other condition nodes to access so that they don't need to derive that context themselves, when readily available from the planner itself. That's already 'something' but not a great deal of context -- although in practical terms planning only fails if (1) there's no valid path (2) the goal or starting points are occupied. Controllers fail when (1) its stuck (2) its in collision already.

If we want more information than just what component failed, you're then reliant on planner plugins to providing you with that proper context about failure - which requires updating the planners to throw standard exceptions / failure codes that can be caught by the planner server and returned via the result interface. This is certainly possible, but would involve a bit of leg work. We would also want to update documentation to make it clear to planner plugin authors that they are expected to return specific codes / throw exceptions in these particular situations (but that's just updating a tutorial + some doxygen in nav2_core).

Diagnostics are not the right answer. We could always improve the Action result fields to contain more information, that's not much of a problem. What's more of a problem is deciding how to what / to populate it with and how so. We need to come up with an exhaustive (but later extendable) list of return codes we'd want to send on failures so that they're API-documentation-able and usable in the BT within their contexts.

But it would be totally reasonable to pass through the error code from the planner/controller/etc to the NavigateToPose action task so that the autonomy system can know what failed and why, that seems totally reasonable / easy once the planner/controller failure codes are handled.

Still, I don't see a clean way of dealing with failure reasons in the behavior trees, the architecture doesn't seem to allow for it.

I disagree with that - we've highlighted now 2 perfectly reasonable ways of doing this with BTs 😉

vinnnyr · 2022-06-27T19:16:40Z

ok I think I totally misunderstood this ticket to be for user knowledge, rather than programmatic recovery :)

karl-schulz · 2022-06-28T15:53:49Z

Thanks for the thoughts, that makes sense!

I disagree with that - we've highlighted now 2 perfectly reasonable ways of doing this with BTs wink

Ok, for example the "Goal is occupied" is then meant to be caught in a separate node in the tree. It probably makes sense to only have one preferred way of doing things and this would be it. Then, a good recoveries tree should e.g. only turn if the goal is not occupied but fail straight away if it is.

Still, the caller of NavigateToPose should ideally have a way of getting this information without having to implement their own logic. Considering your explanations this is probably the more important use case for this kind of feedback.

SteveMacenski · 2022-06-28T18:56:58Z

I think in that case what I mentioned before is a good option. We need to come up with a set of error codes for each major plugin class (probably just Planners and Controllers) that can be sent in the action API response. There are a few ways to communicate that, but exceptions seems like the path of least resistance. For each return code, we'd make a trivial exception deriving from runtime_error that can be caught and set the return code for failure modes. That can then be put in the BT blackboard for use.

Though, I'd argue that having some of those primitive BT nodes like checking if a pose is occupied would be generally useful regardless. That's here nor there I suppose.

The steps are pretty straight forward

Come up with a relatively complete list of the specific types of errors you want to be made aware of (starting or goal is occupied, could not find any valid path, could not find a path within timeout, etc)
Create exceptions in nav2_core/Exceptions.hpp for each of those (really just needs to derive from runtime_error and that's it
Update the plugins to use those specific exceptions in those specific situations. I think most already do checks for start/goal validity so its just a trivial change out to which exception is used
Update the planner/controller server to catch each of the exceptions separately to set the return code for the message
Update the planner/controller BT nodes to put that information into the blackboard for use
Ideally, having a BT XML proposal that uses that blackboard information as an example or perhaps even a new default if that information helps solve some general problem!

Its not surprisingly not a huge amount of work.

SteveMacenski · 2022-07-06T23:32:54Z

Thoughts?

karl-schulz · 2022-07-07T10:20:06Z

Sorry, we are have a bit too much on our plates at the moment :)

One detail question: If I understand you correctly, you would put the error codes/exceptions on the blackboard for planner and controller, probably clearing them when a planner/controller iteration succeeds or a new goal is given. This would mean also putting both of these exceptions in the action result, right?

I generally agree with your points. I also think that it would not be a huge amount of work. However, I think we currently don't have the capacity to implement this (and the required tests etc.) and will resort to the existing BT condition nodes for the moment.

SteveMacenski · 2022-07-07T19:32:00Z

Yes, I think the action results should have some return code field that remaps to specific failure modes that can be used on the blackboard by other nodes to know why a failure was triggered. Each time we get a new result from an algorithm that return code would be updated.

jwallace42 · 2022-08-22T22:11:59Z

I will take a crack at this tomorrow. It should be pretty straight forward.

madgrizzle · 2022-09-16T13:10:33Z

This solution will only work with rolling, correct? i.e., they can't be implemented in galactic and humble because it changes message definitions?

SteveMacenski · 2022-09-16T18:32:46Z

Correct

SteveMacenski · 2022-09-23T23:46:50Z

Which servers here would we want to call this complete? Planner/Controller? Also Smoother, Waypoint, Behavior, BT Navigator?

The BT Navigator / Waypoint would be for returning total statuses beyond just "failure" to what specifically failed to an autonomy stack. So it could be that these are just pass-throughs from the task servers with a flag as to which caused it with a given code.

The smoother / behavior server are the 2 other real task server candidates. I'm not sure we can specifically enumerate the failure modes of a behavior server with such variants as spin to assisted teleop. Behavior Server is more an aggregating server to share resources more than they are really related to any particular task.

Smoothers are pretty specific. Timeout, general failure, collision, could be some options, though none are particularly insightful.

So I'm not sure it would be worthwhile to add them to smoother/behavior, but I'd like to hear some feedback from others.

SteveMacenski · 2022-09-28T19:53:28Z

Planner codes merged!

jwallace42 · 2022-10-14T16:04:10Z

Which servers here would we want to call this complete? Planner/Controller? Also Smoother, Waypoint, Behavior, BT Navigator?

The BT Navigator / Waypoint would be for returning total statuses beyond just "failure" to what specifically failed to an autonomy stack. So it could be that these are just pass-throughs from the task servers with a flag as to which caused it with a given code.

The smoother / behavior server are the 2 other real task server candidates. I'm not sure we can specifically enumerate the failure modes of a behavior server with such variants as spin to assisted teleop. Behavior Server is more an aggregating server to share resources more than they are really related to any particular task.

Smoothers are pretty specific. Timeout, general failure, collision, could be some options, though none are particularly insightful.

So I'm not sure it would be worthwhile to add them to smoother/behavior, but I'd like to hear some feedback from others.

I think it might be worth while to come up with a mock list of errors for each server. My other question is do we care about errors we can't act on?

For the smoother server, I think adding error codes is dependent on the answer to my question above.

For the BT navigator server/Waypoint, coming up with a mock list might be useful. Do you have any ideas of the top of your head?

A think that is doesn't really make sense to add error codes for the behavior server because they don't fail under a common umbrella of errors. If we create more recoveries with some relation between them then this could be useful but even then it would become messy quickly.

SteveMacenski · 2022-10-14T19:58:06Z

I think that's a valuable exercise.

Spin: Invalid inputs or collision detected or TF

Backup / Drive Forward: Invalid inputs or collision detected or TF

AssistedTeleop: TF, perhaps add in progress checkers and such to see if anything is happening successfully? Otherwise, this is really just a pass through.

Smooth: invalid algorithm, Path cannot be smoothed, smoothed path resulted in collision or off of map, smoother time exceeded, TF, invalid inputs(path too short, costmap not updated, etc), invalid algorithm (doesn't exist from request). This is probably meaningful set of error codes to explore.

Wait: None

I think the WPF server would just be a pass through of the error codes from the planners / controller / smoother servers and maybe a few (or 1) WP specific ones like invalid request. I think BT Nav would be similar.

Also, it just came to mind that we don't have an error code yet in the planners / controllers for invalid algorithm request (e.g. FollowPath2 doesn't exist). That might be another good one to add to the core servers in a follow up PR

jwallace42 · 2022-10-20T14:18:29Z

It looks like writing error codes for the smoother could be the next step.

I think the WPF server would just be a pass through of the error codes from the planners / controller / smoother servers and maybe a few (or 1) WP specific ones like invalid request. I think BT Nav would be similar.

I agree. We just need to pipe the error codes up.

Also, it just came to mind that we don't have an error code yet in the planners / controllers for invalid algorithm request (e.g. FollowPath2 doesn't exist). That might be another good one to add to the core servers in a follow up PR

Yep!

SteveMacenski · 2022-10-31T19:53:51Z

So remaining:

Add invalid algorithm selection exception / error code
WPF / NavToPose / NavThroughPose ways to indicate error code and offending server
~~Do AssistedTeleop / spin / backup / forward? Questionable.~~
Add unit tests - all of this exception catching code is now largely not covered by current tests. Some dummy algorithms that throw an exception on calling the main method could be useful and then check the error codes of the server response that handled properly.
how to for future servers Add New Task Server Tutorial
smoother server (+ invalid algorithm + unit tests as above
Doc updates about this change / announcement
Update some BTs to use this work?

SteveMacenski · 2022-11-07T20:05:19Z

@jwallace42 want to do the smoother after the Pipe error one so that we can button up all the major task servers? That would be enough for us to announce the new feature while buttoning up the other small things (tests, invalid algorithm error code, docs - though you started that one)

jwallace42 · 2022-11-07T22:06:48Z

Sure. We have a rough list of the error codes already so it should be straight forward to get that in.

SteveMacenski · 2023-01-03T19:38:00Z

Imminently completed in PR #3324 !

annamannucci · 2023-09-08T08:52:31Z

Hi @jwallace42, @SteveMacenski and all,

I'd like to rise a discussion over the exceptions of current planning plugins.
In the current implementation of the navfn_planner, StartOccupied and GoalOccupied are raised only when the start or the goal pixel are in a lethal obstacle. Though, it would make sense in my opinion to raise this error also when the start or the goal are actually in an INSCRIBED_INFLATED_OBSTACLE.

I already implemented a small change towards this direction here: boschresearch@f626a75

Though, I think it's valid to have this discussion in larger audience also because the proposed change is actually changing the semantics of the two exceptions. Thus, this choice should affect all the others nav2 planners.

What do you think about it?

Best regards,
Anna Mannucci

SteveMacenski · 2023-09-08T17:54:19Z

I think this is a good point. Can you open a PR and we can discuss there? The biggest items I'd ask is making sure you fix all instances of checking the start/goal in all the task servers that have that as an error code option. And probably break out that else statement region search into another function for readability.

CC @jwallace42 should probably review that PR as well

SteveMacenski added the enhancement New feature or request label Jul 7, 2022

jwallace42 mentioned this issue Aug 23, 2022

added result codes for global planner #3146

Merged

7 tasks

jwallace42 mentioned this issue Sep 14, 2022

Control exceptions #3190

Closed

7 tasks

jwallace42 mentioned this issue Oct 2, 2022

Controller exceptions #3227

Merged

7 tasks

jwallace42 mentioned this issue Oct 12, 2022

Expections #3244

Merged

7 tasks

jwallace42 mentioned this issue Oct 17, 2022

exceptions for compute path through poses #3248

Merged

7 tasks

jwallace42 mentioned this issue Oct 20, 2022

Pipe error codes #3251

Merged

7 tasks

SteveMacenski added the in progress label Nov 2, 2022

jwallace42 mentioned this issue Nov 11, 2022

Error code tests #3278

Closed

7 tasks

SteveMacenski closed this as completed in #3251 Nov 18, 2022

SteveMacenski reopened this Nov 18, 2022

jwallace42 mentioned this issue Nov 21, 2022

Smoother error codes #3296

Merged

7 tasks

AlexeyMerzlyakov mentioned this issue Dec 13, 2022

Support result codes for path planner in benchmarking #3323

Merged

7 tasks

jwallace42 mentioned this issue Dec 13, 2022

Behavior Tree uses Error Codes #3324

Merged

7 tasks

SteveMacenski closed this as completed Jan 3, 2023

RBT22 mentioned this issue May 31, 2023

Error handling EnjoyRobotics/navigation2#4

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Feedback for Navigation Failures #3043

Feature Request: Feedback for Navigation Failures #3043

karl-schulz commented Jun 25, 2022 •

edited

Loading

vinnnyr commented Jun 27, 2022 •

edited

Loading

mikeferguson commented Jun 27, 2022

mikeferguson commented Jun 27, 2022

SteveMacenski commented Jun 27, 2022 •

edited

Loading

vinnnyr commented Jun 27, 2022

karl-schulz commented Jun 28, 2022

SteveMacenski commented Jun 28, 2022 •

edited

Loading

SteveMacenski commented Jul 6, 2022

karl-schulz commented Jul 7, 2022

SteveMacenski commented Jul 7, 2022

jwallace42 commented Aug 22, 2022

madgrizzle commented Sep 16, 2022

SteveMacenski commented Sep 16, 2022

SteveMacenski commented Sep 23, 2022 •

edited

Loading

SteveMacenski commented Sep 28, 2022

jwallace42 commented Oct 14, 2022 •

edited

Loading

SteveMacenski commented Oct 14, 2022 •

edited

Loading

jwallace42 commented Oct 20, 2022 •

edited

Loading

SteveMacenski commented Oct 31, 2022 •

edited

Loading

SteveMacenski commented Nov 7, 2022 •

edited

Loading

jwallace42 commented Nov 7, 2022

SteveMacenski commented Jan 3, 2023

annamannucci commented Sep 8, 2023 •

edited

Loading

SteveMacenski commented Sep 8, 2023 •

edited

Loading

Feature Request: Feedback for Navigation Failures #3043

Feature Request: Feedback for Navigation Failures #3043

Comments

karl-schulz commented Jun 25, 2022 • edited Loading

Summary

Explanation

What can we do about it?

vinnnyr commented Jun 27, 2022 • edited Loading

mikeferguson commented Jun 27, 2022

mikeferguson commented Jun 27, 2022

SteveMacenski commented Jun 27, 2022 • edited Loading

vinnnyr commented Jun 27, 2022

karl-schulz commented Jun 28, 2022

SteveMacenski commented Jun 28, 2022 • edited Loading

SteveMacenski commented Jul 6, 2022

karl-schulz commented Jul 7, 2022

SteveMacenski commented Jul 7, 2022

jwallace42 commented Aug 22, 2022

madgrizzle commented Sep 16, 2022

SteveMacenski commented Sep 16, 2022

SteveMacenski commented Sep 23, 2022 • edited Loading

SteveMacenski commented Sep 28, 2022

jwallace42 commented Oct 14, 2022 • edited Loading

SteveMacenski commented Oct 14, 2022 • edited Loading

jwallace42 commented Oct 20, 2022 • edited Loading

SteveMacenski commented Oct 31, 2022 • edited Loading

SteveMacenski commented Nov 7, 2022 • edited Loading

jwallace42 commented Nov 7, 2022

SteveMacenski commented Jan 3, 2023

annamannucci commented Sep 8, 2023 • edited Loading

SteveMacenski commented Sep 8, 2023 • edited Loading

karl-schulz commented Jun 25, 2022 •

edited

Loading

vinnnyr commented Jun 27, 2022 •

edited

Loading

SteveMacenski commented Jun 27, 2022 •

edited

Loading

SteveMacenski commented Jun 28, 2022 •

edited

Loading

SteveMacenski commented Sep 23, 2022 •

edited

Loading

jwallace42 commented Oct 14, 2022 •

edited

Loading

SteveMacenski commented Oct 14, 2022 •

edited

Loading

jwallace42 commented Oct 20, 2022 •

edited

Loading

SteveMacenski commented Oct 31, 2022 •

edited

Loading

SteveMacenski commented Nov 7, 2022 •

edited

Loading

annamannucci commented Sep 8, 2023 •

edited

Loading

SteveMacenski commented Sep 8, 2023 •

edited

Loading