-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Feedback for Navigation Failures #3043
Comments
In terms of implementation, I wonder if some of this is a good application of diagnostics. I think that is essentially the same idea as
|
Just a note, ROS diagnostics are usually meant to be "human readable" - the format is a bunch of strings, so it's not usually the best system if you want to autonomously make decisions based on the data. |
I will also note - this feature (different recovery behaviors depending on the actual cause of failure) was a frequent request in ROS1 as well - we never actually got there. |
That is all logic to be specified and customized in your behavior tree. Everyone's application is different so there's only so much I can do in the default BTs so that they work for most everyone's use-cases using generalized BT nodes. You could, and should, in that situation create some custom BT nodes for checking if the goal is occupied to take different corrective action if you don't believe a given default recovery is rational. Ex. the flow would be something like 'if can't create path, check if goal is occupied, if so, do XYZ` and you can have a variety of condition nodes for specifying actions. With that said, there is on-going work to improve the default BTs and I'm definitely open to suggestions! But thinking about this a bit more generally, we could give users some notion of what failed (e.g. which component crapped out: planner or controller) as a result type to return. Then that information of the action result type could be available in the BT blackboard for other condition nodes to access so that they don't need to derive that context themselves, when readily available from the planner itself. That's already 'something' but not a great deal of context -- although in practical terms planning only fails if (1) there's no valid path (2) the goal or starting points are occupied. Controllers fail when (1) its stuck (2) its in collision already. If we want more information than just what component failed, you're then reliant on planner plugins to providing you with that proper context about failure - which requires updating the planners to throw standard exceptions / failure codes that can be caught by the planner server and returned via the Diagnostics are not the right answer. We could always improve the Action result fields to contain more information, that's not much of a problem. What's more of a problem is deciding how to what / to populate it with and how so. We need to come up with an exhaustive (but later extendable) list of return codes we'd want to send on failures so that they're API-documentation-able and usable in the BT within their contexts. But it would be totally reasonable to pass through the error code from the planner/controller/etc to the NavigateToPose action task so that the autonomy system can know what failed and why, that seems totally reasonable / easy once the planner/controller failure codes are handled.
I disagree with that - we've highlighted now 2 perfectly reasonable ways of doing this with BTs 😉 |
ok I think I totally misunderstood this ticket to be for user knowledge, rather than programmatic recovery :) |
Thanks for the thoughts, that makes sense!
Ok, for example the "Goal is occupied" is then meant to be caught in a separate node in the tree. It probably makes sense to only have one preferred way of doing things and this would be it. Then, a good recoveries tree should e.g. only turn if the goal is not occupied but fail straight away if it is. Still, the caller of NavigateToPose should ideally have a way of getting this information without having to implement their own logic. Considering your explanations this is probably the more important use case for this kind of feedback. |
I think in that case what I mentioned before is a good option. We need to come up with a set of error codes for each major plugin class (probably just Planners and Controllers) that can be sent in the action API response. There are a few ways to communicate that, but exceptions seems like the path of least resistance. For each return code, we'd make a trivial exception deriving from Though, I'd argue that having some of those primitive BT nodes like checking if a pose is occupied would be generally useful regardless. That's here nor there I suppose. The steps are pretty straight forward
Its not surprisingly not a huge amount of work. |
Thoughts? |
Sorry, we are have a bit too much on our plates at the moment :) One detail question: If I understand you correctly, you would put the error codes/exceptions on the blackboard for planner and controller, probably clearing them when a planner/controller iteration succeeds or a new goal is given. This would mean also putting both of these exceptions in the action result, right? I generally agree with your points. I also think that it would not be a huge amount of work. However, I think we currently don't have the capacity to implement this (and the required tests etc.) and will resort to the existing BT condition nodes for the moment. |
Yes, I think the action results should have some return code field that remaps to specific failure modes that can be used on the blackboard by other nodes to know why a failure was triggered. Each time we get a new result from an algorithm that return code would be updated. |
I will take a crack at this tomorrow. It should be pretty straight forward. |
This solution will only work with rolling, correct? i.e., they can't be implemented in galactic and humble because it changes message definitions? |
Correct |
Which servers here would we want to call this complete? Planner/Controller? Also Smoother, Waypoint, Behavior, BT Navigator? The BT Navigator / Waypoint would be for returning total statuses beyond just "failure" to what specifically failed to an autonomy stack. So it could be that these are just pass-throughs from the task servers with a flag as to which caused it with a given code. The smoother / behavior server are the 2 other real task server candidates. I'm not sure we can specifically enumerate the failure modes of a behavior server with such variants as spin to assisted teleop. Behavior Server is more an aggregating server to share resources more than they are really related to any particular task. Smoothers are pretty specific. Timeout, general failure, collision, could be some options, though none are particularly insightful. So I'm not sure it would be worthwhile to add them to smoother/behavior, but I'd like to hear some feedback from others. |
Planner codes merged! |
I think it might be worth while to come up with a mock list of errors for each server. My other question is do we care about errors we can't act on? For the smoother server, I think adding error codes is dependent on the answer to my question above. For the BT navigator server/Waypoint, coming up with a mock list might be useful. Do you have any ideas of the top of your head? A think that is doesn't really make sense to add error codes for the behavior server because they don't fail under a common umbrella of errors. If we create more recoveries with some relation between them then this could be useful but even then it would become messy quickly. |
I think that's a valuable exercise. Spin: Invalid inputs or collision detected or TF Backup / Drive Forward: Invalid inputs or collision detected or TF AssistedTeleop: TF, perhaps add in progress checkers and such to see if anything is happening successfully? Otherwise, this is really just a pass through. Smooth: invalid algorithm, Path cannot be smoothed, smoothed path resulted in collision or off of map, smoother time exceeded, TF, invalid inputs(path too short, costmap not updated, etc), invalid algorithm (doesn't exist from request). This is probably meaningful set of error codes to explore. Wait: None I think the WPF server would just be a pass through of the error codes from the planners / controller / smoother servers and maybe a few (or 1) WP specific ones like invalid request. I think BT Nav would be similar. Also, it just came to mind that we don't have an error code yet in the planners / controllers for invalid algorithm request (e.g. |
It looks like writing error codes for the smoother could be the next step.
I agree. We just need to pipe the error codes up.
Yep! |
So remaining:
|
@jwallace42 want to do the smoother after the Pipe error one so that we can button up all the major task servers? That would be enough for us to announce the new feature while buttoning up the other small things (tests, invalid algorithm error code, docs - though you started that one) |
Sure. We have a rough list of the error codes already so it should be straight forward to get that in. |
Imminently completed in PR #3324 ! |
Hi @jwallace42, @SteveMacenski and all, I'd like to rise a discussion over the exceptions of current planning plugins. I already implemented a small change towards this direction here: boschresearch@f626a75 Though, I think it's valid to have this discussion in larger audience also because the proposed change is actually changing the semantics of the two exceptions. Thus, this choice should affect all the others nav2 planners. What do you think about it? Best regards, |
I think this is a good point. Can you open a PR and we can discuss there? The biggest items I'd ask is making sure you fix all instances of checking the start/goal in all the task servers that have that as an error code option. And probably break out that CC @jwallace42 should probably review that PR as well |
Summary
When reacting to navigation failures, it is essential to know why it failed to be able to react to it in a reasonable way.
Explanation
ComputePathToPose
and controller'sFollowPath
provide no additional information than success/failure and astd_msgs/Empty
.NavigateToPose
action, the resulting response is just astd_msgs/Empty
.Example where feedback could be useful: In the behavior trees, we often see recovery actions for planner/controller failures. This makes sense if the computation failed because the robot is stuck somehow and turning/waiting can resolve the issue. But if the planner failed because the goal is occupied, it makes no sense that the robot backs up or turns around as soon as it sees it. Different reactions depending on the cause for failure are needed.
What can we do about it?
First of all: Do you agree that this is problematic?
I am not sure what the best way of dealing with this issue is and wanted to openly ask what you think about it.
Some ideas:
error
fieldsStill, I don't see a clean way of dealing with failure reasons in the behavior trees, the architecture doesn't seem to allow for it.
The text was updated successfully, but these errors were encountered: