-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BT Action Nodes crash bt_navigator
node upon action server call timeout
#2376
Comments
I think a better solution is to handle the exception, not remove support for exceptions. Other Bt nodes will throw so we should instead make sure that there isn't crashing when exceptions are thrown in the behavior tree. It sounds like something got messed up, we have the BT engine in the BT navigator wrapped with a try catch when ticking nodes that should be catching exceptions. I agree if the current node exception behavior isnt allowing recovery behaviors to trigger, that is a problem. A middleware timeout issue should have a soft re-attempt possible have you looked at main branch if any of these concerns are resolved? Much has changed since foxy on that front that isn’t backportable that may have outright resolved many of your concerns. In your PR you mention constructed bt nodes failing — you also missed that some BT nodes may be getting input values in the constructors themselves. In that case, a crash of a required field would be a reasonable outcome. |
Thanks for feedback @SteveMacenski I totally agree with you that preventing exceptions from causing major damage ( ROS2 node dying) is something necessary. I will take a look at I'd say though that throwing exceptions, specifically for BTNodes , is to me not the most suitable mechanism to notify errors.
Regarding your comment: "you also missed that some BT nodes may be getting input values in the constructors themselves. In that case, a crash of a required field would be a reasonable outcome.". While this can be an exceptional case, and thus an exception to alert might be the right way to do it, the cases that I've seen, the nodes crash within the constructor for no good reason. For instance, say the example of TransformAvailableCondition. This node does not need a Just as an dummy example to visualize the use case:
As said, it might be the case of throwing an exception in the constructor, but to me that's something that I think should be highly discouraged as in many cases, compulsory inputs are a |
There are certain types of failures that shouldn't just be hidden and should probably throw a notice to the behavior tree application that something has clearly gone wrong. If an invalid status / code comes by, that would be a good example of evidence that something is acting up in a way that it shouldn't and we should throw something. I agree that the robot should be able to recover from transient networking outages though -- the ability to have recovery actions or retries for network timeouts seem reasonable. The BT has exception handling here: https://github.com/ros-planning/navigation2/blob/main/nav2_behavior_tree/src/behavior_tree_engine.cpp#L47. To make a statement that BT.CPP is somehow "incorrect" to use exceptions is invalid. This is just C++ code, you can throw as long as you have a mechanism to catch, which we do. Throwing is an important tool in our developer's toolchest and we will not outright remove it. There are failure conditions where throwing is the most reasonable thing so the behavior tree navigator or an application can handle a critical failure.
This is not an exceptional case, we do this often in Nav2. If that's not good formatting for BT.CPP, I'd be happy to field a PR to move all of the get inputs from the constructors to elsewhere so that is no longer an issue for you. I see your PR, but I think your actual issues are:
With those 2 changes, I think that fixes your mentioned concerns. Those are 2 ideas I can get behind as well, but removing exceptions from BT use overall would not be something I would agree with. @naiveHobo how do you feel about this? I think this brings up a good point that a networking failure should probably Here are all of the places exceptions are used in the BT:
The only 2 I have potential objections to are the send goal failure and rejected in bt_action_node, the others look OK to me. |
Any movement here? |
Were the changes that were part of the merge into the foxy-devel branch reverted? I would love to see the changes in galactic as well, since a simple rejection by an action server will cause the entire BT to fail. There is no other clean way to fix this, right? |
This was never merged into any branch, the author of the PR has not been responsive |
Actions for this ticket
@philison I believe these 2 below are good candidates for catching the exception in the node itself and returning FAILURE of the task. I'd merge a PR which implemented that. The other exceptions I highlighted in the comment above seem like real BT-terminating error cases that shouldn't be hidden from the application and should probably be caught and end the tree execution. What do you think? These are the 2 networking related exceptions that could be plausibly found transiently, the others are more systemic issues that need to be addressed by an application developer that probably messed something critical up.
|
Merging shortly |
Bug report
Steps to reproduce issue
Expected behavior
BT::NodeStatus::FAILURE
whenever a problem occurs.bt_navigator
) should not dieActual behavior
bt_navigator
), thus no recovery node is execute but rather the whole BT execution is suddenly haltedros2
daemon needs to restart to see thebt_navigator
node again ( after thebt_navigator
node is re-launched after it crashes ).Feature request
The text was updated successfully, but these errors were encountered: