Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore combining the distance and time replanning BT nodes in 1 tree #1701

Closed
SteveMacenski opened this issue May 7, 2020 · 41 comments
Closed

Comments

@SteveMacenski
Copy link
Member

@naiveHobo did a great job making a distance based replanning decorator node, analog to the constant replanning rate.

In case the robot ends up in a bad situation that it can't move and needs to replan, perhaps we should consider a way to use both time and space decorators to influence the replanning.

A good question is whether its distance traveled or time expired, or distance traveled and time expired.

At high speeds doing "or", we could be replanning quite frequently, however if its "both" then we'd be stuck with the failure modes of both of them.

Alternatively, we could keep them as separate options for users. This ticket is to explore the possibility of combining them into a single workflow.

A potential idea is to back off on the time-based replanning to be every 5 or 10 seconds to account for those issues and do primarily the distance based replanning. I would like to hear other peoples' opinions on this and see what we can come up with.

@naiveHobo
Copy link
Contributor

Adding a scale with speed option is one way to go. The user can define the minimum and maximum values for time and distance for replanning, and maybe something like a scale factor. The counter in the RateController`` and DistanceController``` are scaled according to the current speed.

The time period after which the RateController returns success will increase if the current speed is low (but maxed out after a point to ensure replanning when robot is unable to move). This makes sense as we can assume that quick replanning is not needed when we're traveling at lower speeds.

The distance after which the DistanceController returns success will increase at higher speeds, which means that it sort of maintains its rate.

Scaling the threshold according to speed in either the RateController or the DistanceController effectively means that you're adding a sense of distance traveled to the TimeController and the time expired in DistanceController. We can still have both the decorators since they put limits on the threshold used to trigger the decorator.

@shrijitsingh99
Copy link
Contributor

Adding a scale with speed option is one way to go. The user can define the minimum and maximum values for time and distance for replanning, and maybe something like a scale factor. The counter in the RateController and ``DistanceController``` are scaled according to the current speed.

The DistanceController feels like an InverseRateController, which decreases planning rate as speed increases.

The distance after which the DistanceController returns success will increase at higher speeds, which means that it sort of maintains its rate.

Why not just have a SpeedController instead?
Do you want to decrease the planning rate as the speed increases? At higher speeds, you need to reach quicker so wouldn't a higher planning rate be more suitable?

Having something like a max time since lat plan which triggers replanning irrespective of distance traveled seems like a good idea.

So if max time is 5s, and robot gets stuck a new plan will be generated after 5s. This will ensure distance based replanning doesn't get the robot stuck.

@SteveMacenski
Copy link
Member Author

SteveMacenski commented May 8, 2020

I was also thinking that an option would be to have a maximum rate set, so that if it replanned in the last N milliseconds, if triggered again, pass. That way we can replan at 1 hz (10 hz, 0.1 hz) OR replan on traveling 1 meter (10 cm, 10 meters) but if both happen in quick succession, we skip.

This is starting to get a little convoluted. I'm wondering the wiseness of that. One of the goals was that if using the distance controller that if you got stuck (e.g. you're not moving to go anywhere) that there was still something that would try to replan. The speed controller would fall into the same trap unless it was [min rate, max rate) such that min rate is > 0 for speed 0.0m/s. The BT node would have to be given the robot's maximum speed and these min/max rates to linearly map. Perhaps a recovery could be made to replan on a controller failure might solve that as well. The speed controller is growing on me. If nothing else, we implement it and we add it to our war chest of plugins to chose from down the line.

Given a speed, distance, and time decorators; what do you guys think we should do with them (or others we should consider)? A couple of proposals would be great. Maybe the right answer isn't to combine them, or so, how would we do that? Where else can we use these controllers?

@naiveHobo
Copy link
Contributor

To focus on the initial details first, to combine the distance / time based nodes in a single BT, we will have to implement these nodes as conditions instead of decorators. A decorator inherently has a single child and combining this logic using decorators would need some unnatural BT designing.

Having these nodes as condition nodes also ensures that we can combine them with other higher level conditions (like availability of new goal).

@naiveHobo
Copy link
Contributor

Over to some implementation details:

The speed controller is growing on me. If nothing else, we implement it and we add it to our war chest of plugins to chose from down the line.

I feel that the only scenario where the SpeedController can help is when the speed is 0.0 m/s. When the robot is moving, replanning based on distance / time is more intuitive. If not-moving is the only scenario of use for the SpeedController, it essentially boils down to sort of a progress checking decorator.

@SteveMacenski
Copy link
Member Author

we will have to implement these nodes as conditions instead of decorators

Agreed.

I feel that the only scenario where the SpeedController can help is when the speed is 0.0 m/s. When the robot is moving, replanning based on distance / time is more intuitive. If not-moving is the only scenario of use for the SpeedController, it essentially boils down to sort of a progress checking decorator.

I think that's actually where I wanted that to live anyhow to make it more general, but in a way to cancel navigation tasks that have clearly failed, not to trigger a replan. I can see arguments for it being in the controller server too, and I really don't care that much. They both work and are both clean.

The speed controller could also be useful for other things like throttling cloud publishing of telemetry based on speed to save bandwidth when the robot really isn't moving or adjust costmap update rates. Even if we don't use it here, I think its a valuable asset.

I think its logical after thinking about it for a few minutes, but I agree it doesn't roll of the brain as my first choice. I'm entertaining the option and its pretty novel relative to what's been used in the past. It embeds both the update rate with some minimum bounded rate mathematically and will replan faster the more distance is covered. It seems like a really good single-shot balance. Even if not default, I think its worth implementing that tree and testing it out and seeing how we like it.

@naiveHobo
Copy link
Contributor

I ran some tests with a SpeedController decorator. It feels like the natural solution to the problem proposed in this thread. The SpeedController node I have implemented is essentially a modified RateController which scales the rate according to the current speed.

  • By setting limits on the rate, it ensures the replanning happens even if the robot is not moving, offering a solution for the drawback of the DistanceController.
  • When the robot is moving at high speeds, the rate is increased accordingly and replanning is done more frequently, thus retaining the benefit of using the DistanceController.
  • When you're at a constant speed, the SpeedController boils down to a RateController.

@naiveHobo
Copy link
Contributor

naiveHobo commented May 9, 2020

Here's a rate-speed plot for reference:

Figure_0

The robot is stuck towards the end so the rate of replanning in that case is held at 0.1hz.
This is the ideal behavior that we would expect from a combination of distance / time based conditions, right?

@shrijitsingh99
Copy link
Contributor

Looks great @naiveHobo

The speed controller could also be useful for other things like throttling cloud publishing of telemetry based on speed to save bandwidth when the robot really isn't moving or adjust costmap update rates. Even if we don't use it here, I think its a valuable asset
IMO too this will have a lot of use cases, since a lot of factors can change based on speed as mentioned.

I think we can maybe open a separate discussion on where all it can be beneficial to apply SpeedController.
.

@SteveMacenski
Copy link
Member Author

Agreed, please open a new ticket and summarize this discussion and then add your implementation details.

(I'd also just implement a speed decorator and condition while you're at it because it seems we like to have both analogs around)

Even if we don't use this in the default BT, we'll definitely make a BT with it and merge the code for people to use. Feel free in the mean time to prepare the usual (update READMEs, navigation website, adding a diagram, etc)

@SteveMacenski
Copy link
Member Author

On the topic of Distance + time: We should discuss in this ticket not the PR please the mechanics of the 2 updates and resetting.

@SteveMacenski
Copy link
Member Author

@gimait @naiveHobo can we continue the discussion about how to include time, distance, and new goal in the default BT?

@gimait
Copy link
Contributor

gimait commented May 20, 2020

Following the discussion in #1705, my idea of following the concepts of BehaviorTree.cpp go through skipping the halt function in condition nodes. In general, condition nodes can be initialized when first ticked and idle.

I can of course see the advantage of using the halt function to reset them in cases like onGoalUpdated for the planner: on halt could make it possible to track changes in the goal while other tree branches (different from the planning behavior) are executed. However, in my opinion this is more confusing: you have to consider the state of the system since the last halt, which might have happened (extreme case) hours ago. This is specially problematic for other behaviors such as the recoveries.

In my opinion, there is more advantages to keeping the current halt behavior for condition nodes. It is possible to initialize the condition nodes when idle, isolating the behaviour to the subtree they are part of. I believe this is a more isolated and flexible way of looking at this issue, and potentially what was intended in the design of bt.cpp.

Of course, this comes with difficulties for the planner: how can we trigger the planning when entering the planner subtree (in the first plan or after recovery)? I don't know the best way of doing that, but I bet there is a way of implementing it with the bt structure.

Depending on what solution we end up with, we might also need a new control node that doesn't halt the child nodes when all return failure.


A separate issue that I saw in pr #1705 and that I'm guessing it belong here is: how far would it be a normal distance to trigger the replanning? If we expect to do so in more than 3-4m, maybe the euclidean distance is not the best method to measure the travelled distance (the robot can go across many corners without exiting a 3m radius). If the euclidean distance is an okay measurement, we should probably document somewhere how the travelled distance is measured (I don't know if it's written somewhere already?).

@SteveMacenski
Copy link
Member Author

I'm not sure I understand your proposal, can you make a bulleted concise proposal?

I think where we left things in my thought process was:

  • Add a virtual on_halt() method in the node wrapper
  • In this on_halt function for these conditions, we do a reset of the variables for state (distance since last, time since last)
  • The new goal condition should be first at all times

I think the standing problem is:

  • When we exit because they all fail, the halt function is called either way

I think a possible solution to that is to have the halt() method pass the on_halt() the status code of its return type (failure or succes). Based on that, we reset or leave the variables alone. I think that would then let us only reset when someone actually returns "yes, replan" and not when all fail and halt is called.

Thoughts?

@naiveHobo
Copy link
Contributor

naiveHobo commented May 20, 2020

If I'm understanding this correctly, this will involve the addition of two new node wrappers.

  • A wrapper for the condition nodes where we add the on_halt() method to perform the state reset
  • A wrapper for a control node which will find out if all the children failed or if a child returned success, and then pass this information to the children condition nodes so they can perform their state resets accordingly.

I think there's a couple of problems here:

  • Such a wrapper for condition nodes will have to override the halt() method in order to pass the status code to the on_halt() method. If that's the case, then why not just let the halt() method be virtual and implement the reset logic in there.
  • The condition nodes are supposed to reset their state based on the return type of the other sister condition nodes. And this information is only available to the parent node which means that the parent node will have to send the return type from its children (one succeeded or all failed) to all of its children presumably through the child->halt() call.

I believe a new control node is definitely needed, with some functionalities of Fallback but some additional things as well. The ideal behavior should be:

  • If a child returns failure, tick the next child (like fallback)
  • If a child returns success, halt all children and return success (like fallback)
  • If all children return failure, don't halt any child and return failure (unlike fallback)

The open question is, where do we implement the reset logic. Possible solutions:

  • What @gimait is suggesting, make the condition nodes reset their state in tick() by keeping a check for IDLE state.
  • Add a wrapper for condition nodes where overriding the halt() method is possible. Make all of our specific conditions implement this wrapper and and perform the reset in the overridden halt() method.

EDIT: After a bit more thought, maybe adding the on_halt() method is a good idea. The wrapper for condition nodes will then look like:

halt()
{
  on_halt();
  setStatus(IDLE);
}

And the reset is performed in on_halt().

@SteveMacenski
Copy link
Member Author

SteveMacenski commented May 20, 2020

Is there a way to know the status of the parent node (control flow) to these condition nodes when it calls halt on the children? Right now, I don't think the halt() in any of the condition nodes is actually used, so we have a blank slate. I think its only used in the BT service and BT action node wrappers, that would need an on_halt(), but right now, we're only looking at condition nodes we need to reset state on. I would argue though adding the on_halt() is still something we should add to the BT action / service nodes in case those also have state to reset for new BT action nodes.

I suppose we won't need a halt wrapper for the conditions because there's on override of the conditions halt yet. We can just implement halt ourselves. We just need to find a way to know the state of the parent node calling halt. Maybe changing the halt API to include the parents' status is something @facontidavide would consider? (or providing both) E.g. void halt(const Code & parent_status)

I really don't like the idea of adding new control flow nodes, I don't even like the ones we have already. But that is also an option. The simpler option is to change the halt() API to give the status code of the parent and I think then we're done? We just override halt in these condition nodes to look at that code and reset or not depending if FAIL or SUCCESS.

Again, I think we should still add the on_halt() for the BT service/action nodes for completion's sake, even if currently unused. That could be part of a PR that we can just merge right now if someone would submit one (e.g. add virtual on_halt() in Bt action/service nodes and have it called first in their halt overrides).

@facontidavide
Copy link
Contributor

I haven't read the entire thread, sorry, so I am just answering the line that contains my name 😛

Even if this opens the door in my opinion to nasty anti-patterns, I may provide a const reference/pointer to the parent node. Still, this is bad small to me.

I am trying to understand the problem, but the discussion is spread among multiple threads and it is very hard for me to catch up and give a suggestion about the best pattern.

Do you want to reach out and talk with me in a telco?

@facontidavide
Copy link
Contributor

There is a LOT I am missing in your discussion, but there is something that I need to reming you my friends,

Please look at this code:

https://github.com/BehaviorTree/BehaviorTree.CPP/blob/master/src/control_node.cpp#L44-L60

Please note that:

  • A node is set to IDLE by its parent. It doesn't need to set itself to IDLE in halt().
  • halt() is called only on RUNNING child, not children that returned SUCCESS or FAILURE.

So, when I read what @naiveHobo wrote, I am confused:

If all children return failure, don't halt any child and return failure (unlike fallback)

The halt() callback is not actually executed (in theory), because no children should be in RUNNING node, right?

@SteveMacenski
Copy link
Member Author

SteveMacenski commented May 20, 2020

Indeed, only line that you do need. Either of those would work: state or const ref, we just need to know if its returning with success or failure. The tl;dr of what we're trying to do is get a sense of "why" its being halted, because in certain contexts, we may want to mess with the state of co-children nodes based on if any of them succeeded. Our example is having a BT condition with 3 children: IsNewGoalAvailable, IsReplanningDurationMet, and IsTravelledEnoughToReplan. For these, if any return true, we want to reset the time and distance metric for those Bt nodes such that we don't have 2 replans super close to each other due to one having been met at t = 0.45 and the other at t = 0.46.

@SteveMacenski
Copy link
Member Author

SteveMacenski commented May 20, 2020

halt() is called only on RUNNING child, not children that returned SUCCESS or FAILURE.

Ugh, back to the drawing board fellas. But condition nodes only ever return success or failure my impression was. I suppose we could have them in the running state between triggers such that halt() is called on them when one succeeds. Because this is being ticked constantly, it would be defacto always in the running state except the cycle 1 is triggered. Basically we just need to make sure they're never idle.

@SteveMacenski
Copy link
Member Author

SteveMacenski commented May 20, 2020

I'm pretty sure though that we removed any setting of IDLE manually like you suggest. If we've missed some, its by accident. We're not setting the status in halt() we're needing to reset some internal variables (a timer and a distance)

@facontidavide
Copy link
Contributor

facontidavide commented May 20, 2020

To be pedantic.

Condition is by definition (contract/semantic, how you want to call it) something that can NOT return RUNNING.
Something that may return RUNNING is an Asyncronous Action.

halt() is not cleanup, it is halting a RUNNING Node.

BUT if you want multiple children in RUNNING state at the same time, we go into the dark territory of the ParallelNode. Othe control Nodes can not handle more than a single RUNNING child.

If you want a callback different than halt() ( something like resetNode( parent_status )?) to be called when the parent has done... we may talk about it ;)

@SteveMacenski
Copy link
Member Author

If you want a different callback to be called when the parent has done... we may talk about it ;)

Are you proposing that children can get a callback triggered when its parent completes? That could also do it.

I was operating on false intel. I was under the impression that halt was called on all children nodes of a parent when it ended, regardless of state. I still maintain that halt(const Code & code) or halt(const Node * parent_node) are good ideas, even if they're not useful in our current context.

I suppose we could also just make a custom control node that calls halt on everyone, but when you talk about anti-patterns, needing custom control flow nodes seems to scream that to me.

@gimait
Copy link
Contributor

gimait commented May 20, 2020

Thank you for the clarification @facontidavide. I think through the discussion we forgot about how this works.

It's right that haltChildren only calls halt on running nodes, but set's to IDLE all children nodes, no matter their status.

Following this, I understand why I thought in the first place that it was a good idea to do the initialization of the condition nodes that we are working on IDLE (and why it works fine with the onGoalUpdated node). Plus thanks to this behavior, we can just use the tree structure @facontidavide suggested us couple of weeks ago, without need for any extra nodes.

I don't think we need any halt of any type or extra control nodes. We also don't need to know the state of the parent nodes. Just initialize the condition nodes in idle.
And find a way of triggering the planning when its parent control node is ticked first.

(EDIT: just to avoid confusion with my previous comment, I had in my head that the halt was called on these nodes after talking with @naiveHobo on #1705, and that is why in my previous comment I said we needed an extra control node. With the responses from @facontidavide I got my head back in track and I rectify that: we don't need more control nodes)

@naiveHobo
Copy link
Contributor

Ah, sorry for the confusion.

@facontidavide We are currently resetting the state variables in tick() by checking if the current state of the node is IDLE. This should work in general but I think a virtual resetNode() method is a good addition. This will save the nodes from performing an unnecessary comparison at every tick.

@gimait
Copy link
Contributor

gimait commented May 20, 2020

Wait, another rectification, we do actually need a new control node for this to work.
If we initialize on idle, and the halt children sets all children to idle, we would initialize every time we go through all the conditions even though they failed with the current fallbacks. That was right.

@SteveMacenski
Copy link
Member Author

I disagree with an outright resetNode() API.

If we initialize on idle, and the halt children sets all children to idle, we would initialize every time we go through all the conditions even though they failed with the current fallbacks. That was right.

Glad you came to that yourself ;-) its the exact same problem we were in before with halt, we don't know the state of the parent or other nodes. In fact, this clarification has made things dramatically more complicated. I think the only way forward is with a custom control flow node now that calls halt on all children when any succeed + needing the halt(...) API udate.

@naiveHobo
Copy link
Contributor

I think the only way forward is with a custom control flow node now that calls halt on all children when any succeed

Agreed.

needing the halt(...) API udate.

I don't understand how the halt(...) API would help. ConditionNode doesn't allow the halt() method to be overridden.

I disagree with an outright resetNode() API.

Can you explain why? From what I can understand, resetNode() will allow users to just define what to reset. When to reset can be taken care of by the parent node similar to how the the parent takes care of halting its children. The only thing we'd have to figure out is where to call resetNode().

@SteveMacenski
Copy link
Member Author

SteveMacenski commented May 21, 2020

@facontidavide override final for condition node you're kill'n me man 😉

Can you explain why?

Huge gateway for anti-patterns. We're skirting the edge of what should be allowable, that just encourages bad behavior with maintaining and tracking state from the ticking. You're asking people to ignore the statuses and externally reset.

@naiveHobo
Copy link
Contributor

I think the only way forward is with a custom control flow node now that calls halt on all children when any succeed

If we're going for a custom control flow node, I think it's enough to solve our problem. This control flow node has to set all children to IDLE when any child return success. If all children return failure, then no child is set to IDLE. The state reset can happen in tick() when the node is IDLE like here:

https://github.com/ros-planning/navigation2/blob/6b79b34a08e4ef237d2c21fe6a3e4234b1c88334/nav2_behavior_tree/plugins/condition/goal_updated_condition.cpp#L40

@gimait
Copy link
Contributor

gimait commented May 21, 2020

If we're going for a custom control flow node, I think it's enough to solve our problem

I'm with you there. With that lone thing and doing the initialization with the idle state we are good.

This should work in general but I think a virtual resetNode() method is a good addition.

I'm not sure what is the advantage of this one though, when and how should reset be called?

needing the halt(...) API udate.

I still don't know why we need the halt thing.

@SteveMacenski
Copy link
Member Author

Keep in mind the control node will call halt on children running on exit: https://github.com/BehaviorTree/BehaviorTree.CPP/blob/0f51631dadd7186a415f6c86f01066fadbabdb7c/src/control_node.cpp#L33 so then would we need to override halt to not do this and then make a new haltAllChildren to halt everything regardless of status? I suppose maybe if someone could do a brief workflow of how this will work, that would help.

If we start with idle and tick some and all fail, the control node will exit fail and then all children will be set back to idle, so that's not helpful for retaining state. Are you thinking about putting an if/else on the control exit code as to whether call halt/idle on them? But there's no running state for condition nodes, so really those condition nodes are always idle, success, or failure. So I'm not following the logic.

@SteveMacenski
Copy link
Member Author

@shrijitsingh99 @naiveHobo @gimait given that we now have all of this in, should we look at this, or does the speed controller completely supersede this and we can close the ticket?

@SteveMacenski
Copy link
Member Author

Nudge nudge 😉

@gimait
Copy link
Contributor

gimait commented Jun 30, 2020

😉 I think you can close it, if the speed controller is enough. I don't think we managed to find a solution to make the bt react as we wanted. At least I think I couldn't (when I tried I ended up needing custom and very specific control nodes for this to make it possible, I don't think I found a good solution at the end).
I don't know if @naiveHobo or @shrijitsingh99 were working in any fixes for it?

@SteveMacenski
Copy link
Member Author

OK, we should consider what the new default BT should be though. @naiveHobo especially since you've been working on this area, should we change the tree before we release to foxy? Now's the best time

@naiveHobo
Copy link
Contributor

Ah, yes I stopped looking into ways to combine the rate and distance controller. As @gimait said, most solutions were very specific custom nodes.

I think the speed controller can be a direct replacement for the rate controller. It has the minimum abilities of the rate controller in that it can tick at a constant rate by keeping the minimum-maximum rate equal. And by incorporating the current smoothed speed, it internally has a sense of the distance traveled such that we can set the minimum-maximum speed parameters to ensure that it only ticks after a certain distance is traveled. Anything in between is a natural combination of both the rate and distance controllers.

@SteveMacenski
Copy link
Member Author

OK, should we create a new BT or update the existing BT to use the speed controller for the default tree?

@shrijitsingh99
Copy link
Contributor

I have been out of this discussion for a long time, but I do believe SpeedController is enough as mentioned by everyone.
I think we should add this to the default tree to encourage people to use it, else it might go unnoticed.

@SteveMacenski
Copy link
Member Author

Exactly. Can you open another ticket to discuss a new default behavior tree? We should discuss what we want from it now that we have all these new behavior tree nodes to work with that @gimait @naiveHobo worked on building - lets look at it with fresh eyes

@shrijitsingh99
Copy link
Contributor

Done #1842
Let's continue the discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants