Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Collision Monitor] Performance issue #3402

Closed
HovorunB opened this issue Feb 9, 2023 · 10 comments
Closed

[Collision Monitor] Performance issue #3402

HovorunB opened this issue Feb 9, 2023 · 10 comments
Assignees

Comments

@HovorunB
Copy link
Contributor

HovorunB commented Feb 9, 2023

Hello! Recently when using the collision monitor I discovered that it adds too much latency to the cmd_vel path. The problem to that was waiting for the transform from data source frame to base frame on every callback. I saw that you are using an Advanced API of lookup_transform which relies not only on getting the transform data_source_frame->base_frame but also at fixed_frame->base_frame. This in our case adds latency (we publish fixed_frame->base_frame at 30Hz which adds at most ~33ms to the latency) Publishing fixed_frame->base_frame faster will reduce the latency but i feel like this is more of a workaround rather than a solution.

I wanted to ask whether this is expected and collision monitor could provide such default latency. Maybe you have some thoughts on how to make it run faster?

@SteveMacenski
Copy link
Member

SteveMacenski commented Feb 9, 2023

Please provide some reference to your claim that relies not only on getting the transform data_source_frame->base_frame but also at fixed_frame->base_frame and what specific lines you're pointing to. TF doesn't have "special" frames that it would use as a global reference frame of interest for some "advanced" operation. The name of frames are simply strings which it doesn't know or care about in particular what is represented under what. So if that's happening, its something we're doing in Nav2, I believe, which you should be able to point out.

This in our case adds latency (we publish fixed_frame->base_frame at 30Hz which adds at most ~33ms to the latency) Publishing fixed_frame->base_frame faster will reduce the latency but i feel like this is more of a workaround rather than a solution.

I know this isn't precisely what you said, but are you sure that the issue here isn't that your static transforms from base_frame -> sensor_frame isn't actually throttling the system to 30 hz? I would agree publishing your global localization system (e.g. map->odom) faster isn't a good solution to a collision monitor specific problem -- but if that's actually what's happening, that means you have that same delay setup for anything transforming frames (e.g. your entire perception pipeline, getting robot poses, etc) that is not unique to just the collision monitor. As such, its really a system decision if you should increase this publication rate, and I'd argue, yes, you should, iregardless of the Collision Monitor due to the impact you describe where we actually require that global localization transformation in updating the costmaps or sensor processing.

But like I said, I'm not totally convinced (from my first comment above) that we are doing as you describe in the collision monitor -- so please provide some context (e.g. lines of code + proof that its doing something in map/odom frame) that @AlexeyMerzlyakov and I can start to use to analyze the situation. We definitely should not be doing what you describe, if we are indeed doing it!

@HovorunB
Copy link
Contributor Author

HovorunB commented Feb 9, 2023

Thank you for the reply!

Please provide some reference to your claim that relies not only on getting the transform data_source_frame->base_frame but also at fixed_frame->base_frame and what specific lines you're pointing to.

I'm pointing to This implimentation
which is called from nav2_collision_monitor/src/pointcloud.cpp and then robot_utils.cpp

The way I understand is that it was made like this in order to publish relevant information about the obstacle:
Imagine robot that has a camera on the front side is driving with high speed towards the obstacle. Then it receives sensor data that the obstacle is 2m away. By the time e.g. pointcloud is published into robot's base frame the obstacle would be already closer than 2m away. Therefore we pass fixed_frame to lookup_transform in order to publish the pointcloud relative to the base frame of robot not at the current time, but at the time the pointcloud was captured.

I know this isn't precisely what you said, but are you sure that the issue here isn't that your static transforms from base_frame -> sensor_frame isn't actually throttling the system to 30 hz?

As a workaround in our fork we omited fixed_frame when calling lookup_transform, and now the output of collision monitor is able to keep up at least till 100hz (before this - 20hz was an issue), so i doubt that the problem lies in base_frame -> sensor_frame transform.
Moreover when we changed transform_tolerance parameter to 0, we were getting extrapolation errors from which we can see that while trying to get transform sensor_frame->base_link it fails to get specifically odom->base_link transform
image

I'm not 100% sure that this is exaclty what happens, but since omiting fixed_frame when calling lookup_transform reduced the latency dramatically I'd blame getting the odom->base_link transform as main source of latency

@SteveMacenski
Copy link
Member

SteveMacenski commented Feb 9, 2023

I see! So its our use of TF. @AlexeyMerzlyakov this seems like a bug for you to handle being primarily your development package. I'm assigning you the ticket.

I see what the intent is with that API use - to get the data from time a to time curr over the transforms to have the best estimation possible. That's actually a recent change #3245. I suppose that is more accurate, but come at this cost. Maybe we make it an option which to use or revert to using the less accurate version? The process function is triggered in the callback of the received message, so we're virtually only delayed in time by the communication latency of a Twist message + any processing previous to getData (which is virtually none except getting data from other sources).

@HovorunB if you check for the time difference between curr_time and the data header times, what are you seeing? I'm guessing measured in hundreds or thousands of nanoseconds. If so, 1000ns over 100m/s traveling robot = 0.1mm. Virtually no effect even at rid. high rates of travel. We can just use the older API without fixed frame and remove this concern - if the time differences are low.

Moreover when we changed transform_tolerance parameter to 0

That's a bad call. I can go into more detail, but the error message you sent is what I would have predicted.

I'd blame getting the odom->base_link transform as main source of latency

Oh, that's not the global localizer. Your odometry only publishes at 30 hz?! Yeah, that's way too slow, regardless of Collision Monitor. That should be closer to 100hz, at least, for reasonable state estimation via RL, Fuse, etc. That's definitely a contribution to this problem you should resolve. Again, that will impact your entire perception pipeline.

@SteveMacenski
Copy link
Member

Thank you for bringing this to our attention and explaining the situation to me in more detail

@AlexeyMerzlyakov
Copy link
Collaborator

AlexeyMerzlyakov commented Feb 10, 2023

Yes, the situation is known since CM development times. This lookupTransform() API is used intentionally, as @HovorunB mentioned to consider time swifts between latest data source received and current time:

The way I understand is that it was made like this in order to publish relevant information about the obstacle:
Imagine robot that has a camera on the front side is driving with high speed towards the obstacle. Then it receives sensor data that the obstacle is 2m away. By the time e.g. pointcloud is published into robot's base frame the obstacle would be already closer than 2m away. Therefore we pass fixed_frame to lookup_transform in order to publish the pointcloud relative to the base frame of robot not at the current time, but at the time the pointcloud was captured.

Typical rates for PCL or laser scan (e.g. TurtleBot3) are ~5Hz, which is 200ms maximum time between sensor data released and current CM operation time. If robot is moving with the speed of 10m/s (36 km/h or 22 miles/h, which I see more than realistic scenario for different robot cases, e.g. delivery systems), the obstacle will approach towards the robot at this time by 2 meters. So, in cases of fast moving robots we need to consider these shifts.

if you check for the time difference between curr_time and the data header times, what are you seeing?

During the checking of the problem, I've already made these measurements on TB3 Gazebo simulation, so may I also answer this question? I've got even distribution of diffefence times over [0..200ms] range for laserscan and PCL, as depicted below (OX - ROS time, OY - time difference between curr_time and latest received sensor time):
time_diff_1
time_diff_2

The results are as expected: while new sensor data is not received, the time shift linearly increased. After it, we are addressing updated data and time diff resets to 0. For both laserscan and PCL which have publication time rates at ~5Hz in the experimentation, the results are the same.

The average time for lookupTransform(time considered) API was compared to simple use-case of lookupTransform(time ignored), and got the following results:

lookupTransform(time considered) lookupTransform(time ignored)
Avg time, us 18991 49
Max time, us 92719 788
Min time, us 36 34

So, the way to use simple lookupTransform() API is reasonable.
But since this is not unacceptable for the cases with fast moving robots, I'd might suggest a new ROS-parameter for Collision Monitor, handling which lookupTransform() API will be used. Developer in this case could choose between two options:

  • Better accuracy in cost of performance (fast moving robots)
  • Lower accuracy but best performance (slow robots)

Any other ideas?

@SteveMacenski
Copy link
Member

Typical rates for PCL or laser scan (e.g. TurtleBot3) are ~5Hz, which is 200ms maximum time between sensor data released and current CM operation time. If robot is moving with the speed of 10m/s (36 km/h or 22 miles/h, which I see more than realistic scenario for different robot cases, e.g. delivery systems), the obstacle will approach towards the robot at this time by 2 meters. So, in cases of fast moving robots we need to consider these shifts.

Most laser provide data at 10-20-40hz or higher, the TB3 is not a representative example. I can understand the issue if we use stale data - even though we process the collision monitor at the controller frequency (20+ hz).

Are those increased delays on the table's left column due to the TF tree having the transform available to the global frame as @HovorunB points out? I'm surprised to see that be the case, if our TB3 model publishes its odometry transform at a usual rate of ~100hz. But, woof, it looks like its 30hz 🤦. If we update that to be more appropriate, does that lower our issue?

A parameterization seems reasonable. Though, I think its worth documenting this phenomenon explicitly to help people understand. I do think that having reasonable odometry rates is the "real" solution, its worth supporting less well configured or hobbyist platforms where we can

@AlexeyMerzlyakov
Copy link
Collaborator

AlexeyMerzlyakov commented Feb 13, 2023

Are those increased delays on the table's left column due to the TF tree having the transform available to the global frame as @HovorunB points out?

Yes, lookupTransform(time considered) API calls canTransform() that waits until valid source_id->fixed_id->target_id transforms to be available. This is what causes why we need to wait for the latest odom message to came later than current time (in order to have no extrapolation into the future that, if I correctly understood TF2's code, may cause the exceptions and well-known error messages [1], [2]).

If we update that to be more appropriate, does that lower our issue?

Yes. The lookupTransform() time is directly proportional to the time between subsequent odom publishes:
odom_rate_check

The slope is near 0.5 which is 1/2 from maximum (odom) time we have to wait, which is turned out to be quite expected.

A parameterization seems reasonable. Though, I think its worth documenting this phenomenon explicitly to help people understand. I do think that having reasonable odometry rates is the "real" solution, its worth supporting less well configured or hobbyist platforms where we can

Well, I am ready to prepare the patch with new parameter for Collision Monitor and documentation update as well.

@HovorunB
Copy link
Contributor Author

Thanks for the comments about odom frequency! We are going to investigate and try publishing it faster 🚀

@HovorunB if you check for the time difference between curr_time and the data header times, what are you seeing?

I'm getting up to 200ms, but our robot moves with the speed at most 0.7 m/s, so we can expect an error up to 14cm which for our case of using the FootprintApproach model is totally fine. So new parameter whether to use advanced or simple lookup transform API sounds great! Looking forward.

@SteveMacenski
Copy link
Member

Sounds like a solution!

@SteveMacenski
Copy link
Member

Terminally to be merged, thanks for bringing up the issue @HovorunB !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants