Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new time sync mode SyncFrame #1698

Merged
merged 11 commits into from
May 27, 2022

Conversation

rhaschke
Copy link
Contributor

... to sync TF lookups within the current rendering frame.
Fixes #1693. @AndreasR30, please review.

@dgossow, You implemented the experimental TimePanel functionality 9 years ago. Unfortunately, there are no PR or code comments for the changeset 8c4cccf and I had a hard time guessing the intended functionality.
I understood that SyncExact and SyncApprox are intended to sync against a time signal provided by a specific source Display. For example, all displays deriving from PointCloudCommon call emitTimeSignal() and thus allow this functionality. However, I didn't get the intention of the SyncApprox mode. TF lookups perform an approximate lookup anyway, aren't they? So what's the role of SyncApprox?
As this mode essentially doesn't work (throwing lookup errors on the console), I'm tempted to remove it completely.

screen-recording.mp4

@AndreasR30
Copy link
Contributor

AndreasR30 commented Jan 25, 2022

Hello @rhaschke,
Thank you for this pull request. Unfortunately I don't think that it works as intended it real world scenarios (see comment above).

I was thinking about to use the cache_ in getTransform to solve this problem. However, this would not handle the case correctly:

There is a TF tree odom -> base_link -> chassis, where base_link and chassis are fixed. Then the cache_ would make a snapshot of odom -> base_link. But when odom -> chassis is requested then there is a cache_ miss and a TF with a different stamp could be obtained in lookupTransform().

I can't currently think of a way to solve this here in TimePanel/FrameManager. What about using #1692 and make it only an option in the Global Options?

@rhaschke
Copy link
Contributor Author

Unfortunately I don't think that it works as intended it real world scenarios

Could you please provide python code to mimic your real-world scenario?

What about using #1692 and make it only an option in the Global Options?

I would prefer to find a solution that processes TFs in a separate thread.

@AndreasR30

This comment has been minimized.

@AndreasR30

This comment has been minimized.

@rhaschke rhaschke force-pushed the fix-time-sync branch 2 times, most recently from fcf7113 to ac6ad2b Compare January 26, 2022 14:30
@rhaschke
Copy link
Contributor Author

Thanks for providing the example code. I think I found a good solution now using an adaptable sync_delta_. Of course, during the first few iterations, there might be failing frame lookups, but sync_delta_ should adapt rather quickly.

@AndreasR30
Copy link
Contributor

AndreasR30 commented Jan 26, 2022

@rhaschke

I had not time to test your latest commits yet. But I can tell you where I use SyncApprox in order to understand it better.

SyncApprox can be useful to delay the Transforms with ros::Time(). The latency is obtained from the message delay of the selected display. I sometimes use it in the following scenario:

I have a PointCloud2 with 10Hz and a CNN which takes quite some time to process the cloud. In order to visualize them in a synchronized way in Rviz, I have a node which subscribes to both and republishes both (on a different topic) as soon the detection arrived. This works well. However, things using ros::Time() like RobotDisplay or Grid, are not at the right position in Rviz w.r.t. the point cloud due to its large latency:

SyncApprox helps to delay the TFs with stamp ros::Time() similar to the messages of the selected display:

pc-2022-01-26_18.41.33.mp4

Note how the car jumps to the center of the lidar rings when enabled. SyncExact does not help here, because the TFs are published in a much higher frequency than the message and they are usually not in sync with the selected message.

@rhaschke
Copy link
Contributor Author

rhaschke commented Jan 26, 2022

I'm not sure, I understood your scenario. Let me try to summarize with my own words:

  • There is a FAST PC2 publisher (10Hz)
  • There is a CNN processor, operating on FAST PC2, and republishing it as SLOW PC2 together with TF frames + markers of recognized objects. I guess, this republisher (re)uses the original timestamp of the received input (FAST PC2) message?
    However, because processing is slow this timestamp is - say a second - in the "past" w.r.t. to ros::Time::now().
  • rviz publishes both FAST PC2 and SLOW PC2? And it syncs against SLOW PC2?

Is there some documentation on the syncing behavior that I missed?
It would be great, if you could provide a rosbag of your example (as well as the corresponding rviz config).

current_delta_ should denote the currently measured time offset
sync_delta_ should denote the offset used to compute sync_time_
@rhaschke rhaschke force-pushed the fix-time-sync branch 2 times, most recently from 6fe9120 to e03a00b Compare January 27, 2022 08:37
@rhaschke
Copy link
Contributor Author

@AndreasR30, did you have time to test this PR? Please also comment on #1698 (comment). Thanks.

@AndreasR30
Copy link
Contributor

There is a FAST PC2 publisher (10Hz)

Yes

There is a CNN processor, operating on FAST PC2...

Yes

... and republishing it as SLOW PC2 together with TF frames + markers of recognized objects.

No, the synchronizer is a separate node, which subscribes to original PC2 from the sensor and to Markers of CNN node. No TFs are republished! It republishes both, Markers and PC2 on a different topic when both are available (e.g. prefixed with /synced/*). Now the Markers and the PC2 do match each other in the visualization, when I use those /synced/* topics in Rviz. With original topics, the Markers were always behind the PC2 because of CNN's latency of ~200ms.

Now the only problem left, is when the Fixed Frame is set to odom: The PC2 and the Marker from synced topic lag behind the robot because they are world fixed and they are quite old. One can see this, when the RobotModel is no more in the center of the lidar rings. Setting to ApproxSync helps here because it uses for the robot model no more the most recent stamp (by means of ros::Time()) but a stamp, which lags behind ros::Time::now() by current_delta_, which is obtained from the PC2 latency (in our case ~200ms) of the corresponding PC2Display.

I guess, this republisher (re)uses the original timestamp of the received input (FAST PC2) message?
However, because processing is slow this timestamp is - say a second - in the "past" w.r.t. to ros::Time::now().

Yes, but republisher is a separate node, the sync node.

Is there some documentation on the syncing behavior that I missed?

No I found this feature to be useful by coincidence.

It would be great, if you could provide a rosbag of your example (as well as the corresponding rviz config).

I will try to make one or a synthetic example in Python. But it could could take some days, because I'm busy.

@AndreasR30
Copy link
Contributor

Here is an synthetic example showing when SyncApprox can be useful. It simulates a PC2 with latency of 200ms. Furthermore it simulates the RobotModelDisplay with a Marker. With synchronization off, the robot won't be in the center of the rings. With synchronization approx, the robot will be in the center of the lidar rings.

Caution: There is a bug! For sync approx the fixed_frame_id_ is needed, which is only set in setFixedFrame() callback. So you have to change Fixed Frame to something else and go back to map in order to make SyncApprox work!

sync_approx_test.zip

@AndreasR30
Copy link
Contributor

AndreasR30 commented Jan 30, 2022

Your latest commits seem to work good with the Python script from my previous comment. Aproximate Synchronization works well and the bug mentioned before seems to be eliminated.

However, with the first Python script posted in this PR (unfortunately the "Copy Link" feature does not work so I can't link it) I still see the race conditions (even after few seconds) with Synchronization set to Frame. I cherry-picked your commits on the noetic-devel branch as I only have noetic available now. But I don't think that this causes any difference.

@rhaschke
Copy link
Contributor Author

rhaschke commented Feb 3, 2022

Thanks for providing the script (#1698 (comment)). I only found time to shortly look into this today. I noticed the following problems:

  1. Your script doesn't work with the released version of rviz, which expects perfectly matching timestamps between PC and TF.
    My proposed version relaxes this constraint and makes your script work in SyncApprox mode, which is good.
  2. Your script doesn't really need SyncApprox: Syncing with Exact mode perfectly displays everything relative to the (past) timestamp of the PC.

Thus, I still don't see a need for the SyncApprox mode.

@rhaschke
Copy link
Contributor Author

rhaschke commented Feb 3, 2022

There is a bug! For sync approx the fixed_frame_id_ is needed, which is only set in setFixedFrame() callback.
So you have to change Fixed Frame to something else and go back to map in order to make SyncApprox work!

I cannot confirm this statement. The fixed_frame_id_ is correctly configured at startup.
However, I noticed that rviz doesn't handle latched marker messages correctly (not displaying any marker). Is that the actual problem you observed?

@rhaschke
Copy link
Contributor Author

rhaschke commented Feb 3, 2022

There is a bug! For sync approx the fixed_frame_id_ is needed, which is only set in setFixedFrame() callback.

You are right: While fixed_frame_id_ is looked up during initialization, it actually yields an invalid value of zero at that time because the TF buffer didn't process any messages yet. Fixing this issue (or using your workaround) actually resolves the error messages I observed so far in SyncApprox mode. My new code doesn't use this variable anymore and thus doesn't exhibit the issue as well.

@AndreasR30
Copy link
Contributor

For me the PR looks good. The earlier synthetic example of this PR works (unfortunately I can't link it). However, it takes up to 10 seconds until it works properly. As long there is a TF with a larger delay incoming compared to sync_delta_ the glitches are still there and look even worse than with synchronization off. Not sure whats the reason for it. So this mode is not really the same as using just one thread for TF buffer as proposed in #1692. But I think it is ok.

rhaschke added 3 commits May 27, 2022 18:16
In SyncFrame mode, we use ros::Time::now() for syncing, which is typically
more recent than any TF stamp available, causing extrapolationn exceptions.
To avoid that, we date back sync_time_ by a small amount sync_delta_, that
is determined from observed lookup time differences.
There is no need to explicitly call _getLatestCommonTime().
First, this only succeeds if both frames have exactly the same time stamp.
Second, lookupTransform() will throw an extrapolation exception anyway,
if the lookup fails because the requested timestamp is too new.
rhaschke added a commit to rhaschke/rviz that referenced this pull request May 27, 2022
- New mode SyncFrame, syncing TF lookups within a visualization frame
- Remove experimental status of time syncing
- Simplify and document code
rhaschke added 3 commits May 27, 2022 18:27
... to keep existing config files valid
- New mode SyncFrame, syncing TF lookups within a visualization frame
- Remove experimental status of time syncing
- Simplify and document code
@rhaschke rhaschke merged commit 02159c7 into ros-visualization:noetic-devel May 27, 2022
@rhaschke rhaschke deleted the fix-time-sync branch May 27, 2022 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants