Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Physics CameraSensor lockstep #2746

Merged
merged 17 commits into from
Jul 10, 2020
Merged

Physics CameraSensor lockstep #2746

merged 17 commits into from
Jul 10, 2020

Conversation

mabelzhang
Copy link
Collaborator

@mabelzhang mabelzhang commented Jun 2, 2020

Part of issue #2736. I will keep all the lockstep features in the respect_fps_gz11new branch (instead of merging anything to gazebo11 or gazebo9) until everything lockstep-related is completed, because these features will require changes that are ABI-incompatible, which we will deal with after the feature is working.

This PR is for lockstep on CameraSensor class. Most of the mechanics are from respect_fps branch pulled from the old BitBucket PR https://bitbucket.org/osrf/gazebo/pull-requests/2502/make-sure-cameras-fps-is-strictly-applied
When ready lines in gazebo11 branch (added in this BitBucket PR to help ABI-compatibility https://bitbucket.org/osrf/gazebo/pull-requests/3180/use-direct-api-call-to-scene-setposemsg-to/diff ) are uncommented to set pose directly.
@scpeters: Are the When ready lines backward compatible?

Otherwise, backward compatibility is mostly handled later when I extended to other sensors (#2761 ). The only thing here is that the new test is separated out from the CheckThrottle test into CheckThrottleStrictRate.

The added _strictRate parameter to ServerFixture for tests might be temporary once we extend to all sensors. We either need this parameter, or have lockstep tests load a SDF with the custom <strict_rate:value> tag, instead of following the current tests' pattern to spawn a world. I think loading a SDF might be cleaner, since the <strict_rate:value> tag is a custom tag and we might not want to have that in ServerFixture.

To enable, pass in --lockstep on the command line.

To test:
Quantitative test is in StrictUpdateRate.

Qualitative test:
The new SDF contains two cameras, one high frame rate high resolution, one low frame rate low resolution. A pendulum is added to visually verify that physics is slowed down (added after the screenshots).

500 Hz 1280x720 image drops real time factor to 0.3:
Screenshot from 2020-05-14 20-26-48_500fps

30 Hz 320x240 image real time factor is at 1.0:
Screenshot from 2020-05-14 20-55-38_30fps

The RTF and the pendulum slowing down show that physics is being slowed down to get the 500 fps.

… Add test world file. New test is passing.

Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
Copy link
Contributor

@iche033 iche033 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, overall the strict rate param works for me. I tried changing the world's real time update rate and the camera sensor rate is respected.

On thing I noticed is that these change may have some impact on the RTF. I tried spawning a PR2 model (with no strict rate set) and I get ~0.86 RTF in this branch. But without these changes, I get ~0.95 RTF so close to ~0.1 difference. Maybe we should look at where the slow down occurs later before merging back to the gazebo11 branch.

test/integration/camera_sensor.cc Outdated Show resolved Hide resolved
test/integration/camera_sensor.cc Outdated Show resolved Hide resolved
gazebo/test/ServerFixture.hh Outdated Show resolved Hide resolved
}

/////////////////////////////////////////////////
TEST_F(CameraSensor, CheckThrottleStrictRate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting test failures 50% of the time. Not sure if increasing the tolerance is a good idea or not.

[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from CameraSensor
[ RUN      ] CameraSensor.CheckThrottleStrictRate
[Msg] Waiting for master.
[Msg] Connected to gazebo master @ http://127.0.0.1:11345
[Msg] Publicized address: 192.168.86.31
[Dbg] [ServerFixture.cc:203] ServerFixture load in 1 seconds, timeout after 600 seconds
[Dbg] [camera_sensor.cc:365] timer [5.119] seconds rate [488.377] fps
/home/osrf/code/gz_ws/src/gazebo/test/integration/camera_sensor.cc:367: Failure
Expected: (rate) > (updateRate * (1 - tolerance)), actual: 488.377 vs 490

Copy link
Collaborator Author

@mabelzhang mabelzhang Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I actually ended up increasing to 0.06 (from 0.02) in another branch... 0.06 being arbitrary for the test numbers on my computer... I thought maybe my computer is slow, but the point of this PR was to make the update rate work regardless of the speed of the machine... so it seems that it isn't quite hitting the rate that's asked of it? Though 488 out of 500 isn't bad, I've gotten as low as 473 before, and that's where 0.06 came from. In normal usage, people might never ask for 500 fps, so it might be okay to raise the tolerance. But then the bit about how this whole lockstep idea is so that any computer can hit the update rate isn't quite satisfied. So, I'm not sure what the original author intended with this tolerance. I can raise it irresponsibly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised to 0.06 in c134381

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be timing issue in the test itself. dt is calculated by getting time directly from world while imageCount is from function callback which would be affected by transport delay. Ideally we compute dt based on the timestamp of the first and last image message, which could give us a better estimate of rate.

Not sure if it's a lot of work to verify this. I'm also ok with just increasing the tolerance since the test also passed on jenkins.

Copy link
Collaborator Author

@mabelzhang mabelzhang Jun 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ed6fc04 and bd0085b try to get a time based on the images received, which passed the test 10 times with tolerance lowered to the original 0.02.
I didn't find a place where the timestamp is delivered - it seems rendering::Camera::ConnectNewImageFrame() returns the image in a char *, and doesn't return a stamp. Another option is rendering::Camera::LastRenderWallTime(), but that is wall time and not sim time.
So currently, I am getting the first timestamp manually by checking in the waiting loop for imageCount == 0, which means an iteration before the first image is received. It's still not ideal, in that it might be making the test easier to pass.
I'm consistently getting this rate:
[Dbg] [camera_sensor.cc:354] timer [4.999] seconds rate [500.1] fps
Let me know if you think this change is reasonable, i.e. not too lenient...

@mabelzhang
Copy link
Collaborator Author

mabelzhang commented Jun 10, 2020

Re time difference:
There are a few main places that might have affected the timing.

  1. In the main loop in gazebo/Server.cc, rendering::wait_for_render_request("", 0.100); replaces common::Time::MSleep(1);
  2. Set pose is using a new route? This is the 3 When ready blocks that I uncommented, related to updateScenePoses in gazebo/physics/World.cc and gazebo/rendering/Scene.cc. I don't know what these do differently under the hood. @scpeters I think has a better idea?
  3. The preRenderEnded event trigger and the chain of action it sets off in CameraSensor.cc. This is the core of the PR.

The timing difference might explain the SpeedThreadIslandsTest.DualPR2QuickStep test failures on UbuntuNVIDIA CI. I looked at the CI for d80846d and 6afd412, and the 2 new failures I think are caused by this PR are VisualProperty.VisualMessage andSpeedThreadIslandsTest.DualPR2QuickStep.
https://build.osrfoundation.org/job/gazebo-ci-pr_any-ubuntu_auto-amd64-gpu-nvidia/175/
https://build.osrfoundation.org/job/gazebo-ci-pr_any-ubuntu_auto-amd64-gpu-nvidia/181/

The QuickStep is obvious by its name. The VisualMessage one fails when the transparency of a red box is set to 1 and the camera is expected to see the ground which isn't red, but it's still seeing red. Could be a timing problem if it's too slow.

I was hoping the latter commit with its if-conditionals on useStrictRate to separate between strict rate and original behavior would fix those tests, but it didn't...... which means there might still be something that isn't respecting the original behavior even when strict_rate is not set(!). I believe I caught all the places that can have an if-statement. The only remaining big things that I can't make a decision on are the items above.

Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
@iche033
Copy link
Contributor

iche033 commented Jun 10, 2020

With strict rate disabled, one difference between now and before is that we use updateScenePoses to copy all the pose msgs to the rendering side. Before, we were using gz transport. So the difference is that the physics now needs to wait for pose data to be copied as opposed to just publishing them and forget. I don't know if that's enough to cause a noticeable difference in performance though.

Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
@iche033
Copy link
Contributor

iche033 commented Jun 23, 2020

here is a gist of the changes I made to your branch to preserve original behavior / performance. The caveat is that it introduces a --lockstep cmd arg which is now required (in addition to strict_rate sdf param) in order to enable lockstepping. The main change is that when --lockstep is not specified, pose updates are done over pub/sub as opposed to function callback.

Can you take a look and see if performance is better now? thanks.

Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
@mabelzhang
Copy link
Collaborator Author

mabelzhang commented Jun 29, 2020

Thank you for looking into the performance! I tested with the gist, and these are the results:

No --lockstep:

$ gzserver --verbose test/worlds/camera_strict_rate.world
$ gzclient --verbose

Visualize 30 fps camera: 29-30 Hz, 1.0 RTF
Visualize 500 fps camera: 128-150 Hz, 1.0 RTF
63 FPS for both.
The 1.0 RTF tells us that original behavior is respected when --lockstep is not specified. The high fps camera only goes as high as hardware allows.

With --lockstep:

$ gzserver --verbose test/worlds/camera_strict_rate.world --lockstep
$ gzclient --verbose

Visualize 30 fps camera: 29-30 Hz, 0.5-0.6 RTF
Visualize 500 fps camera: 500 Hz, 0.3-0.33 RTF
62-63 FPS for both
The 500 Hz tells us that lockstep is respected when --lockstep is specified.
The only drawback is the RTF slowdown (0.5-0.6, as opposed to 1.0 above) even for the 30 Hz camera that does not have a <strict_rate> specified. I guess that is how much pub/sub is slower than function callback. Does it affect the 30 Hz camera because the pose updates affect everything in the world, not just sensor by sensor? In that case, I wonder if it still makes sense to have <strict_rate> per sensor. Maybe we should just keep the global --lockstep and remove the per-sensor <strict_rate> tag?

Or, is it possible to do function callbacks for lockstep as well, or must it use pub/sub?

Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
@iche033
Copy link
Contributor

iche033 commented Jun 29, 2020

I guess that is how much pub/sub is slower than function callback

oops sorry I just realized I missed a not in my last comment which gave the opposite meaning. Correction: "when --lockstep is not specified, pose updates are done over pub/sub as opposed to function callback".

Does it affect the 30 Hz camera because the pose updates affect everything in the world

yes that's correct. The pose updates are global and we can't tune it for different sensors.

Maybe we should just keep the global --lockstep and remove the per-sensor <strict_rate> tag?

I've also been thinking about this. I don't know if there is a use case of having a mix of strict rate and non-strict rate sensors. If users enable --lockstep we can assume they want accurate results and willing to sacrifice performance. I am ok with applying strict rate to all sensors if --lockstep is specified.

Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
Signed-off-by: Mabel Zhang <mabel@openrobotics.org>
@mabelzhang
Copy link
Collaborator Author

mabelzhang commented Jul 9, 2020

<strict_rate> custom SDF namespace has now been removed.
Lockstep is enabled by --lockstep command line argument only.
Tests are passing (INTEGRATION_camera_sensor).

Ready for another review.

Copy link
Contributor

@iche033 iche033 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks for fixing the test! It's working well for me.

@mabelzhang mabelzhang merged commit 9aa901c into gazebosim:respect_fps_gz11new Jul 10, 2020
@mabelzhang
Copy link
Collaborator Author

Thank you for the review!

@mabelzhang mabelzhang deleted the respect_fps_gz11camOnly branch July 16, 2020 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants