Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In place switch to reversed Z rendering and offscreen 32 bit float depth buffers for improved camera depth accuracy #978

Conversation

aftersomemath
Copy link
Contributor

@aftersomemath aftersomemath commented Jul 11, 2023

In #948, it was proposed to switch to reversed Z and 32 bit float offscreen depth buffers if both were a nearly strict improvement. Tests indicate that they are. This PR contains those changes in place.

Using reversed Z rendering with a float32 depth buffer results in improved accuracy across the entire range znear to zfar. (Note: #948 contains results for distances approaching zfar, below are results for znear)

Using a float 32 offscreen Z buffer has a slight performance cost that doesn't seem to matter in practice due to the high cost of mjr_readPixels.

Reversed Z rendering close to znear

The plot below shows reversed Z and float 32 depth buffers result in almost always better depth accuracy close to znear. Code is here.

See #948 for results for depths approaching zfar.

Figure_2

Float 32 offscreen z buffer performance test

I modified record.cc to be a pure offscreen rendering test by disabling physics stepping, etc.

Overall, GPU utilization is up, but FPS are about the same because the workload is CPU or PCIe bus bound when mjr_readPixels is used. There is a 14-42% memory cost depending on resolution, but for most use cases I doubt this is an issue.

800x800 (default resolution)

  • mjr_readPixels + mjDB_INT24: 10,000 frames / 25.22 seconds = 396 fps (~45% GPU utilization, 154 MiB GPU Memory)
  • mjr_readPixels + mjDB_FLOAT32: 10,000 / 26.48 = 377.6 fps (~57%, 176 MiB)
  • not calling mjr_readPixels + mjDB_INT24: 10,000 / 8.39 = 1,192 fps (~90%, 154 MiB)
  • not calling mjr_readPixels + mjDB_FLOAT32: 10,000 / 8.46 = 1,182 fps (~97%, 176 MiB)

3840x2160 (4k)

  • mjr_readPixels + mjDB_INT24: 10,000 / 137.95 = 72.5 fps (~63%, 679 MiB)
  • mjr_readPixels + mjDB_FLOAT32: 10,000 / 143.46 = 69.7 fps (~85%, 965 MiB)
  • not calling mjr_readPixels + mjDB_INT24: 10,000 / 25.72 = 389 fps (100%, 679 MiB)
  • not calling mjr_readPixels + mjDB_FLOAT32: 10,000 / 25.73 = 389 fps (100%, 965 MiB)

Intel 12800HX, NVIDIA A2000 8GB, Ubuntu 22.04, Clang 14, --config Release. GPU stats measured with watch -n1 nvidia-smi. Changes to record.cc are here. To run: time bin/record ../model/humanoid100/humanoid100.xml 50 200 foo

@aftersomemath
Copy link
Contributor Author

Is there still interest in merging this in some form? I would be happy to implement the fixes needed to avoid breaking the interface of mjr_readpixels.

@michael-projectx
Copy link

Is there still interest in merging this in some form? I would be happy to implement the fixes needed to avoid breaking the interface of mjr_readpixels.

I'm interested. I came across the original discussion while trying to implement a depth sensor in mujoco.

@yuvaltassa
Copy link
Collaborator

@aftersomemath, so sorry about this! I'm a bit shocked we've let this hang for 2 months now 😬

As far as I can tell this PR is now a strict improvement and breaks nothing. Is that correct? Which is to say, I don't really see what you mean by

I would be happy to implement the fixes needed to avoid breaking the interface of mjr_readpixels.

@aftersomemath
Copy link
Contributor Author

It's alright, we are all busy.

@saran-t pointed out in an offline conversation that the zbuffer values returned by mjr_readpixels are now reversed. So if this is merged depending code will compile, and promptly break. He had in mind a way to fix that which did require supporting rendering in both formats.

@yuvaltassa
Copy link
Collaborator

Our next release has a bunch of breaking changes (see latest changelog), I suppose we could break this as well?

I think if we unreverse the zbuffer in the Python interface we will actually break very few people.

I guess unreversing with floats will also lose the higher precision, correct? What if in Python we also supported a float64 output? (but always with unreversed semantics)

@saran-t WDYT?

@saran-t
Copy link
Member

saran-t commented Sep 17, 2023

The idea that I suggested to @aftersomemath was to add a flag to mjrContext to indicate whether you want the z buffer flipped.

@yuvaltassa
Copy link
Collaborator

yuvaltassa commented Sep 17, 2023

Yes, that sounds sensible (and backwards compatible).

@aftersomemath, am I correct that flipping floats would effectively undo the benefit of the PR?

Also, does this PR improve z-fighting artifacts for small positive offsets? Right now if the distance between two surfaces is small wrt znear, there is quite a lot of z-fighting. Improving that would make me very happy :)

@aftersomemath
Copy link
Contributor Author

aftersomemath commented Sep 17, 2023

Yes, flipping the reverse Z buffer and returning it as a 32 bit float will undo the benefits.

Casting to a 64 bit double, then flipping, might be ok. But fundamentally would still be numerically challenged. A test should be done. (EDIT: this PRs changes to renderer.py essentially follows this approach, so it will probably work).

Whatever is decided on is fine with me.

I'm not sure if z-fighting will be improved close to znear. Theoretically, the best option for distances extremely close to znear is a non-reversed 32 bit floating point buffer which approaches 0 as Z goes to znear. (Not to be confused with OpenGL's default of approaching -1 as Z goes to znear).

However, this PR makes the 32 bit float buffer available to offscreen rendering only. For onscreen rendering (such as used by simulate) the windowing system appears to create the z-buffer and it is most likely to be a 24 bit integer. Reversed Z rendering helps 24 bit integers buffers a little bit, but probably not enough to be noticeable. It is possible to make onscreen windows use a 32 bit Z buffer (which should improve Z-fighting at all distances). That seems like a seperate PR, but it sounds easy.

@aftersomemath
Copy link
Contributor Author

I pushed adding a flag to mjrContext. By default, the reversed Z buffer is reversed again, which has the expected accuracy penalty and comes with some performance cost (I can measure it if wanted).

@aftersomemath aftersomemath force-pushed the depth-precision-in-place branch from d005538 to 7a70d0f Compare September 26, 2023 17:44
typedef enum mjtDepthMapping_ { // OpenGL depth buffer readout mapping (from znear to zfar)
mjDM_ZEROTOONE = 0, // Legacy default, reverses reversed Z rendering, performance penalty
mjDM_ONETOZERO // Native output of reversed Z rendering, decreases numerical error
} mjtDepthMapping;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think mjtDepthMap or mjtDepth is better.
I don't love ONETOZERO and ZEROTOONE, a bit hard to parse.
What about mjDEPTH_01 and mjDEPTH_10? (latter looks like "ten" but I still think an improvement)

Re comments, lowercase please, also the legacyness and performance penalty is irrelevant here, this should focus on semantics. How about

// standard depth map; 0: znear, 1: zfar
// reversed depth map; 1: znear, 0: zfar

@@ -39,6 +39,10 @@ typedef enum mjtFramebuffer_ { // OpenGL framebuffer option
mjFB_OFFSCREEN // offscreen buffer
} mjtFramebuffer;

typedef enum mjtDepthMapping_ { // OpenGL depth buffer readout mapping (from znear to zfar)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the word "readout" should be deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree because mjtDepthMap controls MuJoCo's representation of the depth map regardless of the semantics of the graphics API and buffer.

So I think the comment should be "depth mapping for mjr_readPixels".

@@ -1016,6 +1021,9 @@ void mjr_render(mjrRect viewport, mjvScene* scn, const mjrContext* con) {
// set projection: from light viewpoint
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
// reverse Z rendering mapping (znear, zfar) -> (1, 0)
glTranslatef(0,0,0.5);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces after commas please

@aftersomemath
Copy link
Contributor Author

All requested changes implemented, plus a few more nits.

@@ -39,6 +39,10 @@ typedef enum mjtFramebuffer_ { // OpenGL framebuffer option
mjFB_OFFSCREEN // offscreen buffer
} mjtFramebuffer;

typedef enum mjtDepthMap_ { // depth mapping for `mjr_readPixels`
mjDEPTHMAP_01 = 0, // standard depth map; 0: znear, 1: zfar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still means nothing to me... why not mjDEPTH_ZERONEAR and mjDEPTH_ZEROFAR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this because the neighborhood around zero is where the precision is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, mjDEPTH_ZERONEAR and mjDEPTH_ZEROFAR are way better!

Copy link
Member

@saran-t saran-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just got the chance to look at this. I'm going to try to override @yuvaltassa on one thing, see below.

@aftersomemath
Copy link
Contributor Author

👍 The name change is done.

Copy link
Member

@saran-t saran-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'm just going to ask for an entry in changelog.rst and I'll pull this in for internal review. Thanks!

Combined, both options greatly improve camera depth accuracy compared to
OpenGL default Z mapping and int24 depth buffer.

Requires GL_ARB_clip_control and ARB_depth_buffer_float extensions
…flags

By more carefully inverting the operations performed when OpenGL
transforms metric distance to window coordinates, more accurate
depth values can be estimated from the depth buffer
@aftersomemath aftersomemath force-pushed the depth-precision-in-place branch from 3da5dd7 to 9868486 Compare September 30, 2023 00:52
@aftersomemath
Copy link
Contributor Author

Rebased and added entry to changelog.rst. It is not clear how to generate references.h or I would have added the rest of the entries to the docs.

@yuvaltassa
Copy link
Collaborator

Ya we should probably make the doc-updating script public.

@saran-t note that we also need to add the enum to the API docs, bindings etc. I guess we patch @aftersomemath's change and do that ourselves?

@saran-t
Copy link
Member

saran-t commented Oct 2, 2023

Bad news, we test against Mesa 17 internally and it looks like ARB_clip_control isn't implemented by the software renderer.

@saran-t
Copy link
Member

saran-t commented Oct 2, 2023

Ouch, this extension is also not available on macOS. I think we'll need to make this an optional feature depending on whether ARB_clip_control is available or not.

@aftersomemath
Copy link
Contributor Author

Oof, yes it seems so.

I will look for a workaround but it seems unlikely to exist.

ARB_clip_control is in 71% of the drivers listed here. The other extensions used have scores around ~80%.

@saran-t
Copy link
Member

saran-t commented Oct 2, 2023

I think we'll just have to add a runtime conditional on mjGLAD_GL_ARB_clip_control and run through the old logic if it's not there (and emit a mj_warning if mjDEPTH_ZEROFAR is specified when mjGLAD_GL_ARB_clip_control is not supported?)

If GL_ARB_clip_control not available then [znear, zfar] -> [1, -1] instead of [1, 0] in normalized device coordinates

Thus, Z is still reversed, but not shifted, clipping should still work properly, and conversion to window coordinates does not cause accuracy regression compared to non-reversed Z rendering.
@aftersomemath
Copy link
Contributor Author

Pushed a fallback. Reversed Z rendering remains always on, but if ARB_clip_control not available then [znear, zfar] is mapped to [1, -1] instead of [1, 0] in normalized device coordinates.

Allowing Z to always be reversed makes the code simpler since only one comparison convention needs to be supported.

Using the entire normalized device coordinate range ensures clipping works properly and that the conversion to window coordinates does not cause depth accuracy regression after readout when compared to non-reversed Z rendering (used before this PR).

@saran-t
Copy link
Member

saran-t commented Oct 3, 2023

Sadly this is still causing some problem. Our internal CI uses the manylinux2014 Docker image to build and test on Linux, which is based on CentOS7. The distro only seems to make available the "classic" implementation of OSMesa that uses the legacy swrast driver, which doesn't support ARB_depth_buffer_float.

Mesa 18's llvmpipe driver does support the extension, but AFAICT there's no way to access it through OSMesa without us compiling Mesa from source. In principle I'm OK doing this, but it implies that there's a possibility that someone relying on software rendering would be stuck with the same issue.

How much work would it be to make ARB_depth_buffer_float optional also?

@aftersomemath
Copy link
Contributor Author

Making ARB_depth_buffer_float optional is not difficult. I just pushed a fallback.

I spent a little bit of time trying to test on the manylinux2014 docker image but there must be some tricks to get the environment setup correctly. I'll come back to it if this doesn't work.

# http://stackoverflow.com/a/6657284/1461210
# https://www.khronos.org/opengl/wiki/Depth_Buffer_Precision
out = near / (1 - out * (1 - near / far))
# Calculate OpenGL perspective matrix values in float32 precision
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused.. what's this trying to do? Is the intention to break or preserve compatibility for existing users of this class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose is to improve the accuracy of the returned linear depth values and does not break compatibility.

The changes to renderer.py can be summarized is as: "Improve accuracy of linear depth by using segmented, reverse Z rendering and more carefully implement perspective transformation inversion to preserve numerical accuracy".

Without doing the calculations in this way, most of the accuracy improvements are lost due to numerical issues. Again, this should not break backwards compatibility since in the end renderer.py just returns linear depth.

copybara-service bot pushed a commit that referenced this pull request Oct 12, 2023
PiperOrigin-RevId: 572961667
Change-Id: I69011ddf835ae286e30e7f84cca67867cfe94fa4
@saran-t
Copy link
Member

saran-t commented Oct 12, 2023

Merged in 5916470 (not sure why this isn't being marked as merged). Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants