-
Notifications
You must be signed in to change notification settings - Fork 862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In place switch to reversed Z rendering and offscreen 32 bit float depth buffers for improved camera depth accuracy #978
Conversation
Is there still interest in merging this in some form? I would be happy to implement the fixes needed to avoid breaking the interface of |
I'm interested. I came across the original discussion while trying to implement a depth sensor in mujoco. |
@aftersomemath, so sorry about this! I'm a bit shocked we've let this hang for 2 months now 😬 As far as I can tell this PR is now a strict improvement and breaks nothing. Is that correct? Which is to say, I don't really see what you mean by
|
It's alright, we are all busy. @saran-t pointed out in an offline conversation that the zbuffer values returned by |
Our next release has a bunch of breaking changes (see latest changelog), I suppose we could break this as well? I think if we unreverse the zbuffer in the Python interface we will actually break very few people. I guess unreversing with floats will also lose the higher precision, correct? What if in Python we also supported a float64 output? (but always with unreversed semantics) @saran-t WDYT? |
The idea that I suggested to @aftersomemath was to add a flag to |
Yes, that sounds sensible (and backwards compatible). @aftersomemath, am I correct that flipping floats would effectively undo the benefit of the PR? Also, does this PR improve z-fighting artifacts for small positive offsets? Right now if the distance between two surfaces is small wrt znear, there is quite a lot of z-fighting. Improving that would make me very happy :) |
Yes, flipping the reverse Z buffer and returning it as a 32 bit float will undo the benefits. Casting to a 64 bit double, then flipping, might be ok. But fundamentally would still be numerically challenged. A test should be done. (EDIT: this PRs changes to renderer.py essentially follows this approach, so it will probably work). Whatever is decided on is fine with me. I'm not sure if z-fighting will be improved close to znear. Theoretically, the best option for distances extremely close to znear is a non-reversed 32 bit floating point buffer which approaches 0 as Z goes to znear. (Not to be confused with OpenGL's default of approaching -1 as Z goes to znear). However, this PR makes the 32 bit float buffer available to offscreen rendering only. For onscreen rendering (such as used by |
I pushed adding a flag to |
d005538
to
7a70d0f
Compare
include/mujoco/mjrender.h
Outdated
typedef enum mjtDepthMapping_ { // OpenGL depth buffer readout mapping (from znear to zfar) | ||
mjDM_ZEROTOONE = 0, // Legacy default, reverses reversed Z rendering, performance penalty | ||
mjDM_ONETOZERO // Native output of reversed Z rendering, decreases numerical error | ||
} mjtDepthMapping; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mjtDepthMap
or mjtDepth
is better.
I don't love ONETOZERO
and ZEROTOONE
, a bit hard to parse.
What about mjDEPTH_01
and mjDEPTH_10
? (latter looks like "ten" but I still think an improvement)
Re comments, lowercase please, also the legacyness and performance penalty is irrelevant here, this should focus on semantics. How about
// standard depth map; 0: znear, 1: zfar
// reversed depth map; 1: znear, 0: zfar
include/mujoco/mjrender.h
Outdated
@@ -39,6 +39,10 @@ typedef enum mjtFramebuffer_ { // OpenGL framebuffer option | |||
mjFB_OFFSCREEN // offscreen buffer | |||
} mjtFramebuffer; | |||
|
|||
typedef enum mjtDepthMapping_ { // OpenGL depth buffer readout mapping (from znear to zfar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the word "readout" should be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree because mjtDepthMap
controls MuJoCo's representation of the depth map regardless of the semantics of the graphics API and buffer.
So I think the comment should be "depth mapping for mjr_readPixels
".
src/render/render_gl3.c
Outdated
@@ -1016,6 +1021,9 @@ void mjr_render(mjrRect viewport, mjvScene* scn, const mjrContext* con) { | |||
// set projection: from light viewpoint | |||
glMatrixMode(GL_PROJECTION); | |||
glLoadIdentity(); | |||
// reverse Z rendering mapping (znear, zfar) -> (1, 0) | |||
glTranslatef(0,0,0.5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces after commas please
All requested changes implemented, plus a few more nits. |
include/mujoco/mjrender.h
Outdated
@@ -39,6 +39,10 @@ typedef enum mjtFramebuffer_ { // OpenGL framebuffer option | |||
mjFB_OFFSCREEN // offscreen buffer | |||
} mjtFramebuffer; | |||
|
|||
typedef enum mjtDepthMap_ { // depth mapping for `mjr_readPixels` | |||
mjDEPTHMAP_01 = 0, // standard depth map; 0: znear, 1: zfar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still means nothing to me... why not mjDEPTH_ZERONEAR
and mjDEPTH_ZEROFAR
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with this because the neighborhood around zero is where the precision is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, mjDEPTH_ZERONEAR
and mjDEPTH_ZEROFAR
are way better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just got the chance to look at this. I'm going to try to override @yuvaltassa on one thing, see below.
👍 The name change is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I'm just going to ask for an entry in changelog.rst and I'll pull this in for internal review. Thanks!
Combined, both options greatly improve camera depth accuracy compared to OpenGL default Z mapping and int24 depth buffer. Requires GL_ARB_clip_control and ARB_depth_buffer_float extensions
…flags By more carefully inverting the operations performed when OpenGL transforms metric distance to window coordinates, more accurate depth values can be estimated from the depth buffer
3da5dd7
to
9868486
Compare
Rebased and added entry to changelog.rst. It is not clear how to generate references.h or I would have added the rest of the entries to the docs. |
Ya we should probably make the doc-updating script public. @saran-t note that we also need to add the enum to the API docs, bindings etc. I guess we patch @aftersomemath's change and do that ourselves? |
Bad news, we test against Mesa 17 internally and it looks like ARB_clip_control isn't implemented by the software renderer. |
Ouch, this extension is also not available on macOS. I think we'll need to make this an optional feature depending on whether ARB_clip_control is available or not. |
Oof, yes it seems so. I will look for a workaround but it seems unlikely to exist.
|
I think we'll just have to add a runtime conditional on |
If GL_ARB_clip_control not available then [znear, zfar] -> [1, -1] instead of [1, 0] in normalized device coordinates Thus, Z is still reversed, but not shifted, clipping should still work properly, and conversion to window coordinates does not cause accuracy regression compared to non-reversed Z rendering.
Pushed a fallback. Reversed Z rendering remains always on, but if Allowing Z to always be reversed makes the code simpler since only one comparison convention needs to be supported. Using the entire normalized device coordinate range ensures clipping works properly and that the conversion to window coordinates does not cause depth accuracy regression after readout when compared to non-reversed Z rendering (used before this PR). |
Sadly this is still causing some problem. Our internal CI uses the Mesa 18's llvmpipe driver does support the extension, but AFAICT there's no way to access it through OSMesa without us compiling Mesa from source. In principle I'm OK doing this, but it implies that there's a possibility that someone relying on software rendering would be stuck with the same issue. How much work would it be to make ARB_depth_buffer_float optional also? |
Making ARB_depth_buffer_float optional is not difficult. I just pushed a fallback. I spent a little bit of time trying to test on the |
# http://stackoverflow.com/a/6657284/1461210 | ||
# https://www.khronos.org/opengl/wiki/Depth_Buffer_Precision | ||
out = near / (1 - out * (1 - near / far)) | ||
# Calculate OpenGL perspective matrix values in float32 precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused.. what's this trying to do? Is the intention to break or preserve compatibility for existing users of this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose is to improve the accuracy of the returned linear depth values and does not break compatibility.
The changes to renderer.py can be summarized is as: "Improve accuracy of linear depth by using segmented, reverse Z rendering and more carefully implement perspective transformation inversion to preserve numerical accuracy".
Without doing the calculations in this way, most of the accuracy improvements are lost due to numerical issues. Again, this should not break backwards compatibility since in the end renderer.py just returns linear depth.
PiperOrigin-RevId: 572961667 Change-Id: I69011ddf835ae286e30e7f84cca67867cfe94fa4
Merged in 5916470 (not sure why this isn't being marked as merged). Thanks! |
In #948, it was proposed to switch to reversed Z and 32 bit float offscreen depth buffers if both were a nearly strict improvement. Tests indicate that they are. This PR contains those changes in place.
Using reversed Z rendering with a float32 depth buffer results in improved accuracy across the entire range
znear
tozfar
. (Note: #948 contains results for distances approaching zfar, below are results for znear)Using a float 32 offscreen Z buffer has a slight performance cost that doesn't seem to matter in practice due to the high cost of
mjr_readPixels
.Reversed Z rendering close to znear
The plot below shows reversed Z and float 32 depth buffers result in almost always better depth accuracy close to znear. Code is here.
See #948 for results for depths approaching zfar.
Float 32 offscreen z buffer performance test
I modified
record.cc
to be a pure offscreen rendering test by disabling physics stepping, etc.Overall, GPU utilization is up, but FPS are about the same because the workload is CPU or PCIe bus bound when
mjr_readPixels
is used. There is a 14-42% memory cost depending on resolution, but for most use cases I doubt this is an issue.800x800 (default resolution)
mjr_readPixels
+mjDB_INT24
: 10,000 frames / 25.22 seconds = 396 fps (~45% GPU utilization, 154 MiB GPU Memory)mjr_readPixels
+mjDB_FLOAT32
: 10,000 / 26.48 = 377.6 fps (~57%, 176 MiB)mjr_readPixels
+mjDB_INT24
: 10,000 / 8.39 = 1,192 fps (~90%, 154 MiB)mjr_readPixels
+mjDB_FLOAT32
: 10,000 / 8.46 = 1,182 fps (~97%, 176 MiB)3840x2160 (4k)
mjr_readPixels
+mjDB_INT24
: 10,000 / 137.95 = 72.5 fps (~63%, 679 MiB)mjr_readPixels
+mjDB_FLOAT32
: 10,000 / 143.46 = 69.7 fps (~85%, 965 MiB)mjr_readPixels
+mjDB_INT24
: 10,000 / 25.72 = 389 fps (100%, 679 MiB)mjr_readPixels
+mjDB_FLOAT32
: 10,000 / 25.73 = 389 fps (100%, 965 MiB)Intel 12800HX, NVIDIA A2000 8GB, Ubuntu 22.04, Clang 14,
--config Release
. GPU stats measured withwatch -n1 nvidia-smi
. Changes torecord.cc
are here. To run:time bin/record ../model/humanoid100/humanoid100.xml 50 200 foo