Skip to content

How does the surface normal GT annotation of NYUv2 comes out? #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Boatsure opened this issue Jan 15, 2025 · 5 comments
Closed

How does the surface normal GT annotation of NYUv2 comes out? #34

Boatsure opened this issue Jan 15, 2025 · 5 comments

Comments

@Boatsure
Copy link

This question is not targeted at your work, but a doubt that arises during the process of my investigating dense prediction research.

How the hell the normal GT is calculated from RGBD images?

Let's say, for example, the NYUv2 dataset.
There is no ready-made normal data in the NYUv2 homepage, and its paper is also vague about the method of calculating normals
(Fig.1. Given an input image with raw and inpainted depth maps, we compute surface normals and align them to the room by finding three dominant orthogonal directions.).

While all research are making normal estimation evaluation on NYUv2, what normal GT are they using?

Would you please share the code to generate the normal GT from RGBD images?
How are the normal GT in DSINE generated?
In addition to the official NYUv2 datasets, are there any pre-processed NYUv2 datasets that are used by the mainstream?

Thanks a lot!

@haodong2000
Copy link
Member

haodong2000 commented Jan 18, 2025

Hi @Boatsure thanks for your interests!

Unfortunately, we just borrowed the surface normal benchmark provided in DSINE. So I may not be able to answer your questions.

@Boatsure
Copy link
Author

The DSINE computes normal GT based on cross-product or PlaneSVD algorithm. https://github.com/baegwangbin/DSINE/blob/main/notes/depth_to_normal.ipynb

That is to say, the current normal estimation method takes normal GT like this as the target. So what is the quality of the normal calculated based on these algorithms? We actually encountered a sim2real problem here.

@haodong2000
Copy link
Member

In my opinion, if the depth is GT, the computed normal can also be viewed as GT.

Would you like to share you sim2real problem?

@Boatsure
Copy link
Author

Sorry to bother, what I mean is that the process of obtaining normal information through some mathematical calculation based on the GT depth is essentially an approximation, just like using some kind of solver to render a normal map in a simulator.

As the repo DSINE says, "We took PlaneSVD from Klasing et al. and added a few modifications to handle depth discontinuties. We encourage you to try using other algorithms as it can potentially improve the quality of the ground truth and hence the performance of the model."

Which means, the PlaneSVD algorithm was proposed in the year 2009, and the performance can be improved.

If the GT depth map can be used to obtain the GT normal map through some mathematical calculation, doesn't that mean that the end2end task of normal estimation is meaningless? The only task left for geometric dense estimation is depth estimation.

My fundamental question and shock is that the community still has no benchmark and mature pipeline related to depth2normal methods.

@haodong2000
Copy link
Member

These are very interesting points, many thanks for sharing!

Regaring to the depth2normal problem you mentioned, this is indeed a point that we did not pay serious attention on. Because I've seen several repos like this for normal2depth and this for depth2normal, the results seems very correct. That's the reason why I usually don't think it's a big deal, and I believe that depth2normal and normal2depth have already been mathematically solved.

Regaring to the value of image2normal or image2depth, they are ill-posed tasks from my perspective. Because both depth and normal are projected geometry maps given cameras, however the camera's intrinsics (focal length and center) are typecially unknown. Thus, image-to-{geometry + camera} is a better task I think, and I strongly recommend MoGe. MoGe is the current best in image-to-geometry, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants