Fix scaling image space grads #358

jb-ye · 2024-08-22T04:33:25Z

I think the correct implementation when scaling grads is to multiply the max(width, height) (unless the training image aspect ratio is not 1). There is no good reason to scale them separately and then computes the norm for densification purpose. (Maybe a better choice than max(width, height) is focal, but again x,y dimension grad should be scaled together.)

See splatfacto implementation: https://github.com/nerfstudio-project/nerfstudio/blob/main/nerfstudio/models/splatfacto.py#L472

I know the original Inria implementation seems to be scaling grads separately because they convert points into NDC instead of image space.

liruilong940607 · 2024-08-22T04:56:18Z

Yeah I was aligning with the original implementation instead of nerfstudio's impl.

And I don't feel like there is a correct way here -- each way is ok and its just a matter of the 2d threshold becomes slightly different between these two options. Basically about whether you threshold it in the NDC [-1, 1] space or you threshold it in the {width, height}/max(width, height) space, which are both valid to me.

jb-ye · 2024-08-22T14:15:53Z

@liruilong940607 There is a difference here: think about pixels are independent measurements from camera sensors, grad_x grad_y in pixel space should be treated equally in scaling, because they are also derived signals. Converting points to NDC is rather a pure graphics convention, and doesn’t respect signal processing principles.

BTW, the change also increases PSNR/SSIM in my experiments with similar number of GSs.

liruilong940607 · 2024-08-22T16:19:21Z

Below is a visual between the two options. The gradient vector (in orange) have different length in each of the options. Which means if you adjust the threshold properly, these two options can be identical (You need to adjust the threshold!).

With that, I'm not surprised that you get better PSNR, because changing this is effectively changing the threshold, and apparently the default threshold could not be optimal in all cases.

As for the question of which one is better? I don't see any reason that one is better than the other one. It's just a matter of in which space you are applying the threshold

Does this makes sense to you?

jb-ye · 2024-08-22T17:26:23Z

a. I agree with you that what matters is which space you want to calculate the grads use their norm for thresholding. The grads returned from backward is calculated in pixel coordinate space. If you multiply factors individually as it is done in gsplat/Inria, it is equivalent to calculate the grads in NDC space. Thresholding grads in NDC space is definitely different from thresholding in grads in image space. One cannot adjust threshold in image space to mimic the behavior in NDC space. One may ask what space makes more sense? One hypothetic example is training Gaussian splatting using images crops with varied sizes: say we have 100 images in training set, each comes with different sizes, how do we aggregate grads in multiple iterations?

b. @devernay and I had a discussion about this and we think scaling with max(width, height) isn't a good idea either, if we want to accommodate training images of varies sizes (which is our use case internally). Because the GS loss is an average of individual pixel losses, we should scale the grad using (width_i * height_i) given i is the index of training image.

If you agree with (b.), then my new proposal is:

grads *= (info["width"] * info["height"]) / 2.0 * info["n_cameras"]

and reset default threshold grow_grad2d = 0.32 (0.0002 * 1600), which basically saying that if for a gaussian, the norm of the gradient of total pixel losses (average over iterations) is more than 0.32 per pixel unit, we should densify.

liruilong940607 · 2024-08-22T17:43:32Z

I think normalize to NDC space is exactly the solution to cancel out the various sizes of images. Say for the same image, if you upsample it for 2x (have 4x more pixels), then both the gradient_x and gradient_y are 2x smaller. So normalizing to NDC space (* width/height respectively) ends up the same gradient scale and thus you do not need to modify the threshold to achieve the same densification effect.

jb-ye · 2024-08-22T17:55:40Z

a. I agree that if we trained with upsampled images, grads in NDC or max(width, height) normalized image space should have the same threshold in order to produce similar number of Gaussians.

b. However in practice, 2x resolution is often achieved from camera configurations, in which case, we have more information and sharper details in the 2x resolution captures. The grad calculated from 2x resolution is also less noisy. We effectively need a smaller threshold in NDC or normalized image space to encourage GS to produce more gaussians to produce sharper details. In fact, many splafacto users complains that they get poor result when training with very high resolution images.

c. How to scale grad for threshold, and whether we should respect aspect ratio in scaling is two separate issues. Agree? One thing I am pretty sure is to keep aspect ratio of grads.

liruilong940607 · 2024-08-22T18:46:59Z

To summarize your a. and b., these two situations require different handling of the gradients normalization. But actually, my viewpoint of the b. situation on "different image has different amount of information" should be handled by each image has its own ideal threshold. For example, gradient of 3 pixel unit is considered small gradient on one image with less information, might be considered as large gradient on another image with rich information (e.g., zooming in). So its not about the image resolution at all, its about "the amount of information stored per pixel" kind of thing. I would say this is a limitation of using image gradients to guide the densification process, in which a global threshold does not take into account "the amount of information stored per pixel". Tweaking gradient normalization shouldn't be the solution of it.

For c. I still don't see any reason why keeping aspect ratio is more correct than thresholding in NDC space.

Fix a bug in scaling screen space grads

e9b6a07

jb-ye requested a review from liruilong940607 August 22, 2024 04:33

jb-ye changed the title ~~Fix scaling screen space grads~~ Fix scaling image space grads Aug 22, 2024

jb-ye mentioned this pull request Aug 22, 2024

use gsplat strategy interface to simplify splatfacto nerfstudio-project/nerfstudio#3376

Merged

jb-ye closed this Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix scaling image space grads #358

Fix scaling image space grads #358

jb-ye commented Aug 22, 2024 •

edited

Loading

liruilong940607 commented Aug 22, 2024 •

edited

Loading

jb-ye commented Aug 22, 2024

liruilong940607 commented Aug 22, 2024 •

edited

Loading

jb-ye commented Aug 22, 2024 •

edited

Loading

liruilong940607 commented Aug 22, 2024

jb-ye commented Aug 22, 2024

liruilong940607 commented Aug 22, 2024 •

edited

Loading

Fix scaling image space grads #358

Fix scaling image space grads #358

Conversation

jb-ye commented Aug 22, 2024 • edited Loading

liruilong940607 commented Aug 22, 2024 • edited Loading

jb-ye commented Aug 22, 2024

liruilong940607 commented Aug 22, 2024 • edited Loading

jb-ye commented Aug 22, 2024 • edited Loading

liruilong940607 commented Aug 22, 2024

jb-ye commented Aug 22, 2024

liruilong940607 commented Aug 22, 2024 • edited Loading

jb-ye commented Aug 22, 2024 •

edited

Loading

liruilong940607 commented Aug 22, 2024 •

edited

Loading

liruilong940607 commented Aug 22, 2024 •

edited

Loading

jb-ye commented Aug 22, 2024 •

edited

Loading

liruilong940607 commented Aug 22, 2024 •

edited

Loading