Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scaling image space grads #358

Closed
wants to merge 1 commit into from
Closed

Conversation

jb-ye
Copy link
Collaborator

@jb-ye jb-ye commented Aug 22, 2024

I think the correct implementation when scaling grads is to multiply the max(width, height) (unless the training image aspect ratio is not 1). There is no good reason to scale them separately and then computes the norm for densification purpose. (Maybe a better choice than max(width, height) is focal, but again x,y dimension grad should be scaled together.)

See splatfacto implementation: https://github.com/nerfstudio-project/nerfstudio/blob/main/nerfstudio/models/splatfacto.py#L472

I know the original Inria implementation seems to be scaling grads separately because they convert points into NDC instead of image space.

@jb-ye jb-ye requested a review from liruilong940607 August 22, 2024 04:33
@jb-ye jb-ye changed the title Fix scaling screen space grads Fix scaling image space grads Aug 22, 2024
@liruilong940607
Copy link
Collaborator

liruilong940607 commented Aug 22, 2024

Yeah I was aligning with the original implementation instead of nerfstudio's impl.

And I don't feel like there is a correct way here -- each way is ok and its just a matter of the 2d threshold becomes slightly different between these two options. Basically about whether you threshold it in the NDC [-1, 1] space or you threshold it in the {width, height}/max(width, height) space, which are both valid to me.

@jb-ye
Copy link
Collaborator Author

jb-ye commented Aug 22, 2024

@liruilong940607 There is a difference here: think about pixels are independent measurements from camera sensors, grad_x grad_y in pixel space should be treated equally in scaling, because they are also derived signals. Converting points to NDC is rather a pure graphics convention, and doesn’t respect signal processing principles.

BTW, the change also increases PSNR/SSIM in my experiments with similar number of GSs.

@liruilong940607
Copy link
Collaborator

liruilong940607 commented Aug 22, 2024

Below is a visual between the two options. The gradient vector (in orange) have different length in each of the options. Which means if you adjust the threshold properly, these two options can be identical (You need to adjust the threshold!).

Screenshot 2024-08-22 at 9 14 02 AM

With that, I'm not surprised that you get better PSNR, because changing this is effectively changing the threshold, and apparently the default threshold could not be optimal in all cases.

As for the question of which one is better? I don't see any reason that one is better than the other one. It's just a matter of in which space you are applying the threshold

Does this makes sense to you?

@jb-ye
Copy link
Collaborator Author

jb-ye commented Aug 22, 2024

a. I agree with you that what matters is which space you want to calculate the grads use their norm for thresholding. The grads returned from backward is calculated in pixel coordinate space. If you multiply factors individually as it is done in gsplat/Inria, it is equivalent to calculate the grads in NDC space. Thresholding grads in NDC space is definitely different from thresholding in grads in image space. One cannot adjust threshold in image space to mimic the behavior in NDC space. One may ask what space makes more sense? One hypothetic example is training Gaussian splatting using images crops with varied sizes: say we have 100 images in training set, each comes with different sizes, how do we aggregate grads in multiple iterations?

b. @devernay and I had a discussion about this and we think scaling with max(width, height) isn't a good idea either, if we want to accommodate training images of varies sizes (which is our use case internally). Because the GS loss is an average of individual pixel losses, we should scale the grad using (width_i * height_i) given i is the index of training image.

If you agree with (b.), then my new proposal is:

grads *= (info["width"] * info["height"]) / 2.0 * info["n_cameras"]

and reset default threshold grow_grad2d = 0.32 (0.0002 * 1600), which basically saying that if for a gaussian, the norm of the gradient of total pixel losses (average over iterations) is more than 0.32 per pixel unit, we should densify.

@liruilong940607
Copy link
Collaborator

I think normalize to NDC space is exactly the solution to cancel out the various sizes of images. Say for the same image, if you upsample it for 2x (have 4x more pixels), then both the gradient_x and gradient_y are 2x smaller. So normalizing to NDC space (* width/height respectively) ends up the same gradient scale and thus you do not need to modify the threshold to achieve the same densification effect.

@jb-ye
Copy link
Collaborator Author

jb-ye commented Aug 22, 2024

a. I agree that if we trained with upsampled images, grads in NDC or max(width, height) normalized image space should have the same threshold in order to produce similar number of Gaussians.

b. However in practice, 2x resolution is often achieved from camera configurations, in which case, we have more information and sharper details in the 2x resolution captures. The grad calculated from 2x resolution is also less noisy. We effectively need a smaller threshold in NDC or normalized image space to encourage GS to produce more gaussians to produce sharper details. In fact, many splafacto users complains that they get poor result when training with very high resolution images.

c. How to scale grad for threshold, and whether we should respect aspect ratio in scaling is two separate issues. Agree? One thing I am pretty sure is to keep aspect ratio of grads.

@liruilong940607
Copy link
Collaborator

liruilong940607 commented Aug 22, 2024

To summarize your a. and b., these two situations require different handling of the gradients normalization. But actually, my viewpoint of the b. situation on "different image has different amount of information" should be handled by each image has its own ideal threshold. For example, gradient of 3 pixel unit is considered small gradient on one image with less information, might be considered as large gradient on another image with rich information (e.g., zooming in). So its not about the image resolution at all, its about "the amount of information stored per pixel" kind of thing. I would say this is a limitation of using image gradients to guide the densification process, in which a global threshold does not take into account "the amount of information stored per pixel". Tweaking gradient normalization shouldn't be the solution of it.

For c. I still don't see any reason why keeping aspect ratio is more correct than thresholding in NDC space.

@jb-ye jb-ye closed this Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants