-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix scaling image space grads #358
Conversation
Yeah I was aligning with the original implementation instead of nerfstudio's impl. And I don't feel like there is a correct way here -- each way is ok and its just a matter of the 2d threshold becomes slightly different between these two options. Basically about whether you threshold it in the NDC [-1, 1] space or you threshold it in the |
@liruilong940607 There is a difference here: think about pixels are independent measurements from camera sensors, grad_x grad_y in pixel space should be treated equally in scaling, because they are also derived signals. Converting points to NDC is rather a pure graphics convention, and doesn’t respect signal processing principles. BTW, the change also increases PSNR/SSIM in my experiments with similar number of GSs. |
a. I agree with you that what matters is which space you want to calculate the grads use their norm for thresholding. The b. @devernay and I had a discussion about this and we think scaling with max(width, height) isn't a good idea either, if we want to accommodate training images of varies sizes (which is our use case internally). Because the GS loss is an average of individual pixel losses, we should scale the grad using (width_i * height_i) given i is the index of training image. If you agree with (b.), then my new proposal is:
and reset default threshold |
I think normalize to NDC space is exactly the solution to cancel out the various sizes of images. Say for the same image, if you upsample it for 2x (have 4x more pixels), then both the gradient_x and gradient_y are 2x smaller. So normalizing to NDC space (* width/height respectively) ends up the same gradient scale and thus you do not need to modify the threshold to achieve the same densification effect. |
a. I agree that if we trained with upsampled images, grads in NDC or max(width, height) normalized image space should have the same threshold in order to produce similar number of Gaussians. b. However in practice, 2x resolution is often achieved from camera configurations, in which case, we have more information and sharper details in the 2x resolution captures. The grad calculated from 2x resolution is also less noisy. We effectively need a smaller threshold in NDC or normalized image space to encourage GS to produce more gaussians to produce sharper details. In fact, many splafacto users complains that they get poor result when training with very high resolution images. c. How to scale grad for threshold, and whether we should respect aspect ratio in scaling is two separate issues. Agree? One thing I am pretty sure is to keep aspect ratio of grads. |
To summarize your a. and b., these two situations require different handling of the gradients normalization. But actually, my viewpoint of the b. situation on "different image has different amount of information" should be handled by each image has its own ideal threshold. For example, gradient of 3 pixel unit is considered small gradient on one image with less information, might be considered as large gradient on another image with rich information (e.g., zooming in). So its not about the image resolution at all, its about "the amount of information stored per pixel" kind of thing. I would say this is a limitation of using image gradients to guide the densification process, in which a global threshold does not take into account "the amount of information stored per pixel". Tweaking gradient normalization shouldn't be the solution of it. For c. I still don't see any reason why keeping aspect ratio is more correct than thresholding in NDC space. |
I think the correct implementation when scaling grads is to multiply the
max(width, height)
(unless the training image aspect ratio is not 1). There is no good reason to scale them separately and then computes the norm for densification purpose. (Maybe a better choice thanmax(width, height)
isfocal
, but again x,y dimension grad should be scaled together.)See splatfacto implementation: https://github.com/nerfstudio-project/nerfstudio/blob/main/nerfstudio/models/splatfacto.py#L472
I know the original Inria implementation seems to be scaling grads separately because they convert points into NDC instead of image space.