Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

camera_utils: add "vertical" orientation and "focus" center methods #1543

Merged
merged 7 commits into from
Mar 10, 2023

Conversation

f-dy
Copy link
Contributor

@f-dy f-dy commented Mar 2, 2023

This PR adds a better vertical direction estimation (named "vertical"), which should work in most cases, even when cameras are in a parallel (eg LLFF) configuration (except that in this configuration cameras should be looking horizontally).
It also adds a different centering method, more appropriate for inward-looking configurations, called "focus". The previous unnamed centering method is now called "poses"

Additionally, we fixed inconsistent naming of auto_scale_poses as scale_poses is a couple of DataParserConfig.

Note: The authors of this PR are @f-dy (previously known as @devernay, but I had to create another account for work-related contributions) and @gilureta .

"vertical" orientation method

The existing orientation method, which is called "up", works by averaging the up vector (Y) of all cameras. It has issues, e.g. when cameras were mostly looking down and had an uneven distribution. However, it's still very useful when camera poses are random, for example when scanning a full room rather than a single object.

COLMAP has a slightly different approach, but it also works from the cameras Y direction in 3D, so we expect the same failure cases as the current "up" solution:

This "vertical" orientation method works by finding the 3D direction that is most orthogonal to the X direction of all cameras. This means that this 3D direction projects close to the Y axis of all cameras. This also means that the vanishing point corresponding to the vertical direction should project close to the Y axis. This gives us a vector that may point down instead of up, so we use the previous "up" computation to disambiguate that situation.

This works in all cases where photos are correctly oriented (also a requirement for the "up" method), and also requires some rotation in the camera poses, else the vertical may be any direction in the sagittal plane (the plane spanned by their Z and Y axes), which is shared by all cameras.

We detect that degeneracy: since the vertical is given as the right singular vector corresponding to the smallest singular value of a nx3 matrix, we also check the second smallest, detect the degeneracy, and resolve the ambiguity by chosing he direction within that "vertical plane" that best aligns with the previous "up" vector. This is obtained by projecting that "up" vector onto the plane. See comments in the code.

"focus" centering method

The "focus" centering method works better than "poses" when cameras are turning around something. It works by finding the 3D intersection of the cameras optical axes. Of course 3D lines have no chance to intersect, so it finds the nearest point to these axes, as described in this Wikipedia page. The solution is $\Theta(n)$.

This method is in Multinerf (source) and original NeRF (source).

instant-ngp (source) has a slightly different solution (non-optimal, but maybe more robust): it averages the pairwise intersections of all optical axes (there are $n(n-1)$ in total, making it an $\Theta(n^2)$ solution).

For better robustness, we added an additional check: we only consider the cameras that "see" the focus of attention. We simply check that the focus of attention is not behind the camera. We loop until the set of "active" cameras doesn't change. That way, if most cameras are inward-looking but a few are outward-looking, the latter will be excluded from the computation. Because we start from the "poses" solution (which is the centroid of the camera origins), it usually does only one iteration, so the solution is $\Theta(n)$ on average, but I'm pretty sure someone can design a camera configuration that leads to the worst case $\Theta(n^2)$ complexity (one camera de-activated at each iteration).

A real generic method would work from the sparse SFM reconstruction or the provided depth maps, simply computing their centroid, or a robust centroid (start from the centroid, then do IRLS with Huber weights).

COLMAP's model_aligner with alignment_type=plane centers the coordinate system on the centroid of the reconstruction

Results with previous method

Center is above the real object center, up direction is off.
Image
Image
Image

Results with proposed method

Center is at the object center, up direction is OK (only the last image was produced with the fully trained model)

Image

Image

Image

Here's the camera path, which explains the tilted frame in the previous version:
image

Copy link
Contributor

@tancik tancik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these other options. The "centering" has been ignored despite it being a very important factor in final reconstruction quality.


There are two centering methods:
poses: The poses are centered around the origin.
focus: The origin is set to the focus of attention of all cameras (the
Copy link
Contributor

@tancik tancik Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on have an "auto" option that first tries focus but switches to poses if some percentage of the cameras arn't "active" or some other heuristic like the focus_pt is far from the average ray.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easy to do, so let's make this part of this PR. Should that be a fixed fraction, or an additional parameter?
If it's a fixed fraction (say 0.5) then we could add the "auto" mode which handles that (and reuse the same code)
If it's an additional parameter, then we don't even need to have a separate mode. It would just be that "focus" falls back to "poses" if there's not enough active cameras.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any intuition on how robust a fixed fraction would be?
I think adding another parameter would be fine with me. I also think it might make sense to make this the default option if we feel confident that it will generally improve the results most of the time.

Copy link
Contributor Author

@f-dy f-dy Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I gave it a second thought, and I think that a real "auto" centering method should not work from the camera poses, but should use the sparse 3D feature points from SfM. I'm thinking of the most trivial failure case, which is a camera array configuration à la LLFF: there's no way to guess the scene center from the poses, and only the sparse reconstruction can give you that notion. I'll take a look at how COLMAP does centering. I already know that COLMAP's up direction is very similar to the one already implemented in nerfstudio, and inferior to the one proposed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COLMAP's model_aligner with alignment_type=plane centers the coordinate system on the centroid of the reconstruction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that makes sense. I guess a solution (not for this pr) is to store a centroid in the transforms.json (for the nerfstudio_dataparser) that can override this logic.

@f-dy f-dy force-pushed the more-up-and-center-options branch from 54b04e2 to 026e7b0 Compare March 8, 2023 03:02
@f-dy
Copy link
Contributor Author

f-dy commented Mar 8, 2023

Updated to handle degeneracy in the "vertical" orientation method

@f-dy
Copy link
Contributor Author

f-dy commented Mar 8, 2023

Note: given that the degeneracy is now detected and fixed, "vertical" could be the default orientation method

@f-dy
Copy link
Contributor Author

f-dy commented Mar 8, 2023

working on splitting auto_orient_and_center_poses to pass pylint (done)

@f-dy f-dy marked this pull request as ready for review March 8, 2023 20:31

There are two centering methods:
poses: The poses are centered around the origin.
focus: The origin is set to the focus of attention of all cameras (the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that makes sense. I guess a solution (not for this pr) is to store a centroid in the transforms.json (for the nerfstudio_dataparser) that can override this logic.

@f-dy f-dy force-pushed the more-up-and-center-options branch from b915c50 to b6cb614 Compare March 9, 2023 18:22
Copy link
Contributor

@tancik tancik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding these additional parameters.

@f-dy f-dy force-pushed the more-up-and-center-options branch from 9c0528c to f1a0d16 Compare March 10, 2023 19:08
@f-dy
Copy link
Contributor Author

f-dy commented Mar 10, 2023

I had to resolve conflicts (were tests removed? I couldn't find them)

@tancik
Copy link
Contributor

tancik commented Mar 10, 2023

I had to resolve conflicts (were tests removed? I couldn't find them)

Yes, they were included in a previous PR that was reverted since it was buggy.

@tancik tancik merged commit a277cd4 into nerfstudio-project:main Mar 10, 2023
f-dy pushed a commit to f-dy/sdfstudio that referenced this pull request Apr 20, 2023
- See nerfstudio-project/nerfstudio#1543
- Also default to "vertical" which works better than "up" (nerfstudio-project/nerfstudio#1765)
- Also rename train_split_percentage to train_split_fraction (nerfstudio-project/nerfstudio#1497)
chris838 pushed a commit to chris838/nerfstudio that referenced this pull request Apr 22, 2023
…erfstudio-project#1543)

* camera_utils: add "vertical" orientation and "focus" center methods

* camera_utils: -Z is the look-at direction

* Update nerfstudio_dataparser.py

* missing import

* fix pylint issues

* camera_utils: handle degenerate case for "vertical" orientation method

* camera_utils: split focus_of_attention computation

---------

Co-authored-by: Frédéric Devernay <f-dy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants