-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
camera_utils: add "vertical" orientation and "focus" center methods #1543
camera_utils: add "vertical" orientation and "focus" center methods #1543
Conversation
aecca66
to
54b04e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these other options. The "centering" has been ignored despite it being a very important factor in final reconstruction quality.
|
||
There are two centering methods: | ||
poses: The poses are centered around the origin. | ||
focus: The origin is set to the focus of attention of all cameras (the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on have an "auto" option that first tries focus
but switches to poses
if some percentage of the cameras arn't "active" or some other heuristic like the focus_pt
is far from the average ray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easy to do, so let's make this part of this PR. Should that be a fixed fraction, or an additional parameter?
If it's a fixed fraction (say 0.5) then we could add the "auto" mode which handles that (and reuse the same code)
If it's an additional parameter, then we don't even need to have a separate mode. It would just be that "focus" falls back to "poses" if there's not enough active cameras.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any intuition on how robust a fixed fraction would be?
I think adding another parameter would be fine with me. I also think it might make sense to make this the default option if we feel confident that it will generally improve the results most of the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I gave it a second thought, and I think that a real "auto" centering method should not work from the camera poses, but should use the sparse 3D feature points from SfM. I'm thinking of the most trivial failure case, which is a camera array configuration à la LLFF: there's no way to guess the scene center from the poses, and only the sparse reconstruction can give you that notion. I'll take a look at how COLMAP does centering. I already know that COLMAP's up direction is very similar to the one already implemented in nerfstudio, and inferior to the one proposed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COLMAP's model_aligner
with alignment_type=plane centers the coordinate system on the centroid of the reconstruction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that makes sense. I guess a solution (not for this pr) is to store a centroid
in the transforms.json (for the nerfstudio_dataparser) that can override this logic.
54b04e2
to
026e7b0
Compare
Updated to handle degeneracy in the "vertical" orientation method |
Note: given that the degeneracy is now detected and fixed, "vertical" could be the default orientation method |
working on splitting auto_orient_and_center_poses to pass pylint (done) |
|
||
There are two centering methods: | ||
poses: The poses are centered around the origin. | ||
focus: The origin is set to the focus of attention of all cameras (the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that makes sense. I guess a solution (not for this pr) is to store a centroid
in the transforms.json (for the nerfstudio_dataparser) that can override this logic.
b915c50
to
b6cb614
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding these additional parameters.
9c0528c
to
f1a0d16
Compare
I had to resolve conflicts (were tests removed? I couldn't find them) |
Yes, they were included in a previous PR that was reverted since it was buggy. |
- See nerfstudio-project/nerfstudio#1543 - Also default to "vertical" which works better than "up" (nerfstudio-project/nerfstudio#1765) - Also rename train_split_percentage to train_split_fraction (nerfstudio-project/nerfstudio#1497)
…erfstudio-project#1543) * camera_utils: add "vertical" orientation and "focus" center methods * camera_utils: -Z is the look-at direction * Update nerfstudio_dataparser.py * missing import * fix pylint issues * camera_utils: handle degenerate case for "vertical" orientation method * camera_utils: split focus_of_attention computation --------- Co-authored-by: Frédéric Devernay <f-dy@users.noreply.github.com>
This PR adds a better vertical direction estimation (named "vertical"), which should work in most cases, even when cameras are in a parallel (eg LLFF) configuration (except that in this configuration cameras should be looking horizontally).
It also adds a different centering method, more appropriate for inward-looking configurations, called "focus". The previous unnamed centering method is now called "poses"
Additionally, we fixed inconsistent naming of
auto_scale_poses
asscale_poses
is a couple of DataParserConfig.Note: The authors of this PR are @f-dy (previously known as @devernay, but I had to create another account for work-related contributions) and @gilureta .
"vertical" orientation method
The existing orientation method, which is called "up", works by averaging the up vector (Y) of all cameras. It has issues, e.g. when cameras were mostly looking down and had an uneven distribution. However, it's still very useful when camera poses are random, for example when scanning a full room rather than a single object.
COLMAP has a slightly different approach, but it also works from the cameras Y direction in 3D, so we expect the same failure cases as the current "up" solution:
This "vertical" orientation method works by finding the 3D direction that is most orthogonal to the X direction of all cameras. This means that this 3D direction projects close to the Y axis of all cameras. This also means that the vanishing point corresponding to the vertical direction should project close to the Y axis. This gives us a vector that may point down instead of up, so we use the previous "up" computation to disambiguate that situation.
This works in all cases where photos are correctly oriented (also a requirement for the "up" method), and also requires some rotation in the camera poses, else the vertical may be any direction in the sagittal plane (the plane spanned by their Z and Y axes), which is shared by all cameras.
We detect that degeneracy: since the vertical is given as the right singular vector corresponding to the smallest singular value of a nx3 matrix, we also check the second smallest, detect the degeneracy, and resolve the ambiguity by chosing he direction within that "vertical plane" that best aligns with the previous "up" vector. This is obtained by projecting that "up" vector onto the plane. See comments in the code.
"focus" centering method
The "focus" centering method works better than "poses" when cameras are turning around something. It works by finding the 3D intersection of the cameras optical axes. Of course 3D lines have no chance to intersect, so it finds the nearest point to these axes, as described in this Wikipedia page. The solution is$\Theta(n)$ .
This method is in Multinerf (source) and original NeRF (source).
instant-ngp (source) has a slightly different solution (non-optimal, but maybe more robust): it averages the pairwise intersections of all optical axes (there are$n(n-1)$ in total, making it an $\Theta(n^2)$ solution).
For better robustness, we added an additional check: we only consider the cameras that "see" the focus of attention. We simply check that the focus of attention is not behind the camera. We loop until the set of "active" cameras doesn't change. That way, if most cameras are inward-looking but a few are outward-looking, the latter will be excluded from the computation. Because we start from the "poses" solution (which is the centroid of the camera origins), it usually does only one iteration, so the solution is$\Theta(n)$ on average, but I'm pretty sure someone can design a camera configuration that leads to the worst case $\Theta(n^2)$ complexity (one camera de-activated at each iteration).
A real generic method would work from the sparse SFM reconstruction or the provided depth maps, simply computing their centroid, or a robust centroid (start from the centroid, then do IRLS with Huber weights).
COLMAP's model_aligner with alignment_type=plane centers the coordinate system on the centroid of the reconstruction
Results with previous method
Center is above the real object center, up direction is off.
Results with proposed method
Center is at the object center, up direction is OK (only the last image was produced with the fully trained model)
Here's the camera path, which explains the tilted frame in the previous version: