-
Notifications
You must be signed in to change notification settings - Fork 40
How to use colmap to generate calibration.json
#7
Comments
The multi-view feature isn't integrated with colmap, it assumes that you somehow have access to the camera parameters and have them converted correctly (see bottom of the Readme for coordinate system details, for example). If you want to use colmap, for questions 1-3, it's probably best to essentially call the preprocessing step yourself: https://github.com/Fyusion/LLFF/blob/c6e27b1ee59cb18f054ccb0f87a90214dbe70482/llff/poses/pose_utils.py#L259 I'm not sure if that code supports different camera intrinsics or assumes all of them to be the same, so you'd need to check the result. In any case, the NR-NeRF code uses colmap results that were passed through that preprocessing (it gives the min_bound and max_bound and converts colmap's extrinsics such that they are compatible with NR-NeRF). Regarding 4, the GUI allows to change the camera model to SIMPLE_PINHOLE, if I remember correctly. You could also consider using the preprocessing wrapper from NR-NeRF, especially if all cameras have the same intrinsics. If they are static cameras, you could consider running preprocessing on only one timestep (with multiple images). That way you'd get the extrinsics and intrinsics in the correct format for those images. You'd then need to write code to store them in calibration.json. |
Thanks for your reply @edgar-tr !
Thanks! I have tried to use the preprocessing code, treating the multi-view pictures of one time-step to reconstruct the scene model. However, the bundle adjustment always fails to converge, and it does converge in my Colmap GUI, so I suspect that it's due to the hyper-params used to extract feature points and point-matching.
Yes, I will have a try.
I wrote a snippet to convert the .txt output of colmap to the import json
import numpy as np
def quaternion2matrix(q):
q0 = q[0]
q1 = q[1]
q2 = q[2]
q3 = q[3]
print(q)
m = np.zeros((3, 3))
m[0, 0] = q0 * q0 + q1 * q1 - q2 * q2 - q3 * q3
m[0, 1] = 2 * (q1 * q2 - q0 * q3)
m[0, 2] = 2 * (q1 * q3 + q0 * q2)
m[1, 0] = 2 * (q1 * q2 + q0 * q3)
m[1, 1] = q0 * q0 - q1 * q1 + q2 * q2 - q3 * q3
m[1, 2] = 2 * (q2 * q3 - q0 * q1)
m[2, 0] = 2 * (q1 * q3 - q0 * q2)
m[2, 1] = 2 * (q2 * q3 + q0 * q1)
m[2, 2] = q0 * q0 - q1 * q1 - q2 * q2 + q3 * q3
return m.tolist()
# This part is copy-pasted from the .txt output of colmap.
intrinsics = [1920, 1080, 4442.37, 960, 540, -0.0920246]
extrinsics = [
'1 0.97039 -0.238968 0.0104339 -0.0335974 -1.07221 -2.75473 2.19396 1 down.jpg',
'2 0.997979 -0.0634341 0.00200262 -0.00320611 -0.830256 -0.54455 0.924563 1 front.jpg',
'3 0.969716 -0.0510358 -0.238216 0.0172955 2.78444 -0.361813 0.919001 1 left_30.jpg',
'4 0.924454 -0.0686466 -0.370858 -0.056008 4.44329 -0.847429 1.52017 1 left_60.jpg',
'5 0.96706 -0.0668973 0.244203 0.0261598 -4.22796 -0.607595 1.86112 1 right_30.jpg',
'6 0.924637 -0.0629251 0.373441 0.0403556 -5.59472 -0.66673 2.52344 1 right_60.jpg',
'7 0.993836 0.110606 -0.00483683 -0.00566639 -0.826491 1.70379 1.96883 1 top.jpg'
]
camera_param_dict = {}
for ex in extrinsics:
camera_name = ex.split(' ')[-1].split('.')[0]
(camera_id, qw, qx, qy, qz, tx, ty, tz) = map(float, ex.split(' ')[:-2])
camera_param_dict[camera_name] = {
'translation': [
qx, qy, qz
],
'rotation': quaternion2matrix((qw, qx, qy, qz)),
'center_x': intrinsics[2],
'center_y': intrinsics[2],
'focal_x': intrinsics[3],
'focal_y': intrinsics[4],
'height': intrinsics[1],
'width': intrinsics[0]
}
template = {
"min_bound": 0.0,
"max_bound": 2.0189487179595886,
"0": {
"translation": [
-0.041070333333333334,
1.1753333333333333,
0.49935666666666667
],
"rotation": [
[
0.0577962,
-0.997661,
-0.0364925
],
[
0.558001,
0.00197212,
0.829838
],
[
-0.827825,
-0.0683243,
0.55681
]
],
"center_x": 2572.48,
"center_y": 1875.78,
"focal_x": 5363.46,
"focal_y": 5363.46,
"height": 3840,
"width": 5120
}
}
template.pop('0')
for camera_name in camera_param_dict.keys():
template[camera_name] = camera_param_dict[camera_name]
print(template)
with open("calibration.json", "w") as json_file:
json.dump(template, json_file, indent=4) |
Thanks for sharing the snippet :) The NR-NeRF preprocessing code assumes a SIMPLE_PINHOLE camera model. I have encountered issues with convergence when the images were even slightly distorted. In my case, SIMPLE_PINHOLE wasn't enough but SIMPLE_RADIAL worked via the Colmap GUI. (If the GUI fails with SIMPLE_PINHOLE, I'd suspect some slight distortion to be the reason.) Because the camera model wasn't just SIMPLE_PINHOLE, it wasn't directly compatible with the training code. So I went the way of undistorting the images first (via the Colmap GUI) and then using the standard preprocessing code on them. Alternatively, one could modify the get_rays() functions in nerf_helpers.py to work with a non-pinhole camera model, which should be relatively straightforward. The rest of the code doesn't assume any particular camera model, it takes whatever rays are returned by the get_rays() functions. |
Get it, seems that I should solve the distortion and get intrinsic by PINHOLE model first. I am now treating the FOCAL_LENGTH of SIMPLE_RADIAL model the same as the FOCAL_X and FOCAL_Y(I am not quite familiar with camera models, but in the tutorial of colmap seems that FOCAL_X is almost equivalent to FOCAL_LENGTH in SIMPLE_RADIAL)
I guess it may introduce errors as get_rays() may return wrong |
Some updates here. I have fixed the ColMap and now using a PINHOLE model calibration, the multiview still not working. When I use the single view, the model generalize to unseen latent well. M039_single_view_test_with_diff_latend_training_200000_rgb.3.mp4However, when I training with multi-view setting, it gets static and blurry with seen view unseen latent, and cannot generalize to seen latent(with less_reg_train_fixed_video_rgb.1.mp4train_spiral_video_rgb.1.mp4Could you please give some hint? Maybe too few views(I have only 7 views, fixed camera poses rather than moving sequence). I have tried to use less regulation loss as suggested in this issue, but it does not work. |
This looks similar to what I get when the camera extrinsics are in the wrong coordinate system. In such a case, it overfits to each input camera individually using a ton of artifacts (as seen in the spiral video). But because it overfits to each view, it is less sharp. There is no remotely meaningful 3D model learned because the images are inconsistent with how the cameras are positioned in NR-NeRF's coordinate system. Can you look at logs/cameras.obj to see whether the cameras look sensible? Note that even if the cameras look sensible, it could still be that the images at those cameras are rotated weirdly due to some axis flips. I've had to deal with that issue a lot. You could test this by creating a small dataset from only a single timestep, using the normal preprocessing code without any modifications, and training the model on that but with ray_bending=None (and turning off the regularization losses because they'll throw errors without ray bending, I assume). That trains a standard, rigid NeRF on a single timestep. Novel view synthesis should work somewhat in that case, even though seven static cameras is already quite little. If it does create recognizable novel views, then the issue is the camera coordinate systems being wrong. |
Thanks! It turned out to be a wrong intrinsics and poses. I have tried to use a single timestamp with rigid NeRF, and it fails similar to this one. And then I tried to use NeRF--, which optimizes the intrinsics and poses with bp. The result starts to look reasonable. depth-custom-1.mp4img-custom-1.mp4 |
Hi,
I am wondering how can I get the ex/intrinsics by colmap and store it into
calibration.json
.What I have done is using colmap gui and get the parameters and export them to .txt:
What more do I need to do to use the multi-view feature?
min_bound
,max_bound
? There is no such information in the result of colmap.Thanks!
The text was updated successfully, but these errors were encountered: