Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

30k export problem ? #8

Closed
TXSevenXT opened this issue Sep 4, 2024 · 16 comments
Closed

30k export problem ? #8

TXSevenXT opened this issue Sep 4, 2024 · 16 comments

Comments

@TXSevenXT
Copy link

Hello !

First of all, thank you for your work and for keeping up with the updates :)

I've just run a test on a 30sec mp4 video with the following code:
python run_single.py --colmap_name cascade --video_extension mp4

The calculation finishes well but a problem occurs when exporting the 30k ply (The 7k comes out fine and is usable):
Optimizing /home/xxxxxxx/gs2mesh/splatting_output/custom_nw_iterations30000/cascade
Output folder: /home/xxxxxxx/gs2mesh/splatting_output/custom_nw_iterations30000/cascade [04/09 08:44:34]
Reading camera 86/86 [04/09 08:44:35]
Converting point3d.bin to .ply, will happen only the first time you open the scene. [04/09 08:44:35]
Loading Training Cameras [04/09 08:44:35]
[ INFO ] Encountered quite large input images (>1.6K pixels width), rescaling to 1.6K.
If this is not desired, please explicitly specify '--resolution/-r' as 1 [04/09 08:44:35]
Loading Test Cameras [04/09 08:45:11]
Number of points at initialisation : 43182 [04/09 08:45:11]
Training progress: 23%|█████████▊ | 7000/30000 [09:00<35:06, 10.92it/s, Loss=0.1461526]
[ITER 7000] Evaluating train: L1 0.07914816588163376 PSNR 18.642070770263672 [04/09 08:54:13]

[ITER 7000] Saving Gaussians [04/09 08:54:14]
Training progress: 100%|█████████████████████████████████████████| 30000/30000 [54:02<00:00, 9.25it/s, Loss=0.1150575]

[ITER 30000] Evaluating train: L1 0.06295853853225708 PSNR 20.098556518554688 [04/09 09:39:23]

[ITER 30000] Saving Gaussians [04/09 09:39:24]
Killed
num views: 86
baseline: 0.2705555039768162
RPly: Unable to open file
[Open3D WARNING] Read PLY failed: unable to open file: /home/xxxxxxx/gs2mesh/splatting_output/custom_nw_iterations30000/cascade/point_cloud/iteration_30000/point_cloud.ply
Loading trained model at iteration 30000
Reading camera 86/86
Loading Training Cameras
Loading Test Cameras
Traceback (most recent call last):
File "run_single.py", line 190, in
run_single(args)
File "run_single.py", line 90, in run_single
renderer.prepare_renderer()
File "/home/xxxxxxx/gs2mesh/gs2mesh_utils/renderer_utils.py", line 356, in prepare_renderer
scene = Scene(dataset, self.gaussians, load_iteration=self.splatting_iteration, shuffle=False)
File "/home/xxxxxxx/gs2mesh/third_party/gaussian-splatting/scene/init.py", line 78, in init
self.gaussians.load_ply(os.path.join(self.model_path,
File "/home/xxxxxxx/gs2mesh/third_party/gaussian-splatting/scene/gaussian_model.py", line 216, in load_ply
plydata = PlyData.read(path)
File "/home/xxxxxxx/anaconda3/envs/gs2mesh/lib/python3.8/site-packages/plyfile.py", line 158, in read
(must_close, stream) = _open_stream(stream, 'read')
File "/home/xxxxxxx/anaconda3/envs/gs2mesh/lib/python3.8/site-packages/plyfile.py", line 1345, in _open_stream
return (True, open(stream, read_or_write[0] + 'b'))
FileNotFoundError: [Errno 2] No such file or directory: '/home/xxxxxxx/gs2mesh/splatting_output/custom_nw_iterations30000/cascade/point_cloud/iteration_30000/point_cloud.ply'

Have you already encountered this problem?
If so, do you have any advice on how to solve it?
For information, I'm running Windows 11, CUDA 11.8, WSL2/Ubuntu 22.04

Thanks for your help :)

Have a nice day ^_^

@yanivw12
Copy link
Owner

yanivw12 commented Sep 4, 2024

Hi,

It seems like it could be a memory issue when exporting the ply file:
graphdeco-inria/gaussian-splatting#235
A quick fix that they suggest is saving the ply only after 30k iterations and not after 7k. To do that, add the --GS_save_test_iterations 30000 argument and see if it helps.

@TXSevenXT
Copy link
Author

Thank you very much for your help ^^

After activating the backup memory in the nvidia settings, no more worries with the previous error, thank you :)

However, something is missing to extract the meshes:
[ITER 30000] Evaluating train: L1 0.00707147466018796 PSNR 39.55408630371094 [04/09 17:35:16]

[ITER 30000] Saving Gaussians [04/09 17:35:16]

Training complete. [04/09 17:35:23]
num views: 3
baseline: 0.27544912783794784
Loading trained model at iteration 30000
Reading camera 3/3
Loading Training Cameras
Loading Test Cameras
0%| | 0/3 [00:00<?, ?it/s]UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1716905971873/work/aten/src/ATen/native/TensorShape.cpp:3587.)
UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
100%|█████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16<00:00, 5.43s/it]
Automask must be enabled for masking in script mode. Skipping.
100%|█████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 4.72it/s]
[Open3D WARNING] Write PLY failed: mesh has 0 vertices.
SAVED MESH
[Open3D DEBUG] [ClusterConnectedTriangles] Compute triangle adjacency
[Open3D DEBUG] [ClusterConnectedTriangles] Done computing triangle adjacency
[Open3D DEBUG] [ClusterConnectedTriangles] Done clustering, #clusters=0
[Open3D WARNING] Write PLY failed: mesh has 0 vertices.
SAVED CLEANED MESH

With or without the argument you provided earlier, I arrive at the same point.
It's not a 360° video but rather ‘120-140°’.

@yanivw12
Copy link
Owner

yanivw12 commented Sep 4, 2024

The problem you have now is that the depth is being cropped to soon, so increasing the horizontal baseline and the truncation limit should solve it. It's better to first increase the baseline and then the truncation limit though.

To increase the baseline, add the --no-renderer_360_scene argument, which should calculate a better horizontal baseline given that your scene is not a 360 scene (the resulting baseline should be larger than the current baseline, which is 0.275). If that still doesn't work, increase the truncation limit with the argument --TSDF_max_depth_baselines to something larger than the default 20.

@yanivw12
Copy link
Owner

yanivw12 commented Sep 4, 2024

Also, note that you only have 3 views in your current model, whereas in the first message that you sent you had 86 views. 3 views for COLMAP and 3DGS is usually insufficient.

@TXSevenXT
Copy link
Author

Also, note that you only have 3 views in your current model, whereas in the first message that you sent you had 86 views. 3 views for COLMAP and 3DGS is usually insufficient.

Hi again,

This seems to be related to the '--GS_save_test_iterations 30000' argument
Without it, it finds 86 views.

I'll try with the last arguments and keep you posted.

Thank you very much for your help in any case 😊

@TXSevenXT
Copy link
Author

Hello !

I have a problem because the program always ends up “killed”...
It seems to be a memory problem but I don't understand why...
It seems to use RAM but little or no VRAM...
Even with 29 frames in 720p... I'm a bit stumped.

I have no problem with 3DGS and 200 images in 1600p :(...

@TXSevenXT
Copy link
Author

Here the history for next code line "python run_single.py --colmap_name cascade --skip_video_extraction --no-renderer_scene_360 --TSDF_max_depth_baselines 30 --GS_save_test_iterations 30000" :
Training progress: 100%|██████████████████████████████████████████| 30000/30000 [13:54<00:00, 35.95it/s, Loss=0.0260883]

[ITER 30000] Evaluating train: L1 0.018139631859958174 PSNR 29.62316818237305 [06/09 13:13:30]

[ITER 30000] Saving Gaussians [06/09 13:13:30]

Training complete. [06/09 13:13:47]
num views: 29
baseline: 0.6796290173109062
Loading trained model at iteration 30000
Reading camera 29/29
Loading Training Cameras
Loading Test Cameras
0%| | 0/29 [00:00<?, ?it/s]UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1716905971873/work/aten/src/ATen/native/TensorShape.cpp:3587.)
UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
100%|███████████████████████████████████████████████████████████████████████████████████| 29/29 [00:21<00:00, 1.32it/s]
Automask must be enabled for masking in script mode. Skipping.
10%|████████▋ | 3/29 [01:52<18:30, 42.71s/it]
Killed

@yanivw12
Copy link
Owner

yanivw12 commented Sep 6, 2024

See #1

Your code gets killed in the TSDF step, and there seems to be some bug with the Open3D TSDF. If you're using CUDA11.8 and python3.8, the problem could be the Ubuntu version you're using (I tested it on 20.04, and you're using 22.04). More details in the "Common Issues and Tips" section. From what I've tested, with Ubuntu 20.04 and python 3.7 and 3.8 the TSDF runs without an issue.

@TXSevenXT
Copy link
Author

Thank you for your help, I'll test with Ubuntu 20.04 :)

Last question, Is 32Gb of RAM enough to run your code?
I have a 4090 next to it but I think that the RAM may be a bit borderline with gs2mesh?

Thank you again for your time and advice 😊

@yanivw12
Copy link
Owner

yanivw12 commented Sep 6, 2024

The GS and Stereo models should run fine on a 4090 (with 24GB of VRAM). The only step that can take up RAM is the TSDF, since it's not a GPU accelerated version(*). I haven't really tested how much RAM it takes, but I assume 32GB is enough.

(*) As a side note, I also tested a GPU accelerated TSDF (https://github.com/andyzeng/tsdf-fusion-python), but from my experience with it, it's way more heavier on the GPU memory and I often encountered "CUDA out of memory" (with an L40 with 50GB), especially for larger scenes or higher resolutions. The results also looked visually better with the Open3D version.

@TXSevenXT
Copy link
Author

Hello Yaniv,

Thanks again for your help.
I have installed ubuntu 20.04 with everything that goes well but I am stuck because diff_rasterization_gaussian needs glibc 2.32 and ubuntu 20.04 stops at 2.31.. I've tried everything to move to version 2.32 but I'm at a dead end.
How did you circumvent the problem please?
(gs2mesh) xxxxxxx@Jeremie:~/gs2mesh$ python run_single.py --colmap_name cascade --skip_video_extraction --no-renderer_scene_360 --TSDF_max_depth_baselines 30
Traceback (most recent call last):
File "run_single.py", line 14, in
from gs2mesh_utils.renderer_utils import Renderer
File "/home/xxxxxxx/gs2mesh/gs2mesh_utils/renderer_utils.py", line 25, in
from gaussian_renderer import render
File "/home/xxxxxxx/gs2mesh/third_party/gaussian-splatting/gaussian_renderer/init.py", line 14, in
from diff_gaussian_rasterization import GaussianRasterizationSettings, GaussianRasterizer
File "/home/xxxxxxx/anaconda3/envs/gs2mesh/lib/python3.8/site-packages/diff_gaussian_rasterization/init.py", line 15, in
from . import _C
ImportError: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /home/xxxxxxx/anaconda3/envs/gs2mesh/lib/python3.8/site-packages/diff_gaussian_rasterization/_C.cpython-38-x86_64-linux-gnu.so)

@yanivw12
Copy link
Owner

I checked on the server I'm using, and I'm also running glibc 2.31...
Maybe clean the previous installation (the entire diff_gaussian_rasterization folder) and re-install?

@TXSevenXT
Copy link
Author

Same problem with WSL Ubuntu 20.04 :

Optimizing /home/xsevenx/gs2mesh/splatting_output/custom_nw_iterations30000/cascade
Output folder: /home/xsevenx/gs2mesh/splatting_output/custom_nw_iterations30000/cascade [11/09 14:16:53]
Reading camera 29/29 [11/09 14:16:53]
Converting point3d.bin to .ply, will happen only the first time you open the scene. [11/09 14:16:53]
Loading Training Cameras [11/09 14:16:53]
Loading Test Cameras [11/09 14:16:54]
Number of points at initialisation : 17352 [11/09 14:16:54]
Training progress: 100%|██████████████████████████████████████████| 30000/30000 [14:48<00:00, 33.75it/s, Loss=0.0265618]

[ITER 30000] Evaluating train: L1 0.018692961521446706 PSNR 29.656937789916995 [11/09 14:31:48]

[ITER 30000] Saving Gaussians [11/09 14:31:49]

Training complete. [11/09 14:32:06]
num views: 29
baseline: 0.6802225502353557
Loading trained model at iteration 30000
Reading camera 29/29
Loading Training Cameras
Loading Test Cameras
0%| | 0/29 [00:00<?, ?it/s]UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1716905971873/work/aten/src/ATen/native/TensorShape.cpp:3587.)
UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
100%|███████████████████████████████████████████████████████████████████████████████████| 29/29 [00:25<00:00, 1.13it/s]
Automask must be enabled for masking in script mode. Skipping.
14%|███████████▌ | 4/29 [01:57<14:14, 34.17s/it]

It's using swap like something about 140Gb...
Same 29 files 720p (picture weight 500Ko average) with this command :
python run_single.py --colmap_name cascade --skip_video_extraction --no-renderer_scene_360 --TSDF_max_depth_baselines 30 --GS_save_test_iterations 30000

@yanivw12
Copy link
Owner

Try using a lower resolution for the grid (higher TSDF_voxel). 34 seconds per iteration of TSDF is way too much (should take a couple of seconds max). I suggest visualizing the depth maps that TSDF is using with the custom_data.ipynb notebook. It might help understand why.

@TXSevenXT
Copy link
Author

Really nice, I've tried with --TSDF_voxel 4 instead of 2 and results is great :)
Cleanning needs best settings but ply is clean.

cascade_voxel_4

Thank you very much for your help :)

I'll try with ubuntu 24.04 in order to see if all is Ok =)

@yanivw12
Copy link
Owner

Looks great! It's nice seeing people using the method on their own data. That was one of the things I was anticipating the most when releasing the code.

I'm closing the issue for now, feel free to re-open if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants