You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've spent couple of days trying to launch this code on the custom data i have after colmapping some images. I was able to beat many problems on the way of launching the code but I'm facing this right now and don't know what to do:
dataset total: train 330
dataset [NerfSynthFtDataset] was created
../checkpoints/col_nerfsynth/yandex/*_net_ray_marching.pth
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Continue training from 0 epoch
Iter: 0
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
opt.act_type!!!!!!!!! LeakyReLU
self.points_embeding torch.Size([1, 841, 32])
querier device cuda:0 0
neural_params [('module.neural_points.xyz', torch.Size([841, 3]), False), ('module.neural_points.points_embeding', torch.Size([1, 841, 32]), True), ('module.neural_points.points_conf', torch.Size([1, 841, 1]), True), ('module.neural_points.points_dir', torch.Size([1, 841, 3]), True), ('module.neural_points.points_color', torch.Size([1, 841, 3]), True), ('module.neural_points.Rw2c', torch.Size([3, 3]), False)]
model [MvsPointsVolumetricModel] was created
opt.resume_iter!!!!!!!!! 0
loading ray_marching from ../checkpoints/col_nerfsynth/yandex/0_net_ray_marching.pth
------------------- Networks -------------------
[Network ray_marching] Total number of parameters: 0.377M
------------------------------------------------
# training images = 330
saving model (yandex, epoch 0, total_steps 0)
Traceback (most recent call last):
File "train_ft.py", line 1081, in <module>
main()
File "train_ft.py", line 937, in main
model.optimize_parameters(total_steps=total_steps)
File "/home/daddywesker/Dioram/yandex/pointnerf/run/../models/neural_points_volumetric_model.py", line 217, in optimize_parameters
self.backward(total_steps)
File "/home/daddywesker/Dioram/yandex/pointnerf/run/../models/mvs_points_volumetric_model.py", line 104, in backward
self.loss_total.backward()
File "/home/daddywesker/anaconda3/envs/limap/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/daddywesker/anaconda3/envs/limap/lib/python3.8/site-packages/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
end loading
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
I've tried to debug this problem by attaching to the process launched using bash script in w_colmap_n360 folder. But no clue currently. Any advises?
The text was updated successfully, but these errors were encountered:
Alright, I've kinda fixed this. I've set load_points=0 in sh file.
Problem is, that this project tries to allocate enormous amount of VRAM.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 77.28 GiB (GPU 0; 7.80 GiB total capacity; 62.91 MiB already allocated; 6.40 GiB free; 92.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
end loading
I have 330 images 1024*1024 each. Is this too much?
Hello and thanks for your code.
I've spent couple of days trying to launch this code on the custom data i have after colmapping some images. I was able to beat many problems on the way of launching the code but I'm facing this right now and don't know what to do:
I've tried to debug this problem by attaching to the process launched using bash script in w_colmap_n360 folder. But no clue currently. Any advises?
The text was updated successfully, but these errors were encountered: