PyCUDA error when launching train_ft on custom colmap data #88

DaddyWesker · 2023-08-03T08:50:50Z

Hello and thanks for your code.

I've spent couple of days trying to launch this code on the custom data i have after colmapping some images. I was able to beat many problems on the way of launching the code but I'm facing this right now and don't know what to do:

dataset total: train 330
dataset [NerfSynthFtDataset] was created
../checkpoints/col_nerfsynth/yandex/*_net_ray_marching.pth
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Continue training from 0 epoch
Iter: 0
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
opt.act_type!!!!!!!!! LeakyReLU
self.points_embeding torch.Size([1, 841, 32])
querier device cuda:0 0
neural_params [('module.neural_points.xyz', torch.Size([841, 3]), False), ('module.neural_points.points_embeding', torch.Size([1, 841, 32]), True), ('module.neural_points.points_conf', torch.Size([1, 841, 1]), True), ('module.neural_points.points_dir', torch.Size([1, 841, 3]), True), ('module.neural_points.points_color', torch.Size([1, 841, 3]), True), ('module.neural_points.Rw2c', torch.Size([3, 3]), False)]
model [MvsPointsVolumetricModel] was created
opt.resume_iter!!!!!!!!! 0
loading ray_marching  from  ../checkpoints/col_nerfsynth/yandex/0_net_ray_marching.pth
------------------- Networks -------------------
[Network ray_marching] Total number of parameters: 0.377M
------------------------------------------------
# training images = 330
saving model (yandex, epoch 0, total_steps 0)
Traceback (most recent call last):
  File "train_ft.py", line 1081, in <module>
    main()
  File "train_ft.py", line 937, in main
    model.optimize_parameters(total_steps=total_steps)
  File "/home/daddywesker/Dioram/yandex/pointnerf/run/../models/neural_points_volumetric_model.py", line 217, in optimize_parameters
    self.backward(total_steps)
  File "/home/daddywesker/Dioram/yandex/pointnerf/run/../models/mvs_points_volumetric_model.py", line 104, in backward
    self.loss_total.backward()
  File "/home/daddywesker/anaconda3/envs/limap/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/home/daddywesker/anaconda3/envs/limap/lib/python3.8/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
end loading
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.

I've tried to debug this problem by attaching to the process launched using bash script in w_colmap_n360 folder. But no clue currently. Any advises?

The text was updated successfully, but these errors were encountered:

DaddyWesker · 2023-08-03T12:36:06Z

Alright, I've kinda fixed this. I've set load_points=0 in sh file.

Problem is, that this project tries to allocate enormous amount of VRAM.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 77.28 GiB (GPU 0; 7.80 GiB total capacity; 62.91 MiB already allocated; 6.40 GiB free; 92.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
end loading

I have 330 images 1024*1024 each. Is this too much?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyCUDA error when launching train_ft on custom colmap data #88

PyCUDA error when launching train_ft on custom colmap data #88

DaddyWesker commented Aug 3, 2023

DaddyWesker commented Aug 3, 2023

PyCUDA error when launching train_ft on custom colmap data #88

PyCUDA error when launching train_ft on custom colmap data #88

Comments

DaddyWesker commented Aug 3, 2023

DaddyWesker commented Aug 3, 2023