Training crashes after 7000 #235

nivibilla · 2023-09-24T09:54:33Z

hi,

it gets to 7000 steps, outputs ^C. Doesn't save a point cloud either

The text was updated successfully, but these errors were encountered:

XinyueZ · 2023-09-28T14:02:10Z

I guess, it could be OOM.

jerome3o · 2023-10-23T21:09:39Z

I also get this, not sure if OOM. It seems to only happen around the time iter 7000 saves.

I was even running it in a docker container and it crashed the host machine.

@nivibilla did you ever figure out what was causing this?

jpeng2012 · 2023-12-04T06:00:47Z

Similar issue. showed killed after 7000 iterations for db/drjohnson
But tandt/train works fine.

GaneshBannur · 2024-02-21T11:40:33Z

I had the same error and it was due to OOM. When saving Gaussians there is a spike in CPU RAM usage. You're training with 423 images in Colab so I'm guessing the RAM consumption was already high. When it tried to save Gaussians, consumption must have spiked and caused an OOM.

The quick fix is to not save Gaussians at iteration 7000 and avoid the spike. Only save at 30,000 iterations (or whatever your last iteration is) using the --save_iterations 30000 argument to train.py. However the spike caused when saving at 30,000 may cause it to fail.

The better fix is given in #667. It decreases CPU RAM consumption and prevents this.

LaurensDiels · 2024-03-06T13:31:47Z

I also ran into the same issue. In my case it was not OOM-related, though. I was able to solve the problem by changing the line

elements[:] = list(map(tuple, attributes))

of save_ply in scene/gaussian_model.py, to an explicit loop:

for i in range(len(elements)):
    elements[i] = tuple(attributes[i])

.

guwinston · 2024-12-12T08:34:16Z

I encountered the same problem. In my case, when training to 7000 iterations and save_ply, the code got stuck here and couldn't continue. The reason seems to be that there are too many map operations in this line elements [:]=list (map (tuple, attributes), so I changed it to

elements['x'] = xyz[:, 0]
elements['y'] = xyz[:, 1]
elements['z'] = xyz[:, 2]
elements['nx'] = normals[:, 0]
elements['ny'] = normals[:, 1]
elements['nz'] = normals[:, 2]
for i in range(f_dc.shape[1]):
    elements['f_dc_{}'.format(i)] = f_dc[:, i]
for i in range(f_rest.shape[1]):
    elements['f_rest_{}'.format(i)] = f_rest[:, i]
elements['opacity'] = opacities[:, 0]
for i in range(scale.shape[1]):
    elements['scale_{}'.format(i)] = scale[:, i]
for i in range(rotation.shape[1]):
    elements['rot_{}'.format(i)] = rotation[:, i]

works for me, hoping it would be helpful.

GaneshBannur mentioned this issue Feb 25, 2024

no point cloud result was generated #672

Open

GaneshBannur mentioned this issue May 3, 2024

Training stops at the end when the model is being saved #782

Open

yanivw12 mentioned this issue Sep 4, 2024

30k export problem ? yanivw12/gs2mesh#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training crashes after 7000 #235

Training crashes after 7000 #235

nivibilla commented Sep 24, 2023

XinyueZ commented Sep 28, 2023

jerome3o commented Oct 23, 2023

jpeng2012 commented Dec 4, 2023

GaneshBannur commented Feb 21, 2024 •

edited

Loading

LaurensDiels commented Mar 6, 2024 •

edited

Loading

guwinston commented Dec 12, 2024 •

edited

Loading

Training crashes after 7000 #235

Training crashes after 7000 #235

Comments

nivibilla commented Sep 24, 2023

XinyueZ commented Sep 28, 2023

jerome3o commented Oct 23, 2023

jpeng2012 commented Dec 4, 2023

GaneshBannur commented Feb 21, 2024 • edited Loading

LaurensDiels commented Mar 6, 2024 • edited Loading

guwinston commented Dec 12, 2024 • edited Loading

GaneshBannur commented Feb 21, 2024 •

edited

Loading

LaurensDiels commented Mar 6, 2024 •

edited

Loading

guwinston commented Dec 12, 2024 •

edited

Loading