Getting an error while training during the validation step after 0th epoch! #157

porwalnaman01 · 2021-06-25T12:09:34Z

Hello there, loved your work and paper. I am facing an issue during the training process, it trains the model for the first epoch but during the validation step of first epoch it outputs an error. Hope you can help me here. I am using Pytorch 1.8.0 on ubuntu 18.04.

Epoch 0 | Avg.Loss 0.0849: 100%|###############################################| 5004/5004 [02:07<00:00, 39.24 images/s]
KITTI_tiny-kitti_tiny-velodyne: 0%| | 0.00/5.00 [00:00<?, ? images/s]
Traceback (most recent call last):
File "scripts/train.py", line 68, in
train(args.file)
File "scripts/train.py", line 63, in train
trainer.fit(model_wrapper)
File "/disk1/dan/Naman/packnet-sfm/packnet_sfm/trainers/horovod_trainer.py", line 65, in fit
validation_output = self.validate(val_dataloaders, module)
File "/disk1/dan/Naman/packnet-sfm/packnet_sfm/trainers/horovod_trainer.py", line 120, in validate
output = module.validation_step(batch, i, n)
File "/disk1/dan/Naman/packnet-sfm/packnet_sfm/models/model_wrapper.py", line 194, in validation_step
output = self.evaluate_depth(batch)
File "/disk1/dan/Naman/packnet-sfm/packnet_sfm/models/model_wrapper.py", line 302, in evaluate_depth
inv_depths[0], inv_depths_flipped[0], method='mean')
File "/disk1/dan/Naman/packnet-sfm/packnet_sfm/utils/depth.py", line 247, in post_process_inv_depth
B,C, H, W = inv_depth.shape
ValueError: not enough values to unpack (expected 4, got 3)

tzanis-anevlavis · 2021-07-01T00:53:05Z

Hey, did you figure this out by any chance? I also get the same when trying to overfit KITTI tiny.

VitorGuizilini-TRI · 2021-07-01T01:14:45Z

Are you using the latest version of the repository, after the most recent commit?

tzanis-anevlavis · 2021-07-01T01:20:12Z

Yes, working with the latest version and docker. It seems that both inputs to 'post_process_inv_depth( )' have a dimension mismatch.

GaneshAdam · 2021-07-01T05:49:17Z

Hey, Did you guys solved this by any chance. I am facing similar issue. Any help is appreciated. Thanks!!!

tzanis-anevlavis · 2021-07-01T06:51:16Z

It seems that the functions compute_depth_metrics( ) and post_process_inv_depth( ), which are both called in evaluate_depth( ) (line 291 within the model_wrapper.py) expect their inputs to be of shape [B, C, H, W], but only the first batch is passed. In this example B=1, but that's not what the function expects.

So I did replace inv_depths[0] and inv_depths_flipped[0] with inv_depths and inv_depths_flipped at lines 295 and 301 within evaluate_depth( ). Now it seems to be working, and I got the following results from overfitting KITTI

I did not have a lot of time to dig more into the code and understand if my reasoning above is correct, but maybe @VitorGuizilini-TRI can verify or shed some more light! :)

porwalnaman01 · 2021-07-01T07:01:47Z

@GaneshAdam Pulling the latest commited version of the code and then replacing the packnet_sfm/datasets/augmentation.py file with the older version worked for me. Don't know why, but without replacing the augmentation.py file with the older version gave an error. You may first try by using the latest version of code, if still it gives any error you can try replacing the augmentation.py file with it's older version. I am using ubuntu 18.04, cuda 11.1 and a conda environment.

GaneshAdam · 2021-07-01T08:19:43Z

@janis10, @porwalnaman01..Thank you for quick response. I tried solution suggested by @janis10 , its working now.

stellarpower · 2021-08-13T19:45:07Z

I am still running into this as of 6e3161f, running the sample command provided in the readme (for the KITTI overfit). I have had to modify the build and bump the versions of dependencies, as our GPU isn't supported by the older version of CUDA - you can see this in my fork here, so I had assumed this was related to breaking changes in PyTorch. However as this issue exists recently, am guessing perhaps it may need to be re-opened.

Happy to provide more details; however was trying to help a colleague who couldn't get this to build, so only have a very high-level idea of what the repo is doing.

porwalnaman01 changed the title ~~Getting a wierd error while training during the validation step after 0th epoch!~~ Getting an error while training during the validation step after 0th epoch! Jun 25, 2021

porwalnaman01 closed this as completed Jun 28, 2021

stellarpower added a commit to stellarpower/packnet-sfm that referenced this issue Aug 25, 2021

Fix for TRI-ML#157

76692aa

stellarpower mentioned this issue Aug 25, 2021

Bump to Cuda 11.1; Implement fix for #157 #174

Open

jhan15 mentioned this issue Mar 3, 2022

a bug for load tiny DDAD #185

Open

MiyataYuya mentioned this issue Apr 4, 2023

is:issue is:open ValueError: not enough values to unpack (expected 4, got 3) 测试kitti_tinny数据集出现这个错误是为什么 #245

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting an error while training during the validation step after 0th epoch! #157

Getting an error while training during the validation step after 0th epoch! #157

porwalnaman01 commented Jun 25, 2021

tzanis-anevlavis commented Jul 1, 2021 •

edited

Loading

VitorGuizilini-TRI commented Jul 1, 2021

tzanis-anevlavis commented Jul 1, 2021

GaneshAdam commented Jul 1, 2021

tzanis-anevlavis commented Jul 1, 2021 •

edited

Loading

porwalnaman01 commented Jul 1, 2021

GaneshAdam commented Jul 1, 2021

stellarpower commented Aug 13, 2021

Getting an error while training during the validation step after 0th epoch! #157

Getting an error while training during the validation step after 0th epoch! #157

Comments

porwalnaman01 commented Jun 25, 2021

tzanis-anevlavis commented Jul 1, 2021 • edited Loading

VitorGuizilini-TRI commented Jul 1, 2021

tzanis-anevlavis commented Jul 1, 2021

GaneshAdam commented Jul 1, 2021

tzanis-anevlavis commented Jul 1, 2021 • edited Loading

porwalnaman01 commented Jul 1, 2021

GaneshAdam commented Jul 1, 2021

stellarpower commented Aug 13, 2021

tzanis-anevlavis commented Jul 1, 2021 •

edited

Loading

tzanis-anevlavis commented Jul 1, 2021 •

edited

Loading