Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting an error while training during the validation step after 0th epoch! #157

Closed
porwalnaman01 opened this issue Jun 25, 2021 · 8 comments

Comments

@porwalnaman01
Copy link

Hello there, loved your work and paper. I am facing an issue during the training process, it trains the model for the first epoch but during the validation step of first epoch it outputs an error. Hope you can help me here. I am using Pytorch 1.8.0 on ubuntu 18.04.

Epoch 0 | Avg.Loss 0.0849: 100%|###############################################| 5004/5004 [02:07<00:00, 39.24 images/s]
KITTI_tiny-kitti_tiny-velodyne: 0%| | 0.00/5.00 [00:00<?, ? images/s]
Traceback (most recent call last):
File "scripts/train.py", line 68, in
train(args.file)
File "scripts/train.py", line 63, in train
trainer.fit(model_wrapper)
File "/disk1/dan/Naman/packnet-sfm/packnet_sfm/trainers/horovod_trainer.py", line 65, in fit
validation_output = self.validate(val_dataloaders, module)
File "/disk1/dan/Naman/packnet-sfm/packnet_sfm/trainers/horovod_trainer.py", line 120, in validate
output = module.validation_step(batch, i, n)
File "/disk1/dan/Naman/packnet-sfm/packnet_sfm/models/model_wrapper.py", line 194, in validation_step
output = self.evaluate_depth(batch)
File "/disk1/dan/Naman/packnet-sfm/packnet_sfm/models/model_wrapper.py", line 302, in evaluate_depth
inv_depths[0], inv_depths_flipped[0], method='mean')
File "/disk1/dan/Naman/packnet-sfm/packnet_sfm/utils/depth.py", line 247, in post_process_inv_depth
B,C, H, W = inv_depth.shape
ValueError: not enough values to unpack (expected 4, got 3)

@porwalnaman01 porwalnaman01 changed the title Getting a wierd error while training during the validation step after 0th epoch! Getting an error while training during the validation step after 0th epoch! Jun 25, 2021
@tzanis-anevlavis
Copy link

tzanis-anevlavis commented Jul 1, 2021

Hey, did you figure this out by any chance? I also get the same when trying to overfit KITTI tiny.

@VitorGuizilini-TRI
Copy link
Collaborator

Are you using the latest version of the repository, after the most recent commit?

@tzanis-anevlavis
Copy link

Yes, working with the latest version and docker. It seems that both inputs to 'post_process_inv_depth( )' have a dimension mismatch.

@GaneshAdam
Copy link

Hey, Did you guys solved this by any chance. I am facing similar issue. Any help is appreciated. Thanks!!!

@tzanis-anevlavis
Copy link

tzanis-anevlavis commented Jul 1, 2021

It seems that the functions compute_depth_metrics( ) and post_process_inv_depth( ), which are both called in evaluate_depth( ) (line 291 within the model_wrapper.py) expect their inputs to be of shape [B, C, H, W], but only the first batch is passed. In this example B=1, but that's not what the function expects.

So I did replace inv_depths[0] and inv_depths_flipped[0] with inv_depths and inv_depths_flipped at lines 295 and 301 within evaluate_depth( ). Now it seems to be working, and I got the following results from overfitting KITTI

Screen Shot 2021-06-30 at 11 38 00 PM

I did not have a lot of time to dig more into the code and understand if my reasoning above is correct, but maybe @VitorGuizilini-TRI can verify or shed some more light! :)

@porwalnaman01
Copy link
Author

@GaneshAdam Pulling the latest commited version of the code and then replacing the packnet_sfm/datasets/augmentation.py file with the older version worked for me. Don't know why, but without replacing the augmentation.py file with the older version gave an error. You may first try by using the latest version of code, if still it gives any error you can try replacing the augmentation.py file with it's older version. I am using ubuntu 18.04, cuda 11.1 and a conda environment.

@GaneshAdam
Copy link

@janis10, @porwalnaman01..Thank you for quick response. I tried solution suggested by @janis10 , its working now.

@stellarpower
Copy link

I am still running into this as of 6e3161f, running the sample command provided in the readme (for the KITTI overfit). I have had to modify the build and bump the versions of dependencies, as our GPU isn't supported by the older version of CUDA - you can see this in my fork here, so I had assumed this was related to breaking changes in PyTorch. However as this issue exists recently, am guessing perhaps it may need to be re-opened.

Happy to provide more details; however was trying to help a colleague who couldn't get this to build, so only have a very high-level idea of what the repo is doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants