Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shocked by the initial val result #18

Open
myalos opened this issue Feb 10, 2023 · 21 comments
Open

shocked by the initial val result #18

myalos opened this issue Feb 10, 2023 · 21 comments

Comments

@myalos
Copy link

myalos commented Feb 10, 2023

Before training, the function val evaluate the initial model on the NYUv2 test set, and the result is
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.323 & 0.448 & 1.002 & 0.365 & 0.520 & 0.783 & 0.905
That shocks me, am i wrong ? why the initial model perform pretty well on the NYUv2 test set ?

@niujinshuchong
Copy link
Contributor

Hi, the depth map is aligned with GT depth during evaluation.

@Cresynia
Copy link

Hi, I read your blog and noted that you have run the structdepth before,and I met some problems when I run it. It said 'normD_down = D_down + norm_down RuntimeError: The size of tensor a (378) must match the size of tensor b (281) at non-singleton dimension 2'.I don't know where I made errors.And I'm a beginner,the first time to run,really hope you can give some help.Thanks!

@myalos
Copy link
Author

myalos commented Mar 17, 2023

Hi,
rgb = torch.permute(rgb, (0, 2, 3, 1))
rgb_down = self.pdist(rgb[:, 1:, :, :], rgb[:, :-1, :, :])
rgb_right = self.pdist(rgb[:, :, 1:, :], rgb[:, :, :-1, :])
...
aligned_norm = torch.permute(rgb, (0, 2, 3, 1))
norm_down = self.pdist(aligned_norm[:, 1:, :, :], aligned_norm[:, :-1, :, :])
norm_right = self.pdist(aligned_norm[:, :, 1:, :], aligned_norm[:, :, :-1, :])

I remember I changed these two places before, hope this help. By the way, did you run the monodepth2 on nyu2 successfully?

@Cresynia
Copy link

Cresynia commented Mar 17, 2023 via email

@Cresynia
Copy link

Problems solved!Thanks!I can run it!
And could you give me some advice about it,sometimes I felt so confused because I felt I nearly know nothing and can do nothing and still have a lot to learn.About this project,I learned for a while but I always felt not clear about it,can't get the point.Because I'm weak in all aspects.Could you give some advice,and can I ask you questions if I met problems?
Really thanks for your help! Best wishes!

@myalos
Copy link
Author

myalos commented Mar 18, 2023

I understand this feeling of confusion. But I am not expert in depth estimation(maybe beginner plus), and your expectations of my abilitiy make me a little nervous. I have run the sfmlearner indoor and structdepth before, if you met questions, feel free to ask, and i will reply if i see it.

@Cresynia
Copy link

Cresynia commented Mar 18, 2023 via email

@Cresynia
Copy link

Hi!When I ran this project,I met some problems,I want to ask you if you ever met it.I'm finding the solutions.
image

And I'm doing my graduation project based on structdepth,but I met some problems.Due to my limited knowledge,I want to do some work, and I only add attention to the network.The result is not good,instead of increasing,it decreased.And I'm finding the reason.I think maybe structdepth trained on the P2net's model,they have the same network,but if I changed the network,the result will decrease,but the structdepth have higher requirements for the pretrained model ,otherwise the result will not good.I don't know if I think right or it have this reason.So I tried to do it on the P2net and then use structdepth. I don't know if I can do like this.I have no direction.Thanks!

@myalos
Copy link
Author

myalos commented Apr 28, 2023

A1:
maybe you should change this
color_aug = transforms.ColorJitter.get_params(self.brightness, self.contrast, self.saturation, self.hue).
to this
color_aug = transforms.ColorJitter(self.brightness, self.contrast, self.saturation, self.hue)
A2:
sorry, i cannot give you advice, since i am not familiar with depth + attention.

@Cresynia
Copy link

Thanks!I can run it! Sincerely thanks for your help!
And for the second question,I just want to ask if it may have that reason for the structdepth because of changed network.I just guessed that.And I'm going to have a try.I hope I can graduate smoothly.Thanks!

@Cresynia
Copy link

Sorry,I have another question.When I'm running ,it occured:
image
image
I have no idea why this happen

@myalos
Copy link
Author

myalos commented Apr 28, 2023

i suggest you could set the breakpoint and debug, i cannot find the reason from this information. maybe gradient explode or other reason. i have not met this case before.

@Cresynia
Copy link

OK,thanks!

@Cresynia
Copy link

Do you know why this happen when I debug.
image

It's so strange.I haven't run to the val_dataset,but it occurred.It occurred at the beginning.And if I run into a function,it will occur again.Will this influence code execution?

@Cresynia
Copy link

I make a new one and the problem disappeared about the NameError,I think maybe I changed something but I didn't notice that.And I'm running to see if that problems will occur again.

@myalos
Copy link
Author

myalos commented Apr 29, 2023

ctrl + f search where val_dataset is used, comment it.

@Cresynia
Copy link

Thanks! Problems disappeared miraculously. I created a new one,and it didn't occur.But the Nan problems still exist sometimes, and sometimes it can run. I don't know why. Hope it won't occur again.And now I'm going to write the eval_res_for_each_epoch.txt just as the structdepth do,because it's not convenient to see.And I also want to use the tensorboard on the server,but I couldn't open the website it provides,do you have any solutions?Thanks!

@myalos
Copy link
Author

myalos commented Apr 29, 2023

A1:This needs you set the breakpoint and debug, or search in the website or ask person who is familiar with this bug.There are many reasons that can cause nan problems, and i have limited experience in deep learning.
A2:launch the tensorboard with --ip = 0.0.0.0

@Cresynia
Copy link

Q1:OK,Thanks!
Q2:
It still can't work.
I use the command :
tensorboard --logdir=./ --host=0.0.0.0
and go into the ip:6006
but it still can't work.

@myalos
Copy link
Author

myalos commented Apr 29, 2023

the command "tensorboard with --ip = 0.0.0.0" works for me

@Cresynia
Copy link

Cresynia commented Apr 29, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants