Skip to content
This repository has been archived by the owner on Sep 2, 2024. It is now read-only.

Cityscapes experiment #2

Open
maria8899 opened this issue Feb 7, 2019 · 24 comments
Open

Cityscapes experiment #2

maria8899 opened this issue Feb 7, 2019 · 24 comments

Comments

@maria8899
Copy link

maria8899 commented Feb 7, 2019

Hi,
Thanks for open sourcing the code, this is great!
Could you share your json parameter file for cityscapes?
Also, I think it is is missing the file depth_mean.npy to be able to run it.

Thanks.

@ozansener
Copy link
Collaborator

We will update the code in a few days with depth_mean.npy and config files. But, in the mean time here are the config files if you do not want to wait:

  • Parameter Set w/ Approximation:
    optimizer=Adam|batch_size=8|lr=0.0005|dataset=cityscapes|normalization_type=none|algorithm=mgda|use_approximation=True

  • Parameter Set w/o Approximation:
    optimizer=Adam|batch_size=8|lr=0.0001|dataset=cityscapes|normalization_type=none|algorithm=mgda|use_approximation=False

depth_mean.npy is the average depth map of training set. We use it for making the input zero mean.

@maria8899
Copy link
Author

Thanks. Will the update code be running with pytorch 1.0? I am getting into a few problems to run it since some features are deprecated (e.g. volatile variables, .data[0], etc.)

@maria8899
Copy link
Author

maria8899 commented Feb 8, 2019

I am also having an error with the FW step:
sol, min_norm = MinNormSolver.find_min_norm_element([grads[t] for t in tasks])

----> 1 sol, min_norm = MinNormSolver.find_min_norm_element([grads[t] for t in tasks])

~/MultiObjectiveOptimization/min_norm_solvers.py in find_min_norm_element(vecs)
     99         # Solution lying at the combination of two points
    100         dps = {}
--> 101         init_sol, dps = MinNormSolver._min_norm_2d(vecs, dps)
    102 
    103         n=len(vecs)

~/MultiObjectiveOptimization/min_norm_solvers.py in _min_norm_2d(vecs, dps)
     42                     dps[(i, j)] = 0.0
     43                     for k in range(len(vecs[i])):
---> 44                         dps[(i,j)] += torch.dot(vecs[i][k], vecs[j][k]).data[0]
     45                     dps[(j, i)] = dps[(i, j)]
     46                 if (i,i) not in dps:

RuntimeError: dot: Expected 1-D argument self, but got 4-D

What exactly should containgrads and [grads[t] for t in tasks]?

Edit: the solution is to replace with
torch.dot(vecs[i][k].view(-1), vecs[j][k].view(-1)).item()

@SimonVandenhende
Copy link

I can confirm that those changes worked for me to get the code running with pytorch 1.0.
I was able to reproduce the results for the single task models, but so far no luck with the mgda method.
Did you have to include any other changes @maria8899

@ozansener
Copy link
Collaborator

@r0456230 Can you tell me what exactly you are trying to reproduce? The config files I put as a comment should give exact results of mgda w/ and w/o approximation.

Please note that; we report disparity metric in paper and compute depth metric in the code. Depth map is later separately converted into disparity as post-processing. If the issue is depth, this should explain it.

mIoU should be exactly same with what reported in the code and the paper. We used the parameters I posted as a comment.

@ozansener
Copy link
Collaborator

@maria8899 Although we are planning to support pytorch 1.0, I am not sure when will it be. I will also update the ReadMe with the exact versions of each Python module we used. Pytorch was 0.3.1

@SimonVandenhende
Copy link

@ozansener I was able to reproduce the results from the paper for the single task models using your code (depth, instance segmentation and semantic segmentation on cityscapes).
However, when I run the code with the parameters posted above, after 50 epochs the models seems to be far removed from the results obtained in the paper.

@maria8899
Copy link
Author

maria8899 commented Feb 13, 2019

I think I have managed to make it work with pytorch 1.0, but I still need to check the results and train it fully.
@r0456230 I haven't done much other changes, the main problem was in this FW step. Have you setup the scales/tasks correctly in the json file?

@YoungGer
Copy link

YoungGer commented Feb 18, 2019

@maria8899 @r0456230 could you please tell me how to solve depth_mean.npy missing problem?

I tried the code below, but I'm not sure if it's correct

depth_mean = np.mean([depth!=0])
depth[depth!=0] = (depth[depth!=0] - depth_mean) / self.DEPTH_STD
#depth[depth!=0] = (depth[depth!=0] - self.DEPTH_MEAN[depth!=0]) / self.DEPTH_STD

@maria8899
Copy link
Author

maria8899 commented Feb 18, 2019

@YoungGer you need to compute a mean image (per pixel) using all training images (or just a few to get an approximation) of the Cityscapes's disparity dataset.

@YoungGer
Copy link

@YoungGer you need to compute a mean image (per pixel) using all training images (or just a few to get an approximation) of the Cityscapes's disparity dataset.

I know, thank you for your help.

@JulienSiems
Copy link

@YoungGer Have you noticed, that the find_min_norm_element method actually uses the projected gradient descent method? Only find_min_norm_element_FW is the Frank-Wolfe algorithm as discussed in the paper. They are only guaranteed to be equivalent for a number of tasks equal to 2.

@kilianyp
Copy link

kilianyp commented Mar 29, 2019

EDIT: Obviously I realised right after sending that question 2 is because of the optimization. Question 1 still remains.

Hi @ozansener ,
thanks for publishing your code!

I have two questions after reading this answer by @maria8899

Edit: the solution is to replace with
torch.dot(vecs[i][k].view(-1), vecs[j][k].view(-1)).item()

In your code the z variable returned by the 'backbone' of the network is passed to each task.
Its gradient is then used in the find_min_norm algorithm.

First of all as maria noted, the gradient is a 4D variable that needs to be reshaped first to 1D in torch 1.0.1.
I compared the behavior to torch 0.3.1 and it does lead to the same result, but it raised some questions, which very well might come from my missing understanding of your paper.

  1. The gradient still has the batch dimension, why do you calculate the the min_norm_point between all samples as one big vector instead of for example averaging or summing over the batch dimension? Isn't this what is effectively happen after the reshaping? This is just intuitively speaking comparing it with stochastic gradient descent.
  2. Why is there a batch dimension anyway? From the paper it is not quite clear to me what should be fed into the FrankWolfeSolver, but shouldn't it be the gradient of some parameters instead of an output variable? Or does that not matter and lead to the same result?

Thanks a lot!

@ozansener
Copy link
Collaborator

ozansener commented May 1, 2019

@kilsenp First let me answer the 2.

  • 2: You are right if you apply MGDA directly, it should be gradients with respect to parameters. However, one of the main contributions of the paper is showing that instead you can actually feed gradients with respect to the representations. This is basically the Section 3.3 of the paper and what we are computing in the code is $\nabla_Z$.

  • 1: No, you need batch dimension since forward pass of the network is different for each image. You can read Section 3.3 of the paper in detail to understand whats going on.

@liyangliu
Copy link

Hi, @maria8899, @kilsenp,
Have you reproduced the results on MultiMNIST or CityScapes? Thanks.

@youandeme
Copy link

Hi,@liyangliu ,
Have you reproduced the results on MultiMNIST? I have tried but only got the result like grid search.Would you like to tell me the params you chosen?Thanks.

@liyangliu
Copy link

@youandeme, I haven't reproduced the results on MultiMNIST. I used the same hyper parameters mentioned by the author in #9, but can not surpass the uniform scaling baseline. Also, I noticed in the "Gradient Surgery" paper (supplementary materials), other researchers report different results on MultiMNIST from this MOO paper. So I doubt that others also have difficulty in reproducing the results on MultiMNIST following this MOO paper.

@ozansener
Copy link
Collaborator

@liyangliu @youandeme How are you evaluating the MultiMNIST? We did not release any test set, actually there is no test set. The code generates random test set every time you run. For all modules, you simply use the hyper-params I put. Then, you save every epoch result and choose the best epoch with the best val accuracy. Then, you call MultiMNIST test which will generate random test set and evaluate it. If you call the MultiMNIST loader with test param, it should do the trick. If you evaluate this way, the result are not exactly matching since test set is randomly generated, but the order of methods is preserved.

@liyangliu
Copy link

liyangliu commented May 4, 2020

Hi, @ozansener, as you mentioned, the order of different methods (MGDA-UB vs. uniform scaling) will keep the same whatever test set I use. But on the validation set, I cannot find the superiority of MGDA-UB upon uniform scaling. Also, on CityScapes I cannot reproduce the results reported in the paper. Actually I find that single-task baseline is better than the reported ones (10.28 vs. 11.34, 64.04 vs. 60.68 on the instance and semantic segmentation task respectively). I obtain these numbers with your provided code, so maybe I made some mistakes?

@ozansener
Copy link
Collaborator

@liyangliu For MultiMNIST, I think there are issues since we did not release a test. Everyone reports slightly different numbers. In hindsight, we should have released the test set but did not even save it. So, I would say please report whatever number you obtained for MultiMNIST. For Cityscapes though, it is strange as many people re-produced the numbers. Please send me an e-mail about the CityScapes so we can discuss.

@liyangliu
Copy link

liyangliu commented May 5, 2020

Thanks. @ozansener. On CityScapes I re-run your code with instance & semantic segmentation task and get the following results for MGDA-UB and SINGLE task, respectively:

method instance semantic
MGDA-UB 15.88 64.53
SINGLE 10.28 64.04
MGDA-UB (paper) 10.25 66.63
SINGLE (paper) 11.34 60.08

It seems that the performance of instance segmentation is a bit strange.

@ozansener
Copy link
Collaborator

@liyangliu Instance segmentation one looks strange. Are you using the hyper-params I posted for both single task and multi-task. Also, are the tasks uniformly scaled or are you doing any search. Let me know the setup.

@liyangliu
Copy link

liyangliu commented May 8, 2020

@ozansener, I use exactly the hyper-params you posted for single & multi-task training. I use 1/0 and 0/1 scale for single task training (instance and semantic segmentation) and didn't do any grid search.

@AwesomeLemon
Copy link

Sorry for an off-topic question, but I have trouble even running the training on CityScapes: for 256x512 input I get 32x64 output, while the target is 256x512. And smaller output makes sense to me because of the not dilated earlier layers & maxpooling.
So could someone please clear up for me, whether target indeed should have the same dimensions as input, and if so, where the spatial upsampling is supposed to happen?

@ozansener ozansener mentioned this issue Jul 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants