Skip to content
This repository has been archived by the owner on Apr 9, 2024. It is now read-only.

Sub-optimal results on customized KITTI dataset #14

Open
ruili3 opened this issue Apr 2, 2023 · 3 comments
Open

Sub-optimal results on customized KITTI dataset #14

ruili3 opened this issue Apr 2, 2023 · 3 comments

Comments

@ruili3
Copy link

ruili3 commented Apr 2, 2023

Hi!

Thank you for sharing the code of this impressive work.
Since there is no config file and training list for the Waymo dataset, I use a workaround to build a KITTI benchmark based on the KITTI_STEP dataset. It has annotated instance labels for the train/val set, with ~5000 images feasible for evaluating the ARI metric. I choose ~3500 for training and ~1500 for testing using the depth supervision of SAVi++. Since there is no annotation for optical flow, I use only LiDAR points for supervision.

After training the network, both depth and segmentations are not as good as reported in the paper. The FG-ARI is around 8.0, which is quietly low. Visualizations for a single frame of the sequence are shown below (from top to bottom: image, interpolated_depth_gt, segmentations_gt, depth_pred, segmentations_pred).
image

Considering the large discrepancy between the KITTI and Waymo datasets, I have some queries on the setting you used in Waymo experiments:

  • In addition to LiDAR supervision, do you use other signals (eg, optical flow as used in MOVi) for supervision in Waymo?
  • Do you supervise the depth in the log depth space (ie, log(d+1))?
  • Are there customized settings for Waymo data, that are different from that used for the MOVi dataset?

It will be great if you can offer other insights on to achieve comparable results on the outdoor dataset. Thank you!

@tkipf
Copy link
Contributor

tkipf commented Apr 3, 2023

Thanks a lot for your message! It is difficult to say whether the method will work on KITTI, as the dataset is quite a bit smaller, and the larger diversity of Waymo Open might be helpful in stabilizing the approach.

Regarding your questions:

  • SAVi++ on Waymo Open only uses LiDAR as a prediction signal
  • We train using mean-squared error on the log(d+1) normalized LiDAR signal, and we only compute the loss for observed LiDAR points (i.e. we mask out empty parts of the depth image).
  • All settings are described in the paper -- we use a hybrid ResNet + Transformer encoder and apply crop-resize data augmentation. The data augmentation hyperparameters are different for Waymo Open than for MOVi.

Hope this helps!

@ruili3
Copy link
Author

ruili3 commented Apr 3, 2023

Thank you for your feedback!

Are there some suggestions on the image resolution? In our implementation, the input/output image size is set to 192x640, which is different from the Waymo setting. I noticed the broadcast decoder will amplify the resolution 16 times, and there are also fixed patch sizes (ie, 8) during the encoding stage. According to your experience, do you think it is necessary to tune these parameters according to the input resolution?

Thank you very much!

@gamaleldin
Copy link
Contributor

Thanks for reaching out. In Waymo experiments, we used a resolution of 128x192 and a high resolution version with resolution 256x384. There are two parameters to adapt the decoder for different resolutions:

  1. Adjust the starting resolution of the decoder by changing the "resolution": (8, 12) in our decoder config.
  2. Adding more layers convT layer to the decoder.
    Adjusting those parameters will have implication on FLOPS and/or number of model parameters.

In the resolution you mentioned 192x640, I believe adjusting decoder resolution parameter from (8, 12) --> (12, 40) may be good to try. I expect this to be computationally expensive given the high resolution.
Another setting to try, which may be more computationally efficient is:

  1. Adjust the decoder resolution parameter to (6, 20)

  2. Adjust the decoder "backbone" to the following:

       "backbone": ml_collections.ConfigDict({
           "module": "savi.modules.CNN",
           "features": [64, 64, 64, 64, 64],
           "kernel_size": [(5, 5), (5, 5), (5, 5), (5, 5),  (5, 5)],
           "strides": [(2, 2), (2, 2), (2, 2), (2, 2), (2, 2)],
           "layer_transpose": [True, True, True, True, True]
       }),
    

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants