Sub-optimal results on customized KITTI dataset #14

ruili3 · 2023-04-02T15:01:26Z

Hi!

Thank you for sharing the code of this impressive work.
Since there is no config file and training list for the Waymo dataset, I use a workaround to build a KITTI benchmark based on the KITTI_STEP dataset. It has annotated instance labels for the train/val set, with ~5000 images feasible for evaluating the ARI metric. I choose ~3500 for training and ~1500 for testing using the depth supervision of SAVi++. Since there is no annotation for optical flow, I use only LiDAR points for supervision.

After training the network, both depth and segmentations are not as good as reported in the paper. The FG-ARI is around 8.0, which is quietly low. Visualizations for a single frame of the sequence are shown below (from top to bottom: image, interpolated_depth_gt, segmentations_gt, depth_pred, segmentations_pred).

Considering the large discrepancy between the KITTI and Waymo datasets, I have some queries on the setting you used in Waymo experiments:

In addition to LiDAR supervision, do you use other signals (eg, optical flow as used in MOVi) for supervision in Waymo?
Do you supervise the depth in the log depth space (ie, log(d+1))?
Are there customized settings for Waymo data, that are different from that used for the MOVi dataset?

It will be great if you can offer other insights on to achieve comparable results on the outdoor dataset. Thank you!

The text was updated successfully, but these errors were encountered:

tkipf · 2023-04-03T08:39:50Z

Thanks a lot for your message! It is difficult to say whether the method will work on KITTI, as the dataset is quite a bit smaller, and the larger diversity of Waymo Open might be helpful in stabilizing the approach.

Regarding your questions:

SAVi++ on Waymo Open only uses LiDAR as a prediction signal
We train using mean-squared error on the log(d+1) normalized LiDAR signal, and we only compute the loss for observed LiDAR points (i.e. we mask out empty parts of the depth image).
All settings are described in the paper -- we use a hybrid ResNet + Transformer encoder and apply crop-resize data augmentation. The data augmentation hyperparameters are different for Waymo Open than for MOVi.

Hope this helps!

ruili3 · 2023-04-03T09:43:09Z

Thank you for your feedback!

Are there some suggestions on the image resolution? In our implementation, the input/output image size is set to 192x640, which is different from the Waymo setting. I noticed the broadcast decoder will amplify the resolution 16 times, and there are also fixed patch sizes (ie, 8) during the encoding stage. According to your experience, do you think it is necessary to tune these parameters according to the input resolution?

Thank you very much!

gamaleldin · 2023-04-03T18:16:32Z

Thanks for reaching out. In Waymo experiments, we used a resolution of 128x192 and a high resolution version with resolution 256x384. There are two parameters to adapt the decoder for different resolutions:

Adjust the starting resolution of the decoder by changing the "resolution": (8, 12) in our decoder config.
Adding more layers convT layer to the decoder.
Adjusting those parameters will have implication on FLOPS and/or number of model parameters.

In the resolution you mentioned 192x640, I believe adjusting decoder resolution parameter from (8, 12) --> (12, 40) may be good to try. I expect this to be computationally expensive given the high resolution.
Another setting to try, which may be more computationally efficient is:

Adjust the decoder resolution parameter to (6, 20)

Adjust the decoder "backbone" to the following:

   "backbone": ml_collections.ConfigDict({
       "module": "savi.modules.CNN",
       "features": [64, 64, 64, 64, 64],
       "kernel_size": [(5, 5), (5, 5), (5, 5), (5, 5),  (5, 5)],
       "strides": [(2, 2), (2, 2), (2, 2), (2, 2), (2, 2)],
       "layer_transpose": [True, True, True, True, True]
   }),

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sub-optimal results on customized KITTI dataset #14

Sub-optimal results on customized KITTI dataset #14

ruili3 commented Apr 2, 2023 •

edited

Loading

tkipf commented Apr 3, 2023

ruili3 commented Apr 3, 2023 •

edited

Loading

gamaleldin commented Apr 3, 2023

Sub-optimal results on customized KITTI dataset #14

Sub-optimal results on customized KITTI dataset #14

Comments

ruili3 commented Apr 2, 2023 • edited Loading

tkipf commented Apr 3, 2023

ruili3 commented Apr 3, 2023 • edited Loading

gamaleldin commented Apr 3, 2023

ruili3 commented Apr 2, 2023 •

edited

Loading

ruili3 commented Apr 3, 2023 •

edited

Loading