Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question related to Ablation study & CSS Net five layers freeze #7

Open
taeyeopl opened this issue Oct 14, 2021 · 6 comments
Open

Question related to Ablation study & CSS Net five layers freeze #7

taeyeopl opened this issue Oct 14, 2021 · 6 comments

Comments

@taeyeopl
Copy link

taeyeopl commented Oct 14, 2021

Thanks for sharing the great work!
I have two simple questions related to the ablation study & CSS Net freeze part.

Q1. Can you explain the difference between (R,t) / (R,t),s / (R,t),s,z, which is in the main paper tab3??

image

Due to a limit of my understanding, it is hard to understand the difference clearly included implementation. Is it for making a label or for variables in deep sdf training?? I'm curious because I can't find a place where the all [(R,t), s, and z] parts are affected in your code.

def get_kitti_label(dsdf, grid, latent, scale, trans, yaw, p_WC, bbox):

Q2. As in the code the conv1, bn1, and layer1 were frozen, Can you explain how to count the number of layers (5)??
I saw that in supplementary C.1. CSS Net, "the first five layers are frozen in order to prevent overfitting
to peculiarities of the rendered data".

_freeze_module(self.conv1)

@taeyeopl taeyeopl changed the title Question related to CSS Net five layers freeze Question related to Ablation study & CSS Net five layers freeze Oct 14, 2021
@xmyqsh
Copy link

xmyqsh commented Oct 19, 2021

A1: R(rotation), t(translation), s(scale), z(shape latent code[3 dim in this paper])
R,t,s can be estimated by 3D-3D correspondence estimation. The one 3D points is the back-projected Lidar Frustum Points from NOCS. The other 3d points is the DeepSDF rendering model(which is normalized and centered, which is just like sampling on CAD model) points.
Because of the 1-to-1 correspondence property of NOCS(2d map) and DeepSDF rendering model points, we can sample some of correspondence pairs, then the 3D-3D correspondence estimation can be solved by kabsch or procrustes algorithm.

z is the conditioned latent vector of DeepSDF which can be calculated for each SDF shape model by MAP in autodecode in DeepSDF. The process of MAP in autodecode is expensive. So, the MAP result z will be saved as css label. And then z could be predicted by css_net. And then z can be used from the conditioned input for DeepSDF.

def get_kitti_label(dsdf, grid, latent, scale, trans, yaw, p_WC, bbox):
The latent is generated by the MAP as described above. But the related code is not in this repo. It should be in the author's other repo.

A2: the first five layers means first five conv layers 1(self.conv1) + 4(self.layer1: for resnet18 layer1 who has 2 block consisted by 2 conv) = 5

There is some bug in the freeze code.
See the PR #8 for more detail.

@taeyeopl
Copy link
Author

Thanks for the explanation!
I understood each component but still have some misunderstandings.
Sorry for my poor understanding.

Q1. Can you explain each experimental setting clearly??
It would be really helpful to understand the ablation.
As I understood, based on the equation,
image

  1. [setting 1] (R,t) means does not multiply s, only using R, t, to transform DeepSDF rendering points to the Lidar coordinate.
  2. [setting 2] (R,t,s) means consider same as equation (10),
  3. [setting 3] (R,t,s,z) -> z part is quite hard to understand because It seems like a must-have for optimization. Can you explain the difference without/with the z??
    image

@xmyqsh
Copy link

xmyqsh commented Oct 19, 2021

The setting 3 is the default setting of this repo. And setting 1 and setting 2 are not supported currently.
The functionality of conditioned latent code z is that we can use one DSDF for all model shape, instead of one model shape one DSDF.
I cannot imagine the setting 1 and setting 2 without the paper detail or code detail currently.
We need the author to have a explain for setting 1 and setting 2.
@zakharos

@taeyeopl
Copy link
Author

taeyeopl commented Oct 22, 2021

I think it is not desirable to compare z for all model(single class, car) shapes and one model shape with one DSDF as an ablation study. Because the original DeepSDF considers all models (single class, car). It could make sense if your single model covers all models with multi-class(car, bike, etc). And, the driving scenario can't adopt one model shape one DSDF. It would be challenging to make models for all cars.

Nevertheless, I would appreciate it if you could explain each setting in order to have a clear understanding of the ablation study.
@zakharos

@zakharos
Copy link
Collaborator

zakharos commented Nov 5, 2021

Hi @taeyeop-lee! I apologize for the delay! Please find the answers to your questions below:

Ablation setup
The goal of the ablation is to demonstrate how different components of the pipeline affect the final downstream performance (detection). In particular, the 3 settings you are referring to demonstrate how different optimization variables - R (rotation), t (translation), s (scale), and z (latent shape code) - affect the end performance.

  • [setting 1] (R,t): in this setting only optimize over R and t and use an initial s scale prediction without optimizing it.
  • [setting 2] (R,t,s): here we optimize over rotation, translation, and scale.
  • [setting 3] (R,t,s,z): finally, in this case we additionally optimize over z variable, which makes it possible to also change the shape of the model during optimization.

From the results in Table 3, we see that setting 3 results in best overall perfomance.

Frozen layers
@xmyqsh is absolutely right in describing how frozen layers are computed.

@xmyqsh
Copy link

xmyqsh commented Nov 15, 2021

@zakharos
Actually, my doubt is on the [setting 1], what the value of the initial s scale is? What is the range of scale s? [0~inf]? I don't think it can get such a good result without optimizing it.
@taeyeop-lee Good suggestion. Further experiment is needed to verify it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants