Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kitti-pretrain loss and acc problem #8

Open
XuekuanWang opened this issue Nov 10, 2022 · 8 comments
Open

kitti-pretrain loss and acc problem #8

XuekuanWang opened this issue Nov 10, 2022 · 8 comments

Comments

@XuekuanWang
Copy link

No description provided.

@XuekuanWang
Copy link
Author

XuekuanWang commented Nov 10, 2022

hello,
we have trained the simipu-kitti model, and found the intro-loss is lower than cross loss,but acc top1 is lower。

2022-11-10 14:44:13,589 - mmdet - INFO - Epoch [122][30/58] lr: 2.964e-04, eta: 1:24:15, time: 2.173, data_time: 0.245, memory: 59185, cross_acc_
top1: 71.7124, cross_acc_top5: 93.2814, cross_loss: 6.9118, intro_loss: 2.8096, intro_acc_top1: 36.9674, intro_acc_top5: 73.1265, loss: 9.7215
2022-11-10 14:46:14,595 - mmdet - INFO - Epoch [123][30/58] lr: 2.934e-04, eta: 1:23:11, time: 2.193, data_time: 0.189, memory: 59185, cross_acc_
top1: 71.1550, cross_acc_top5: 92.9795, cross_loss: 6.9552, intro_loss: 2.8046, intro_acc_top1: 36.8098, intro_acc_top5: 72.9778, loss: 9.7598

I think the intro loss is easy to learn, so the acc maybe higher. This result is right?

@XuekuanWang XuekuanWang changed the title intro kitti-pretrain loss and acc problem Nov 10, 2022
@zhyever
Copy link
Owner

zhyever commented Nov 10, 2022

Maybe not. Remember that we need to adopt a matching algorithm to get positive pairs in the intro branch. There could be mistakes in matching. Also, the positive pairs do not exactly locate at the same spatial position in 3D space. These things also make intro-learning more difficult.

However, while there're tons of problems, the intro-branch features are in the same representative space (i.e, extracted by PointNet in this work), and the feature similarity is higher compared with image features & LiDAR features (cross-branch). Hence, it could be a small loss (more similar) but lower acc (hard to distinguish).

These are from an intuitive view. To further step, I guess one may need to research when the contrastive loss is lower.

@XuekuanWang
Copy link
Author

XuekuanWang commented Nov 10, 2022

Thanks.

And I try to reproduce the experimental results of the paper. But fail.

3D-AP
Easy | Mod | Hard
-- | -- | -- | -- | -- | -- | -- | -- | --
paper | 81.32% | 70.88% | 66.19%
paper 无 pretrain |  79.17% | 68.58% | 64.81%
100epoch(pretrain)+100epoch(下游任务) | /79.49% | 68.54% | 64.23%
无预训练-40epoch | / | / | / | / | / | 77.59% | 67.57% | 61.78%
无预训练-100epoch | / | / | / | / | / | 78.60% | 68.75% | 64.66%

What is the reason?

1)check the kitti-pretrain model is right?The log is right?

2022-11-09 03:34:19,568 - mmdet - INFO - Epoch [100][90/116] lr: 3.697e-04, eta: 0:00:20, time: 0.999, data_time: 0.033, memory: 29223, cross_acc_top1: 75.8513, cross_acc_top5: 95.9727, cross_loss: 6.0896, intro_loss: 2.7013, intro_acc_top1: 37.7396, intro_acc_top5: 73.4069, loss: 8.7908
2022-11-09 03:34:45,791 - mmdet - INFO - Saving checkpoint at 100 epochs

  1. I try to use the open kitti-pretrain model, but fail. There is missing many keys.
    image

Can you help me reproduce the experimental results of the paper? Thanks.

@zhyever
Copy link
Owner

zhyever commented Nov 10, 2022

Please refer to the 3D detection log presented here. The model performance in the last several epochs is consistently better than the baseline w/o pertaining. No other tricks were adopted in our exps and the log is exactly corresponding to the exp reported in our paper.

For the pre-trained model, please let me know what keys are missing. I have no idea if I upload the wrong models.

@XuekuanWang
Copy link
Author

XuekuanWang commented Nov 10, 2022

ok, thanks.
It is my log when load pretrion model ==== > SimIPU_kitti_50e.pth
There has an error "unexpected key in source state_dict".

'please set runner in your config.', UserWarning)
2022-11-10 17:54:12,058 - mmdet - INFO - load checkpoint from local path: /root/paddlejob/workspace/env_run/kuan/exp/simipu/SimIPU_kitti_50e.pth
2022-11-10 17:54:12,135 - mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: backbone.conv1.weight, backbone.bn1.weight, backbone.bn1.bias, backbone.bn1.running_mean, backbone.bn1.running_var, backbone.bn1.num_batches_tracked, backbone.layer1.0.conv1.weight, backbone.layer1.0.bn1.weight, backbone.layer1.0.bn1.bias, backbone.layer1.0.bn1.running_mean, backbone.layer1.0.bn1.running_var, backbone.layer1.0.bn1.num_batches_tracked, backbone.layer1.0.conv2.weight, backbone.layer1.0.bn2.weight, backbone.layer1.0.bn2.bias, backbone.layer1.0.bn2.running_mean, backbone.layer1.0.bn2.running_var, backbone.layer1.0.bn2.num_batches_tracked, backbone.layer1.0.conv3.weight, backbone.layer1.0.bn3.weight, backbone.layer1.0.bn3.bias, backbone.layer1.0.bn3.running_mean, backbone.layer1.0.bn3.running_var, backbone.layer1.0.bn3.num_batches_tracked, backbone.layer1.0.downsample.0.weight, backbone.layer1.0.downsample.1.weight, backbone.layer1.0.downsample.1.bias, backbone.layer1.0.downsample.1.running_mean, backbone.layer1.0.downsample.1.running_var, backbone.layer1.0.downsample.1.num_batches_tracked, backbone.layer1.1.conv1.weight, backbone.layer1.1.bn1.weight, backbone.layer1.1.bn1.bias, backbone.layer1.1.bn1.running_mean, backbone.layer1.1.bn1.running_var, backbone.layer1.1.bn1.num_batches_tracked, backbone.layer1.1.conv2.weight, backbone.layer1.1.bn2.weight, backbone.layer1.1.bn2.bias, backbone.layer1.1.bn2.running_mean, backbone.layer1.1.bn2.running_var, backbone.layer1.1.bn2.num_batches_tracked, backbone.layer1.1.conv3.weight, backbone.layer1.1.bn3.weight, backbone.layer1.1.bn3.bias, backbone.layer1.1.bn3.running_mean, backbone.layer1.1.bn3.running_var, backbone.layer1.1.bn3.num_batches_tracked, backbone.layer1.2.conv1.weight, backbone.layer1.2.bn1.weight, backbone.layer1.2.bn1.bias, backbone.layer1.2.bn1.running_mean, backbone.layer1.2.bn1.running_var, backbone.layer1.2.bn1.num_batches_tracked, backbone.layer1.2.conv2.weight, backbone.layer1.2.bn2.weight, backbone.layer1.2.bn2.bias, backbone.layer1.2.bn2.running_mean, backbone.layer1.2.bn2.running_var, backbone.layer1.2.bn2.num_batches_tracked, backbone.layer1.2.conv3.weight, backbone.layer1.2.bn3.weight, backbone.layer1.2.bn3.bias, backbone.layer1.2.bn3.running_mean, backbone.layer1.2.bn3.running_var, backbone.layer1.2.bn3.num_batches_tracked, backbone.layer2.0.conv1.weight, backbone.layer2.0.bn1.weight, backbone.layer2.0.bn1.bias, backbone.layer2.0.bn1.running_mean, backbone.layer2.0.bn1.running_var, backbone.layer2.0.bn1.num_batches_tracked, backbone.layer2.0.conv2.weight, backbone.layer2.0.bn2.weight, backbone.layer2.0.bn2.bias, backbone.layer2.0.bn2.running_mean, backbone.layer2.0.bn2.running_var, backbone.layer2.0.bn2.num_batches_tracked, backbone.layer2.0.conv3.weight, backbone.layer2.0.bn3.weight, backbone.layer2.0.bn3.bias, backbone.layer2.0.bn3.running_mean, backbone.layer2.0.bn3.running_var, backbone.layer2.0.bn3.num_batches_tracked, backbone.layer2.0.downsample.0.weight, backbone.layer2.0.downsample.1.weight, backbone.layer2.0.downsample.1.bias, backbone.layer2.0.downsample.1.running_mean, backbone.layer2.0.downsample.1.running_var, backbone.layer2.0.downsample.1.num_batches_tracked, backbone.layer2.1.conv1.weight, backbone.layer2.1.bn1.weight, backbone.layer2.1.bn1.bias, backbone.layer2.1.bn1.running_mean, backbone.layer2.1.bn1.running_var, backbone.layer2.1.bn1.num_batches_tracked, backbone.layer2.1.conv2.weight, backbone.layer2.1.bn2.weight, backbone.l

@zhyever
Copy link
Owner

zhyever commented Nov 10, 2022

Could you please try other SimIPU pre-trained models, so that I can ensure that it is my mistake uploading wrong models.

@XuekuanWang
Copy link
Author

Ok, I will try other pretrain model.
There is the params of moca_r50_kitti. The key is not match ": backbone.conv1.weight, backbone.bn1.weight,"?

(img_backbone): ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): ResLayer(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)

@zhyever
Copy link
Owner

zhyever commented Nov 29, 2022

:D Hi, I would like to ask if there is any update.

Remember when pre-training, there are two encoders actually (img, point cloud). So when loading the parameters, the point cloud part could be miss-matched but the img encoder parameters could be successfully loaded.

If there is a bug, you could change the key name in the parameter dict provided in this repo. For example, you may change the backbone to img_backbone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants