-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions on reproducing the reported results on MS COCO #30
Comments
We honestly haven't encountered any case where ASL has not outperformed easily cross entropy. here are some training tricks we used (they are quite standard and can be found also in public repositories like this), see if something resonant differently from your framework:
that's what I can think of at the top of my head. |
Thank you for the information. As I tried the cross entropy loss and the proposed ASL with same training configurations -- with only standard data augmentations, I suppose ASL should have produced at least better results. For these training tricks, I do believe they have potentials to improve the performance (both cross entropy and ASL). If I apply these tricks to ASL training, I also need to apply them to traditional cross entropy training to verify that the performance improvement comes from ASL (instead of these tricks). So, I am wondering if the experiments (for the same backbone) that produce the results in Fig. 8 used all these tricks (e.g. EMA, AutoAugment)? Also, from my understanding, both cross entropy and ASL in Fig. 8 are initialized by the corresponding models (ResNet/TresNet) trained on ImageNet, right? Besides, could you please also specify some more details for the following hyper parameters, so that I do not have to try them all:
Thank you! |
I think that our approaches ("philosophy" :) ) for deep learning are a bit different "training tricks" is a bit underwhelming name for the most important thing in deep learning. they are not extra methods your should choose whether to use or not. they are the essence, the bread-and-butter. i would be very proud if someone categorizes ASL as a good training trick. training without proper augmentations, for example, is unacceptable in my opinion. the training would quickly overfit, no matter what your loss function is. EMA is also essential in any modern deep learning scheme. it usually outscores the regular model by ~1%, and generalize better. my answers to your questions:
all of the training tricks i mentioned are used in this repository: all the best |
Hello @shuijundadoudou, can you share the training setup to get ~ 82.5 mAP for the cross-entropy loss? for example, the optimizer, learning rate augmentation, and any training tricks? |
Hi @mrT23 , thank you for the details. When you say "squish resizing", do you mean that unlock the aspect ratio during resizing such that objects are stretched? What is the reason behind that? Is that because we could potentially crop some objects out of the image? Since COCO dataset has positional info (bounding boxes, masks), does it make sense to use such info to mark certain objects as negative if those objects are cropped out? |
read more about squish vs crop resizing here: crop-resizing works basically only on ImageNet, because the vast majority of objects there are zoomed-in and in the center. |
Thank you @mrT23 for the elaboration. Really helpful. |
This would be helpful. By the way, I think that @shuijundadoudou used the asymmetric loss (ASL), though. |
I agree. We cannot share our training code as-is due to commercial limitations, but once a public code will be shared, we can try to help improve it and get results similar to the ones in the article |
I also want to reproduce the result on MSCOCO, but due to my limitation on GPU resourses and limited time, I resize the img to 224*224. I use no training tricks and set the learning rate 1e-4 constantly. At first I thought even if I can't get mAP 86.6, I can get at least mAP 70, which is enough for me. But after about 100 iteration in Epoch 1, the loss decerases from 230 to 90 and doesn't decrease anymore, the mAP on validation is 6, very low. I wonder whether it's because I don't implement the tricks or I should not resize the images to 224 or simply the code is wrong. Here are my training code, and I don't change other files like tresnet.py or so, I am a deep learning beginner and I will be very grateful if you point out my problems, it has puzzled me for quite a long time. -- coding: utf-8 --parser = argparse.ArgumentParser(description='PyTorch ImageNet Training') def main():
if name == 'main': |
@GhostWnd hello你的问题发现什么原因了吗 我也出现了这个问题 谢谢 |
Great thanks for your details, which really helps me a lot! |
Hi,
First, thank you for sharing the exciting work.
I was trying to reproduce the results on MS COCO dataset based on my own training framework. When I used cross entropy loss
loss_function=AsymmetricLoss(gamma_neg=0,** gamma_pos=0, clip=0)
to achieve the baseline. The result (with backbone of ResNet101) of mAP ~82.5% was achieved, which is quite similar to the result reported in Fig. 8 of the paper.Then, I replaced the loss function with
loss_function=AsymmetricLoss(gamma_neg=4, gamma_pos=1, clip=0.05)
-- all other hyper parameters were kept consistent. However, I only got the mAP result of ~82.1%.Also, the traditional focal loss
loss_function=AsymmetricLoss(gamma_neg=2, gamma_pos=2, clip=0)
can not outperform the baseline (~82.5%), given the same configurations. I am curious about the issue of my training process.Could you also please share some training tricks? For example, a snippet of code on adjusting learning rate, training transforms similar to that used for validation here, etc. Or, is there any suggestions?
Thank you.
The text was updated successfully, but these errors were encountered: