-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to train my own datasets (format is like coco datasets) #54
Comments
@EDG-Zola You do not need to change this code.
|
Thanks for great works! just a refered question. If I have 29 classes, _C.MODEL.FCOS.NUM_CLASSES should be set to 30? |
|
@sunpeng981712364 If the 29 classes do not contain the background class, |
hi, I use fcos_demo.py to visualize the result and it seems right, But when I predict use tools/testnet.py with coco protocol, all the AP/AR is close to zero. Do I need to change tools/testnet.py |
@sunpeng981712364 I am not sure what is wrong with your code. It might be helpful to debug your code line by line. |
@tianzhi0549 谢谢您的及时回复(#^.^#)嘻嘻 |
Can one or two 1080Ti GPU be used to train? |
@liuguanglyc I think you can, but maybe you need to use a smaller input size (e.g., 600px). |
Hi @tianzhi0549 , I am trying to train with my own dataset with fcos_R_101_FPN_2x. However, I encountered the error that mentioned RuntimeError: Error(s) in loading state_dict for GeneralizedRCNN: I also removed all the previous checkpoints from ~/.torch/models/ Would you please advice on the steps to retrain a model with your own coco style dataset? Thank you so much! Training Command
fcos_R_101_FPN_2x.yaml
|
Should I follow the retrain instruction from maskrcnn-benchmark, to trim the last layers. And also add the dataset statement in the _init.py file ? |
@heng2j I don't think it is necessary if you have converted your datasets into the coco-style format. |
Hi @tianzhi0549 thank you for your quick response. And how about trim the last layers of I am retaining with the given FCOS_R_101_FPN_2x.pth? |
@heng2j You might need to do that if you want to fine-tune from coco pre-trained models. |
Thank you for your confirmation @tianzhi0549 !! And one more related question, since I am performing some sort of incremental learning which will require manual feature extraction. Any suggestion on the best practices to extract features with FCOS? My target objects can be as small as 16x16 or less. Once again thank you so much for your help and your great work!! |
Hi @tianzhi0549, for fine turning with the pretrained model FCOS_R_101_FPN_2x.pth, as you suggested I removed only the following 2 keys from the head. ['module.rpn.head.cls_logits.weight', 'module.rpn.head.cls_logits.bias'] However, the training step completed immediately once started. Would you please advice on what will be the proper way for retrain? So the we will know how to better utilize FCOS for our own domain? loading annotations into memory... **Click to expand the logs:**
|
@heng2j You also need to remove solver states in the checkpoint. |
hi @tianzhi0549, do you mind to point me out how to remove the solver states in the checkpoint? |
And what are the solver states that I should pay attention to? And @sunpeng981712364 , would you please also share some light on how you did it? |
@heng2j Do you use our provided pre-trained models? We have removed all solver states in them. |
Hi @tianzhi0549, yes I’m using your provided pre-trained model FCOS_R_101_FPN_2x.pth and i encountered the above issue. Do you mind to take a look at the full log in my previous comment which included all the parameters that set up for the training. I’m also wondering which keys in the head I should remove from your given checkpoints ? I only removed ['module.rpn.head.cls_logits.weight', 'module.rpn.head.cls_logits.bias']. Would love to know how to properly train with your given model. |
@heng2j Please post you full log here. |
Hi @tianzhi0549 , Here you go: [FCOS]$ python -m torch.distributed.launch --nproc_per_node=1 --master_port=$((RANDOM + 10000)) tools/train_net.py --skip-test --config-file configs/fcos/fcos_R_101_FPN_2x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_101_FPN_2x OS: CentOS Linux 7 (Core) Python version: 3.7 Nvidia driver version: 418.56 Versions of relevant libraries: |
@heng2j Sorry, it's our fault. We did not remove |
Hi @tianzhi0549 , I was thinking to remove the iterations as well. Thank you for your confirmation, and thank you so much for your timely helps! I will give it a try later today. |
Hi @tianzhi0549 , thank you it works! And I am training the model now. |
@heng2j Happy to know this. |
Hi, v: i + 1 for i, v in enumerate(self.coco.getCatIds()) The error is in Step-2, Pycoco tools is not finding the catgories from the annotation file provided. Anyone else face the similar problem, if yes, what is the solution please? Thank you. |
@shahdate Can you try to reinstall coco? |
Hi @tianzhi0549, Thank you for your reply. I completely deleted and reinstalled the coco multiple times. But still it is not working. |
@shahdate Are you sure you are using correct annotation json files of COCO? |
Hi, |
@shahdate Happy to know that. |
Hi |
hello! would you mind telling me where to add my dataset in step1? I cannot find the right place to add my dataset in defaults.py.thanhk you very much! In order to train FCOS on your own dataset, you need to, |
@hello-piger I have edited it. Please check it again. |
thank you for your quick response. |
hello,the model is very good! |
@sunpeng981712364 你训练好了吗?我的可以训练但是推理的时候,没有结果。 |
May I ask what kind of annotations do you use for training? Should we include the "segmentation" in coco labels for train? |
Why should we use _coco_style as the suffix of own dataset names? Is there any particular requirements? |
Hey, I found this file ( FCOS/maskrcnn_benchmark/config/defaults.py Line 284 in ff8376b
If i change there is a dimension bug: size mismatch for rpn.head.cls_logits.weight: copying a param with shape torch.Size([80, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 256, 3, 3]). |
@sunpeng981712364 Hey, have you figured out the problem 0 AP? |
well, i dont think only the 3 modifications are required so as to train custom datasets. |
Hi, I'm trying to run training custom data set with 4 classes from the pre-trained model downloaded from this git. I ran the remove solver class on this downloaded .pth file and using in the .yaml. But however i keep getting below error. Please guide me which step I'm missing. Thanks! 2020-09-17 06:17:38,972 fcos_core.utils.checkpoint INFO: Loading checkpoint from pretrained_models/FCOS_syncbn_bs32_c128_MNV2_FPN_1x_wo_solver_states.pth |
@sathyamsn Please remove the weights FCOS/tools/remove_solver_states.py Line 20 in 9a01528
|
Thanks for the quick response. Added below removals in the remove solver code. ################################################# But this time entire training skipped and started evaluation directly and mAP=0. Please help. Thanks. 2020-09-17 06:51:04,827 fcos_core.trainer INFO: Start training |
Just noticed that unlike tensorflow - the starting step should be of higher than per-trained model step. So my per-trained model trained till 90K. So when I gave 100000, training started. Thanks. However mAP is very low. On analyzing the detected bbox size is very low when compared to the actual gt bbox size. Any suggestions. |
Can you tell me how to remove the weight in details? |
@autumnfairytale7 As mentioned in previous comment by @tianzhi0549 , run the code FCOS/tools/remove_solver_states.py passing your pre trained model and remove the weights as per your error message. |
@tianzhi0549 hi, my class is 5 including background, but my ap is all 1.0, i want to ask you what factors might cause this problem? thanks |
Now I have converted my datasets format to coco format, andI want to train my own datasets using FCOS. I referenced GETTING_STARTED.md in mmdetection repo, and there is a tutorial in mmdetection repo to train my own datasets. But in FCOS repo, I find the file FCOS/maskrcnn_benchmark/data/datasets/coco.py is different like /mmdetection/mmdet/datasets/coco.py. Is there any suggestions?
The text was updated successfully, but these errors were encountered: