How to train my own datasets (format is like coco datasets) #54

Xavier-Zeng · 2019-05-30T14:06:49Z

Now I have converted my datasets format to coco format, andI want to train my own datasets using FCOS. I referenced GETTING_STARTED.md in mmdetection repo, and there is a tutorial in mmdetection repo to train my own datasets. But in FCOS repo, I find the file FCOS/maskrcnn_benchmark/data/datasets/coco.py is different like /mmdetection/mmdet/datasets/coco.py. Is there any suggestions?

tianzhi0549 · 2019-05-31T03:06:19Z

@EDG-Zola You do not need to change this code.
In order to train FCOS on your own dataset, you need to,

Add you dataset to

FCOS/fcos_core/config/paths_catalog.py

Line 10 in efb76e4

"coco_2017_train": {

. Please use _coco_style as the suffix of your dataset names.
In https://github.com/tianzhi0549/FCOS/blob/master/configs/fcos/fcos_R_50_FPN_1x.yaml, change DATASETS to your own ones.
Modify MODEL.FCOS.NUM_CLASSES in

FCOS/maskrcnn_benchmark/config/defaults.py

Line 284 in ff8376b

_C.MODEL.FCOS.NUM_CLASSES = 81 # the number of classes including background

if your dataset has a different number of classes.

sunpeng981712364 · 2019-06-03T09:06:49Z

Thanks for great works! just a refered question. If I have 29 classes, _C.MODEL.FCOS.NUM_CLASSES should be set to 30?

sunpeng981712364 · 2019-06-03T09:13:41Z

@EDG-Zola You do not need to change this code.
In order to train FCOS on your own dataset, you need to,

Add you dataset to https://github.com/tianzhi0549/FCOS/blob/master/maskrcnn_benchmark/config/defaults.py. Please use _coco_style as the suffix of your dataset names.

In https://github.com/tianzhi0549/FCOS/blob/master/configs/fcos/fcos_R_50_FPN_1x.yaml, change DATASETS to your own ones.

Modify MODEL.FCOS.NUM_CLASSES in

FCOS/maskrcnn_benchmark/config/defaults.py

Line 284 in ff8376b

_C.MODEL.FCOS.NUM_CLASSES = 81 # the number of classes including background

if your dataset has a different number of classes.
Thanks for great works! just a refered question. If I have 29 classes, _C.MODEL.FCOS.NUM_CLASSES should be set to 30?

tianzhi0549 · 2019-06-03T10:42:46Z

@sunpeng981712364 If the 29 classes do not contain the background class, NUM_CLASSES should be set as 30.

sunpeng981712364 · 2019-06-04T05:00:57Z

hi, I use fcos_demo.py to visualize the result and it seems right, But when I predict use tools/testnet.py with coco protocol, all the AP/AR is close to zero. Do I need to change tools/testnet.py
@tianzhi0549
Should the following code be add?
top_predictions = self.select_top_predictions(predictions)

tianzhi0549 · 2019-06-04T05:14:59Z

@sunpeng981712364 I am not sure what is wrong with your code. It might be helpful to debug your code line by line.

sunpeng981712364 · 2019-06-04T05:17:25Z

@tianzhi0549 谢谢您的及时回复(#^.^#)嘻嘻

liuguanglyc · 2019-06-18T03:42:32Z

Can one or two 1080Ti GPU be used to train?

tianzhi0549 · 2019-06-18T04:33:13Z

@liuguanglyc I think you can, but maybe you need to use a smaller input size (e.g., 600px).

heng2j · 2019-07-31T05:28:02Z

Hi @tianzhi0549 , I am trying to train with my own dataset with fcos_R_101_FPN_2x.

However, I encountered the error that mentioned

RuntimeError: Error(s) in loading state_dict for GeneralizedRCNN:
size mismatch for rpn.head.cls_logits.weight: copying a param with shape torch.Size([80, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 256, 3, 3]).
size mismatch for rpn.head.cls_logits.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([11]).

I also removed all the previous checkpoints from ~/.torch/models/

Would you please advice on the steps to retrain a model with your own coco style dataset? Thank you so much!

Training Command

python -m torch.distributed.launch \
    --nproc_per_node=3 \
    --master_port=$((RANDOM + 10000)) \
    tools/train_net.py \
    --skip-test \
    --config-file configs/fcos/fcos_R_101_FPN_2x.yaml \
    DATALOADER.NUM_WORKERS 2 \
    OUTPUT_DIR training_dir/fcos_R_101_FPN_2x

fcos_R_101_FPN_2x.yaml

MODEL:
  META_ARCHITECTURE: "GeneralizedRCNN"
  WEIGHT: "../FCOS/FCOS_R_101_FPN_2x.pth"
  RPN_ONLY: True
  FCOS_ON: True
  BACKBONE:
    CONV_BODY: "R-101-FPN-RETINANET"
  RESNETS:
    BACKBONE_OUT_CHANNELS: 256
  RETINANET:
    USE_C5: False # FCOS uses P5 instead of C5
DATASETS:
  TRAIN: ("my_data_train_coco_style", "my_data_val_coco_style")
  TEST: ("my_data_test_coco_style",)
INPUT:
  MIN_SIZE_RANGE_TRAIN: (640, 800)
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333
DATALOADER:
  SIZE_DIVISIBILITY: 32
SOLVER:
  BASE_LR: 0.01
  WEIGHT_DECAY: 0.0001
  STEPS: (120000, 160000)
  MAX_ITER: 180000
  IMS_PER_BATCH: 4
  WARMUP_METHOD: "constant"

heng2j · 2019-07-31T05:43:21Z

Should I follow the retrain instruction from maskrcnn-benchmark, to trim the last layers. And also add the dataset statement in the _init.py file ?

tianzhi0549 · 2019-07-31T05:58:57Z

@heng2j I don't think it is necessary if you have converted your datasets into the coco-style format.

heng2j · 2019-07-31T11:08:36Z

Hi @tianzhi0549 thank you for your quick response. And how about trim the last layers of I am retaining with the given FCOS_R_101_FPN_2x.pth?

tianzhi0549 · 2019-07-31T11:41:15Z

@heng2j You might need to do that if you want to fine-tune from coco pre-trained models.

heng2j · 2019-07-31T11:48:44Z

Thank you for your confirmation @tianzhi0549 !! And one more related question, since I am performing some sort of incremental learning which will require manual feature extraction. Any suggestion on the best practices to extract features with FCOS? My target objects can be as small as 16x16 or less. Once again thank you so much for your help and your great work!!

heng2j · 2019-08-01T02:28:29Z

Hi @tianzhi0549, for fine turning with the pretrained model FCOS_R_101_FPN_2x.pth, as you suggested I removed only the following 2 keys from the head.

['module.rpn.head.cls_logits.weight', 'module.rpn.head.cls_logits.bias']

However, the training step completed immediately once started. Would you please advice on what will be the proper way for retrain? So the we will know how to better utilize FCOS for our own domain?

loading annotations into memory...
Done (t=0.14s)
creating index...
index created!
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
2019-07-31 21:59:50,050 maskrcnn_benchmark.trainer INFO: Start training
2019-07-31 21:59:50,285 maskrcnn_benchmark.trainer INFO: Total training time: 0:00:00.234026 (0.0000 s / it)

**Click to expand the logs:**


[FCOS]$ python -m torch.distributed.launch     --nproc_per_node=1     --master_port=$((RANDOM + 10000))     tools/train_net.py     --skip-test     --config-file configs/fcos/fcos_R_101_FPN_2x.yaml     DATALOADER.NUM_WORKERS 2     OUTPUT_DIR training_dir/fcos_R_101_FPN_2x
2019-07-31 21:59:34,252 maskrcnn_benchmark INFO: Using 1 GPUs
2019-07-31 21:59:34,252 maskrcnn_benchmark INFO: Namespace(config_file='configs/fcos/fcos_R_101_FPN_2x.yaml', distributed=False, local_rank=0, opts=['DATALOADER.NUM_WORKERS', '2', 'OUTPUT_DIR', 'training_dir/fcos_R_101_FPN_2x'], skip_test=True)
2019-07-31 21:59:34,252 maskrcnn_benchmark INFO: Collecting env info (might take some time)
2019-07-31 21:59:43,826 maskrcnn_benchmark INFO: 
PyTorch version: 1.0.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: CentOS Linux 7 (Core)
GCC version: (GCC) 4.9.1
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti


Nvidia driver version: 418.56
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] Could not collect
[conda] cuda92                    1.0                           0    pytorch
[conda] pytorch                   1.0.0           py3.7_cuda9.0.176_cudnn7.4.1_1    pytorch
[conda] torchvision               0.2.1                      py_2    pytorch
        Pillow (6.1.0)
2019-07-31 21:59:43,826 maskrcnn_benchmark INFO: Loaded configuration file configs/fcos/fcos_R_101_FPN_2x.yaml
2019-07-31 21:59:43,827 maskrcnn_benchmark INFO: 
MODEL:
  META_ARCHITECTURE: "GeneralizedRCNN"
  WEIGHT: "../FCOS/FCOS_R_101_FPN_2x.pth"
  RPN_ONLY: True
  FCOS_ON: True
  BACKBONE:
    CONV_BODY: "R-101-FPN-RETINANET"
  RESNETS:
    BACKBONE_OUT_CHANNELS: 256
  RETINANET:
    USE_C5: False # FCOS uses P5 instead of C5
DATASETS:
  TRAIN: ("cofga_train_cocostyle", "cofga_val_cocostyle")
  TEST: ("cofga_test_cocostyle",)
INPUT:
  MIN_SIZE_RANGE_TRAIN: (640, 800)
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333
DATALOADER:
  SIZE_DIVISIBILITY: 32
SOLVER:
  BASE_LR: 0.01
  WEIGHT_DECAY: 0.0001
  STEPS: (120000, 160000)
  MAX_ITER: 180000
  IMS_PER_BATCH: 1
  WARMUP_METHOD: "constant"
2019-07-31 21:59:43,828 maskrcnn_benchmark INFO: Running with config:
DATALOADER:
  ASPECT_RATIO_GROUPING: True
  NUM_WORKERS: 2
  SIZE_DIVISIBILITY: 32
DATASETS:
  TEST: ('cofga_test_cocostyle',)
  TRAIN: ('cofga_train_cocostyle', 'cofga_val_cocostyle')
INPUT:
  MAX_SIZE_TEST: 1333
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_RANGE_TRAIN: (640, 800)
  MIN_SIZE_TEST: 800
  MIN_SIZE_TRAIN: (800,)
  PIXEL_MEAN: [102.9801, 115.9465, 122.7717]
  PIXEL_STD: [1.0, 1.0, 1.0]
  TO_BGR255: True
MODEL:
  BACKBONE:
    CONV_BODY: R-101-FPN-RETINANET
    FREEZE_CONV_BODY_AT: 2
    USE_GN: False
  CLS_AGNOSTIC_BBOX_REG: False
  DEVICE: cuda
  FBNET:
    ARCH: default
    ARCH_DEF: 
    BN_TYPE: bn
    DET_HEAD_BLOCKS: []
    DET_HEAD_LAST_SCALE: 1.0
    DET_HEAD_STRIDE: 0
    DW_CONV_SKIP_BN: True
    DW_CONV_SKIP_RELU: True
    KPTS_HEAD_BLOCKS: []
    KPTS_HEAD_LAST_SCALE: 0.0
    KPTS_HEAD_STRIDE: 0
    MASK_HEAD_BLOCKS: []
    MASK_HEAD_LAST_SCALE: 0.0
    MASK_HEAD_STRIDE: 0
    RPN_BN_TYPE: 
    RPN_HEAD_BLOCKS: 0
    SCALE_FACTOR: 1.0
    WIDTH_DIVISOR: 1
  FCOS:
    FPN_STRIDES: [8, 16, 32, 64, 128]
    INFERENCE_TH: 0.05
    LOSS_ALPHA: 0.25
    LOSS_GAMMA: 2.0
    NMS_TH: 0.6
    NUM_CLASSES: 2
    NUM_CONVS: 4
    PRE_NMS_TOP_N: 1000
    PRIOR_PROB: 0.01
  FCOS_ON: True
  FPN:
    USE_GN: False
    USE_RELU: False
  GROUP_NORM:
    DIM_PER_GP: -1
    EPSILON: 1e-05
    NUM_GROUPS: 32
  KEYPOINT_ON: False
  MASK_ON: False
  META_ARCHITECTURE: GeneralizedRCNN
  RESNETS:
    BACKBONE_OUT_CHANNELS: 256
    NUM_GROUPS: 1
    RES2_OUT_CHANNELS: 256
    RES5_DILATION: 1
    STEM_FUNC: StemWithFixedBatchNorm
    STEM_OUT_CHANNELS: 64
    STRIDE_IN_1X1: True
    TRANS_FUNC: BottleneckWithFixedBatchNorm
    WIDTH_PER_GROUP: 64
  RETINANET:
    ANCHOR_SIZES: (32, 64, 128, 256, 512)
    ANCHOR_STRIDES: (8, 16, 32, 64, 128)
    ASPECT_RATIOS: (0.5, 1.0, 2.0)
    BBOX_REG_BETA: 0.11
    BBOX_REG_WEIGHT: 4.0
    BG_IOU_THRESHOLD: 0.4
    FG_IOU_THRESHOLD: 0.5
    INFERENCE_TH: 0.05
    LOSS_ALPHA: 0.25
    LOSS_GAMMA: 2.0
    NMS_TH: 0.4
    NUM_CLASSES: 81
    NUM_CONVS: 4
    OCTAVE: 2.0
    PRE_NMS_TOP_N: 1000
    PRIOR_PROB: 0.01
    SCALES_PER_OCTAVE: 3
    STRADDLE_THRESH: 0
    USE_C5: False
  RETINANET_ON: False
  ROI_BOX_HEAD:
    CONV_HEAD_DIM: 256
    DILATION: 1
    FEATURE_EXTRACTOR: ResNet50Conv5ROIFeatureExtractor
    MLP_HEAD_DIM: 1024
    NUM_CLASSES: 81
    NUM_STACKED_CONVS: 4
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_SCALES: (0.0625,)
    PREDICTOR: FastRCNNPredictor
    USE_GN: False
  ROI_HEADS:
    BATCH_SIZE_PER_IMAGE: 512
    BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0)
    BG_IOU_THRESHOLD: 0.5
    DETECTIONS_PER_IMG: 100
    FG_IOU_THRESHOLD: 0.5
    NMS: 0.5
    POSITIVE_FRACTION: 0.25
    SCORE_THRESH: 0.05
    USE_FPN: False
  ROI_KEYPOINT_HEAD:
    CONV_LAYERS: (512, 512, 512, 512, 512, 512, 512, 512)
    FEATURE_EXTRACTOR: KeypointRCNNFeatureExtractor
    MLP_HEAD_DIM: 1024
    NUM_CLASSES: 17
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_SCALES: (0.0625,)
    PREDICTOR: KeypointRCNNPredictor
    RESOLUTION: 14
    SHARE_BOX_FEATURE_EXTRACTOR: True
  ROI_MASK_HEAD:
    CONV_LAYERS: (256, 256, 256, 256)
    DILATION: 1
    FEATURE_EXTRACTOR: ResNet50Conv5ROIFeatureExtractor
    MLP_HEAD_DIM: 1024
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_SCALES: (0.0625,)
    POSTPROCESS_MASKS: False
    POSTPROCESS_MASKS_THRESHOLD: 0.5
    PREDICTOR: MaskRCNNC4Predictor
    RESOLUTION: 14
    SHARE_BOX_FEATURE_EXTRACTOR: True
    USE_GN: False
  RPN:
    ANCHOR_SIZES: (32, 64, 128, 256, 512)
    ANCHOR_STRIDE: (16,)
    ASPECT_RATIOS: (0.5, 1.0, 2.0)
    BATCH_SIZE_PER_IMAGE: 256
    BG_IOU_THRESHOLD: 0.3
    FG_IOU_THRESHOLD: 0.7
    FPN_POST_NMS_TOP_N_TEST: 2000
    FPN_POST_NMS_TOP_N_TRAIN: 2000
    MIN_SIZE: 0
    NMS_THRESH: 0.7
    POSITIVE_FRACTION: 0.5
    POST_NMS_TOP_N_TEST: 1000
    POST_NMS_TOP_N_TRAIN: 2000
    PRE_NMS_TOP_N_TEST: 6000
    PRE_NMS_TOP_N_TRAIN: 12000
    RPN_HEAD: SingleConvRPNHead
    STRADDLE_THRESH: 0
    USE_FPN: False
  RPN_ONLY: True
  USE_SYNCBN: False
  WEIGHT: ../FCOS/FCOS_R_101_FPN_2x.pth
OUTPUT_DIR: training_dir/fcos_R_101_FPN_2x
PATHS_CATALOG: ../FCOS/maskrcnn_benchmark/config/paths_catalog.py
SOLVER:
  BASE_LR: 0.01
  BIAS_LR_FACTOR: 2
  CHECKPOINT_PERIOD: 2500
  GAMMA: 0.1
  IMS_PER_BATCH: 1
  MAX_ITER: 180000
  MOMENTUM: 0.9
  STEPS: (120000, 160000)
  WARMUP_FACTOR: 0.3333333333333333
  WARMUP_ITERS: 500
  WARMUP_METHOD: constant
  WEIGHT_DECAY: 0.0001
  WEIGHT_DECAY_BIAS: 0
TEST:
  DETECTIONS_PER_IMG: 100
  EXPECTED_RESULTS: []
  EXPECTED_RESULTS_SIGMA_TOL: 4
  IMS_PER_BATCH: 8
2019-07-31 21:59:49,205 maskrcnn_benchmark.utils.checkpoint INFO: Loading checkpoint from ../FCOS/FCOS_R_101_FPN_2x.pth
...
loading annotations into memory...
Done (t=0.14s)
creating index...
index created!
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
2019-07-31 21:59:50,050 maskrcnn_benchmark.trainer INFO: Start training
2019-07-31 21:59:50,285 maskrcnn_benchmark.trainer INFO: Total training time: 0:00:00.234026 (0.0000 s / it)

tianzhi0549 · 2019-08-01T02:31:38Z

@heng2j You also need to remove solver states in the checkpoint.

heng2j · 2019-08-01T02:50:06Z

hi @tianzhi0549, do you mind to point me out how to remove the solver states in the checkpoint?

heng2j · 2019-08-01T03:09:08Z

And what are the solver states that I should pay attention to? And @sunpeng981712364 , would you please also share some light on how you did it?

tianzhi0549 · 2019-08-01T04:46:22Z

@heng2j Do you use our provided pre-trained models? We have removed all solver states in them.

heng2j · 2019-08-01T11:06:16Z

Hi @tianzhi0549, yes I’m using your provided pre-trained model FCOS_R_101_FPN_2x.pth and i encountered the above issue.

Do you mind to take a look at the full log in my previous comment which included all the parameters that set up for the training. I’m also wondering which keys in the head I should remove from your given checkpoints ?

I only removed ['module.rpn.head.cls_logits.weight', 'module.rpn.head.cls_logits.bias'].

Would love to know how to properly train with your given model.

tianzhi0549 · 2019-08-01T11:31:08Z

@heng2j Please post you full log here.

heng2j · 2019-08-01T11:33:50Z

Hi @tianzhi0549 ,

Here you go:

[FCOS]$ python -m torch.distributed.launch --nproc_per_node=1 --master_port=$((RANDOM + 10000)) tools/train_net.py --skip-test --config-file configs/fcos/fcos_R_101_FPN_2x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_101_FPN_2x
2019-07-31 21:59:34,252 maskrcnn_benchmark INFO: Using 1 GPUs
2019-07-31 21:59:34,252 maskrcnn_benchmark INFO: Namespace(config_file='configs/fcos/fcos_R_101_FPN_2x.yaml', distributed=False, local_rank=0, opts=['DATALOADER.NUM_WORKERS', '2', 'OUTPUT_DIR', 'training_dir/fcos_R_101_FPN_2x'], skip_test=True)
2019-07-31 21:59:34,252 maskrcnn_benchmark INFO: Collecting env info (might take some time)
2019-07-31 21:59:43,826 maskrcnn_benchmark INFO:
PyTorch version: 1.0.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: CentOS Linux 7 (Core)
GCC version: (GCC) 4.9.1
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti

Nvidia driver version: 418.56
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] Could not collect
[conda] cuda92 1.0 0 pytorch
[conda] pytorch 1.0.0 py3.7_cuda9.0.176_cudnn7.4.1_1 pytorch
[conda] torchvision 0.2.1 py_2 pytorch
Pillow (6.1.0)
2019-07-31 21:59:43,826 maskrcnn_benchmark INFO: Loaded configuration file configs/fcos/fcos_R_101_FPN_2x.yaml
2019-07-31 21:59:43,827 maskrcnn_benchmark INFO:
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHT: "../FCOS/FCOS_R_101_FPN_2x.pth"
RPN_ONLY: True
FCOS_ON: True
BACKBONE:
CONV_BODY: "R-101-FPN-RETINANET"
RESNETS:
BACKBONE_OUT_CHANNELS: 256
RETINANET:
USE_C5: False # FCOS uses P5 instead of C5
DATASETS:
TRAIN: ("cofga_train_cocostyle", "cofga_val_cocostyle")
TEST: ("cofga_test_cocostyle",)
INPUT:
MIN_SIZE_RANGE_TRAIN: (640, 800)
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MAX_SIZE_TEST: 1333
DATALOADER:
SIZE_DIVISIBILITY: 32
SOLVER:
BASE_LR: 0.01
WEIGHT_DECAY: 0.0001
STEPS: (120000, 160000)
MAX_ITER: 180000
IMS_PER_BATCH: 1
WARMUP_METHOD: "constant"
2019-07-31 21:59:43,828 maskrcnn_benchmark INFO: Running with config:
DATALOADER:
ASPECT_RATIO_GROUPING: True
NUM_WORKERS: 2
SIZE_DIVISIBILITY: 32
DATASETS:
TEST: ('cofga_test_cocostyle',)
TRAIN: ('cofga_train_cocostyle', 'cofga_val_cocostyle')
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_RANGE_TRAIN: (640, 800)
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: (800,)
PIXEL_MEAN: [102.9801, 115.9465, 122.7717]
PIXEL_STD: [1.0, 1.0, 1.0]
TO_BGR255: True
MODEL:
BACKBONE:
CONV_BODY: R-101-FPN-RETINANET
FREEZE_CONV_BODY_AT: 2
USE_GN: False
CLS_AGNOSTIC_BBOX_REG: False
DEVICE: cuda
FBNET:
ARCH: default
ARCH_DEF:
BN_TYPE: bn
DET_HEAD_BLOCKS: []
DET_HEAD_LAST_SCALE: 1.0
DET_HEAD_STRIDE: 0
DW_CONV_SKIP_BN: True
DW_CONV_SKIP_RELU: True
KPTS_HEAD_BLOCKS: []
KPTS_HEAD_LAST_SCALE: 0.0
KPTS_HEAD_STRIDE: 0
MASK_HEAD_BLOCKS: []
MASK_HEAD_LAST_SCALE: 0.0
MASK_HEAD_STRIDE: 0
RPN_BN_TYPE:
RPN_HEAD_BLOCKS: 0
SCALE_FACTOR: 1.0
WIDTH_DIVISOR: 1
FCOS:
FPN_STRIDES: [8, 16, 32, 64, 128]
INFERENCE_TH: 0.05
LOSS_ALPHA: 0.25
LOSS_GAMMA: 2.0
NMS_TH: 0.6
NUM_CLASSES: 2
NUM_CONVS: 4
PRE_NMS_TOP_N: 1000
PRIOR_PROB: 0.01
FCOS_ON: True
FPN:
USE_GN: False
USE_RELU: False
GROUP_NORM:
DIM_PER_GP: -1
EPSILON: 1e-05
NUM_GROUPS: 32
KEYPOINT_ON: False
MASK_ON: False
META_ARCHITECTURE: GeneralizedRCNN
RESNETS:
BACKBONE_OUT_CHANNELS: 256
NUM_GROUPS: 1
RES2_OUT_CHANNELS: 256
RES5_DILATION: 1
STEM_FUNC: StemWithFixedBatchNorm
STEM_OUT_CHANNELS: 64
STRIDE_IN_1X1: True
TRANS_FUNC: BottleneckWithFixedBatchNorm
WIDTH_PER_GROUP: 64
RETINANET:
ANCHOR_SIZES: (32, 64, 128, 256, 512)
ANCHOR_STRIDES: (8, 16, 32, 64, 128)
ASPECT_RATIOS: (0.5, 1.0, 2.0)
BBOX_REG_BETA: 0.11
BBOX_REG_WEIGHT: 4.0
BG_IOU_THRESHOLD: 0.4
FG_IOU_THRESHOLD: 0.5
INFERENCE_TH: 0.05
LOSS_ALPHA: 0.25
LOSS_GAMMA: 2.0
NMS_TH: 0.4
NUM_CLASSES: 81
NUM_CONVS: 4
OCTAVE: 2.0
PRE_NMS_TOP_N: 1000
PRIOR_PROB: 0.01
SCALES_PER_OCTAVE: 3
STRADDLE_THRESH: 0
USE_C5: False
RETINANET_ON: False
ROI_BOX_HEAD:
CONV_HEAD_DIM: 256
DILATION: 1
FEATURE_EXTRACTOR: ResNet50Conv5ROIFeatureExtractor
MLP_HEAD_DIM: 1024
NUM_CLASSES: 81
NUM_STACKED_CONVS: 4
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_SCALES: (0.0625,)
PREDICTOR: FastRCNNPredictor
USE_GN: False
ROI_HEADS:
BATCH_SIZE_PER_IMAGE: 512
BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0)
BG_IOU_THRESHOLD: 0.5
DETECTIONS_PER_IMG: 100
FG_IOU_THRESHOLD: 0.5
NMS: 0.5
POSITIVE_FRACTION: 0.25
SCORE_THRESH: 0.05
USE_FPN: False
ROI_KEYPOINT_HEAD:
CONV_LAYERS: (512, 512, 512, 512, 512, 512, 512, 512)
FEATURE_EXTRACTOR: KeypointRCNNFeatureExtractor
MLP_HEAD_DIM: 1024
NUM_CLASSES: 17
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_SCALES: (0.0625,)
PREDICTOR: KeypointRCNNPredictor
RESOLUTION: 14
SHARE_BOX_FEATURE_EXTRACTOR: True
ROI_MASK_HEAD:
CONV_LAYERS: (256, 256, 256, 256)
DILATION: 1
FEATURE_EXTRACTOR: ResNet50Conv5ROIFeatureExtractor
MLP_HEAD_DIM: 1024
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_SCALES: (0.0625,)
POSTPROCESS_MASKS: False
POSTPROCESS_MASKS_THRESHOLD: 0.5
PREDICTOR: MaskRCNNC4Predictor
RESOLUTION: 14
SHARE_BOX_FEATURE_EXTRACTOR: True
USE_GN: False
RPN:
ANCHOR_SIZES: (32, 64, 128, 256, 512)
ANCHOR_STRIDE: (16,)
ASPECT_RATIOS: (0.5, 1.0, 2.0)
BATCH_SIZE_PER_IMAGE: 256
BG_IOU_THRESHOLD: 0.3
FG_IOU_THRESHOLD: 0.7
FPN_POST_NMS_TOP_N_TEST: 2000
FPN_POST_NMS_TOP_N_TRAIN: 2000
MIN_SIZE: 0
NMS_THRESH: 0.7
POSITIVE_FRACTION: 0.5
POST_NMS_TOP_N_TEST: 1000
POST_NMS_TOP_N_TRAIN: 2000
PRE_NMS_TOP_N_TEST: 6000
PRE_NMS_TOP_N_TRAIN: 12000
RPN_HEAD: SingleConvRPNHead
STRADDLE_THRESH: 0
USE_FPN: False
RPN_ONLY: True
USE_SYNCBN: False
WEIGHT: ../FCOS/FCOS_R_101_FPN_2x.pth
OUTPUT_DIR: training_dir/fcos_R_101_FPN_2x
PATHS_CATALOG: ../FCOS/maskrcnn_benchmark/config/paths_catalog.py
SOLVER:
BASE_LR: 0.01
BIAS_LR_FACTOR: 2
CHECKPOINT_PERIOD: 2500
GAMMA: 0.1
IMS_PER_BATCH: 1
MAX_ITER: 180000
MOMENTUM: 0.9
STEPS: (120000, 160000)
WARMUP_FACTOR: 0.3333333333333333
WARMUP_ITERS: 500
WARMUP_METHOD: constant
WEIGHT_DECAY: 0.0001
WEIGHT_DECAY_BIAS: 0
TEST:
DETECTIONS_PER_IMG: 100
EXPECTED_RESULTS: []
EXPECTED_RESULTS_SIGMA_TOL: 4
IMS_PER_BATCH: 8
2019-07-31 21:59:49,205 maskrcnn_benchmark.utils.checkpoint INFO: Loading checkpoint from ../FCOS/FCOS_R_101_FPN_2x.pth
...
loading annotations into memory...
Done (t=0.14s)
creating index...
index created!
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
2019-07-31 21:59:50,050 maskrcnn_benchmark.trainer INFO: Start training
2019-07-31 21:59:50,285 maskrcnn_benchmark.trainer INFO: Total training time: 0:00:00.234026 (0.0000 s / it)

tianzhi0549 · 2019-08-01T11:46:12Z

@heng2j Sorry, it's our fault. We did not remove iteration in the released checkpoints. Please remove it by yourself by following this code https://github.com/tianzhi0549/FCOS/blob/master/tools/remove_solver_states.py.

heng2j · 2019-08-01T11:53:40Z

Hi @tianzhi0549 , I was thinking to remove the iterations as well. Thank you for your confirmation, and thank you so much for your timely helps! I will give it a try later today.

heng2j · 2019-08-01T18:13:58Z

Hi @tianzhi0549 , thank you it works! And I am training the model now.

tianzhi0549 · 2019-08-02T04:30:08Z

@heng2j Happy to know this.

Shahadate-Rezvy · 2019-08-06T10:11:53Z

Hi,
I got the following error in step 2 when trained coco dataset with Maskrcnn_Benchmark model. Any suggestion please.

v: i + 1 for i, v in enumerate(self.coco.getCatIds())
File "/home/zst19phu/anaconda3/envs/ptorch/lib/python3.7/site-packages/pycocotools-2.0-py3.7-linux-x86_64.egg/pycocotools/coco.py", line 170, in getCatIds
cats = self.dataset['categories']
KeyError: 'categories'

The error is in Step-2, Pycoco tools is not finding the catgories from the annotation file provided. Anyone else face the similar problem, if yes, what is the solution please?

Thank you.

tianzhi0549 · 2019-08-06T13:39:56Z

@shahdate Can you try to reinstall coco?

Shahadate-Rezvy · 2019-08-06T14:20:12Z

@shahdate Can you try to reinstall coco?

Hi @tianzhi0549,

Thank you for your reply. I completely deleted and reinstalled the coco multiple times. But still it is not working.

tianzhi0549 · 2019-08-07T04:41:58Z

@shahdate Are you sure you are using correct annotation json files of COCO?

Shahadate-Rezvy · 2019-08-07T14:09:22Z

Hi,
Many thanks. Problem solved. I was using json files which are not according to COCO format. Now I have used Json files of COCO and Cuda 10 instead of 9. It works.
Again lot of thanks.

tianzhi0549 · 2019-08-08T02:46:43Z

@shahdate Happy to know that.

Shahadate-Rezvy · 2019-08-30T05:28:07Z

Hi
I have created a Maskrcnn_benchmark model for medical images with 3 classes (High grade, Normal grade and Low grade). My model can detect only high grade, not other two. I have used coco style json file, 5000 Iterations. Any help please.
Many thanks

hello-piger · 2019-09-04T06:09:26Z

hello! would you mind telling me where to add my dataset in step1? I cannot find the right place to add my dataset in defaults.py.thanhk you very much!

In order to train FCOS on your own dataset, you need to,
1.Add you dataset to https://github.com/tianzhi0549/FCOS/blob/master/maskrcnn_benchmark/config/defaults.py. Please use _coco_style as the suffix of your dataset names.
2.In https://github.com/tianzhi0549/FCOS/blob/master/configs/fcos/fcos_R_50_FPN_1x.yaml, change DATASETS to your own ones.
3.Modify MODEL.FCOS.NUM_CLASSES in

tianzhi0549 · 2019-09-04T06:25:04Z

@hello-piger I have edited it. Please check it again.

hello-piger · 2019-09-04T06:55:03Z

thank you for your quick response.

menggege321 · 2019-09-15T08:06:45Z

hello,the model is very good!

dreamhighchina · 2019-10-08T02:42:13Z

@sunpeng981712364 你训练好了吗？我的可以训练但是推理的时候，没有结果。

milliema · 2019-10-16T06:28:21Z

Now I have converted my datasets format to coco format, andI want to train my own datasets using FCOS. I referenced GETTING_STARTED.md in mmdetection repo, and there is a tutorial in mmdetection repo to train my own datasets. But in FCOS repo, I find the file FCOS/maskrcnn_benchmark/data/datasets/coco.py is different like /mmdetection/mmdet/datasets/coco.py. Is there any suggestions?

May I ask what kind of annotations do you use for training? Should we include the "segmentation" in coco labels for train?

milliema · 2019-10-16T08:52:10Z

@EDG-Zola You do not need to change this code.
In order to train FCOS on your own dataset, you need to,

1. Add you dataset to https://github.com/tianzhi0549/FCOS/blob/efb76e48e6490a93cc8b6b5dc93738fa1df34af5/fcos_core/config/paths_catalog.py#L10
   . Please use `_coco_style` as the suffix of your dataset names.

2. In https://github.com/tianzhi0549/FCOS/blob/master/configs/fcos/fcos_R_50_FPN_1x.yaml, change `DATASETS` to your own ones.

3. Modify `MODEL.FCOS.NUM_CLASSES` in https://github.com/tianzhi0549/FCOS/blob/ff8376bb903fe11a371df658f4bc87d3d6903125/maskrcnn_benchmark/config/defaults.py#L284
    if your dataset has a different number of classes.

Why should we use _coco_style as the suffix of own dataset names? Is there any particular requirements?

Finniu · 2019-11-07T07:56:25Z

@EDG-Zola You do not need to change this code.
In order to train FCOS on your own dataset, you need to,

Add you dataset to

FCOS/fcos_core/config/paths_catalog.py

Line 10 in efb76e4

"coco_2017_train": {

. Please use _coco_style as the suffix of your dataset names.

In https://github.com/tianzhi0549/FCOS/blob/master/configs/fcos/fcos_R_50_FPN_1x.yaml, change DATASETS to your own ones.

Modify MODEL.FCOS.NUM_CLASSES in

FCOS/maskrcnn_benchmark/config/defaults.py

Line 284 in ff8376b

_C.MODEL.FCOS.NUM_CLASSES = 81 # the number of classes including background

if your dataset has a different number of classes.

Hey, I found this file (

FCOS/maskrcnn_benchmark/config/defaults.py

Line 284 in ff8376b

_C.MODEL.FCOS.NUM_CLASSES = 81 # the number of classes including background

) is not in origin cloned folder , and when i am training, this file is not used as well, i checked the origin code, which have been used is /FCOS/fcos_core/config/defaults.py, so should i change the num of class in this file?

If i change there is a dimension bug:

size mismatch for rpn.head.cls_logits.weight: copying a param with shape torch.Size([80, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 256, 3, 3]).

Finniu · 2019-11-07T07:59:57Z

hi, I use fcos_demo.py to visualize the result and it seems right, But when I predict use tools/testnet.py with coco protocol, all the AP/AR is close to zero. Do I need to change tools/testnet.py
@tianzhi0549
Should the following code be add?
top_predictions = self.select_top_predictions(predictions)

@sunpeng981712364 Hey, have you figured out the problem 0 AP?

alen-mask · 2020-01-07T08:55:19Z

well, i dont think only the 3 modifications are required so as to train custom datasets.
the thresholds_for_classes has to be changed too...otherwise the trained model will gives totally different scores for each bboxes, and then the output will be none(at least, my data return very low scores compared to coco-setting).

sathyamsn · 2020-09-17T06:22:48Z

Hi, I'm trying to run training custom data set with 4 classes from the pre-trained model downloaded from this git. I ran the remove solver class on this downloaded .pth file and using in the .yaml. But however i keep getting below error. Please guide me which step I'm missing. Thanks!

2020-09-17 06:17:38,972 fcos_core.utils.checkpoint INFO: Loading checkpoint from pretrained_models/FCOS_syncbn_bs32_c128_MNV2_FPN_1x_wo_solver_states.pth
Traceback (most recent call last):
File "tools/train_net.py", line 180, in
main()
File "tools/train_net.py", line 173, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 59, in train
extra_checkpoint_data = checkpointer.load(cfg.MODEL.WEIGHT)
File "/home/sathya/FCOS/fcos_core/utils/checkpoint.py", line 62, in load
self._load_model(checkpoint)
File "/home/sathya/FCOS/fcos_core/utils/checkpoint.py", line 98, in _load_model
load_state_dict(self.model, checkpoint.pop("model"))
File "/home/sathya/FCOS/fcos_core/utils/model_serialization.py", line 80, in load_state_dict
model.load_state_dict(model_state_dict)
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 779, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DistributedDataParallel:
size mismatch for module.rpn.head.cls_logits.weight: copying a param with shape torch.Size([80, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([4, 128, 3, 3]).
size mismatch for module.rpn.head.cls_logits.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([4]).

tianzhi0549 · 2020-09-17T06:35:05Z

@sathyamsn Please remove the weights module.rpn.head.cls_logits.weight from the pre-trained checkpoint. If you do not know how to remove the weights, please refer to

FCOS/tools/remove_solver_states.py

Line 20 in 9a01528

del model["optimizer"]

.

sathyamsn · 2020-09-17T15:14:33Z

Thanks for the quick response. Added below removals in the remove solver code.

#################################################
del model["model"]["module.rpn.head.cls_logits.weight"]
del model["model"]["module.rpn.head.cls_logits.bias"]
#################################################

But this time entire training skipped and started evaluation directly and mAP=0. Please help. Thanks.

2020-09-17 06:51:04,827 fcos_core.trainer INFO: Start training
Done (t=0.00s)
creating index...
index created!
1%|# | 4/734 [00:00<04:14, 2.87it/s]loading annotations into memory...
loading annotations into memory...
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
loading annotations into memory...
Done (t=0.06s)
creating index...
2020-09-17 06:51:04,965 fcos_core.trainer INFO: Total training time: 0:00:00.136299 (0.0000 s / it)
loading annotations into memory...
index created!
1%|#6
index created!
2020-09-17 06:51:05,042 fcos_core.inference INFO: Start evaluation on coco_cust_validation dataset(5867 images).
1%|##1 | 8/734 [00:00<02:23, 5.05it/s]Done (t=0.06s)

sathyamsn · 2020-09-18T03:55:40Z

Just noticed that unlike tensorflow - the starting step should be of higher than per-trained model step. So my per-trained model trained till 90K. So when I gave 100000, training started. Thanks.

However mAP is very low. On analyzing the detected bbox size is very low when compared to the actual gt bbox size. Any suggestions.

autumnfairytale7 · 2020-10-06T05:21:11Z

Thanks for the quick response. Added below removals in the remove solver code.

#################################################
del model["model"]["module.rpn.head.cls_logits.weight"]
del model["model"]["module.rpn.head.cls_logits.bias"]
#################################################

But this time entire training skipped and started evaluation directly and mAP=0. Please help. Thanks.

2020-09-17 06:51:04,827 fcos_core.trainer INFO: Start training
Done (t=0.00s)
creating index...
index created!
1%|# | 4/734 [00:00<04:14, 2.87it/s]loading annotations into memory...
loading annotations into memory...
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
loading annotations into memory...
Done (t=0.06s)
creating index...
2020-09-17 06:51:04,965 fcos_core.trainer INFO: Total training time: 0:00:00.136299 (0.0000 s / it)
loading annotations into memory...
index created!
1%|#6
index created!
2020-09-17 06:51:05,042 fcos_core.inference INFO: Start evaluation on coco_cust_validation dataset(5867 images).
1%|##1 | 8/734 [00:00<02:23, 5.05it/s]Done (t=0.06s)

Can you tell me how to remove the weight in details?

sathyamsn · 2020-10-09T05:06:08Z

@autumnfairytale7 As mentioned in previous comment by @tianzhi0549 , run the code FCOS/tools/remove_solver_states.py passing your pre trained model and remove the weights as per your error message.

EiMaker · 2021-12-23T09:55:38Z

@tianzhi0549 hi, my class is 5 including background, but my ap is all 1.0, i want to ask you what factors might cause this problem? thanks

tianzhi0549 closed this as completed Jun 4, 2019

sfzhang15 mentioned this issue Dec 11, 2019

how to test my own dataset？ sfzhang15/ATSS#5

Closed

sfzhang15 mentioned this issue Dec 23, 2019

tips for finetuning on private dataset based on the pretrained model sfzhang15/ATSS#16

Closed

sfzhang15 mentioned this issue Feb 5, 2020

I want use custom dataset sfzhang15/ATSS#19

Closed

How to train my own datasets (format is like coco datasets) #54

How to train my own datasets (format is like coco datasets) #54

Comments

Xavier-Zeng commented May 30, 2019

tianzhi0549 commented May 31, 2019 • edited Loading

sunpeng981712364 commented Jun 3, 2019

sunpeng981712364 commented Jun 3, 2019

tianzhi0549 commented Jun 3, 2019

sunpeng981712364 commented Jun 4, 2019 • edited Loading

tianzhi0549 commented Jun 4, 2019

sunpeng981712364 commented Jun 4, 2019

liuguanglyc commented Jun 18, 2019

tianzhi0549 commented Jun 18, 2019

heng2j commented Jul 31, 2019

heng2j commented Jul 31, 2019

tianzhi0549 commented Jul 31, 2019

heng2j commented Jul 31, 2019

tianzhi0549 commented Jul 31, 2019

heng2j commented Jul 31, 2019

heng2j commented Aug 1, 2019 • edited Loading

tianzhi0549 commented Aug 1, 2019

heng2j commented Aug 1, 2019

heng2j commented Aug 1, 2019

tianzhi0549 commented Aug 1, 2019 • edited Loading

heng2j commented Aug 1, 2019

tianzhi0549 commented Aug 1, 2019

heng2j commented Aug 1, 2019

tianzhi0549 commented Aug 1, 2019

heng2j commented Aug 1, 2019

heng2j commented Aug 1, 2019

tianzhi0549 commented Aug 2, 2019

Shahadate-Rezvy commented Aug 6, 2019

tianzhi0549 commented Aug 6, 2019

Shahadate-Rezvy commented Aug 6, 2019

tianzhi0549 commented Aug 7, 2019

Shahadate-Rezvy commented Aug 7, 2019

tianzhi0549 commented Aug 8, 2019

Shahadate-Rezvy commented Aug 30, 2019

hello-piger commented Sep 4, 2019

tianzhi0549 commented Sep 4, 2019

hello-piger commented Sep 4, 2019

menggege321 commented Sep 15, 2019

dreamhighchina commented Oct 8, 2019

milliema commented Oct 16, 2019

milliema commented Oct 16, 2019

Finniu commented Nov 7, 2019 • edited Loading

Finniu commented Nov 7, 2019

alen-mask commented Jan 7, 2020

sathyamsn commented Sep 17, 2020

tianzhi0549 commented Sep 17, 2020

sathyamsn commented Sep 17, 2020 • edited Loading

sathyamsn commented Sep 18, 2020

autumnfairytale7 commented Oct 6, 2020

sathyamsn commented Oct 9, 2020

EiMaker commented Dec 23, 2021

tianzhi0549 commented May 31, 2019 •

edited

Loading

sunpeng981712364 commented Jun 4, 2019 •

edited

Loading

heng2j commented Aug 1, 2019 •

edited

Loading

tianzhi0549 commented Aug 1, 2019 •

edited

Loading

Finniu commented Nov 7, 2019 •

edited

Loading

sathyamsn commented Sep 17, 2020 •

edited

Loading