Box2Mask failing for custom class of just one class. #9

VikasRajashekar · 2023-01-12T22:55:58Z

I get the following error:


file "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2805, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: weight tensor should be defined either for all 2 classes or no classes but got weight tensor of shape: [81] at /tmp/pip-req-build-g2m34a_4/aten/src/THCUNN/generic/ClassNLLCriterion.cu:43

Attaching my config:

_base_ = [
    '../_base_/datasets/coco_panoptic.py', '../_base_/default_runtime.py'
]

model = dict(
    type='Box2Mask',
    backbone=dict(
        type='ResNet',
        depth=101,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=-1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='https://download.pytorch.org/models/resnet101-b641f3a9.pth')),
    panoptic_head=dict(
        type='Box2MaskHead',
        in_channels=[256, 512, 1024, 2048],  # pass to pixel_decoder inside
        strides=[4, 8, 16, 32],
        feat_channels=256,
        out_channels=256,
        num_things_classes=1,
        num_stuff_classes=0,
        num_queries=100,
        num_transformer_feat_level=3,
        pixel_decoder=dict(
            type='MSDeformAttnPixelDecoder',
            num_outs=3,
            norm_cfg=dict(type='GN', num_groups=32),
            act_cfg=dict(type='ReLU'),
            encoder=dict(
                type='DetrTransformerEncoder',
                num_layers=6,
                transformerlayers=dict(
                    type='BaseTransformerLayer',
                    attn_cfgs=dict(
                        type='MultiScaleDeformableAttention',
                        embed_dims=256,
                        num_heads=8,
                        num_levels=3,
                        num_points=4,
                        im2col_step=64,
                        dropout=0.0,
                        batch_first=False,
                        norm_cfg=None,
                        init_cfg=None),
                    ffn_cfgs=dict(
                        type='FFN',
                        embed_dims=256,
                        feedforward_channels=1024,
                        num_fcs=2,
                        ffn_drop=0.0,
                        act_cfg=dict(type='ReLU', inplace=True)),
                    operation_order=('self_attn', 'norm', 'ffn', 'norm')),
                init_cfg=None),
            positional_encoding=dict(
                type='SinePositionalEncoding', num_feats=128, normalize=True),
            init_cfg=None),
        enforce_decoder_input_project=False,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True),
        transformer_decoder=dict(
            type='DetrTransformerDecoder',
            return_intermediate=True,
            num_layers=9,
            transformerlayers=dict(
                type='DetrTransformerDecoderLayer',
                attn_cfgs=dict(
                    type='MultiheadAttention',
                    embed_dims=256,
                    num_heads=8,
                    attn_drop=0.0,
                    proj_drop=0.0,
                    dropout_layer=None,
                    batch_first=False),
                ffn_cfgs=dict(
                    embed_dims=256,
                    feedforward_channels=2048,
                    num_fcs=2,
                    act_cfg=dict(type='ReLU', inplace=True),
                    ffn_drop=0.0,
                    dropout_layer=None,
                    add_identity=True),
                feedforward_channels=2048,
                operation_order=('cross_attn', 'norm', 'self_attn', 'norm',
                                 'ffn', 'norm')),
            init_cfg=None),
        loss_cls=dict(
            type='CrossEntropyLoss',
            use_sigmoid=False,
            loss_weight=2.0,
            reduction='mean',
            class_weight=[1.0] * 80 + [0.1]),
        loss_mask=dict(
            type='LevelsetLoss',
            loss_weight=1.0),
        loss_box=dict(
            type='BoxProjectionLoss',
            loss_weight=5.0)),
    panoptic_fusion_head=dict(
        type='MaskFormerFusionHead',
        num_things_classes=1,
        num_stuff_classes=0,
        loss_panoptic=None,
        init_cfg=None),
    train_cfg=dict(
        assigner=dict(
            type='MaskHungarianAssigner',
            cls_cost=dict(type='ClassificationCost', weight=2.0),
            dice_cost=dict(type='BoxMatchingCost', weight=5.0, pred_act=True, eps=1.0)),
        sampler=dict(type='MaskPseudoSampler')),
    test_cfg=dict(
        panoptic_on=False,
        semantic_on=False,
        instance_on=True,
        max_per_image=1500,
        iou_thr=0.8,
        filter_low_score=True),
    init_cfg=None)

# dataset settings
image_size = (1024, 1024)
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
pad_cfg = dict(img=(128, 128, 128), masks=0, seg=255)
train_pipeline = [
    dict(type='LoadImageFromFile', to_float32=True),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=False),
    dict(type='GenerateBoxMask'), #genenrate box mask
    dict(type='RandomFlip', flip_ratio=0.5),
    # large scale jittering
    dict(
        type='Resize',
        img_scale=image_size,
        ratio_range=(0.1, 2.0),
        multiscale_mode='range',
        keep_ratio=True),
    dict(
        type='RandomCrop',
        crop_size=image_size,
        crop_type='absolute',
        recompute_bbox=True,
        allow_negative_crop=True),
    dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-5, 1e-5), keep_empty=True),
    dict(type='Pad', size=image_size, pad_val=pad_cfg),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='DefaultFormatBundle', img_to_float=True),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Pad', size_divisor=32, pad_val=pad_cfg),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
dataset_type = 'CocoDataset'
data_root = '/data/coco/'
classes = ('cell',)
data = dict(
    _delete_=True,
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file='/netscratch/nkhalid/vikas/Dataset/LiveCell/annotations/livecell_coco_train.json',
        img_prefix='/netscratch/nkhalid/vikas/Dataset/LiveCell/images/train',
        pipeline=train_pipeline,
        classes = classes),
    val=dict(
        type=dataset_type,
        ann_file='/netscratch/nkhalid/vikas/Dataset/LiveCell/annotations/livecell_coco_test.json',
        img_prefix='/netscratch/nkhalid/vikas/Dataset/LiveCell/images/test',
        pipeline=test_pipeline,
        classes = classes),
    test=dict(
       type=dataset_type,
       ann_file='/netscratch/nkhalid/vikas/Dataset/LiveCell/annotations/livecell_coco_val.json',
       img_prefix='/netscratch/nkhalid/vikas/Dataset/LiveCell/images/val',
       pipeline=test_pipeline,
       classes = classes))

embed_multi = dict(lr_mult=1.0, decay_mult=0.0)
# optimizer
optimizer = dict(
    type='AdamW',
    lr=0.0001,
    weight_decay=0.05,
    eps=1e-8,
    betas=(0.9, 0.999),
    paramwise_cfg=dict(
        custom_keys={
            'backbone': dict(lr_mult=0.1, decay_mult=1.0),
            'query_embed': embed_multi,
            'query_feat': embed_multi,
            'level_embed': embed_multi,
        },
        norm_decay_mult=0.0))
optimizer_config = dict(grad_clip=dict(max_norm=0.01, norm_type=2))

lr_config = dict(
    policy='step',
    gamma=0.1,
    by_epoch=False,
    step=[327778, 355092],
    warmup='linear',
    warmup_by_epoch=False,
    warmup_ratio=1.0,  # no warmup
    warmup_iters=10)

max_iters = 368750
runner = dict(type='IterBasedRunner', max_iters=max_iters)

log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook', by_epoch=False),
        dict(type='TensorboardLoggerHook', by_epoch=False)
    ])

interval = 5000
workflow = [('train', interval)]
checkpoint_config = dict(
    by_epoch=False, interval=interval, save_last=True, max_keep_ckpts=3)
dynamic_intervals = [(max_iters // interval * interval + 1, max_iters)]

evaluation = dict(interval=interval, dynamic_intervals=dynamic_intervals, metric=['bbox', 'segm'])
find_unused_parameters = True
work_dir = './work_dirs/box2mask_r101_coco_50e'
load_from = '/netscratch/rajashekar/SAIL/BoxInst/models/box2mask_r101_coco_50e.pth'

The text was updated successfully, but these errors were encountered:

LiWentomng · 2023-01-13T03:01:45Z

@VikasRajashekar
The line#94 in config file also needs to be changed for one class, like following,
class_weight=[1.0] * 1 + [0.1])

Btw. The training step and max_iters (50e by default) need to be changed proportionally according to the number of your training images.

VikasRajashekar · 2023-01-29T22:55:59Z

@LiWentomng Thanks for the input.
I did change it. But however I face the following issue.

 File "/netscratch/rajashekar/SAIL/BoxInst2/BoxInstSeg-main/mmdet/models/seg_heads/panoptic_fusion_heads/maskformer_fusion_head.py", line 140, in instance_postprocess
  File "/netscratch/rajashekar/SAIL/BoxInst2/BoxInstSeg-main/mmdet/models/seg_heads/panoptic_fusion_heads/maskformer_fusion_head.py", line 140, in instance_postprocess
RuntimeError: selected index k out of range
    scores_per_image, top_indices = scores.flatten(0, 1).topk(
RuntimeError: selected index k out of range
    scores_per_image, top_indices = scores.flatten(0, 1).topk(
RuntimeError: selected index k out of range
        scores_per_image, top_indices = scores.flatten(0, 1).topk(

I did debug the values of scores.flatten(0, 1),max_per_image,mask_cls in the file maskformer_fusion_head.
For each image it is always as follows:

scores.flatten(0, 1).shape=torch.Size([100])
max_per_image=1500
mask_cls=torch.Size([100, 2])

I tried to hardcode the topk to 100 and ran the evaluation but got very poor results:

COCOeval_opt.evaluate() finished in 12.34 seconds.

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=2000 ] = 0.121
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=2000 ] = 0.239
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=2000 ] = 0.112
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=2000 ] = 0.082
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=2000 ] = 0.152
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=2000 ] = 0.249
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.165
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=500 ] = 0.165
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=2000 ] = 0.165
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=2000 ] = 0.096
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=2000 ] = 0.207
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=2000 ] = 0.377


Evaluating segm...
Loading and preparing results...
DONE (t=1.51s)
creating index...
index created!
Changing MaxDets and areas
Evaluate annotation type *segm*
COCOeval_opt.evaluate() finished in 18.51 seconds.

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=2000 ] = 0.003
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=2000 ] = 0.006
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=2000 ] = 0.003
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=2000 ] = 0.003
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=2000 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=2000 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=500 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=2000 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=2000 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=2000 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=2000 ] = 0.000

I am attaching the config file and the corresponding log file.
config_logs.zip

Am I missing something? Or is it a bug?
Looking forward for your inputs.

VikasRajashekar · 2023-01-30T16:02:01Z

@LiWentomng looking for your input.

VikasRajashekar · 2023-02-08T22:42:06Z

@LiWentomng Any update?

LiWentomng · 2023-02-09T13:04:21Z

@VikasRajashekar
Sorry to reply later! I'm busy recently so that with no input for this issue.
Have you test BoxInst and BoxLevelset in this rep for your dataset? Can it run well?

I have test the Box2mask on ICDAR2019 dataset with one class, which can run well without the above errors RuntimeError: selected index k out of range.

I suggest you try the BoxInst and BoxLevelset firstly.

Any further questions can be discussed.

Aayushktyagi · 2024-02-10T10:11:01Z

I also observed the same trend. For Boxlevelset and box2mask, performance on single class is very poor. For Boxinst its decent.
@VikasRajashekar did you try BoxLevelset?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Box2Mask failing for custom class of just one class. #9

Box2Mask failing for custom class of just one class. #9

VikasRajashekar commented Jan 12, 2023

LiWentomng commented Jan 13, 2023

VikasRajashekar commented Jan 29, 2023 •

edited

Loading

VikasRajashekar commented Jan 30, 2023

VikasRajashekar commented Feb 8, 2023

LiWentomng commented Feb 9, 2023

Aayushktyagi commented Feb 10, 2024 •

edited

Loading

Box2Mask failing for custom class of just one class. #9

Box2Mask failing for custom class of just one class. #9

Comments

VikasRajashekar commented Jan 12, 2023

LiWentomng commented Jan 13, 2023

VikasRajashekar commented Jan 29, 2023 • edited Loading

VikasRajashekar commented Jan 30, 2023

VikasRajashekar commented Feb 8, 2023

LiWentomng commented Feb 9, 2023

Aayushktyagi commented Feb 10, 2024 • edited Loading

VikasRajashekar commented Jan 29, 2023 •

edited

Loading

Aayushktyagi commented Feb 10, 2024 •

edited

Loading