Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Box2Mask failing for custom class of just one class. #9

Open
VikasRajashekar opened this issue Jan 12, 2023 · 6 comments
Open

Box2Mask failing for custom class of just one class. #9

VikasRajashekar opened this issue Jan 12, 2023 · 6 comments

Comments

@VikasRajashekar
Copy link

I get the following error:


file "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2805, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: weight tensor should be defined either for all 2 classes or no classes but got weight tensor of shape: [81] at /tmp/pip-req-build-g2m34a_4/aten/src/THCUNN/generic/ClassNLLCriterion.cu:43

Attaching my config:

_base_ = [
    '../_base_/datasets/coco_panoptic.py', '../_base_/default_runtime.py'
]

model = dict(
    type='Box2Mask',
    backbone=dict(
        type='ResNet',
        depth=101,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=-1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='https://download.pytorch.org/models/resnet101-b641f3a9.pth')),
    panoptic_head=dict(
        type='Box2MaskHead',
        in_channels=[256, 512, 1024, 2048],  # pass to pixel_decoder inside
        strides=[4, 8, 16, 32],
        feat_channels=256,
        out_channels=256,
        num_things_classes=1,
        num_stuff_classes=0,
        num_queries=100,
        num_transformer_feat_level=3,
        pixel_decoder=dict(
            type='MSDeformAttnPixelDecoder',
            num_outs=3,
            norm_cfg=dict(type='GN', num_groups=32),
            act_cfg=dict(type='ReLU'),
            encoder=dict(
                type='DetrTransformerEncoder',
                num_layers=6,
                transformerlayers=dict(
                    type='BaseTransformerLayer',
                    attn_cfgs=dict(
                        type='MultiScaleDeformableAttention',
                        embed_dims=256,
                        num_heads=8,
                        num_levels=3,
                        num_points=4,
                        im2col_step=64,
                        dropout=0.0,
                        batch_first=False,
                        norm_cfg=None,
                        init_cfg=None),
                    ffn_cfgs=dict(
                        type='FFN',
                        embed_dims=256,
                        feedforward_channels=1024,
                        num_fcs=2,
                        ffn_drop=0.0,
                        act_cfg=dict(type='ReLU', inplace=True)),
                    operation_order=('self_attn', 'norm', 'ffn', 'norm')),
                init_cfg=None),
            positional_encoding=dict(
                type='SinePositionalEncoding', num_feats=128, normalize=True),
            init_cfg=None),
        enforce_decoder_input_project=False,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True),
        transformer_decoder=dict(
            type='DetrTransformerDecoder',
            return_intermediate=True,
            num_layers=9,
            transformerlayers=dict(
                type='DetrTransformerDecoderLayer',
                attn_cfgs=dict(
                    type='MultiheadAttention',
                    embed_dims=256,
                    num_heads=8,
                    attn_drop=0.0,
                    proj_drop=0.0,
                    dropout_layer=None,
                    batch_first=False),
                ffn_cfgs=dict(
                    embed_dims=256,
                    feedforward_channels=2048,
                    num_fcs=2,
                    act_cfg=dict(type='ReLU', inplace=True),
                    ffn_drop=0.0,
                    dropout_layer=None,
                    add_identity=True),
                feedforward_channels=2048,
                operation_order=('cross_attn', 'norm', 'self_attn', 'norm',
                                 'ffn', 'norm')),
            init_cfg=None),
        loss_cls=dict(
            type='CrossEntropyLoss',
            use_sigmoid=False,
            loss_weight=2.0,
            reduction='mean',
            class_weight=[1.0] * 80 + [0.1]),
        loss_mask=dict(
            type='LevelsetLoss',
            loss_weight=1.0),
        loss_box=dict(
            type='BoxProjectionLoss',
            loss_weight=5.0)),
    panoptic_fusion_head=dict(
        type='MaskFormerFusionHead',
        num_things_classes=1,
        num_stuff_classes=0,
        loss_panoptic=None,
        init_cfg=None),
    train_cfg=dict(
        assigner=dict(
            type='MaskHungarianAssigner',
            cls_cost=dict(type='ClassificationCost', weight=2.0),
            dice_cost=dict(type='BoxMatchingCost', weight=5.0, pred_act=True, eps=1.0)),
        sampler=dict(type='MaskPseudoSampler')),
    test_cfg=dict(
        panoptic_on=False,
        semantic_on=False,
        instance_on=True,
        max_per_image=1500,
        iou_thr=0.8,
        filter_low_score=True),
    init_cfg=None)

# dataset settings
image_size = (1024, 1024)
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
pad_cfg = dict(img=(128, 128, 128), masks=0, seg=255)
train_pipeline = [
    dict(type='LoadImageFromFile', to_float32=True),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=False),
    dict(type='GenerateBoxMask'), #genenrate box mask
    dict(type='RandomFlip', flip_ratio=0.5),
    # large scale jittering
    dict(
        type='Resize',
        img_scale=image_size,
        ratio_range=(0.1, 2.0),
        multiscale_mode='range',
        keep_ratio=True),
    dict(
        type='RandomCrop',
        crop_size=image_size,
        crop_type='absolute',
        recompute_bbox=True,
        allow_negative_crop=True),
    dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-5, 1e-5), keep_empty=True),
    dict(type='Pad', size=image_size, pad_val=pad_cfg),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='DefaultFormatBundle', img_to_float=True),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Pad', size_divisor=32, pad_val=pad_cfg),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
dataset_type = 'CocoDataset'
data_root = '/data/coco/'
classes = ('cell',)
data = dict(
    _delete_=True,
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file='/netscratch/nkhalid/vikas/Dataset/LiveCell/annotations/livecell_coco_train.json',
        img_prefix='/netscratch/nkhalid/vikas/Dataset/LiveCell/images/train',
        pipeline=train_pipeline,
        classes = classes),
    val=dict(
        type=dataset_type,
        ann_file='/netscratch/nkhalid/vikas/Dataset/LiveCell/annotations/livecell_coco_test.json',
        img_prefix='/netscratch/nkhalid/vikas/Dataset/LiveCell/images/test',
        pipeline=test_pipeline,
        classes = classes),
    test=dict(
       type=dataset_type,
       ann_file='/netscratch/nkhalid/vikas/Dataset/LiveCell/annotations/livecell_coco_val.json',
       img_prefix='/netscratch/nkhalid/vikas/Dataset/LiveCell/images/val',
       pipeline=test_pipeline,
       classes = classes))

embed_multi = dict(lr_mult=1.0, decay_mult=0.0)
# optimizer
optimizer = dict(
    type='AdamW',
    lr=0.0001,
    weight_decay=0.05,
    eps=1e-8,
    betas=(0.9, 0.999),
    paramwise_cfg=dict(
        custom_keys={
            'backbone': dict(lr_mult=0.1, decay_mult=1.0),
            'query_embed': embed_multi,
            'query_feat': embed_multi,
            'level_embed': embed_multi,
        },
        norm_decay_mult=0.0))
optimizer_config = dict(grad_clip=dict(max_norm=0.01, norm_type=2))

lr_config = dict(
    policy='step',
    gamma=0.1,
    by_epoch=False,
    step=[327778, 355092],
    warmup='linear',
    warmup_by_epoch=False,
    warmup_ratio=1.0,  # no warmup
    warmup_iters=10)

max_iters = 368750
runner = dict(type='IterBasedRunner', max_iters=max_iters)

log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook', by_epoch=False),
        dict(type='TensorboardLoggerHook', by_epoch=False)
    ])

interval = 5000
workflow = [('train', interval)]
checkpoint_config = dict(
    by_epoch=False, interval=interval, save_last=True, max_keep_ckpts=3)
dynamic_intervals = [(max_iters // interval * interval + 1, max_iters)]

evaluation = dict(interval=interval, dynamic_intervals=dynamic_intervals, metric=['bbox', 'segm'])
find_unused_parameters = True
work_dir = './work_dirs/box2mask_r101_coco_50e'
load_from = '/netscratch/rajashekar/SAIL/BoxInst/models/box2mask_r101_coco_50e.pth'


@LiWentomng
Copy link
Owner

@VikasRajashekar
The line#94 in config file also needs to be changed for one class, like following,
class_weight=[1.0] * 1 + [0.1])

Btw. The training step and max_iters (50e by default) need to be changed proportionally according to the number of your training images.

@VikasRajashekar
Copy link
Author

VikasRajashekar commented Jan 29, 2023

@LiWentomng Thanks for the input.
I did change it. But however I face the following issue.

 File "/netscratch/rajashekar/SAIL/BoxInst2/BoxInstSeg-main/mmdet/models/seg_heads/panoptic_fusion_heads/maskformer_fusion_head.py", line 140, in instance_postprocess
  File "/netscratch/rajashekar/SAIL/BoxInst2/BoxInstSeg-main/mmdet/models/seg_heads/panoptic_fusion_heads/maskformer_fusion_head.py", line 140, in instance_postprocess
RuntimeError: selected index k out of range
    scores_per_image, top_indices = scores.flatten(0, 1).topk(
RuntimeError: selected index k out of range
    scores_per_image, top_indices = scores.flatten(0, 1).topk(
RuntimeError: selected index k out of range
        scores_per_image, top_indices = scores.flatten(0, 1).topk(

I did debug the values of scores.flatten(0, 1),max_per_image,mask_cls in the file maskformer_fusion_head.
For each image it is always as follows:

scores.flatten(0, 1).shape=torch.Size([100])
max_per_image=1500
mask_cls=torch.Size([100, 2])

I tried to hardcode the topk to 100 and ran the evaluation but got very poor results:

COCOeval_opt.evaluate() finished in 12.34 seconds.

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=2000 ] = 0.121
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=2000 ] = 0.239
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=2000 ] = 0.112
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=2000 ] = 0.082
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=2000 ] = 0.152
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=2000 ] = 0.249
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.165
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=500 ] = 0.165
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=2000 ] = 0.165
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=2000 ] = 0.096
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=2000 ] = 0.207
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=2000 ] = 0.377


Evaluating segm...
Loading and preparing results...
DONE (t=1.51s)
creating index...
index created!
Changing MaxDets and areas
Evaluate annotation type *segm*
COCOeval_opt.evaluate() finished in 18.51 seconds.

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=2000 ] = 0.003
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=2000 ] = 0.006
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=2000 ] = 0.003
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=2000 ] = 0.003
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=2000 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=2000 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=500 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=2000 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=2000 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=2000 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=2000 ] = 0.000

I am attaching the config file and the corresponding log file.
config_logs.zip

Am I missing something? Or is it a bug?
Looking forward for your inputs.

@VikasRajashekar
Copy link
Author

@LiWentomng looking for your input.

@VikasRajashekar
Copy link
Author

@LiWentomng Any update?

@LiWentomng
Copy link
Owner

@VikasRajashekar
Sorry to reply later! I'm busy recently so that with no input for this issue.
Have you test BoxInst and BoxLevelset in this rep for your dataset? Can it run well?

I have test the Box2mask on ICDAR2019 dataset with one class, which can run well without the above errors RuntimeError: selected index k out of range.

I suggest you try the BoxInst and BoxLevelset firstly.

Any further questions can be discussed.

@Aayushktyagi
Copy link

Aayushktyagi commented Feb 10, 2024

I also observed the same trend. For Boxlevelset and box2mask, performance on single class is very poor. For Boxinst its decent.
@VikasRajashekar did you try BoxLevelset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants