GitHub - dongjunhwang/small_object_wsol: Official implementation of "Small Object Matters in Weakly Supervised Object Localization"

Small Object Matters in Weakly Supervised Object Localization

Abstract: Weakly-supervised object localization (WSOL) methods aim to capture the extent of the target object without full supervision such as bounding boxes or segmentation masks. Although numerous studies have been conducted in the research field of WSOL, we find that most existing methods are less effective at localizing small objects. In this paper, we first analyze why previous studies have overlooked this problem. Based on the analysis, we propose two remedies: 1) new evaluation metrics and a dataset to accurately measure localization performance for small objects, and 2) a novel consistency learning framework to zoom in on small objects so the model can perceive them more clearly. Our extensive experimental results demonstrate that the proposed method significantly improves small object localization on four different backbone networks and four different datasets, without sacrificing the performance of medium and large objects. In addition to these gains, our method can be easily applied to existing WSOL methods as it does not require any changes to the model architecture or data input pipeline.

Official implementation of "Small Object Matters in Weakly Supervised Object Localization"

Most part of our codes originates from this repository.

Dependencies

Run the following command to build the docker container.
Modify pytorch/pytorch:latest tag in Dockerfile, if necessary.

docker build . -t wsol_test
docker run -it -d --gpus '"device=0"' --shm-size=16G --name wsol_test wsol_test:latest
docker exec -it wsol_test /bin/bash

Environments (pip freeze returns):

munch==2.5.0
sklearn==0.0
opencv-python==4.5.5.64
torch==1.11.0
torchvision==0.12.0

Prepare train+eval datasets

We borrowed the script for preparing datasets from the original repository.

ImageNet
To prepare ImageNet data, download ImageNet "train" and "val" split from here and put the downloaded file on dataset/ILSVRC2012_img_train.tar and dataset/ILSVRC2012_img_val.tar.

Then, run the following command on the root directory to extract the images.

sh dataset/prepare_imagenet.sh

CUB
Run the following command to download the original CUB dataset and extract the image files on the root directory.

sh dataset/prepare_cub.sh

Note: you can also download the CUBV2 dataset from here and CUBSmall dataset from here. Put the downloaded file on the dataset/CUBV2.tar directory and then run the above script.

OpenImages
To download and extract files, run the following command on root directory

sh dataset/prepare_openimages.sh

Note: you can also download the OpenImages30k dataset from here (images , masks). Put the downloaded OpenImages_images.zip and OpenImages_annotations.zip files in dataset directory and run the above script.

Training

Run the following command when training the ResNet50 network on the ImageNet dataset.

sh scripts/resnet_imagenet_ours.sh

For the sake of reproducing our experimental results, we include all training scripts for the three backbones and three datasets in ./scripts/.

You must modify --data_root, --mask_root arguments by your own local path.

Training log and checkpoints will be saved in ./train_log/.

Fine-Tuning

In our paper, we applied our method to three state-of-the-art methods:

Domain Adaptation (DA), Bridging the Gap (Brid) and IVR.

For reproducing the results, first you need to download the pretrained models here.

Then, run the following command:

Domain Adaptation (ResNet50, ImageNet):

sh scripts/resnet_imagenet_ours_with_da.sh

Bridging the Gap (ResNet50, ImageNet):

sh scripts/resnet_imagenet_ours_with_brid.sh

IVR (ResNet50, ImageNet):

sh scripts/resnet_imagenet_ours_with_ivr.sh

percentile values in each datasets and architectures are report in following table.

ImageNet	ResNet50	VGG16	Inception V3
percentile	0.3	0.2	0.4

Evaluation

Add the following arguments in the script.

--checkpoint_path train_log/resnet_imagenet_ours/last_checkpoint.pth.tar \
--eval_on_val_and_test False \
--eval_size_ratio True

Information of the Hyperparameters

Architecture	ImageNet					CUB					OpenImages
Architecture	λ₁	λ₂	λ₃	τ	ν	λ₁	λ₂	λ₃	τ	ν	λ₁	λ₂	λ₃	τ	ν
ResNet50	0.90	0.10	0.90	0.15	0.30	0.50	0.20	0.80	0.20	0.80	1.50	0.50	0.50	0.30	1.00
VGG16	0.80	0.50	0.80	0.50	0.90	0.90	0.10	0.70	0.70	0.20	1.00	0.70	0.70	0.05	0.10
Inception	0.60	0.70	0.70	0.50	0.60	1.00	0.20	0.80	0.40	0.10	0.20	1.30	0.90	0.15	0.10

Arguments

eval_on_val_and_test : evaluation on val or test dataset.
eval_size_ratio : print the scores that evaluated by MaxBoxAcc^S, MaxBoxAcc^mean metric.

Pretrained Model

You can download pre-trained models here.

Pretrained models trained on ImageNet, CUB, and OpenImages using three architectures (ResNet50, VGG16, InceptionV3) are available.
File name example: resnet_imagenet_ours.pth.tar is trained on the ImageNet dataset using ResNet50.
We upload all the models on Google Drive of an anonymous account.

To evaluate the pre-trained model, modify the --checkpoint_path argument by the path of the downloaded file.

How to apply this method to my own code?

Our method can be easily applied to other methods:

First, copy and paste ./wsol/method/crop.py into your code repository.
Next, add the following snippets into your code.

# main.py
self.crop_module = wsol.method.CropCAM(self.args.large_feature_map,
                                       self.args.original_feature_map,
                                       architecture=self.args.architecture,
                                       # Hyperparameters
                                       loss_ratio=self.args.loss_ratio,
                                       loss_pos=self.args.loss_pos,
                                       loss_neg=self.args.loss_neg,
                                       crop_threshold=self.args.crop_threshold,
                                       crop_ratio=self.args.crop_ratio,
                                       # For CAAM
                                       attention_cam=self.args.attention_cam,
                                       # For attach the network freely.
                                       wsol_method=self.args.wsol_method,
                                       other_method_loss_ratio=self.args.other_method_loss_ratio,
                                       crop_method_loss_ratio=self.args.crop_method_loss_ratio,
                                       # For Several Norm Method.
                                       norm_method=self.args.norm_method,
                                       percentile=self.args.percentile,
                                       crop_with_norm=self.args.crop_with_norm)

# main.py
if epoch >= self.args.crop_start_epoch:
    output_dict = self.crop_module.forward(self.model, images, target)
    logits = output_dict["logits"]
    loss, att_loss, cls_loss = self.crop_module.get_loss(output_dict=output_dict, target=target)
    return logits, loss, att_loss, cls_loss

# resnet.py
if crop:
    return {'cam_weights': self.fc.weight[labels],
            'logits': logits, 'feature_map': x}

Notes

We assume that your model instance is in the variable self.model.
You might need to modify the above snippets for applying them to your code repository.
In that case, you can refer to our implementation in main.py and wsol/resnet.py.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
dataset		dataset
metadata		metadata
release		release
scripts		scripts
wsol		wsol
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.py		config.py
data_loaders.py		data_loaders.py
evaluation.py		evaluation.py
fix_imagenetv2.py		fix_imagenetv2.py
inference.py		inference.py
main.py		main.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Small Object Matters in Weakly Supervised Object Localization

Dependencies

Prepare train+eval datasets

Training

Fine-Tuning

Evaluation

Information of the Hyperparameters

Arguments

Pretrained Model

How to apply this method to my own code?

Notes

About

Releases

Packages

Languages

dongjunhwang/small_object_wsol

Folders and files

Latest commit

History

Repository files navigation

Small Object Matters in Weakly Supervised Object Localization

Dependencies

Prepare train+eval datasets

Training

Fine-Tuning

Evaluation

Information of the Hyperparameters

Arguments

Pretrained Model

How to apply this method to my own code?

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages