Skip to content

Latest commit

 

History

History

hornet

HorNet

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Abstract

Recent progress in vision Transformers exhibits great success in various tasks driven by the new spatial modeling mechanism based on dot-product self-attention. In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework. We present the Recursive Gated Convolution (g nConv) that performs high-order spatial interactions with gated convolutions and recursive designs. The new operation is highly flexible and customizable, which is compatible with various variants of convolution and extends the two-order interactions in self-attention to arbitrary orders without introducing significant extra computation. g nConv can serve as a plug-and-play module to improve various vision Transformers and convolution-based models. Based on the operation, we construct a new family of generic vision backbones named HorNet. Extensive experiments on ImageNet classification, COCO object detection and ADE20K semantic segmentation show HorNet outperform Swin Transformers and ConvNeXt by a significant margin with similar overall architecture and training configurations. HorNet also shows favorable scalability to more training data and a larger model size. Apart from the effectiveness in visual encoders, we also show g nConv can be applied to task-specific decoders and consistently improve dense prediction performance with less computation. Our results demonstrate that g nConv can be a new basic module for visual modeling that effectively combines the merits of both vision Transformers and CNNs. Code is available at https://github.com/raoyongming/HorNet.

How to use it?

Predict image

from mmpretrain import inference_model

predict = inference_model('hornet-tiny_3rdparty_in1k', 'demo/bird.JPEG')
print(predict['pred_class'])
print(predict['pred_score'])

Use the model

import torch
from mmpretrain import get_model

model = get_model('hornet-tiny_3rdparty_in1k', pretrained=True)
inputs = torch.rand(1, 3, 224, 224)
out = model(inputs)
print(type(out))
# To extract features.
feats = model.extract_feat(inputs)
print(type(feats))

Test Command

Prepare your dataset according to the docs.

Test:

python tools/test.py configs/hornet/hornet-tiny_8xb128_in1k.py https://download.openmmlab.com/mmclassification/v0/hornet/hornet-tiny_3rdparty_in1k_20220915-0e8eedff.pth

Models and results

Image Classification on ImageNet-1k

Model Pretrain Params (M) Flops (G) Top-1 (%) Top-5 (%) Config Download
hornet-tiny_3rdparty_in1k* From scratch 22.41 3.98 82.84 96.24 config model
hornet-tiny-gf_3rdparty_in1k* From scratch 22.99 3.90 82.98 96.38 config model
hornet-small_3rdparty_in1k* From scratch 49.53 8.83 83.79 96.75 config model
hornet-small-gf_3rdparty_in1k* From scratch 50.40 8.71 83.98 96.77 config model
hornet-base_3rdparty_in1k* From scratch 87.26 15.58 84.24 96.94 config model
hornet-base-gf_3rdparty_in1k* From scratch 88.42 15.42 84.32 96.95 config model

Models with * are converted from the official repo. The config files of these models are only for inference. We haven't reproduce the training results.

Citation

@article{rao2022hornet,
  title={HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions},
  author={Rao, Yongming and Zhao, Wenliang and Tang, Yansong and Zhou, Jie and Lim, Ser-Lam and Lu, Jiwen},
  journal={arXiv preprint arXiv:2207.14284},
  year={2022}
}