[NEW!] Check out our latest work involution in CVPR'21 that bridges convolution and self-attention operators.
PyTorch implementation of LambdaNetworks: Modeling long-range Interactions without Attention.
Lambda Networks apply associative law of matrix multiplication to reverse the computing order of self-attention, achieving the linear computation complexity regarding content interactions.
Similar techniques have been used previously in A2-Net and CGNL. Check out a collection of self-attention modules in another repository dot-product-attention.
✓ SGD optimizer, initial learning rate 0.1, momentum 0.9, weight decay 0.0001
✓ epoch 130, batch size 256, 8x Tesla V100 GPUs, LR decay strategy cosine
✓ label smoothing 0.1
Architecture | Parameters | FLOPs | Top-1 / Top-5 Acc. (%) | Download |
---|---|---|---|---|
Lambda-ResNet-50 | 14.995M | 6.576G | 78.208 / 93.820 | model | log |
If you find this repository useful in your research, please cite
@InProceedings{Li_2021_CVPR,
author = {Li, Duo and Hu, Jie and Wang, Changhu and Li, Xiangtai and She, Qi and Zhu, Lei and Zhang, Tong and Chen, Qifeng},
title = {Involution: Inverting the Inherence of Convolution for Visual Recognition},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021}
}
@inproceedings{
bello2021lambdanetworks,
title={LambdaNetworks: Modeling long-range Interactions without Attention},
author={Irwan Bello},
booktitle={International Conference on Learning Representations},
year={2021},
}