By Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He
UC San Diego, Facebook AI Research
- Introduction
- Citation
- Requirements and Dependencies
- Training
- ImageNet Pretrained Models
- Third-party re-implementations
- Congrats to the ILSVRC 2017 classification challenge winner WMW. ResNeXt is the foundation of their new SENet architecture (a ResNeXt-152 (64 x 4d) with the Squeeze-and-Excitation module)!
- Check out Figure 6 in the new Memory-Efficient Implementation of DenseNets paper for a comparision between ResNeXts and DenseNets. (DenseNet cosine is DenseNet trained with cosine learning rate schedule.)
This repository contains a Torch implementation for the ResNeXt algorithm for image classification. The code is based on fb.resnet.torch.
ResNeXt is a simple, highly modularized network architecture for image classification. Our network is constructed by repeating a building block that aggregates a set of transformations with the same topology. Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call “cardinality” (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width.
Figure: Training curves on ImageNet-1K. (Left): ResNet/ResNeXt-50 with the same complexity (~4.1 billion FLOPs, ~25 million parameters); (Right): ResNet/ResNeXt-101 with the same complexity (~7.8 billion FLOPs, ~44 million parameters).
If you use ResNeXt in your research, please cite the paper:
@article{Xie2016,
title={Aggregated Residual Transformations for Deep Neural Networks},
author={Saining Xie and Ross Girshick and Piotr Dollár and Zhuowen Tu and Kaiming He},
journal={arXiv preprint arXiv:1611.05431},
year={2016}
}
See the fb.resnet.torch installation instructions for a step-by-step guide.
- Install Torch on a machine with CUDA GPU
- Install cuDNN v4 or v5 and the Torch cuDNN bindings
- Download the ImageNet dataset and move validation images to labeled subfolders
Please follow fb.resnet.torch for the general usage of the code, including how to use pretrained ResNeXt models for your own task.
There are two new hyperparameters need to be specified to determine the bottleneck template:
-baseWidth and -cardinality
baseWidth | cardinality |
---|---|
64 | 1 |
40 | 2 |
24 | 4 |
14 | 8 |
4 | 32 |
To train ResNeXt-50 (32x4d) on 8 GPUs for ImageNet:
th main.lua -dataset imagenet -bottleneckType resnext_C -depth 50 -baseWidth 4 -cardinality 32 -batchSize 256 -nGPU 8 -nThreads 8 -shareGradInput true -data [imagenet-folder]
To reproduce CIFAR results (e.g. ResNeXt 16x64d for cifar10) on 8 GPUs:
th main.lua -dataset cifar10 -bottleneckType resnext_C -depth 29 -baseWidth 64 -cardinality 16 -weightDecay 5e-4 -batchSize 128 -nGPU 8 -nThreads 8 -shareGradInput true
To get comparable results using 2/4 GPUs, you should change the batch size and the corresponding learning rate:
th main.lua -dataset cifar10 -bottleneckType resnext_C -depth 29 -baseWidth 64 -cardinality 16 -weightDecay 5e-4 -batchSize 64 -nGPU 4 -LR 0.05 -nThreads 8 -shareGradInput true
th main.lua -dataset cifar10 -bottleneckType resnext_C -depth 29 -baseWidth 64 -cardinality 16 -weightDecay 5e-4 -batchSize 32 -nGPU 2 -LR 0.025 -nThreads 8 -shareGradInput true
Note: CIFAR datasets will be automatically downloaded and processed for the first time. Note that in the arXiv paper CIFAR results are based on pre-activated bottleneck blocks and a batch size of 256. We found that better CIFAR test acurracy can be achieved using original bottleneck blocks and a batch size of 128.
ImageNet pretrained models are licensed under CC BY-NC 4.0.
Network | GFLOPS | Top-1 Error | Download |
---|---|---|---|
ResNet-50 (1x64d) | ~4.1 | 23.9 | Original ResNet-50 |
ResNeXt-50 (32x4d) | ~4.1 | 22.2 | Download (191MB) |
ResNet-101 (1x64d) | ~7.8 | 22.0 | Original ResNet-101 |
ResNeXt-101 (32x4d) | ~7.8 | 21.2 | Download (338MB) |
ResNeXt-101 (64x4d) | ~15.6 | 20.4 | Download (638MB) |
Besides our torch implementation, we recommend to see also the following third-party re-implementations and extensions: