Skip to content

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

License

Notifications You must be signed in to change notification settings

yunkchen/inplace_abn

 
 

In-Place Activated BatchNorm

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

In-Place Activated BatchNorm (InPlace-ABN) is a novel approach to reduce the memory required for training deep networks. It allows for up to 50% memory savings in modern architectures such as ResNet, ResNeXt and Wider ResNet by redefining BN + non linear activation as a single in-place operation, while smartly dropping or recomputing intermediate buffers as needed.

This repository contains a PyTorch implementation of the InPlace-ABN layer, as well as some training scripts to reproduce the ImageNet classification results reported in our paper.

We have now also released the inference code for semantic segmentation, together with the Mapillary Vistas trained model leading to #1 position on the Mapillary Vistas Semantic Segmentation leaderboard. More information can be found at the bottom of this page.

Citation

If you use In-Place Activated BatchNorm in your research, please cite:

@inproceedings{rotabulo2017place,
  title={In-Place Activated BatchNorm for Memory-Optimized Training of DNNs},
  author={Rota Bul\`o, Samuel and Porzi, Lorenzo and Kontschieder, Peter},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2018}
}

Overview

When processing a BN-Activation-Convolution sequence in the forward pass, most deep learning frameworks need to store two big buffers, i.e. the input x of BN and the input z of Conv. This is necessary because the standard implementations of the backward passes of BN and Conv depend on their inputs to calculate the gradients. Using Inplace-ABN to replace the BN-Activation sequence, we can safely discard x, thus saving up to 50% GPU memory at training time. To achieve this, we rewrite the backward pass of BN in terms of its output y, which is in turn reconstructed from z by inverting the activation function.

The parametrization for the scaling factor of BN changed compared to standard BN, in order to ensure an invertible transformation. Specifically, the scaling factor becomes .

Requirements

To install PyTorch, please refer to https://github.com/pytorch/pytorch#installation.

NOTE 1: our code requires PyTorch v1.1 or later

NOTE 2: we are only able to provide support for Linux platforms and CUDA versions >= 10.0

NOTE 3: in general, it is not possible to load weights from a network trained with standard BN into an InPlace-ABN network without severe performance degradation, due to the different handling of BN scaling parameters

To install the package containing the iABN layers:

pip install inplace-abn

Note that some parts of InPlace-ABN have native C++/CUDA implementations, meaning that the command above will need to compile them.

Alternatively, to download and install the latest version of our library, also obtaining a copy of the Imagenet / Vistas scripts:

git clone https://github.com/mapillary/inplace_abn.git
cd inplace_abn
python setup.py install
cd scripts
pip install -r requirements.txt

The last of the commands above will install some additional libraries required by the Imagenet / Vistas scripts.

Force compiling with CUDA

In order to force the compilation of the native CUDA functions on systems that do not have access to a GPU (e.g. Docker containers), two environment variables have to be set:

export TORCH_CUDA_ARCH_LIST="{archs}"
export IABN_FORCE_CUDA=1

where {archs} is a list of target CUDA architectures, e.g. Pascal;Volta, 6.0;6.5 etc.

Training on ImageNet-1k

Here you can find the results from our arXiv paper (top-1 / top-5 scores) with corresponding, trained models and md5 checksums, respectively. The model files provided below are made available under the license attached to ImageNet.

Network Batch 224 224, 10-crops 320 Trained models (+md5)
ResNeXt101, Std-BN 256 77.04 / 93.50 78.72 / 94.47 77.92 / 94.28 448438885986d14db5e870b95f814f91
ResNeXt101, InPlace-ABN 512 78.08 / 93.79 79.52 / 94.66 79.38 / 94.67 3b7a221cbc076410eb12c8dd361b7e4e
ResNeXt152, InPlace-ABN 256 78.28 / 94.04 79.73 / 94.82 79.56 / 94.67 2c8d572587961ed74611d534c5b2e9ce
WideResNet38, InPlace-ABN 256 79.72 / 94.78 81.03 / 95.43 80.69 / 95.27 1c085ab70b789cc1d6c1594f7a761007
ResNeXt101, InPlace-ABN sync 256 77.70 / 93.78 79.18 / 94.60 78.98 / 94.56 0a85a21847b15e5a242e17bf3b753849
DenseNet264, InPlace-ABN 256 78.57 / 94.17 79.72 / 94.93 79.49 / 94.89 0b413d67b725619441d0646d663865bf
ResNet50v1, InPlace-ABN sync 512 75.53 / 92.59 77.04 / 93.57 76.60 / 93.49 2522ca639f7fdfd7c0089ba1f5f6c2e8
ResNet34v1, InPlace-ABN sync 512 73.27 / 91.34 75.19 / 92.66 74.87 / 92.42 61515c1484911c3cc753d405131e1dda
ResNet101v1, InPlace-ABN sync 512 77.07 / 93.45 78.58 / 94.40 78.25 / 94.19 1552ae0f3d610108df702135f56bd27b

Data preparation

Our script uses torchvision.datasets.ImageFolder for loading ImageNet data, which expects folders organized as follows:

root/train/[class_id1]/xxx.{jpg,png,jpeg}
root/train/[class_id1]/xxy.{jpg,png,jpeg}
root/train/[class_id2]/xxz.{jpg,png,jpeg}
...

root/val/[class_id1]/asdas.{jpg,png,jpeg}
root/val/[class_id1]/123456.{jpg,png,jpeg}
root/val/[class_id2]/__32_.{jpg,png,jpeg}
...

Images can have any name, as long as the extension is that of a recognized image format. Class ids are also free-form, but they are expected to match between train and validation data. Note that the training data in the standard ImageNet distribution is already given in the required format, while validation images need to be split into class sub-folders as described above.

Training

The main training script is scripts/train_imagenet.py: this supports training on ImageNet, or any other dataset formatted as described above, while keeping a log of relevant metrics in Tensorboard format and periodically saving snapshots. Most training parameters can be specified as a json-formatted configuration file (look here for a complete list of configurable parameters). All parameters not explicitly specified in the configuration file are set to their defaults, also available in scripts/imagenet/config.py.

Our arXiv results can be reproduced by running scripts/train_imagenet.py with the configuration files in scripts/experiments. As an example, the command to train ResNeXt101 with InPlace-ABN, Leaky ReLU and batch_size = 512 is:

cd scripts
python -m torch.distributed.launch --nproc_per_node <n. GPUs per node> train_imagenet.py --log-dir /path/to/tensorboard/logs experiments/resnext101_ipabn_lr_512.json /path/to/imagenet/root

Validation

Validation is run by scripts/train_imagenet.py at the end of every training epoch. To validate a trained model, you can use the scripts/test_imagenet.py script, which allows for 10-crops validation and transferring weights across compatible networks (e.g. from ResNeXt101 with ReLU to ResNeXt101 with Leaky ReLU). This script accepts the same configuration files as scripts/train_imagenet.py, but note that the scale_val and crop_val parameters are ignored in favour of the --scale and --crop command-line arguments.

As an example, to validate the ResNeXt101 trained above using 10-crops of size 224 from images scaled to 256 pixels, you can run:

cd scripts
python -m torch.distributed.launch --nproc_per_node <n. GPUs per node> test_imagenet.py --crop 224 --scale 256 --ten_crops experiments/resnext101_ipabn_lr_512.json /path/to/checkpoint /path/to/imagenet/root

Usage for Semantic Segmentation on Cityscapes and Mapillary Vistas

We have successfully used InPlace-ABN with a DeepLab3 segmentation head that was trained on top of the WideResNet38 model above. Due to InPlace-ABN, we can significantly increase the amount of input data to this model, which eventually allowed us to obtain #1 positions on Cityscapes, Mapillary Vistas, AutoNUE, Kitti and ScanNet segmentation leaderboards. The training settings mostly follow the description in our paper.

Mapillary Vistas pre-trained model

We release our WideResNet38 + DeepLab3 segmentation model trained on the Mapillary Vistas research set. This is the model used to reach #1 position on the MVD semantic segmentation leaderboard. The segmentation model file provided below is made available under a CC BY-NC-SA 4.0 license.

Network mIOU Trained model (+md5)
WideResNet38 + DeepLab3 53.42 913f78486a34aa1577a7cd295e8a33bb

To use this, please download the .pth.tar model file linked above and run the test_vistas.py script as follows:

cd scripts
python test_vistas.py /path/to/model.pth.tar /path/to/input/folder /path/to/output/folder

The script will process all .png, .jpg and .jpeg images from the input folder and write the predictions in the output folder as .png images. For additional options, e.g. test time augmentation, please consult the script's help message.

The results on the test data written above were obtained by employing only scale 1.0 + flipping.

Changelog

Update 04 Jul. 2019: version 1.0.0

  • Complete rewrite of the CUDA code following the most recent native BN implementation from Pytorch
  • Improved synchronized BN implementation, correctly handling different per-GPU batch sizes and Pytorch distributed groups
  • The iABN layers are now packaged in an installable python library to simplify use in other projects
  • The Imagenet / Vistas scripts are still available in the scripts folder
  • Requires now PyTorch 1.1

Update 08 Jan. 2019:

  • Enabled multiprocessing and inplace ABN synchronization over multiple processes (previously using threads). It now requires to use DistributedDataParallel instead of DataParallel
  • Added compatibility with fp16 (currently allows fp16 input but requires the module to stay in fp32 mode)
  • Requires now PyTorch 1.0

Update Feb. 2019:

  • Added ResNet34v1, ResNet50v1 and ResNet101v1 ImageNet-1k pre-trained models

We have modified the imagenet training code and BN synchronization in order to work with multiple processes. We have also added compatibility of our Inplace ABN module with fp16.

About

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 67.7%
  • Cuda 17.5%
  • C++ 12.3%
  • C 2.5%