Human segmentation models, training/inference code, and trained weights, implemented in PyTorch.
- UNet: backbones MobileNetV2 (all aphas and expansions), ResNetV1 (all num_layers)
- DeepLab3+: backbones ResNetV1 (num_layers=18,34,50,101), VGG16_bn
- BiSeNet: backbones ResNetV1 (num_layers=18)
- PSPNet: backbones ResNetV1 (num_layers=18,34,50,101)
- ICNet: backbones ResNetV1 (num_layers=18,34,50,101)
To assess architecture, memory, forward time (in either cpu or gpu), numper of parameters, and number of FLOPs of a network, use this command:
python measure_model.py
Portrait Segmentation (Human/Background)
- Automatic Portrait Segmentation for Image Stylization: 1800 images
- Supervisely Person: 5711 images
- Python3.6.x is used in this repository.
- Clone the repository:
git clone --recursive https://github.com/AntiAegis/Human-Segmentation-PyTorch.git
cd Human-Segmentation-PyTorch
git submodule sync
git submodule update --init --recursive
- To install required packages, use pip:
workon humanseg
pip install -r requirements.txt
pip install -e models/pytorch-image-models
- For training a network from scratch, for example DeepLab3+, use this command:
python train.py --config config/config_DeepLab.json --device 0
where config/config_DeepLab.json is the configuration file which contains network, dataloader, optimizer, losses, metrics, and visualization configurations.
- For resuming training the network from a checkpoint, use this command:
python train.py --config config/config_DeepLab.json --device 0 --resume path_to_checkpoint/model_best.pth
- One can open tensorboard to monitor the training progress by enabling the visualization mode in the configuration file.
There are two modes of inference: video and webcam.
python inference_video.py --watch --use_cuda --checkpoint path_to_checkpoint/model_best.pth
python inference_webcam.py --use_cuda --checkpoint path_to_checkpoint/model_best.pth
- Networks are trained on a combined dataset from the two mentioned datasets above. There are 6627 training and 737 testing images.
- Input size of model is set to 320.
- The CPU and GPU time is the averaged inference time of 10 runs (there are also 10 warm-up runs before measuring) with batch size 1.
- The mIoU is measured on the testing subset (737 images) from the combined dataset.
- Hardware configuration for benchmarking:
CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
GPU: GeForce GTX 1050 Mobile, CUDA 9.0
Model | Parameters | FLOPs | CPU time | GPU time | mIoU |
---|---|---|---|---|---|
UNet_MobileNetV2 (alpha=1.0, expansion=6) | 4.7M | 1.3G | 167ms | 17ms | 91.37% |
UNet_ResNet18 | 16.6M | 9.1G | 165ms | 21ms | 90.09% |
DeepLab3+_ResNet18 | 16.6M | 9.1G | 133ms | 28ms | 91.21% |
BiSeNet_ResNet18 | 11.9M | 4.7G | 88ms | 10ms | 87.02% |
PSPNet_ResNet18 | 12.6M | 20.7G | 235ms | 666ms | --- |
ICNet_ResNet18 | 11.6M | 2.0G | 48ms | 55ms | 86.27% |