-
Notifications
You must be signed in to change notification settings - Fork 125
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Eduardo
committed
Dec 5, 2019
0 parents
commit d4a46fa
Showing
27 changed files
with
3,240 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
#Files | ||
*.pyc | ||
*.pyo | ||
*/__pycache__/ | ||
*/*/__pycache__/ | ||
*/*/*/__pycache__/ | ||
eval/save_results/ | ||
eval/save_color/ | ||
save/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# ERFNet (PyTorch version) | ||
|
||
This code is a toolbox that uses **PyTorch** for training and evaluating the **ERFNet** architecture for semantic segmentation. | ||
|
||
**For the Original Torch version please go [HERE](https://github.com/Eromera/erfnet)** | ||
|
||
NOTE: This PyTorch version has a slightly better result than the ones in the Torch version (used in the paper): 72.1 IoU in Val set and 69.8 IoU in test set. | ||
|
||
![Example segmentation](example_segmentation.png?raw=true "Example segmentation") | ||
|
||
## Publications | ||
|
||
If you use this software in your research, please cite our publications: | ||
|
||
**"Efficient ConvNet for Real-time Semantic Segmentation"**, E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo, IEEE Intelligent Vehicles Symposium (IV), pp. 1789-1794, Redondo Beach (California, USA), June 2017. | ||
**[Best Student Paper Award]**, [[pdf]](http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17iv.pdf) | ||
|
||
**"ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation"**, E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo, Transactions on Intelligent Transportation Systems (T-ITS), December 2017. [[pdf]](http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17tits.pdf) | ||
|
||
## Packages | ||
For instructions please refer to the README on each folder: | ||
|
||
* [train](train) contains tools for training the network for semantic segmentation. | ||
* [eval](eval) contains tools for evaluating/visualizing the network's output. | ||
* [imagenet](imagenet) Contains script and model for pretraining ERFNet's encoder in Imagenet. | ||
* [trained_models](trained_models) Contains the trained models used in the papers. NOTE: the pytorch version is slightly different from the torch models. | ||
|
||
## Requirements: | ||
|
||
* [**The Cityscapes dataset**](https://www.cityscapes-dataset.com/): Download the "leftImg8bit" for the RGB images and the "gtFine" for the labels. **Please note that for training you should use the "_labelTrainIds" and not the "_labelIds", you can download the [cityscapes scripts](https://github.com/mcordts/cityscapesScripts) and use the [conversor](https://github.com/mcordts/cityscapesScripts/blob/master/cityscapesscripts/preparation/createTrainIdLabelImgs.py) to generate trainIds from labelIds** | ||
* [**Python 3.6**](https://www.python.org/): If you don't have Python3.6 in your system, I recommend installing it with [Anaconda](https://www.anaconda.com/download/#linux) | ||
* [**PyTorch**](http://pytorch.org/): Make sure to install the Pytorch version for Python 3.6 with CUDA support (code only tested for CUDA 8.0). | ||
* **Additional Python packages**: numpy, matplotlib, Pillow, torchvision and visdom (optional for --visualize flag) | ||
|
||
In Anaconda you can install with: | ||
``` | ||
conda install numpy matplotlib torchvision Pillow | ||
conda install -c conda-forge visdom | ||
``` | ||
|
||
If you use Pip (make sure to have it configured for Python3.6) you can install with: | ||
|
||
``` | ||
pip install numpy matplotlib torchvision Pillow visdom | ||
``` | ||
|
||
## License | ||
|
||
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summary here: http://creativecommons.org/licenses/by-nc/4.0/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# Functions for evaluating/visualizing the network's output | ||
|
||
Currently there are 4 usable functions to evaluate stuff: | ||
- eval_cityscapes_color | ||
- eval_cityscapes_server | ||
- eval_iou | ||
- eval_forwardTime | ||
|
||
## eval_cityscapes_color.py | ||
|
||
This code can be used to produce segmentation of the Cityscapes images in color for visualization purposes. By default it saves images in eval/save_color/ folder. You can also visualize results in visdom with --visualize flag. | ||
|
||
**Options:** Specify the Cityscapes folder path with '--datadir' option. Select the cityscapes subset with '--subset' ('val', 'test', 'train' or 'demoSequence'). For other options check the bottom side of the file. | ||
|
||
**Examples:** | ||
``` | ||
python eval_cityscapes_color.py --datadir /home/datasets/cityscapes/ --subset val | ||
``` | ||
|
||
## eval_cityscapes_server.py | ||
|
||
This code can be used to produce segmentation of the Cityscapes images and convert the output indices to the original 'labelIds' so it can be evaluated using the scripts from Cityscapes dataset (evalPixelLevelSemanticLabeling.py) or uploaded to Cityscapes test server. By default it saves images in eval/save_results/ folder. | ||
|
||
**Options:** Specify the Cityscapes folder path with '--datadir' option. Select the cityscapes subset with '--subset' ('val', 'test', 'train' or 'demoSequence'). For other options check the bottom side of the file. | ||
|
||
**Examples:** | ||
``` | ||
python eval_cityscapes_server.py --datadir /home/datasets/cityscapes/ --subset val | ||
``` | ||
|
||
## eval_iou.py | ||
|
||
This code can be used to calculate the IoU (mean and per-class) in a subset of images with labels available, like Cityscapes val/train sets. | ||
|
||
**Options:** Specify the Cityscapes folder path with '--datadir' option. Select the cityscapes subset with '--subset' ('val' or 'train'). For other options check the bottom side of the file. | ||
|
||
**Examples:** | ||
``` | ||
python eval_iou.py --datadir /home/datasets/cityscapes/ --subset val | ||
``` | ||
|
||
## eval_forwardTime.py | ||
This function loads a model specified by '-m' and enters a loop to continuously estimate forward pass time (fwt) in the specified resolution. | ||
|
||
**Options:** Option '--width' specifies the width (default: 1024). Option '--height' specifies the height (default: 512). For other options check the bottom side of the file. | ||
|
||
**Examples:** | ||
``` | ||
python eval_forwardTime.py | ||
``` | ||
|
||
**NOTE**: Paper values were obtained with a single Titan X (Maxwell) and a Jetson TX1 using the original Torch code. The pytorch code is a bit faster, but cudahalf (FP16) seems to give problems at the moment for some pytorch versions so this code only runs at FP32 (a bit slower). | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
# Code with dataset loader for VOC12 and Cityscapes (adapted from bodokaiser/piwise code) | ||
# Sept 2017 | ||
# Eduardo Romera | ||
####################### | ||
|
||
import numpy as np | ||
import os | ||
|
||
from PIL import Image | ||
|
||
from torch.utils.data import Dataset | ||
|
||
EXTENSIONS = ['.jpg', '.png'] | ||
|
||
def load_image(file): | ||
return Image.open(file) | ||
|
||
def is_image(filename): | ||
return any(filename.endswith(ext) for ext in EXTENSIONS) | ||
|
||
def is_label(filename): | ||
return filename.endswith("_labelTrainIds.png") | ||
|
||
def image_path(root, basename, extension): | ||
return os.path.join(root, f'{basename}{extension}') | ||
|
||
def image_path_city(root, name): | ||
return os.path.join(root, f'{name}') | ||
|
||
def image_basename(filename): | ||
return os.path.basename(os.path.splitext(filename)[0]) | ||
|
||
class VOC12(Dataset): | ||
|
||
def __init__(self, root, input_transform=None, target_transform=None): | ||
self.images_root = os.path.join(root, 'images') | ||
self.labels_root = os.path.join(root, 'labels') | ||
|
||
self.filenames = [image_basename(f) | ||
for f in os.listdir(self.labels_root) if is_image(f)] | ||
self.filenames.sort() | ||
|
||
self.input_transform = input_transform | ||
self.target_transform = target_transform | ||
|
||
def __getitem__(self, index): | ||
filename = self.filenames[index] | ||
|
||
with open(image_path(self.images_root, filename, '.jpg'), 'rb') as f: | ||
image = load_image(f).convert('RGB') | ||
with open(image_path(self.labels_root, filename, '.png'), 'rb') as f: | ||
label = load_image(f).convert('P') | ||
|
||
if self.input_transform is not None: | ||
image = self.input_transform(image) | ||
if self.target_transform is not None: | ||
label = self.target_transform(label) | ||
|
||
return image, label | ||
|
||
def __len__(self): | ||
return len(self.filenames) | ||
|
||
|
||
class cityscapes(Dataset): | ||
|
||
def __init__(self, root, input_transform=None, target_transform=None, subset='val'): | ||
self.images_root = os.path.join(root, 'leftImg8bit/' + subset) | ||
self.labels_root = os.path.join(root, 'gtFine/' + subset) | ||
|
||
self.filenames = [os.path.join(dp, f) for dp, dn, fn in os.walk(os.path.expanduser(self.images_root)) for f in fn if is_image(f)] | ||
self.filenames.sort() | ||
|
||
self.filenamesGt = [os.path.join(dp, f) for dp, dn, fn in os.walk(os.path.expanduser(self.labels_root)) for f in fn if is_label(f)] | ||
self.filenamesGt.sort() | ||
|
||
self.input_transform = input_transform | ||
self.target_transform = target_transform | ||
|
||
def __getitem__(self, index): | ||
filename = self.filenames[index] | ||
filenameGt = self.filenamesGt[index] | ||
|
||
#print(filename) | ||
|
||
with open(image_path_city(self.images_root, filename), 'rb') as f: | ||
image = load_image(f).convert('RGB') | ||
with open(image_path_city(self.labels_root, filenameGt), 'rb') as f: | ||
label = load_image(f).convert('P') | ||
|
||
if self.input_transform is not None: | ||
image = self.input_transform(image) | ||
if self.target_transform is not None: | ||
label = self.target_transform(label) | ||
|
||
return image, label, filename, filenameGt | ||
|
||
def __len__(self): | ||
return len(self.filenames) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
# ERFNET full network definition for Pytorch | ||
# Sept 2017 | ||
# Eduardo Romera | ||
####################### | ||
|
||
import torch | ||
import torch.nn as nn | ||
import torch.nn.init as init | ||
import torch.nn.functional as F | ||
|
||
|
||
class DownsamplerBlock (nn.Module): | ||
def __init__(self, ninput, noutput): | ||
super().__init__() | ||
|
||
self.conv = nn.Conv2d(ninput, noutput-ninput, (3, 3), stride=2, padding=1, bias=True) | ||
self.pool = nn.MaxPool2d(2, stride=2) | ||
self.bn = nn.BatchNorm2d(noutput, eps=1e-3) | ||
|
||
def forward(self, input): | ||
output = torch.cat([self.conv(input), self.pool(input)], 1) | ||
output = self.bn(output) | ||
return F.relu(output) | ||
|
||
|
||
class non_bottleneck_1d (nn.Module): | ||
def __init__(self, chann, dropprob, dilated): | ||
super().__init__() | ||
|
||
self.conv3x1_1 = nn.Conv2d(chann, chann, (3, 1), stride=1, padding=(1,0), bias=True) | ||
|
||
self.conv1x3_1 = nn.Conv2d(chann, chann, (1,3), stride=1, padding=(0,1), bias=True) | ||
|
||
self.bn1 = nn.BatchNorm2d(chann, eps=1e-03) | ||
|
||
self.conv3x1_2 = nn.Conv2d(chann, chann, (3, 1), stride=1, padding=(1*dilated,0), bias=True, dilation = (dilated,1)) | ||
|
||
self.conv1x3_2 = nn.Conv2d(chann, chann, (1,3), stride=1, padding=(0,1*dilated), bias=True, dilation = (1, dilated)) | ||
|
||
self.bn2 = nn.BatchNorm2d(chann, eps=1e-03) | ||
|
||
self.dropout = nn.Dropout2d(dropprob) | ||
|
||
|
||
def forward(self, input): | ||
|
||
output = self.conv3x1_1(input) | ||
output = F.relu(output) | ||
output = self.conv1x3_1(output) | ||
output = self.bn1(output) | ||
output = F.relu(output) | ||
|
||
output = self.conv3x1_2(output) | ||
output = F.relu(output) | ||
output = self.conv1x3_2(output) | ||
output = self.bn2(output) | ||
|
||
if (self.dropout.p != 0): | ||
output = self.dropout(output) | ||
|
||
return F.relu(output+input) #+input = identity (residual connection) | ||
|
||
|
||
class Encoder(nn.Module): | ||
def __init__(self, num_classes): | ||
super().__init__() | ||
self.initial_block = DownsamplerBlock(3,16) | ||
|
||
self.layers = nn.ModuleList() | ||
|
||
self.layers.append(DownsamplerBlock(16,64)) | ||
|
||
for x in range(0, 5): #5 times | ||
self.layers.append(non_bottleneck_1d(64, 0.1, 1)) | ||
|
||
self.layers.append(DownsamplerBlock(64,128)) | ||
|
||
for x in range(0, 2): #2 times | ||
self.layers.append(non_bottleneck_1d(128, 0.1, 2)) | ||
self.layers.append(non_bottleneck_1d(128, 0.1, 4)) | ||
self.layers.append(non_bottleneck_1d(128, 0.1, 8)) | ||
self.layers.append(non_bottleneck_1d(128, 0.1, 16)) | ||
|
||
#only for encoder mode: | ||
self.output_conv = nn.Conv2d(128, num_classes, 1, stride=1, padding=0, bias=True) | ||
|
||
def forward(self, input, predict=False): | ||
output = self.initial_block(input) | ||
|
||
for layer in self.layers: | ||
output = layer(output) | ||
|
||
if predict: | ||
output = self.output_conv(output) | ||
|
||
return output | ||
|
||
|
||
class UpsamplerBlock (nn.Module): | ||
def __init__(self, ninput, noutput): | ||
super().__init__() | ||
self.conv = nn.ConvTranspose2d(ninput, noutput, 3, stride=2, padding=1, output_padding=1, bias=True) | ||
self.bn = nn.BatchNorm2d(noutput, eps=1e-3) | ||
|
||
def forward(self, input): | ||
output = self.conv(input) | ||
output = self.bn(output) | ||
return F.relu(output) | ||
|
||
class Decoder (nn.Module): | ||
def __init__(self, num_classes): | ||
super().__init__() | ||
|
||
self.layers = nn.ModuleList() | ||
|
||
self.layers.append(UpsamplerBlock(128,64)) | ||
self.layers.append(non_bottleneck_1d(64, 0, 1)) | ||
self.layers.append(non_bottleneck_1d(64, 0, 1)) | ||
|
||
self.layers.append(UpsamplerBlock(64,16)) | ||
self.layers.append(non_bottleneck_1d(16, 0, 1)) | ||
self.layers.append(non_bottleneck_1d(16, 0, 1)) | ||
|
||
self.output_conv = nn.ConvTranspose2d( 16, num_classes, 2, stride=2, padding=0, output_padding=0, bias=True) | ||
|
||
def forward(self, input): | ||
output = input | ||
|
||
for layer in self.layers: | ||
output = layer(output) | ||
|
||
output = self.output_conv(output) | ||
|
||
return output | ||
|
||
|
||
class ERFNet(nn.Module): | ||
def __init__(self, num_classes, encoder=None): #use encoder to pass pretrained encoder | ||
super().__init__() | ||
|
||
if (encoder == None): | ||
self.encoder = Encoder(num_classes) | ||
else: | ||
self.encoder = encoder | ||
self.decoder = Decoder(num_classes) | ||
|
||
def forward(self, input, only_encode=False): | ||
if only_encode: | ||
return self.encoder.forward(input, predict=True) | ||
else: | ||
output = self.encoder(input) #predict=False by default | ||
return self.decoder.forward(output) | ||
|
Oops, something went wrong.