Skip to content

Commit

Permalink
Full reupload after code downtime
Browse files Browse the repository at this point in the history
  • Loading branch information
Eduardo committed Dec 5, 2019
0 parents commit d4a46fa
Show file tree
Hide file tree
Showing 27 changed files with 3,240 additions and 0 deletions.
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#Files
*.pyc
*.pyo
*/__pycache__/
*/*/__pycache__/
*/*/*/__pycache__/
eval/save_results/
eval/save_color/
save/
49 changes: 49 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# ERFNet (PyTorch version)

This code is a toolbox that uses **PyTorch** for training and evaluating the **ERFNet** architecture for semantic segmentation.

**For the Original Torch version please go [HERE](https://github.com/Eromera/erfnet)**

NOTE: This PyTorch version has a slightly better result than the ones in the Torch version (used in the paper): 72.1 IoU in Val set and 69.8 IoU in test set.

![Example segmentation](example_segmentation.png?raw=true "Example segmentation")

## Publications

If you use this software in your research, please cite our publications:

**"Efficient ConvNet for Real-time Semantic Segmentation"**, E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo, IEEE Intelligent Vehicles Symposium (IV), pp. 1789-1794, Redondo Beach (California, USA), June 2017.
**[Best Student Paper Award]**, [[pdf]](http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17iv.pdf)

**"ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation"**, E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo, Transactions on Intelligent Transportation Systems (T-ITS), December 2017. [[pdf]](http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17tits.pdf)

## Packages
For instructions please refer to the README on each folder:

* [train](train) contains tools for training the network for semantic segmentation.
* [eval](eval) contains tools for evaluating/visualizing the network's output.
* [imagenet](imagenet) Contains script and model for pretraining ERFNet's encoder in Imagenet.
* [trained_models](trained_models) Contains the trained models used in the papers. NOTE: the pytorch version is slightly different from the torch models.

## Requirements:

* [**The Cityscapes dataset**](https://www.cityscapes-dataset.com/): Download the "leftImg8bit" for the RGB images and the "gtFine" for the labels. **Please note that for training you should use the "_labelTrainIds" and not the "_labelIds", you can download the [cityscapes scripts](https://github.com/mcordts/cityscapesScripts) and use the [conversor](https://github.com/mcordts/cityscapesScripts/blob/master/cityscapesscripts/preparation/createTrainIdLabelImgs.py) to generate trainIds from labelIds**
* [**Python 3.6**](https://www.python.org/): If you don't have Python3.6 in your system, I recommend installing it with [Anaconda](https://www.anaconda.com/download/#linux)
* [**PyTorch**](http://pytorch.org/): Make sure to install the Pytorch version for Python 3.6 with CUDA support (code only tested for CUDA 8.0).
* **Additional Python packages**: numpy, matplotlib, Pillow, torchvision and visdom (optional for --visualize flag)

In Anaconda you can install with:
```
conda install numpy matplotlib torchvision Pillow
conda install -c conda-forge visdom
```

If you use Pip (make sure to have it configured for Python3.6) you can install with:

```
pip install numpy matplotlib torchvision Pillow visdom
```

## License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summary here: http://creativecommons.org/licenses/by-nc/4.0/
55 changes: 55 additions & 0 deletions eval/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Functions for evaluating/visualizing the network's output

Currently there are 4 usable functions to evaluate stuff:
- eval_cityscapes_color
- eval_cityscapes_server
- eval_iou
- eval_forwardTime

## eval_cityscapes_color.py

This code can be used to produce segmentation of the Cityscapes images in color for visualization purposes. By default it saves images in eval/save_color/ folder. You can also visualize results in visdom with --visualize flag.

**Options:** Specify the Cityscapes folder path with '--datadir' option. Select the cityscapes subset with '--subset' ('val', 'test', 'train' or 'demoSequence'). For other options check the bottom side of the file.

**Examples:**
```
python eval_cityscapes_color.py --datadir /home/datasets/cityscapes/ --subset val
```

## eval_cityscapes_server.py

This code can be used to produce segmentation of the Cityscapes images and convert the output indices to the original 'labelIds' so it can be evaluated using the scripts from Cityscapes dataset (evalPixelLevelSemanticLabeling.py) or uploaded to Cityscapes test server. By default it saves images in eval/save_results/ folder.

**Options:** Specify the Cityscapes folder path with '--datadir' option. Select the cityscapes subset with '--subset' ('val', 'test', 'train' or 'demoSequence'). For other options check the bottom side of the file.

**Examples:**
```
python eval_cityscapes_server.py --datadir /home/datasets/cityscapes/ --subset val
```

## eval_iou.py

This code can be used to calculate the IoU (mean and per-class) in a subset of images with labels available, like Cityscapes val/train sets.

**Options:** Specify the Cityscapes folder path with '--datadir' option. Select the cityscapes subset with '--subset' ('val' or 'train'). For other options check the bottom side of the file.

**Examples:**
```
python eval_iou.py --datadir /home/datasets/cityscapes/ --subset val
```

## eval_forwardTime.py
This function loads a model specified by '-m' and enters a loop to continuously estimate forward pass time (fwt) in the specified resolution.

**Options:** Option '--width' specifies the width (default: 1024). Option '--height' specifies the height (default: 512). For other options check the bottom side of the file.

**Examples:**
```
python eval_forwardTime.py
```

**NOTE**: Paper values were obtained with a single Titan X (Maxwell) and a Jetson TX1 using the original Torch code. The pytorch code is a bit faster, but cudahalf (FP16) seems to give problems at the moment for some pytorch versions so this code only runs at FP32 (a bit slower).



100 changes: 100 additions & 0 deletions eval/dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Code with dataset loader for VOC12 and Cityscapes (adapted from bodokaiser/piwise code)
# Sept 2017
# Eduardo Romera
#######################

import numpy as np
import os

from PIL import Image

from torch.utils.data import Dataset

EXTENSIONS = ['.jpg', '.png']

def load_image(file):
return Image.open(file)

def is_image(filename):
return any(filename.endswith(ext) for ext in EXTENSIONS)

def is_label(filename):
return filename.endswith("_labelTrainIds.png")

def image_path(root, basename, extension):
return os.path.join(root, f'{basename}{extension}')

def image_path_city(root, name):
return os.path.join(root, f'{name}')

def image_basename(filename):
return os.path.basename(os.path.splitext(filename)[0])

class VOC12(Dataset):

def __init__(self, root, input_transform=None, target_transform=None):
self.images_root = os.path.join(root, 'images')
self.labels_root = os.path.join(root, 'labels')

self.filenames = [image_basename(f)
for f in os.listdir(self.labels_root) if is_image(f)]
self.filenames.sort()

self.input_transform = input_transform
self.target_transform = target_transform

def __getitem__(self, index):
filename = self.filenames[index]

with open(image_path(self.images_root, filename, '.jpg'), 'rb') as f:
image = load_image(f).convert('RGB')
with open(image_path(self.labels_root, filename, '.png'), 'rb') as f:
label = load_image(f).convert('P')

if self.input_transform is not None:
image = self.input_transform(image)
if self.target_transform is not None:
label = self.target_transform(label)

return image, label

def __len__(self):
return len(self.filenames)


class cityscapes(Dataset):

def __init__(self, root, input_transform=None, target_transform=None, subset='val'):
self.images_root = os.path.join(root, 'leftImg8bit/' + subset)
self.labels_root = os.path.join(root, 'gtFine/' + subset)

self.filenames = [os.path.join(dp, f) for dp, dn, fn in os.walk(os.path.expanduser(self.images_root)) for f in fn if is_image(f)]
self.filenames.sort()

self.filenamesGt = [os.path.join(dp, f) for dp, dn, fn in os.walk(os.path.expanduser(self.labels_root)) for f in fn if is_label(f)]
self.filenamesGt.sort()

self.input_transform = input_transform
self.target_transform = target_transform

def __getitem__(self, index):
filename = self.filenames[index]
filenameGt = self.filenamesGt[index]

#print(filename)

with open(image_path_city(self.images_root, filename), 'rb') as f:
image = load_image(f).convert('RGB')
with open(image_path_city(self.labels_root, filenameGt), 'rb') as f:
label = load_image(f).convert('P')

if self.input_transform is not None:
image = self.input_transform(image)
if self.target_transform is not None:
label = self.target_transform(label)

return image, label, filename, filenameGt

def __len__(self):
return len(self.filenames)

153 changes: 153 additions & 0 deletions eval/erfnet.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
# ERFNET full network definition for Pytorch
# Sept 2017
# Eduardo Romera
#######################

import torch
import torch.nn as nn
import torch.nn.init as init
import torch.nn.functional as F


class DownsamplerBlock (nn.Module):
def __init__(self, ninput, noutput):
super().__init__()

self.conv = nn.Conv2d(ninput, noutput-ninput, (3, 3), stride=2, padding=1, bias=True)
self.pool = nn.MaxPool2d(2, stride=2)
self.bn = nn.BatchNorm2d(noutput, eps=1e-3)

def forward(self, input):
output = torch.cat([self.conv(input), self.pool(input)], 1)
output = self.bn(output)
return F.relu(output)


class non_bottleneck_1d (nn.Module):
def __init__(self, chann, dropprob, dilated):
super().__init__()

self.conv3x1_1 = nn.Conv2d(chann, chann, (3, 1), stride=1, padding=(1,0), bias=True)

self.conv1x3_1 = nn.Conv2d(chann, chann, (1,3), stride=1, padding=(0,1), bias=True)

self.bn1 = nn.BatchNorm2d(chann, eps=1e-03)

self.conv3x1_2 = nn.Conv2d(chann, chann, (3, 1), stride=1, padding=(1*dilated,0), bias=True, dilation = (dilated,1))

self.conv1x3_2 = nn.Conv2d(chann, chann, (1,3), stride=1, padding=(0,1*dilated), bias=True, dilation = (1, dilated))

self.bn2 = nn.BatchNorm2d(chann, eps=1e-03)

self.dropout = nn.Dropout2d(dropprob)


def forward(self, input):

output = self.conv3x1_1(input)
output = F.relu(output)
output = self.conv1x3_1(output)
output = self.bn1(output)
output = F.relu(output)

output = self.conv3x1_2(output)
output = F.relu(output)
output = self.conv1x3_2(output)
output = self.bn2(output)

if (self.dropout.p != 0):
output = self.dropout(output)

return F.relu(output+input) #+input = identity (residual connection)


class Encoder(nn.Module):
def __init__(self, num_classes):
super().__init__()
self.initial_block = DownsamplerBlock(3,16)

self.layers = nn.ModuleList()

self.layers.append(DownsamplerBlock(16,64))

for x in range(0, 5): #5 times
self.layers.append(non_bottleneck_1d(64, 0.1, 1))

self.layers.append(DownsamplerBlock(64,128))

for x in range(0, 2): #2 times
self.layers.append(non_bottleneck_1d(128, 0.1, 2))
self.layers.append(non_bottleneck_1d(128, 0.1, 4))
self.layers.append(non_bottleneck_1d(128, 0.1, 8))
self.layers.append(non_bottleneck_1d(128, 0.1, 16))

#only for encoder mode:
self.output_conv = nn.Conv2d(128, num_classes, 1, stride=1, padding=0, bias=True)

def forward(self, input, predict=False):
output = self.initial_block(input)

for layer in self.layers:
output = layer(output)

if predict:
output = self.output_conv(output)

return output


class UpsamplerBlock (nn.Module):
def __init__(self, ninput, noutput):
super().__init__()
self.conv = nn.ConvTranspose2d(ninput, noutput, 3, stride=2, padding=1, output_padding=1, bias=True)
self.bn = nn.BatchNorm2d(noutput, eps=1e-3)

def forward(self, input):
output = self.conv(input)
output = self.bn(output)
return F.relu(output)

class Decoder (nn.Module):
def __init__(self, num_classes):
super().__init__()

self.layers = nn.ModuleList()

self.layers.append(UpsamplerBlock(128,64))
self.layers.append(non_bottleneck_1d(64, 0, 1))
self.layers.append(non_bottleneck_1d(64, 0, 1))

self.layers.append(UpsamplerBlock(64,16))
self.layers.append(non_bottleneck_1d(16, 0, 1))
self.layers.append(non_bottleneck_1d(16, 0, 1))

self.output_conv = nn.ConvTranspose2d( 16, num_classes, 2, stride=2, padding=0, output_padding=0, bias=True)

def forward(self, input):
output = input

for layer in self.layers:
output = layer(output)

output = self.output_conv(output)

return output


class ERFNet(nn.Module):
def __init__(self, num_classes, encoder=None): #use encoder to pass pretrained encoder
super().__init__()

if (encoder == None):
self.encoder = Encoder(num_classes)
else:
self.encoder = encoder
self.decoder = Decoder(num_classes)

def forward(self, input, only_encode=False):
if only_encode:
return self.encoder.forward(input, predict=True)
else:
output = self.encoder(input) #predict=False by default
return self.decoder.forward(output)

Loading

0 comments on commit d4a46fa

Please sign in to comment.