Full reupload after code downtime

Eromera · Dec 5, 2019 · d4a46fa · d4a46fa
commit d4a46fa
Show file tree

Hide file tree

Showing 27 changed files with 3,240 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,9 @@
+#Files
+*.pyc
+*.pyo
+*/__pycache__/
+*/*/__pycache__/
+*/*/*/__pycache__/
+eval/save_results/
+eval/save_color/
+save/
diff --git a/README.md b/README.md
@@ -0,0 +1,49 @@
+# ERFNet (PyTorch version)
+
+This code is a toolbox that uses **PyTorch** for training and evaluating the **ERFNet** architecture for semantic segmentation.
+
+**For the Original Torch version please go [HERE](https://github.com/Eromera/erfnet)**
+
+NOTE: This PyTorch version has a slightly better result than the ones in the Torch version (used in the paper): 72.1 IoU in Val set and 69.8 IoU in test set.
+
+![Example segmentation](example_segmentation.png?raw=true "Example segmentation")
+
+## Publications
+
+If you use this software in your research, please cite our publications:
+
+**"Efficient ConvNet for Real-time Semantic Segmentation"**, E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo, IEEE Intelligent Vehicles Symposium (IV), pp. 1789-1794, Redondo Beach (California, USA), June 2017. 
+**[Best Student Paper Award]**, [[pdf]](http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17iv.pdf)
+
+**"ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation"**, E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo, Transactions on Intelligent Transportation Systems (T-ITS), December 2017. [[pdf]](http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17tits.pdf)
+
+## Packages
+For instructions please refer to the README on each folder:
+
+* [train](train) contains tools for training the network for semantic segmentation.
+* [eval](eval) contains tools for evaluating/visualizing the network's output.
+* [imagenet](imagenet) Contains script and model for pretraining ERFNet's encoder in Imagenet.
+* [trained_models](trained_models) Contains the trained models used in the papers. NOTE: the pytorch version is slightly different from the torch models.
+
+## Requirements:
+
+* [**The Cityscapes dataset**](https://www.cityscapes-dataset.com/): Download the "leftImg8bit" for the RGB images and the "gtFine" for the labels. **Please note that for training you should use the "_labelTrainIds" and not the "_labelIds", you can download the [cityscapes scripts](https://github.com/mcordts/cityscapesScripts) and use the [conversor](https://github.com/mcordts/cityscapesScripts/blob/master/cityscapesscripts/preparation/createTrainIdLabelImgs.py) to generate trainIds from labelIds**
+* [**Python 3.6**](https://www.python.org/): If you don't have Python3.6 in your system, I recommend installing it with [Anaconda](https://www.anaconda.com/download/#linux)
+* [**PyTorch**](http://pytorch.org/): Make sure to install the Pytorch version for Python 3.6 with CUDA support (code only tested for CUDA 8.0). 
+* **Additional Python packages**: numpy, matplotlib, Pillow, torchvision and visdom (optional for --visualize flag)
+
+In Anaconda you can install with:
+```
+conda install numpy matplotlib torchvision Pillow
+conda install -c conda-forge visdom
+```
+
+If you use Pip (make sure to have it configured for Python3.6) you can install with: 
+
+```
+pip install numpy matplotlib torchvision Pillow visdom
+```
+
+## License
+
+This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summary here: http://creativecommons.org/licenses/by-nc/4.0/
diff --git a/eval/README.md b/eval/README.md
@@ -0,0 +1,55 @@
+# Functions for evaluating/visualizing the network's output
+
+Currently there are 4 usable functions to evaluate stuff:
+- eval_cityscapes_color
+- eval_cityscapes_server
+- eval_iou
+- eval_forwardTime
+
+## eval_cityscapes_color.py 
+
+This code can be used to produce segmentation of the Cityscapes images in color for visualization purposes. By default it saves images in eval/save_color/ folder. You can also visualize results in visdom with --visualize flag.
+
+**Options:** Specify the Cityscapes folder path with '--datadir' option. Select the cityscapes subset with '--subset' ('val', 'test', 'train' or 'demoSequence'). For other options check the bottom side of the file.
+
+**Examples:**
+```
+python eval_cityscapes_color.py --datadir /home/datasets/cityscapes/ --subset val
+```
+
+## eval_cityscapes_server.py 
+
+This code can be used to produce segmentation of the Cityscapes images and convert the output indices to the original 'labelIds' so it can be evaluated using the scripts from Cityscapes dataset (evalPixelLevelSemanticLabeling.py) or uploaded to Cityscapes test server. By default it saves images in eval/save_results/ folder.
+
+**Options:** Specify the Cityscapes folder path with '--datadir' option. Select the cityscapes subset with '--subset' ('val', 'test', 'train' or 'demoSequence'). For other options check the bottom side of the file.
+
+**Examples:**
+```
+python eval_cityscapes_server.py --datadir /home/datasets/cityscapes/ --subset val
+```
+
+## eval_iou.py 
+
+This code can be used to calculate the IoU (mean and per-class) in a subset of images with labels available, like Cityscapes val/train sets.
+
+**Options:** Specify the Cityscapes folder path with '--datadir' option. Select the cityscapes subset with '--subset' ('val' or 'train'). For other options check the bottom side of the file.
+
+**Examples:**
+```
+python eval_iou.py --datadir /home/datasets/cityscapes/ --subset val
+```
+
+## eval_forwardTime.py
+This function loads a model specified by '-m' and enters a loop to continuously estimate forward pass time (fwt) in the specified resolution. 
+
+**Options:** Option '--width' specifies the width (default: 1024). Option '--height' specifies the height (default: 512). For other options check the bottom side of the file.
+
+**Examples:**
+```
+python eval_forwardTime.py
+```
+
+**NOTE**: Paper values were obtained with a single Titan X (Maxwell) and a Jetson TX1 using the original Torch code. The pytorch code is a bit faster, but cudahalf (FP16) seems to give problems at the moment for some pytorch versions so this code only runs at FP32 (a bit slower).
+
+
+
diff --git a/eval/dataset.py b/eval/dataset.py
@@ -0,0 +1,100 @@
+# Code with dataset loader for VOC12 and Cityscapes (adapted from bodokaiser/piwise code)
+# Sept 2017
+# Eduardo Romera
+#######################
+
+import numpy as np
+import os
+
+from PIL import Image
+
+from torch.utils.data import Dataset
+
+EXTENSIONS = ['.jpg', '.png']
+
+def load_image(file):
+    return Image.open(file)
+
+def is_image(filename):
+    return any(filename.endswith(ext) for ext in EXTENSIONS)
+
+def is_label(filename):
+    return filename.endswith("_labelTrainIds.png")
+
+def image_path(root, basename, extension):
+    return os.path.join(root, f'{basename}{extension}')
+
+def image_path_city(root, name):
+    return os.path.join(root, f'{name}')
+
+def image_basename(filename):
+    return os.path.basename(os.path.splitext(filename)[0])
+
+class VOC12(Dataset):
+
+    def __init__(self, root, input_transform=None, target_transform=None):
+        self.images_root = os.path.join(root, 'images')
+        self.labels_root = os.path.join(root, 'labels')
+
+        self.filenames = [image_basename(f)
+            for f in os.listdir(self.labels_root) if is_image(f)]
+        self.filenames.sort()
+
+        self.input_transform = input_transform
+        self.target_transform = target_transform
+
+    def __getitem__(self, index):
+        filename = self.filenames[index]
+
+        with open(image_path(self.images_root, filename, '.jpg'), 'rb') as f:
+            image = load_image(f).convert('RGB')
+        with open(image_path(self.labels_root, filename, '.png'), 'rb') as f:
+            label = load_image(f).convert('P')
+
+        if self.input_transform is not None:
+            image = self.input_transform(image)
+        if self.target_transform is not None:
+            label = self.target_transform(label)
+
+        return image, label
+
+    def __len__(self):
+        return len(self.filenames)
+
+
+class cityscapes(Dataset):
+
+    def __init__(self, root, input_transform=None, target_transform=None, subset='val'):
+        self.images_root = os.path.join(root, 'leftImg8bit/' + subset)
+        self.labels_root = os.path.join(root, 'gtFine/' + subset)
+
+        self.filenames = [os.path.join(dp, f) for dp, dn, fn in os.walk(os.path.expanduser(self.images_root)) for f in fn if is_image(f)]
+        self.filenames.sort()
+
+        self.filenamesGt = [os.path.join(dp, f) for dp, dn, fn in os.walk(os.path.expanduser(self.labels_root)) for f in fn if is_label(f)]
+        self.filenamesGt.sort()
+
+        self.input_transform = input_transform
+        self.target_transform = target_transform
+
+    def __getitem__(self, index):
+        filename = self.filenames[index]
+        filenameGt = self.filenamesGt[index]
+
+        #print(filename)
+
+        with open(image_path_city(self.images_root, filename), 'rb') as f:
+            image = load_image(f).convert('RGB')
+        with open(image_path_city(self.labels_root, filenameGt), 'rb') as f:
+            label = load_image(f).convert('P')
+
+        if self.input_transform is not None:
+            image = self.input_transform(image)
+        if self.target_transform is not None:
+            label = self.target_transform(label)
+
+        return image, label, filename, filenameGt
+
+    def __len__(self):
+        return len(self.filenames)
+
diff --git a/eval/erfnet.py b/eval/erfnet.py
@@ -0,0 +1,153 @@
+# ERFNET full network definition for Pytorch
+# Sept 2017
+# Eduardo Romera
+#######################
+
+import torch
+import torch.nn as nn
+import torch.nn.init as init
+import torch.nn.functional as F
+
+
+class DownsamplerBlock (nn.Module):
+    def __init__(self, ninput, noutput):
+        super().__init__()
+
+        self.conv = nn.Conv2d(ninput, noutput-ninput, (3, 3), stride=2, padding=1, bias=True)
+        self.pool = nn.MaxPool2d(2, stride=2)
+        self.bn = nn.BatchNorm2d(noutput, eps=1e-3)
+
+    def forward(self, input):
+        output = torch.cat([self.conv(input), self.pool(input)], 1)
+        output = self.bn(output)
+        return F.relu(output)
+
+
+class non_bottleneck_1d (nn.Module):
+    def __init__(self, chann, dropprob, dilated):        
+        super().__init__()
+
+        self.conv3x1_1 = nn.Conv2d(chann, chann, (3, 1), stride=1, padding=(1,0), bias=True)
+
+        self.conv1x3_1 = nn.Conv2d(chann, chann, (1,3), stride=1, padding=(0,1), bias=True)
+
+        self.bn1 = nn.BatchNorm2d(chann, eps=1e-03)
+
+        self.conv3x1_2 = nn.Conv2d(chann, chann, (3, 1), stride=1, padding=(1*dilated,0), bias=True, dilation = (dilated,1))
+
+        self.conv1x3_2 = nn.Conv2d(chann, chann, (1,3), stride=1, padding=(0,1*dilated), bias=True, dilation = (1, dilated))
+
+        self.bn2 = nn.BatchNorm2d(chann, eps=1e-03)
+
+        self.dropout = nn.Dropout2d(dropprob)
+
+
+    def forward(self, input):
+
+        output = self.conv3x1_1(input)
+        output = F.relu(output)
+        output = self.conv1x3_1(output)
+        output = self.bn1(output)
+        output = F.relu(output)
+
+        output = self.conv3x1_2(output)
+        output = F.relu(output)
+        output = self.conv1x3_2(output)
+        output = self.bn2(output)
+
+        if (self.dropout.p != 0):
+            output = self.dropout(output)
+
+        return F.relu(output+input)    #+input = identity (residual connection)
+
+
+class Encoder(nn.Module):
+    def __init__(self, num_classes):
+        super().__init__()
+        self.initial_block = DownsamplerBlock(3,16)
+
+        self.layers = nn.ModuleList()
+
+        self.layers.append(DownsamplerBlock(16,64))
+
+        for x in range(0, 5):    #5 times
+           self.layers.append(non_bottleneck_1d(64, 0.1, 1))  
+
+        self.layers.append(DownsamplerBlock(64,128))
+
+        for x in range(0, 2):    #2 times
+            self.layers.append(non_bottleneck_1d(128, 0.1, 2))
+            self.layers.append(non_bottleneck_1d(128, 0.1, 4))
+            self.layers.append(non_bottleneck_1d(128, 0.1, 8))
+            self.layers.append(non_bottleneck_1d(128, 0.1, 16))
+
+        #only for encoder mode:
+        self.output_conv = nn.Conv2d(128, num_classes, 1, stride=1, padding=0, bias=True)
+
+    def forward(self, input, predict=False):
+        output = self.initial_block(input)
+
+        for layer in self.layers:
+            output = layer(output)
+
+        if predict:
+            output = self.output_conv(output)
+
+        return output
+
+
+class UpsamplerBlock (nn.Module):
+    def __init__(self, ninput, noutput):
+        super().__init__()
+        self.conv = nn.ConvTranspose2d(ninput, noutput, 3, stride=2, padding=1, output_padding=1, bias=True)
+        self.bn = nn.BatchNorm2d(noutput, eps=1e-3)
+
+    def forward(self, input):
+        output = self.conv(input)
+        output = self.bn(output)
+        return F.relu(output)
+
+class Decoder (nn.Module):
+    def __init__(self, num_classes):
+        super().__init__()
+
+        self.layers = nn.ModuleList()
+
+        self.layers.append(UpsamplerBlock(128,64))
+        self.layers.append(non_bottleneck_1d(64, 0, 1))
+        self.layers.append(non_bottleneck_1d(64, 0, 1))
+
+        self.layers.append(UpsamplerBlock(64,16))
+        self.layers.append(non_bottleneck_1d(16, 0, 1))
+        self.layers.append(non_bottleneck_1d(16, 0, 1))
+
+        self.output_conv = nn.ConvTranspose2d( 16, num_classes, 2, stride=2, padding=0, output_padding=0, bias=True)
+
+    def forward(self, input):
+        output = input
+
+        for layer in self.layers:
+            output = layer(output)
+
+        output = self.output_conv(output)
+
+        return output
+
+
+class ERFNet(nn.Module):
+    def __init__(self, num_classes, encoder=None):  #use encoder to pass pretrained encoder
+        super().__init__()
+
+        if (encoder == None):
+            self.encoder = Encoder(num_classes)
+        else:
+            self.encoder = encoder
+        self.decoder = Decoder(num_classes)
+
+    def forward(self, input, only_encode=False):
+        if only_encode:
+            return self.encoder.forward(input, predict=True)
+        else:
+            output = self.encoder(input)    #predict=False by default
+            return self.decoder.forward(output)
+