diff --git a/README.md b/README.md
index 0c9bed8..bf0e525 100644
--- a/README.md
+++ b/README.md
@@ -1,244 +1,337 @@
-# DeepLab with PyTorch <!-- omit in toc -->
-
-This is an unofficial **PyTorch** implementation to train **DeepLab v2** model (ResNet backbone) [[1](##references)] on **COCO-Stuff** dataset [[2](##references)].
-DeepLab is one of the CNN architectures for semantic image segmentation.
-COCO-Stuff is a semantic segmentation dataset, which includes 164k images annotated with 171 thing/stuff classes (+ unlabeled).
-
-This repository aims to reproduce the official score of DeepLab v2 on COCO-Stuff datasets.
-The model can be trained both on the latest [COCO-Stuff 164k](https://github.com/nightrome/cocostuff) and [COCO-Stuff 10k](https://github.com/nightrome/cocostuff10k), *without building the official implementation in Caffe*.
-[Pre-trained models are provided](#models----omit-in-toc).
-ResNet-based DeepLab v3/v3+ are also included, although they are not tested.
-```torch.hub``` is supported.
-
-- [Setup](#setup)
-- [Training](#training)
-- [Evaluation](#evaluation)
-- [Performance](#performance)
-- [Demo](#demo)
-- [Misc](#misc)
-- [References](#references)
+# DeepLab with PyTorch <!-- omit in toc --> 
 
-## Setup
+This is an unofficial **PyTorch** implementation of **DeepLab v2** [[1](##references)] with a **ResNet** backbone. **COCO-Stuff** dataset [[2](##references)] and **PASCAL VOC** dataset [[3]()] are supported. The initial weights (`.caffemodel`) officially provided by the authors are can be converted/used without building the Caffe API. DeepLab v3/v3+ models with the identical backbone are also included (although not tested). [```torch.hub``` is supported](#torchhub).
 
-### Requirements <!-- omit in toc -->
+## Performance
 
-For anaconda users:
+Pretrained models are provided. Note that the 2D interpolation ways are different from the original, which leads to a bit better results.
+
+### COCO-Stuff
+
+<table>
+    <tr>
+        <th>Train set</th>
+        <th>Eval set</th>
+        <th>CRF?</th>
+        <th>Code</th>
+        <th>Pixel<br>Accuracy</th>
+        <th>Mean<br>Accuracy</th>
+        <th>Mean IoU</th>
+        <th>FreqW IoU</th>
+    </tr>
+    <tr>
+        <td rowspan="3">
+            10k <i>train</i> &dagger;<br>
+            (<a href='https://drive.google.com/file/d/1Cgbl3Q_tHPFPyqfx2hx-9FZYBSbG5Rhy/view?usp=sharing'>Model</a>)
+        </td>
+        <td rowspan="3">10k <i>val</i> &dagger;</td>
+        <td rowspan="2"></td>
+        <td>Original [<a href="#references">2</a>]</td>
+        <td><strong>65.1</strong></td>
+        <td><strong>45.5</strong></td>
+        <td><strong>34.4</strong></td>
+        <td><strong>50.4</strong></td>
+    </tr>
+    <tr>
+        <td>Ours</td>
+        <td><strong>65.8</td>
+        <td><strong>45.7</strong></td>
+        <td><strong>34.8</strong></td>
+        <td><strong>51.2</strong></td>
+    </tr>
+    <tr>
+        <td>&#10003;</td>
+        <td>Ours</td>
+        <td>67.1</td>
+        <td>46.4</td>
+        <td>35.6</td>
+        <td>52.5</td>
+    </tr>
+    <tr>
+        <td rowspan="4">
+            164k <i>train</i><br>
+            (<a href='https://drive.google.com/file/d/18kR928yl9Hz4xxuxnYgg7Hpi36hM8J2d/view?usp=sharing'>Model</a>)
+        </td>
+        <td rowspan="2">10k <i>val</i></td>
+        <td></td>
+        <td>Ours</td>
+        <td>68.4</td>
+        <td>55.6</td>
+        <td>44.2</td>
+        <td>55.1</td>
+    </tr>
+    <tr>
+        <td>&#10003;</td>
+        <td>Ours</td>
+        <td>69.2</td>
+        <td>55.9</td>
+        <td>45.0</td>
+        <td>55.9</td>
+    </tr>
+    <tr>
+        <td rowspan="2">164k <i>val</i></td>
+        <td></td>
+        <td>Ours</td>
+        <td>66.8</td>
+        <td>51.2</td>
+        <td>39.1</td>
+        <td>51.5</td>
+    </tr>
+    <tr>
+        <td>&#10003;</td>
+        <td>Ours</td>
+        <td>67.6</td>
+        <td>51.5</td>
+        <td>39.7</td>
+        <td>52.3</td>
+    </tr>
+</table>
+
+&dagger; Images and labels are pre-warped to square-shape 513x513
+
+### PASCAL VOC 2012
+
+<table>
+    <tr>
+        <th>Train set</th>
+        <th>Eval set</th>
+        <th>CRF?</th>
+        <th>Code</th>
+        <th>Pixel<br>Accuracy</th>
+        <th>Mean<br>Accuracy</th>
+        <th>Mean IoU</th>
+        <th>FreqW IoU</th>
+    </tr>
+    <tr>
+        <td rowspan="4">
+            <i>trainaug</i><br>
+            (<a href='https://drive.google.com/file/d/1FaW2Sp7Jj3eaoyZtbabM1IWZnuScN-u6/view?usp=sharing'>Model</a>)
+        </td>
+        <td rowspan="4"><i>val</i></td>
+        <td rowspan="2"></td>
+        <td>Original [<a href="#references">3</a>]</td>
+        <td>-</td>
+        <td>-</td>
+        <td><strong>76.35</strong></td>
+        <td>-</td>
+    </tr>
+    <tr>
+        <td>Ours</td>
+        <td>94.64</td>
+        <td>86.50</td>
+        <td><strong>76.65</td>
+        <td>90.41</td>
+    </tr>
+    <tr>
+        <td rowspan="2">&#10003;</td>
+        <td>Original [<a href="#references">3</a>]</td>
+        <td>-</td>
+        <td>-</td>
+        <td><strong>77.69</strong></td>
+        <td>-</td>
+    </tr>
+    <tr>
+        <td>Ours</td>
+        <td>95.04</td>
+        <td>86.64</td>
+        <td><strong>77.93</strong></td>
+        <td>91.06</td>
+    </tr>
+</table>
 
-```sh
-# Please modify CUDA option according to your environment
-conda env create --file config/conda_env.yaml
-```
+## Setup
 
-* python 2.7+/3.6+
-* [pytorch](https://pytorch.org/) 0.4.1+
-* [torchvision](https://pytorch.org/)
-* [torchnet](https://github.com/pytorch/tnt)
-* [pydensecrf](https://github.com/lucasb-eyer/pydensecrf)
-* [tensorflow](https://www.tensorflow.org/install/) (for tensorboard)
-* [tensorboardX](https://github.com/lanpa/tensorboard-pytorch) 1.0+
-* opencv 3.0.0+
-* tqdm
-* click
-* addict
-* h5py
-* scipy
-* matplotlib
-* yaml
-* joblib
-
-### Datasets <!-- omit in toc -->
-
-COCO-Stuff 164k is the latest version and recommended.
-
-<details>
-<summary><strong>COCO-Stuff 164k</strong> (click to show the structure)</summary>
-<pre>
-├── images
-│   ├── train2017
-│   │   ├── 000000000009.jpg
-│   │   └── ...
-│   └── val2017
-│       ├── 000000000139.jpg
-│       └── ...
-└── annotations
-    ├── train2017
-    │   ├── 000000000009.png
-    │   └── ...
-    └── val2017
-        ├── 000000000139.png
-        └── ...
-</pre>
-</details>
-<br>
-
-1. Run the script below to download the dataset (20GB+).
+### Requirements
 
-```sh
-./scripts/setup_cocostuff164k.sh <PATH TO DOWNLOAD>
-```
+* Python 2.7+/3.6+
+* Anaconda environement
 
-2. Set the path to the dataset in ```config/cocostuff164k.yaml```.
+Then setup from `conda_env.yaml`. Please modify cuda option as needed (default: `cudatoolkit=10.0`)
 
-```yaml
-DATASET: cocostuff164k
-ROOT: # <- Write here
-...
+```console
+$ conda env create -f configs/conda_env.yaml
+$ conda activate deeplab-pytorch
 ```
 
-<details>
-<summary><strong>COCO-Stuff 10k</strong> (click to show the structure)</summary>
-<pre>
-├── images
-│   ├── COCO_train2014_000000000077.jpg
-│   └── ...
-├── annotations
-│   ├── COCO_train2014_000000000077.mat
-│   └── ...
-└── imageLists
-    ├── all.txt
-    ├── test.txt
-    └── train.txt
-</pre>
-</details>
-<br>
-
-1. Run the script below to download the dataset (2GB).
-
-```sh
-./scripts/setup_cocostuff10k.sh <PATH TO DOWNLOAD>
-```
+### Datasets
 
-2. Set the path to the dataset in ```config/cocostuff10k.yaml```.
+Setup instruction is provided in each link.
 
-```yaml
-DATASET: cocostuff10k
-ROOT: # <- Write here
-...
-```
+* [COCO-Stuff 10k/164k](data/datasets/cocostuff/README.md)
+* [PASCAL VOC 2012](data/datasets/voc12/README.md)
 
-### Initial parameters <!-- omit in toc -->
+### Initial weights
 
 1. Run the script below to download caffemodel pre-trained on ImageNet and 91-class COCO (1GB+).
 
 ```sh
-./scripts/setup_caffemodels.sh
+$ bash scripts/setup_caffemodels.sh
 ```
 
-2. Convert the caffemodel to pytorch compatible. No need to build the official DeepLab!
+2. Convert the caffemodel to pytorch compatible. No need to build the official Caffe API!
 
 ```sh
-# This generates deeplabv2_resnet101_COCO_init.pth from init.caffemodel
-python convert.py --dataset init
+# This generates "deeplabv2_resnet101_COCO_init.pth" from "init.caffemodel"
+$ python convert.py --dataset coco
 ```
-You can also convert an included ```train2_iter_20000.caffemodel``` for PASCAL VOC 2012 dataset. See [here](config/README.md#voc12yaml). 
 
 ## Training
 
-Training, evaluation, and some demos are all through the [```.yaml``` configuration files](config/README.md).
+Please see [```./scripts/train_eval.sh```](scripts/train_eval.sh) for example usage.
 
-```sh
-# Train DeepLab v2 on COCO-Stuff 164k
-python main.py train --config config/cocostuff164k.yaml
 ```
+Usage: main.py train [OPTIONS]
 
-```sh
-# Monitor a cross-entropy loss
-tensorboard --logdir runs
-```
+  Training DeepLab by v2 protocol
 
-Default settings:
+Options:
+  -c, --config-path FILENAME  Dataset configuration file in YAML  [required]
+  --cuda / --cpu              Enable CUDA if available [default: --cuda]
+  --help                      Show this message and exit.
+```
 
-- All the GPUs visible to the process are used. Please specify the scope with ```CUDA_VISIBLE_DEVICES=```.
-- Stochastic gradient descent (SGD) is used with momentum of 0.9 and initial learning rate of 2.5e-4. Polynomial learning rate decay is employed; the learning rate is multiplied by ```(1-iter/iter_max)**power``` at every 10 iterations.
-- Weights are updated 20k iterations for COCO-Stuff 10k and 100k iterations for COCO-Stuff 164k, with a mini-batch of 10. The batch is not processed at once due to high occupancy of video memories, instead, gradients of small batches are aggregated, and weight updating is performed at the end (```batch_size * iter_size = 10```).
-- Input images are initially warped to 513x513, randomly re-scaled by factors ranging from 0.5 to 1.5, zero-padded if needed, and randomly cropped to 321x321 so that the input size is fixed during training (see the example below).
-- The label indices range from 0 to 181 and the model outputs a 182-dim categorical distribution, but only [171 classes](https://github.com/nightrome/cocostuff/blob/master/labels.md) are supervised with COCO-Stuff.
-- Loss is defined as a sum of responses from multi-scale inputs (1x, 0.75x, 0.5x) and element-wise max across the scales. The "unlabeled" class (index -1) is ignored in the loss computation.
-- Moving average loss (```average_loss``` in Caffe) can be monitored in TensorBoard.
-- GPU memory usage is approx. 11.2 GB with the default setting (tested on the single Titan X). You can reduce it with a small ```batch_size```.
+To monitor a loss, lr values, and gpu usage:
 
-Processed image vs. label examples:
+```sh
+$ tensorboard --logdir data/logs
+```
 
-![Data](docs/data.png)
+Common settings:
+
+- **Model**: DeepLab v2 with ResNet-101 backbone. Dilated rates of ASPP are (6, 12, 18, 24). Output stride is 8.
+- **Multi-GPU**: All the GPUs visible to the process are used. Please specify the scope with
+```CUDA_VISIBLE_DEVICES=```.
+- **Multi-scale loss**: Loss is defined as a sum of responses from multi-scale inputs (1x, 0.75x, 0.5x) and
+element-wise max across the scales. The *unlabeled* class is ignored in the loss computation.
+- **Gradient accumulation**: The mini-batch of 10 samples is not processed at once due to the high occupancy of GPU
+memories. Instead, gradients of small batches of 5 samples are accumulated for 2 iterations, and weight updating is
+performed at the end (```batch_size * iter_size = 10```). GPU memory usage is approx. 11.2 GB with the default setting
+(tested on the single Titan X). You can reduce it with a small ```batch_size```.
+- **Learning rate**: Stochastic gradient descent (SGD) is used with momentum of 0.9 and initial learning rate of
+2.5e-4. Polynomial learning rate decay is employed; the learning rate is multiplied by ```(1-iter/iter_max)**power```
+at every 10 iterations.
+- **Monitoring**: Moving average loss (```average_loss``` in Caffe) can be monitored in TensorBoard.
+
+[COCO-Stuff 164k](config/cocostuff164k.yaml):
+- **#Iterations**: Updated 100k iterations.
+- **#Classes**: The label indices range from 0 to 181 and the model outputs a 182-dim categorical distribution, but
+only [171 classes](https://github.com/nightrome/cocostuff/blob/master/labels.md) are supervised with COCO-Stuff. Index 255 is an unlabeled class to be ignored.
+- **Preprocessing**: (1) Input images are randomly re-scaled by factors ranging
+from 0.5 to 1.5, (2) padded if needed, and (3) randomly cropped to 321x321.
+
+[COCO-Stuff 10k](config/cocostuff10k.yaml):
+- **#Iterations**: Updated 20k iterations.
+- **#Classes**: Same as the 164k version above.
+- **Preprocessing**: (1) Input images are initially warped to 513x513 squares, (2) randomly re-scaled by factors ranging from
+0.5 to 1.5, (3) padded if needed, and (4) randomly cropped to 321x321 so that the input size is fixed during training.
+
+[PASCAL VOC 2012](config/voc12.yaml):
+- **#Iterations**: Updated 20k iterations.
+- **#Classes**: 20 foreground objects + 1 background. Index 255 is an unlabeled class to be ignored.
+- **Preprocessing**: (1) Input images are randomly re-scaled by factors ranging from 0.5 to 1.5, (2) padded if needed, and (3) randomly cropped
+to 321x321.
+
+Processed image vs. label examples in COCO-Stuff:
+
+![Data](docs/datasets/cocostuff.png)
 
 ## Evaluation
 
-```sh
-# Evaluate the final model on COCO-Stuff 164k validation set
-python main.py test --config config/cocostuff164k.yaml \
-                    --model-path data/models/deeplab_resnet101/cocostuff164k/checkpoint_final.pth
+To compute scores in:
+* Pixel accuracy
+* Mean accuracy
+* Mean IoU
+* Frequency weighted IoU
+
 ```
+Usage: main.py test [OPTIONS]
 
-You can run CRF post-processing with a option ```--crf```. See ```--help``` for more details.
+  Evaluation on validation set
 
-## Performance
+Options:
+  -c, --config-path FILENAME  Dataset configuration file in YAML  [required]
+  -m, --model-path PATH       PyTorch model to be loaded  [required]
+  --cuda / --cpu              Enable CUDA if available [default: --cuda]
+  --help                      Show this message and exit.
+```
 
-### Validation scores <!-- omit in toc -->
+To perform CRF post-processing:
 
-| &nbsp;                                                            |   Train set   |  Eval set   |  CRF?  | Pixel<br>Accuracy | Mean<br>Accuracy |  Mean IoU  | FreqW IoU  |
-| :---------------------------------------------------------------- | :-----------: | :---------: | :----: | :---------------: | :--------------: | :--------: | :--------: |
-| [**Official (Caffe)**](https://github.com/nightrome/cocostuff10k) | **10k train** | **10k val** | **No** |    **65.1 %**     |    **45.5 %**    | **34.4 %** | **50.4 %** |
-| **This repo**                                                     | **10k train** | **10k val** | **No** |    **65.3 %**     |    **45.3 %**    | **34.4 %** | **50.5 %** |
-| This repo                                                         |   10k train   |   10k val   |  Yes   |      66.7 %       |      45.9 %      |   35.5 %   |   51.9 %   |
-| This repo                                                         |  164k train   |   10k val   |   No   |      67.6 %       |      54.9 %      |   43.2 %   |   53.9 %   |
-| This repo                                                         |  164k train   |   10k val   |  Yes   |      68.7 %       |      55.3 %      |   44.4 %   |   55.1 %   |
-| This repo                                                         |  164k train   |  164k val   |   No   |      65.7 %       |      49.7 %      |   37.6 %   |   50.0 %   |
-| This repo                                                         |  164k train   |  164k val   |  Yes   |      66.8 %       |      50.1 %      |   38.5 %   |   51.1 %   |
+```
+Usage: main.py crf [OPTIONS]
 
-### Models <!-- omit in toc -->
+  CRF post-processing on pre-computed logits
 
-* [Models trained on COCO-Stuff 10k/164k (*.pth)](https://drive.google.com/drive/folders/1m3wyXvvWy-IvGmdFS_dsQCRXhFNhek8_?usp=sharing)
-* [Scores (*.json)](https://drive.google.com/drive/folders/1PouglnlwsyHTwdSo_d55WgMgdnxbxmE6?usp=sharing)
+Options:
+  -c, --config-path FILENAME  Dataset configuration file in YAML  [required]
+  -j, --n-jobs INTEGER        Number of parallel jobs  [default: # cores]
+  --help                      Show this message and exit.
+```
 
 ## Demo
 
-### From an image <!-- omit in toc -->
+|           COCO-Stuff 164k            |           COCO-Stuff 10k            |       PASCAL VOC 2012        |         Pretrained COCO          |
+| :----------------------------------: | :---------------------------------: | :--------------------------: | :------------------------------: |
+| ![](docs/examples/cocostuff164k.png) | ![](docs/examples/cocostuff10k.png) | ![](docs/examples/voc12.png) | ![](docs/examples/coco_init.png) |
 
-```bash
-python demo.py single --config config/cocostuff164k.yaml \
-                      --model-path <PATH TO MODEL> \
-                      --image-path <PATH TO IMAGE> \
-                      --crf
+### Single image
+
+```
+Usage: demo.py single [OPTIONS]
+
+  Inference from a single image
+
+Options:
+  -c, --config-path FILENAME  Dataset configuration file in YAML  [required]
+  -m, --model-path PATH       PyTorch model to be loaded  [required]
+  -i, --image-path PATH       Image to be processed  [required]
+  --cuda / --cpu              Enable CUDA if available [default: --cuda]
+  --crf                       CRF post-processing  [default: False]
+  --help                      Show this message and exit.
 ```
 
-### From a webcam <!-- omit in toc -->
+### Webcam
 
 A class of mouseovered pixel is shown in terminal.
 
-```bash
-python demo.py live --config config/cocostuff164k.yaml \
-                    --model-path <PATH TO MODEL> \
-                    --camera-id <CAMERA ID> \
-                    --crf
+```
+Usage: demo.py live [OPTIONS]
+
+  Inference from camera stream
+
+Options:
+  -c, --config-path FILENAME  Dataset configuration file in YAML  [required]
+  -m, --model-path PATH       PyTorch model to be loaded  [required]
+  --cuda / --cpu              Enable CUDA if available [default: --cuda]
+  --crf                       CRF post-processing  [default: False]
+  --camera-id INTEGER         Device ID  [default: 0]
+  --help                      Show this message and exit.
 ```
 
-### torch.hub <!-- omit in toc -->
+### torch.hub
+
 
 Model setup with 3 lines.
 
 ```python
 import torch.hub
-
 model = torch.hub.load("kazuto1011/deeplab-pytorch", "deeplabv2_resnet101", n_classes=182)
-model.load_state_dict(torch.load("cocostuff164k_iter100k.pth"))
+model.load_state_dict(torch.load("deeplabv2_resnet101_msc-cocostuff164k-100000.pth"))
 ```
 
 ## Misc
 
-### Image processing <!-- omit in toc -->
+### Difference with Caffe version
 
-Default setting warps an image and a label to square-shape as the official code does.
-To preserve aspect ratio in the image preprocessing, please modify ```.yaml``` as follows:
+* While the official code employs 1/16 bilinear interpolation (```Interp``` layer) for downsampling a label for only 0.5x input, this codebase does for both 0.5x and 0.75x inputs with nearest interpolation (```PIL.Image.resize```, [related issue](https://github.com/kazuto1011/deeplab-pytorch/issues/51)).
+* Bilinear interpolation on images and logits is performed with the ```align_corners=False```.
 
-```yaml
-BATCH_SIZE:
-    TEST: 1
-WARP_IMAGE: False
-```
+### Training batch normalization
 
-### Training batch normalization <!-- omit in toc -->
 
-This codebase only supports DeepLab v2 training which freezes batch normalization layers, although v3/v3+ protocols require training them. If training their parameters as well in your projects, please install [the extra library](https://hangzhang.org/PyTorch-Encoding/) below.
+This codebase only supports DeepLab v2 training which freezes batch normalization layers, although
+v3/v3+ protocols require training them. If training their parameters on multiple GPUs as well in your projects, please
+install [the extra library](https://hangzhang.org/PyTorch-Encoding/) below.
 
 ```bash
 pip install torch-encoding
@@ -256,8 +349,17 @@ except:
 
 ## References
 
-1. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. *IEEE TPAMI*, 2018.<br>
-[Project](http://liangchiehchen.com/projects/DeepLab.html) / [Code](https://bitbucket.org/aquariusjay/deeplab-public-ver2) / [arXiv paper](https://arxiv.org/abs/1606.00915)
+1. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. DeepLab: Semantic Image
+Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. *IEEE TPAMI*,
+2018.<br>
+[Project](http://liangchiehchen.com/projects/DeepLab.html) /
+[Code](https://bitbucket.org/aquariusjay/deeplab-public-ver2) / [arXiv
+paper](https://arxiv.org/abs/1606.00915)
 
 2. H. Caesar, J. Uijlings, V. Ferrari. COCO-Stuff: Thing and Stuff Classes in Context. In *CVPR*, 2018.<br>
-[Project](https://github.com/nightrome/cocostuff) / [Code](https://github.com/nightrome/cocostuff) / [arXiv paper](https://arxiv.org/abs/1612.03716)
\ No newline at end of file
+[Project](https://github.com/nightrome/cocostuff) / [arXiv paper](https://arxiv.org/abs/1612.03716)
+
+1. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The PASCAL Visual Object
+Classes (VOC) Challenge. *IJCV*, 2010.<br>
+[Project](http://host.robots.ox.ac.uk/pascal/VOC) /
+[Paper](http://host.robots.ox.ac.uk/pascal/VOC/pubs/everingham10.pdf)
\ No newline at end of file
diff --git a/config/README.md b/config/README.md
deleted file mode 100644
index 59f2ef3..0000000
--- a/config/README.md
+++ /dev/null
@@ -1,11 +0,0 @@
-## ```cocostuff10k.yaml```
-
-COCO-Stuff 10k. See the main [```README.md```](../README.md#default-settings).
-
-## ```cocostuff164k.yaml```
-
-COCO-Stuff 164k. See the main [```README.md```](../README.md#default-settings).
-
-## ```voc12.yaml```
-
-PASCAL VOC2012. If you want to try the official voc12 caffemodel, please convert with the ```convert.py --dataset voc12``` and use this configuration file.
\ No newline at end of file
diff --git a/config/conda_env.yaml b/config/conda_env.yaml
deleted file mode 100644
index 826a12f..0000000
--- a/config/conda_env.yaml
+++ /dev/null
@@ -1,25 +0,0 @@
-name: deeplab-pytorch
-channels:
-  - pytorch
-  - conda-forge
-  - defaults
-dependencies:
-  - pytorch
-  - torchvision
-  # - cudatoolkit=10.0
-  - h5py
-  - scipy
-  - matplotlib
-  - pyyaml
-  - click
-  - tqdm
-  # - clang
-  # - clangxx
-  - pydensecrf
-  - pip:
-    - torchnet==0.0.2
-    - opencv-python==3.4.1.15
-    - tensorboardX==1.2
-    - tensorflow==1.9.0
-    - addict
-    - joblib
\ No newline at end of file
diff --git a/configs/coco.yaml b/configs/coco.yaml
new file mode 100644
index 0000000..b358bd6
--- /dev/null
+++ b/configs/coco.yaml
@@ -0,0 +1,58 @@
+EXP:
+    ID: coco
+    OUTPUT_DIR: data
+
+DATASET:
+    NAME: coco
+    ROOT:
+    LABELS: ./data/datasets/coco/labels.txt
+    N_CLASSES: 91
+    IGNORE_LABEL:
+    SCALES:
+    SPLIT:
+        TRAIN:
+        VAL:
+        TEST:
+
+DATALOADER:
+    NUM_WORKERS: 0
+
+IMAGE:
+    MEAN:
+        R: 122.675
+        G: 116.669
+        B: 104.008
+    SIZE:
+        BASE:
+        TRAIN:
+        TEST: 513
+
+MODEL:
+    NAME: DeepLabV1_ResNet101
+    N_BLOCKS: [3, 4, 23, 3]
+    ATROUS_RATES:
+    INIT_MODEL:
+
+SOLVER:
+    BATCH_SIZE:
+        TRAIN: 5
+        TEST: 1
+    ITER_MAX: 100000
+    ITER_SIZE: 2
+    ITER_SAVE: 5000
+    ITER_TB: 100
+    LR_DECAY: 10
+    LR: 2.5e-4
+    MOMENTUM: 0.9
+    OPTIMIZER: sgd
+    POLY_POWER: 0.9
+    WEIGHT_DECAY: 5.0e-4
+    AVERAGE_LOSS: 20
+
+CRF:
+    ITER_MAX: 10
+    POS_W: 3
+    POS_XY_STD: 1
+    BI_W: 4
+    BI_XY_STD: 67
+    BI_RGB_STD: 3
diff --git a/config/cocostuff10k.yaml b/configs/cocostuff10k.yaml
similarity index 67%
rename from config/cocostuff10k.yaml
rename to configs/cocostuff10k.yaml
index c6686e6..d798908 100644
--- a/config/cocostuff10k.yaml
+++ b/configs/cocostuff10k.yaml
@@ -1,10 +1,13 @@
+EXP:
+    ID: cocostuff10k
+    OUTPUT_DIR: data
+
 DATASET:
     NAME: cocostuff10k
     ROOT: /media/kazuto1011/Extra/cocostuff/cocostuff-10k-v1.1
-    LABELS: ./data/datasets/cocostuff/labels_2.txt
-    N_CLASSES: 182    
-    IGNORE_LABEL: -1
-    WARP_IMAGE: True
+    LABELS: ./data/datasets/cocostuff/labels.txt
+    N_CLASSES: 182
+    IGNORE_LABEL: 255
     SCALES: [0.5, 0.75, 1.0, 1.25, 1.5]
     SPLIT:
         TRAIN: train
@@ -20,17 +23,15 @@ IMAGE:
         G: 116.669
         B: 104.008
     SIZE:
-        TRAIN:
-            BASE: 513
-            CROP: 321
+        BASE:
+        TRAIN: 321
         TEST: 513
 
 MODEL:
     NAME: DeepLabV2_ResNet101_MSC
     N_BLOCKS: [3, 4, 23, 3]
     ATROUS_RATES: [6, 12, 18, 24]
-    INIT_MODEL: ./data/models/deeplab_resnet101/coco_init/deeplabv2_resnet101_COCO_init.pth
-    SAVE_DIR: ./data/models/deeplab_resnet101/cocostuff10k
+    INIT_MODEL: data/models/coco/deeplabv1_resnet101/caffemodel/deeplabv1_resnet101-coco.pth
 
 SOLVER:
     BATCH_SIZE:
@@ -46,7 +47,6 @@ SOLVER:
     OPTIMIZER: sgd
     POLY_POWER: 0.9
     WEIGHT_DECAY: 5.0e-4
-    LOG_DIR: runs/cocostuff10k
     AVERAGE_LOSS: 20
 
 CRF:
@@ -55,4 +55,4 @@ CRF:
     POS_XY_STD: 1
     BI_W: 4
     BI_XY_STD: 67
-    BI_RGB_STD: 3
\ No newline at end of file
+    BI_RGB_STD: 3
diff --git a/config/cocostuff164k.yaml b/configs/cocostuff164k.yaml
similarity index 70%
rename from config/cocostuff164k.yaml
rename to configs/cocostuff164k.yaml
index 7e4a788..7e6fb45 100644
--- a/config/cocostuff164k.yaml
+++ b/configs/cocostuff164k.yaml
@@ -1,10 +1,13 @@
+EXP:
+    ID: cocostuff164k
+    OUTPUT_DIR: data
+
 DATASET:
     NAME: cocostuff164k
     ROOT: /media/kazuto1011/Extra/cocostuff/cocostuff-164k
-    LABELS: ./data/datasets/cocostuff/labels_2.txt
+    LABELS: ./data/datasets/cocostuff/labels.txt
     N_CLASSES: 182
     IGNORE_LABEL: 255
-    WARP_IMAGE: True
     SCALES: [0.5, 0.75, 1.0, 1.25, 1.5]
     SPLIT:
         TRAIN: train2017
@@ -20,22 +23,20 @@ IMAGE:
         G: 116.669
         B: 104.008
     SIZE:
-        TRAIN:
-            BASE: 513
-            CROP: 321
+        BASE: # None
+        TRAIN: 321
         TEST: 513
 
 MODEL:
     NAME: DeepLabV2_ResNet101_MSC
     N_BLOCKS: [3, 4, 23, 3]
     ATROUS_RATES: [6, 12, 18, 24]
-    INIT_MODEL: ./data/models/deeplab_resnet101/coco_init/deeplabv2_resnet101_COCO_init.pth
-    SAVE_DIR: ./data/models/deeplab_resnet101/cocostuff164k
+    INIT_MODEL: data/models/coco/deeplabv1_resnet101/caffemodel/deeplabv1_resnet101-coco.pth
 
 SOLVER:
     BATCH_SIZE:
         TRAIN: 5
-        TEST: 5
+        TEST: 1
     ITER_MAX: 100000
     ITER_SIZE: 2
     ITER_SAVE: 5000
@@ -46,7 +47,6 @@ SOLVER:
     OPTIMIZER: sgd
     POLY_POWER: 0.9
     WEIGHT_DECAY: 5.0e-4
-    LOG_DIR: runs/cocostuff164k
     AVERAGE_LOSS: 20
 
 CRF:
@@ -55,4 +55,4 @@ CRF:
     POS_XY_STD: 1
     BI_W: 4
     BI_XY_STD: 67
-    BI_RGB_STD: 3
\ No newline at end of file
+    BI_RGB_STD: 3
diff --git a/configs/conda_env.yaml b/configs/conda_env.yaml
new file mode 100644
index 0000000..fe23345
--- /dev/null
+++ b/configs/conda_env.yaml
@@ -0,0 +1,26 @@
+name: deeplab-pytorch
+channels:
+  - pytorch
+  - conda-forge
+  - defaults
+dependencies:
+  # - clang # For MacOS
+  # - clangxx # For MacOS
+  - click
+  - cudatoolkit=10.0
+  - cython
+  - matplotlib
+  - pytorch
+  - pyyaml
+  - scipy
+  - torchvision
+  - tqdm
+  - pip:
+    - addict
+    - black
+    - joblib
+    - opencv-python
+    - pydensecrf
+    - tensorboardX
+    - tensorflow
+    - torchnet
diff --git a/config/voc12.yaml b/configs/voc12.yaml
similarity index 67%
rename from config/voc12.yaml
rename to configs/voc12.yaml
index 310cd7f..8dcef1b 100644
--- a/config/voc12.yaml
+++ b/configs/voc12.yaml
@@ -1,15 +1,18 @@
+EXP:
+    ID: voc12
+    OUTPUT_DIR: data
+
 DATASET:
-    NAME: voc12
-    ROOT:
+    NAME: vocaug
+    ROOT: /media/kazuto1011/Extra/VOCdevkit
     LABELS: ./data/datasets/voc12/labels.txt
     N_CLASSES: 21
     IGNORE_LABEL: 255
-    WARP_IMAGE: False
     SCALES: [0.5, 0.75, 1.0, 1.25, 1.5]
     SPLIT:
-        TRAIN:
-        VAL:
-        TEST:
+        TRAIN: train_aug
+        VAL: val
+        TEST: test
 
 DATALOADER:
     NUM_WORKERS: 0
@@ -20,22 +23,20 @@ IMAGE:
         G: 116.669
         B: 104.008
     SIZE:
-        TRAIN:
-            BASE: 513
-            CROP: 321
+        BASE: # None
+        TRAIN: 321
         TEST: 513
 
 MODEL:
     NAME: DeepLabV2_ResNet101_MSC
     N_BLOCKS: [3, 4, 23, 3]
     ATROUS_RATES: [6, 12, 18, 24]
-    INIT_MODEL: ./data/models/deeplab_resnet101/coco_init/deeplabv2_resnet101_COCO_init.pth
-    SAVE_DIR: ./data/models/deeplab_resnet101/voc12
+    INIT_MODEL: data/models/coco/deeplabv1_resnet101/caffemodel/deeplabv1_resnet101-coco.pth
 
 SOLVER:
     BATCH_SIZE:
         TRAIN: 5
-        TEST: 5
+        TEST: 1
     ITER_MAX: 20000
     ITER_SIZE: 2
     ITER_SAVE: 5000
@@ -46,7 +47,6 @@ SOLVER:
     OPTIMIZER: sgd
     POLY_POWER: 0.9
     WEIGHT_DECAY: 5.0e-4
-    LOG_DIR: runs/voc12
     AVERAGE_LOSS: 20
 
 CRF:
@@ -55,4 +55,4 @@ CRF:
     POS_XY_STD: 1
     BI_W: 4
     BI_XY_STD: 67
-    BI_RGB_STD: 3
\ No newline at end of file
+    BI_RGB_STD: 3
diff --git a/convert.py b/convert.py
index 31ae01c..7812fec 100644
--- a/convert.py
+++ b/convert.py
@@ -8,14 +8,16 @@
 from __future__ import absolute_import, division, print_function
 
 import re
-from collections import OrderedDict
+import traceback
+from collections import Counter, OrderedDict
 
 import click
 import numpy as np
 import torch
+from addict import Dict
 
 from libs import caffe_pb2
-from libs.models import DeepLabV2_ResNet101_MSC
+from libs.models import DeepLabV1_ResNet101, DeepLabV2_ResNet101_MSC
 
 
 def parse_caffemodel(model_path):
@@ -25,18 +27,24 @@ def parse_caffemodel(model_path):
 
     # Check trainable layers
     print(
-        *set([(layer.type, len(layer.blobs)) for layer in caffemodel.layer]), sep="\n"
+        *Counter(
+            [(layer.type, len(layer.blobs)) for layer in caffemodel.layer]
+        ).most_common(),
+        sep="\n",
     )
 
     params = OrderedDict()
     previous_layer_type = None
     for layer in caffemodel.layer:
-        print("{} ({}): {}".format(layer.name, layer.type, len(layer.blobs)))
-
         # Skip the shared branch
         if "res075" in layer.name or "res05" in layer.name:
             continue
 
+        print(
+            "\033[34m[Caffe]\033[00m",
+            "{} ({}): {}".format(layer.name, layer.type, len(layer.blobs)),
+        )
+
         # Convolution or Dilated Convolution
         if "Convolution" in layer.type:
             params[layer.name] = {}
@@ -58,6 +66,12 @@ def parse_caffemodel(model_path):
                 params[layer.name]["dilation"] = layer.convolution_param.dilation[0]
             else:
                 params[layer.name]["dilation"] = 1
+        # Fully-connected
+        elif "InnerProduct" in layer.type:
+            params[layer.name] = {}
+            params[layer.name]["weight"] = list(layer.blobs[0].data)
+            if len(layer.blobs) == 2:
+                params[layer.name]["bias"] = list(layer.blobs[1].data)
         # Batch Normalization
         elif "BatchNorm" in layer.type:
             params[layer.name] = {}
@@ -71,12 +85,18 @@ def parse_caffemodel(model_path):
             params[layer.name][
                 "momentum"
             ] = layer.batch_norm_param.moving_average_fraction
+            params[layer.name]["num_batches_tracked"] = np.array(0)
             batch_norm_layer = layer.name
         # Scale
         elif "Scale" in layer.type:
             assert previous_layer_type == "BatchNorm"
             params[batch_norm_layer]["weight"] = list(layer.blobs[0].data)
             params[batch_norm_layer]["bias"] = list(layer.blobs[1].data)
+        elif "Pooling" in layer.type:
+            params[layer.name] = {}
+            params[layer.name]["kernel_size"] = layer.pooling_param.kernel_size
+            params[layer.name]["stride"] = layer.pooling_param.stride
+            params[layer.name]["padding"] = layer.pooling_param.pad
 
         previous_layer_type = layer.type
 
@@ -84,9 +104,9 @@ def parse_caffemodel(model_path):
 
 
 # Hard coded translater
-def translate_layer_name(source):
+def translate_layer_name(source, target="base"):
     def layer_block_branch(source, target):
-        target += ".layer{}".format(source[0][0])
+        target += "layer{}".format(source[0][0])
         if len(source[0][1:]) == 1:
             block = {"a": 1, "b": 2, "c": 3}.get(source[0][1:])
         else:
@@ -104,12 +124,19 @@ def layer_block_branch(source, target):
         return target
 
     source = source.split("_")
-    target = "base"
 
-    if "conv1" in source[0]:
-        target += ".layer1.conv1.conv"
+    if "pool" in source[0]:
+        target += "layer1.pool"
+    elif "fc" in source[0]:
+        if len(source) == 3:
+            stage = source[2]
+            target += "aspp.{}".format(stage)
+        else:
+            target += "fc"
+    elif "conv1" in source[0]:
+        target += "layer1.conv1.conv"
     elif "conv1" in source[1]:
-        target += ".layer1.conv1.bn"
+        target += "layer1.conv1.bn"
     elif "res" in source[0]:
         source[0] = source[0].replace("res", "")
         target = layer_block_branch(source, target)
@@ -118,92 +145,108 @@ def layer_block_branch(source, target):
         source[0] = source[0].replace("bn", "")
         target = layer_block_branch(source, target)
         target += ".bn"
-    elif "fc" in source[0]:
-        # Skip if coco_init
-        if len(source) == 3:
-            stage = source[2]
-            target += ".aspp.{}".format(stage)
 
     return target
 
 
 @click.command()
-@click.option("-d", "--dataset", required=True, type=click.Choice(["voc12", "init"]))
+@click.option(
+    "-d",
+    "--dataset",
+    type=click.Choice(["voc12", "coco"]),
+    required=True,
+    help="Caffemodel",
+)
 def main(dataset):
+    """
+    Convert caffemodels to pytorch models
+    """
+
     WHITELIST = ["kernel_size", "stride", "padding", "dilation", "eps", "momentum"]
-    CONFIG = {
-        "voc12": {
-            "path_caffe_model": "data/models/deeplab_resnet101/voc12/train2_iter_20000.caffemodel",
-            "path_pytorch_model": "data/models/deeplab_resnet101/voc12/deeplabv2_resnet101_VOC2012.pth",
-            "n_classes": 21,
-        },
-        "init": {
-            "path_caffe_model": "data/models/deeplab_resnet101/coco_init/init.caffemodel",
-            "path_pytorch_model": "data/models/deeplab_resnet101/coco_init/deeplabv2_resnet101_COCO_init.pth",
-            "n_classes": 91,
-        },
-    }.get(dataset)
-
-    params = parse_caffemodel(CONFIG["path_caffe_model"])
-
-    model = DeepLabV2_ResNet101_MSC(n_classes=CONFIG["n_classes"])
+    CONFIG = Dict(
+        {
+            "voc12": {
+                # For loading the provided VOC 2012 caffemodel
+                "PATH_CAFFE_MODEL": "data/models/voc12/deeplabv2_resnet101_msc/caffemodel/train2_iter_20000.caffemodel",
+                "PATH_PYTORCH_MODEL": "data/models/voc12/deeplabv2_resnet101_msc/caffemodel/deeplabv2_resnet101_msc-vocaug.pth",
+                "N_CLASSES": 21,
+                "MODEL": "DeepLabV2_ResNet101_MSC",
+                "HEAD": "base.",
+            },
+            "coco": {
+                # For loading the provided initial weights pre-trained on COCO
+                "PATH_CAFFE_MODEL": "data/models/coco/deeplabv1_resnet101/caffemodel/init.caffemodel",
+                "PATH_PYTORCH_MODEL": "data/models/coco/deeplabv1_resnet101/caffemodel/deeplabv1_resnet101-coco.pth",
+                "N_CLASSES": 91,
+                "MODEL": "DeepLabV1_ResNet101",
+                "HEAD": "",
+            },
+        }.get(dataset)
+    )
+
+    params = parse_caffemodel(CONFIG.PATH_CAFFE_MODEL)
+
+    model = eval(CONFIG.MODEL)(n_classes=CONFIG.N_CLASSES)
     model.eval()
-    own_state = model.state_dict()
+    reference_state_dict = model.state_dict()
 
     rel_tol = 1e-7
 
-    state_dict = OrderedDict()
-    for layer_name, layer_dict in params.items():
-        for param_name, values in layer_dict.items():
-            if param_name in WHITELIST and dataset != "coco_init" and dataset != "init":
-                attribute = translate_layer_name(layer_name)
-                attribute = eval("model." + attribute + "." + param_name)
-                if isinstance(attribute, tuple):
-                    assert (
-                        attribute[0] == values
-                    ), "Inconsistent values: {}@{}, {}@{}".format(
-                        attribute[0],
-                        translate_layer_name(layer_name) + "." + param_name,
-                        values,
-                        layer_name,
+    converted_state_dict = OrderedDict()
+    for caffe_layer, caffe_layer_dict in params.items():
+        for param_name, caffe_values in caffe_layer_dict.items():
+            pytorch_layer = translate_layer_name(caffe_layer, CONFIG.HEAD)
+            if pytorch_layer:
+                pytorch_param = pytorch_layer + "." + param_name
+
+                # Parameter check
+                if param_name in WHITELIST:
+                    pytorch_values = eval("model." + pytorch_param)
+                    if isinstance(pytorch_values, tuple):
+                        assert (
+                            pytorch_values[0] == caffe_values
+                        ), "Inconsistent values: {} @{} (Caffe), {} @{} (PyTorch)".format(
+                            caffe_values,
+                            caffe_layer + "/" + param_name,
+                            pytorch_values,
+                            pytorch_param,
+                        )
+                    else:
+                        assert (
+                            abs(pytorch_values - caffe_values) < rel_tol
+                        ), "Inconsistent values: {} @{} (Caffe), {} @{} (PyTorch)".format(
+                            caffe_values,
+                            caffe_layer + "/" + param_name,
+                            pytorch_values,
+                            pytorch_param,
+                        )
+                    print(
+                        "\033[34m[Passed!]\033[00m",
+                        (caffe_layer + "/" + param_name).ljust(35),
+                        "->",
+                        pytorch_param,
+                    )
+                    continue
+
+                # Weight conversion
+                if pytorch_param in reference_state_dict:
+                    caffe_values = torch.tensor(caffe_values)
+                    caffe_values = caffe_values.view_as(
+                        reference_state_dict[pytorch_param]
                     )
-                else:
-                    assert (
-                        abs(attribute - values) < rel_tol
-                    ), "Inconsistent values: {}@{}, {}@{}".format(
-                        attribute,
-                        translate_layer_name(layer_name) + "." + param_name,
-                        values,
-                        layer_name,
+                    converted_state_dict[pytorch_param] = caffe_values
+                    print(
+                        "\033[32m[Copied!]\033[00m",
+                        (caffe_layer + "/" + param_name).ljust(35),
+                        "->",
+                        pytorch_param,
                     )
-                print(
-                    layer_name.ljust(20),
-                    "->",
-                    param_name,
-                    attribute,
-                    values,
-                    ": Checked!",
-                )
-                continue
-            param_name = translate_layer_name(layer_name) + "." + param_name
-            if param_name in own_state:
-                values = torch.FloatTensor(values)
-                values = values.view_as(own_state[param_name])
-                state_dict[param_name] = values
-                print(layer_name.ljust(20), "->", param_name, ": Copied!")
-
-    try:
-        print("\033[32mVerify the converted model\033[00m")
-        model.load_state_dict(state_dict)
-    except:
-        import traceback
-
-        traceback.print_exc()
-        print("\033[32mVerify with ignoring ASPP (strict=False)\033[00m")
-        model.load_state_dict(state_dict, strict=False)
-
-    print("Saving to", CONFIG["path_pytorch_model"])
-    torch.save(state_dict, CONFIG["path_pytorch_model"])
+
+    print("\033[32mVerify the converted model\033[00m")
+    model.load_state_dict(converted_state_dict)
+
+    print('Saving to "{}"'.format(CONFIG.PATH_PYTORCH_MODEL))
+    torch.save(converted_state_dict, CONFIG.PATH_PYTORCH_MODEL)
 
 
 if __name__ == "__main__":
diff --git a/data/models/.gitignore b/data/.gitignore
similarity index 56%
rename from data/models/.gitignore
rename to data/.gitignore
index e679bd5..c73f4b7 100644
--- a/data/models/.gitignore
+++ b/data/.gitignore
@@ -1,4 +1,6 @@
 *.pth
 *.pth.tar
 *.caffemodel
-*.npy
\ No newline at end of file
+*.npy
+*.prototxt
+*.zip
\ No newline at end of file
diff --git a/data/datasets/coco/labels.txt b/data/datasets/coco/labels.txt
new file mode 100644
index 0000000..83604e0
--- /dev/null
+++ b/data/datasets/coco/labels.txt
@@ -0,0 +1,91 @@
+0	background
+1	person
+2	bicycle
+3	car
+4	motorcycle
+5	airplane
+6	bus
+7	train
+8	truck
+9	boat
+10	traffic light
+11	fire hydrant
+12	street sign
+13	stop sign
+14	parking meter
+15	bench
+16	bird
+17	cat
+18	dog
+19	horse
+20	sheep
+21	cow
+22	elephant
+23	bear
+24	zebra
+25	giraffe
+26	hat
+27	backpack
+28	umbrella
+29	shoe
+30	eye glasses
+31	handbag
+32	tie
+33	suitcase
+34	frisbee
+35	skis
+36	snowboard
+37	sports ball
+38	kite
+39	baseball bat
+40	baseball glove
+41	skateboard
+42	surfboard
+43	tennis racket
+44	bottle
+45	plate
+46	wine glass
+47	cup
+48	fork
+49	knife
+50	spoon
+51	bowl
+52	banana
+53	apple
+54	sandwich
+55	orange
+56	broccoli
+57	carrot
+58	hot dog
+59	pizza
+60	donut
+61	cake
+62	chair
+63	couch
+64	potted plant
+65	bed
+66	mirror
+67	dining table
+68	window
+69	desk
+70	toilet
+71	door
+72	tv
+73	laptop
+74	mouse
+75	remote
+76	keyboard
+77	cell phone
+78	microwave
+79	oven
+80	toaster
+81	sink
+82	refrigerator
+83	blender
+84	book
+85	clock
+86	vase
+87	scissors
+88	teddy bear
+89	hair drier
+90	toothbrush
\ No newline at end of file
diff --git a/data/datasets/cocostuff/README.md b/data/datasets/cocostuff/README.md
new file mode 100644
index 0000000..f748afc
--- /dev/null
+++ b/data/datasets/cocostuff/README.md
@@ -0,0 +1,76 @@
+# COCO-Stuff
+
+This is an instruction for setting up COCO-Stuff dataset.
+COCO-Stuff 164k is the latest version and recommended.
+
+![](../../../docs/datasets/cocostuff.png)
+
+## COCO-Stuff 164k
+
+### Setup
+
+1. Run the script below to download the dataset (20GB+).
+
+```sh
+$ bash ./scripts/setup_cocostuff164k.sh [PATH TO DOWNLOAD]
+```
+
+2. Set the path to the dataset in ```configs/cocostuff164k.yaml```.
+
+```yaml
+DATASET: cocostuff164k
+    ROOT: # <- Write here
+...
+```
+
+### Dataset structure
+
+```
+├── images
+│   ├── train2017
+│   │   ├── 000000000009.jpg
+│   │   └── ...
+│   └── val2017
+│       ├── 000000000139.jpg
+│       └── ...
+└── annotations
+    ├── train2017
+    │   ├── 000000000009.png
+    │   └── ...
+    └── val2017
+        ├── 000000000139.png
+        └── ...
+```
+
+## COCO-Stuff 10k
+
+### Setup
+
+1. Run the script below to download the dataset (2GB).
+
+```sh
+$ bash ./scripts/setup_cocostuff10k.sh [PATH TO DOWNLOAD]
+```
+
+2. Set the path to the dataset in ```configs/cocostuff10k.yaml```.
+
+```yaml
+DATASET: cocostuff10k
+    ROOT: # <- Write here
+...
+```
+
+### Dataset structure
+
+```
+├── images
+│   ├── COCO_train2014_000000000077.jpg
+│   └── ...
+├── annotations
+│   ├── COCO_train2014_000000000077.mat
+│   └── ...
+└── imageLists
+    ├── all.txt
+    ├── test.txt
+    └── train.txt
+```
diff --git a/data/datasets/cocostuff/labels.txt b/data/datasets/cocostuff/labels.txt
index c5d66fd..cf9f484 100644
--- a/data/datasets/cocostuff/labels.txt
+++ b/data/datasets/cocostuff/labels.txt
@@ -1,183 +1,182 @@
-0	unlabeled
-1	person
-2	bicycle
-3	car
-4	motorcycle
-5	airplane
-6	bus
-7	train
-8	truck
-9	boat
-10	traffic light
-11	fire hydrant
-12	street sign
-13	stop sign
-14	parking meter
-15	bench
-16	bird
-17	cat
-18	dog
-19	horse
-20	sheep
-21	cow
-22	elephant
-23	bear
-24	zebra
-25	giraffe
-26	hat
-27	backpack
-28	umbrella
-29	shoe
-30	eye glasses
-31	handbag
-32	tie
-33	suitcase
-34	frisbee
-35	skis
-36	snowboard
-37	sports ball
-38	kite
-39	baseball bat
-40	baseball glove
-41	skateboard
-42	surfboard
-43	tennis racket
-44	bottle
-45	plate
-46	wine glass
-47	cup
-48	fork
-49	knife
-50	spoon
-51	bowl
-52	banana
-53	apple
-54	sandwich
-55	orange
-56	broccoli
-57	carrot
-58	hot dog
-59	pizza
-60	donut
-61	cake
-62	chair
-63	couch
-64	potted plant
-65	bed
-66	mirror
-67	dining table
-68	window
-69	desk
-70	toilet
-71	door
-72	tv
-73	laptop
-74	mouse
-75	remote
-76	keyboard
-77	cell phone
-78	microwave
-79	oven
-80	toaster
-81	sink
-82	refrigerator
-83	blender
-84	book
-85	clock
-86	vase
-87	scissors
-88	teddy bear
-89	hair drier
-90	toothbrush
-91	hair brush
-92	banner
-93	blanket
-94	branch
-95	bridge
-96	building-other
-97	bush
-98	cabinet
-99	cage
-100	cardboard
-101	carpet
-102	ceiling-other
-103	ceiling-tile
-104	cloth
-105	clothes
-106	clouds
-107	counter
-108	cupboard
-109	curtain
-110	desk-stuff
-111	dirt
-112	door-stuff
-113	fence
-114	floor-marble
-115	floor-other
-116	floor-stone
-117	floor-tile
-118	floor-wood
-119	flower
-120	fog
-121	food-other
-122	fruit
-123	furniture-other
-124	grass
-125	gravel
-126	ground-other
-127	hill
-128	house
-129	leaves
-130	light
-131	mat
-132	metal
-133	mirror-stuff
-134	moss
-135	mountain
-136	mud
-137	napkin
-138	net
-139	paper
-140	pavement
-141	pillow
-142	plant-other
-143	plastic
-144	platform
-145	playingfield
-146	railing
-147	railroad
-148	river
-149	road
-150	rock
-151	roof
-152	rug
-153	salad
-154	sand
-155	sea
-156	shelf
-157	sky-other
-158	skyscraper
-159	snow
-160	solid-other
-161	stairs
-162	stone
-163	straw
-164	structural-other
-165	table
-166	tent
-167	textile-other
-168	towel
-169	tree
-170	vegetable
-171	wall-brick
-172	wall-concrete
-173	wall-other
-174	wall-panel
-175	wall-stone
-176	wall-tile
-177	wall-wood
-178	water-other
-179	waterdrops
-180	window-blind
-181	window-other
-182	wood
\ No newline at end of file
+0	person
+1	bicycle
+2	car
+3	motorcycle
+4	airplane
+5	bus
+6	train
+7	truck
+8	boat
+9	traffic light
+10	fire hydrant
+11	street sign
+12	stop sign
+13	parking meter
+14	bench
+15	bird
+16	cat
+17	dog
+18	horse
+19	sheep
+20	cow
+21	elephant
+22	bear
+23	zebra
+24	giraffe
+25	hat
+26	backpack
+27	umbrella
+28	shoe
+29	eye glasses
+30	handbag
+31	tie
+32	suitcase
+33	frisbee
+34	skis
+35	snowboard
+36	sports ball
+37	kite
+38	baseball bat
+39	baseball glove
+40	skateboard
+41	surfboard
+42	tennis racket
+43	bottle
+44	plate
+45	wine glass
+46	cup
+47	fork
+48	knife
+49	spoon
+50	bowl
+51	banana
+52	apple
+53	sandwich
+54	orange
+55	broccoli
+56	carrot
+57	hot dog
+58	pizza
+59	donut
+60	cake
+61	chair
+62	couch
+63	potted plant
+64	bed
+65	mirror
+66	dining table
+67	window
+68	desk
+69	toilet
+70	door
+71	tv
+72	laptop
+73	mouse
+74	remote
+75	keyboard
+76	cell phone
+77	microwave
+78	oven
+79	toaster
+80	sink
+81	refrigerator
+82	blender
+83	book
+84	clock
+85	vase
+86	scissors
+87	teddy bear
+88	hair drier
+89	toothbrush
+90	hair brush
+91	banner
+92	blanket
+93	branch
+94	bridge
+95	building-other
+96	bush
+97	cabinet
+98	cage
+99	cardboard
+100	carpet
+101	ceiling-other
+102	ceiling-tile
+103	cloth
+104	clothes
+105	clouds
+106	counter
+107	cupboard
+108	curtain
+109	desk-stuff
+110	dirt
+111	door-stuff
+112	fence
+113	floor-marble
+114	floor-other
+115	floor-stone
+116	floor-tile
+117	floor-wood
+118	flower
+119	fog
+120	food-other
+121	fruit
+122	furniture-other
+123	grass
+124	gravel
+125	ground-other
+126	hill
+127	house
+128	leaves
+129	light
+130	mat
+131	metal
+132	mirror-stuff
+133	moss
+134	mountain
+135	mud
+136	napkin
+137	net
+138	paper
+139	pavement
+140	pillow
+141	plant-other
+142	plastic
+143	platform
+144	playingfield
+145	railing
+146	railroad
+147	river
+148	road
+149	rock
+150	roof
+151	rug
+152	salad
+153	sand
+154	sea
+155	shelf
+156	sky-other
+157	skyscraper
+158	snow
+159	solid-other
+160	stairs
+161	stone
+162	straw
+163	structural-other
+164	table
+165	tent
+166	textile-other
+167	towel
+168	tree
+169	vegetable
+170	wall-brick
+171	wall-concrete
+172	wall-other
+173	wall-panel
+174	wall-stone
+175	wall-tile
+176	wall-wood
+177	water-other
+178	waterdrops
+179	window-blind
+180	window-other
+181	wood
\ No newline at end of file
diff --git a/data/datasets/cocostuff/labels_2.txt b/data/datasets/cocostuff/labels_2.txt
deleted file mode 100644
index cf9f484..0000000
--- a/data/datasets/cocostuff/labels_2.txt
+++ /dev/null
@@ -1,182 +0,0 @@
-0	person
-1	bicycle
-2	car
-3	motorcycle
-4	airplane
-5	bus
-6	train
-7	truck
-8	boat
-9	traffic light
-10	fire hydrant
-11	street sign
-12	stop sign
-13	parking meter
-14	bench
-15	bird
-16	cat
-17	dog
-18	horse
-19	sheep
-20	cow
-21	elephant
-22	bear
-23	zebra
-24	giraffe
-25	hat
-26	backpack
-27	umbrella
-28	shoe
-29	eye glasses
-30	handbag
-31	tie
-32	suitcase
-33	frisbee
-34	skis
-35	snowboard
-36	sports ball
-37	kite
-38	baseball bat
-39	baseball glove
-40	skateboard
-41	surfboard
-42	tennis racket
-43	bottle
-44	plate
-45	wine glass
-46	cup
-47	fork
-48	knife
-49	spoon
-50	bowl
-51	banana
-52	apple
-53	sandwich
-54	orange
-55	broccoli
-56	carrot
-57	hot dog
-58	pizza
-59	donut
-60	cake
-61	chair
-62	couch
-63	potted plant
-64	bed
-65	mirror
-66	dining table
-67	window
-68	desk
-69	toilet
-70	door
-71	tv
-72	laptop
-73	mouse
-74	remote
-75	keyboard
-76	cell phone
-77	microwave
-78	oven
-79	toaster
-80	sink
-81	refrigerator
-82	blender
-83	book
-84	clock
-85	vase
-86	scissors
-87	teddy bear
-88	hair drier
-89	toothbrush
-90	hair brush
-91	banner
-92	blanket
-93	branch
-94	bridge
-95	building-other
-96	bush
-97	cabinet
-98	cage
-99	cardboard
-100	carpet
-101	ceiling-other
-102	ceiling-tile
-103	cloth
-104	clothes
-105	clouds
-106	counter
-107	cupboard
-108	curtain
-109	desk-stuff
-110	dirt
-111	door-stuff
-112	fence
-113	floor-marble
-114	floor-other
-115	floor-stone
-116	floor-tile
-117	floor-wood
-118	flower
-119	fog
-120	food-other
-121	fruit
-122	furniture-other
-123	grass
-124	gravel
-125	ground-other
-126	hill
-127	house
-128	leaves
-129	light
-130	mat
-131	metal
-132	mirror-stuff
-133	moss
-134	mountain
-135	mud
-136	napkin
-137	net
-138	paper
-139	pavement
-140	pillow
-141	plant-other
-142	plastic
-143	platform
-144	playingfield
-145	railing
-146	railroad
-147	river
-148	road
-149	rock
-150	roof
-151	rug
-152	salad
-153	sand
-154	sea
-155	shelf
-156	sky-other
-157	skyscraper
-158	snow
-159	solid-other
-160	stairs
-161	stone
-162	straw
-163	structural-other
-164	table
-165	tent
-166	textile-other
-167	towel
-168	tree
-169	vegetable
-170	wall-brick
-171	wall-concrete
-172	wall-other
-173	wall-panel
-174	wall-stone
-175	wall-tile
-176	wall-wood
-177	water-other
-178	waterdrops
-179	window-blind
-180	window-other
-181	wood
\ No newline at end of file
diff --git a/data/datasets/voc12/README.md b/data/datasets/voc12/README.md
new file mode 100644
index 0000000..ec44ea5
--- /dev/null
+++ b/data/datasets/voc12/README.md
@@ -0,0 +1,60 @@
+# PASCAL VOC 2012
+
+This is an instruction for setting up PASCAL VOC dataset.
+
+![](../../../docs/datasets/voc12.png)
+
+1. Download PASCAL VOC 2012.
+
+```sh
+$ bash scripts/setup_voc12.sh [PATH TO DOWNLOAD]
+```
+
+```
+/VOCdevkit
+└── VOC2012
+    ├── Annotations
+    ├── ImageSets
+    │   └── Segmentation
+    ├── JPEGImages
+    ├── SegmentationObject
+    └── SegmentationClass
+```
+
+2. Add SBD augmentated training data as `SegmentationClassAug`.
+
+
+* Convert by yourself ([here](https://github.com/shelhamer/fcn.berkeleyvision.org/tree/master/data/pascal)).
+* Or download pre-converted files ([here](https://github.com/DrSleep/tensorflow-deeplab-resnet#evaluation)).
+
+3. Download official image sets as `ImageSets/SegmentationAug`.
+
+* From https://ucla.app.box.com/s/rd9z2xvwsfpksi7mi08i2xqrj7ab4keb/file/55053033642
+
+```sh
+/VOCdevkit
+└── VOC2012
+    ├── Annotations
+    ├── ImageSets
+    │   ├── Segmentation
+    │   └── SegmentationAug # ADDED!!
+    │       ├── test.txt
+    │       ├── train_aug.txt
+    │       ├── train.txt
+    │       ├── trainval_aug.txt
+    │       ├── trainval.txt
+    │       └── val.txt
+    ├── JPEGImages
+    ├── SegmentationObject
+    ├── SegmentationClass
+    └── SegmentationClassAug # ADDED!!
+        └── 2007_000032.png
+```
+
+1. Set the path to the dataset in ```configs/voc12.yaml```.
+
+```yaml
+DATASET: voc12
+    ROOT: # <- Write here
+...
+```
diff --git a/data/models/deeplab_resnet101/coco_init/.gitkeep b/data/models/coco/deeplabv1_resnet101/caffemodel/.gitkeep
similarity index 100%
rename from data/models/deeplab_resnet101/coco_init/.gitkeep
rename to data/models/coco/deeplabv1_resnet101/caffemodel/.gitkeep
diff --git a/data/models/deeplab_resnet101/cocostuff164k/.gitkeep b/data/models/deeplab_resnet101/cocostuff164k/.gitkeep
deleted file mode 100644
index e69de29..0000000
diff --git a/data/models/deeplab_resnet101/init/.gitkeep b/data/models/deeplab_resnet101/init/.gitkeep
deleted file mode 100644
index e69de29..0000000
diff --git a/data/models/deeplab_resnet101/voc12/.gitkeep b/data/models/deeplab_resnet101/voc12/.gitkeep
deleted file mode 100644
index e69de29..0000000
diff --git a/data/models/deeplab_resnet101/cocostuff10k/.gitkeep b/data/models/voc12/deeplabv2_resnet101_msc/caffemodel/.gitkeep
similarity index 100%
rename from data/models/deeplab_resnet101/cocostuff10k/.gitkeep
rename to data/models/voc12/deeplabv2_resnet101_msc/caffemodel/.gitkeep
diff --git a/demo.py b/demo.py
index ec4d573..efe8fb6 100644
--- a/demo.py
+++ b/demo.py
@@ -19,7 +19,7 @@
 import yaml
 from addict import Dict
 
-from libs.models import DeepLabV2_ResNet101_MSC
+from libs.models import *
 from libs.utils import DenseCRF
 
 
@@ -43,15 +43,6 @@ def get_classtable(CONFIG):
     return classes
 
 
-def setup_model(model_path, n_classes, device):
-    model = DeepLabV2_ResNet101_MSC(n_classes=n_classes)
-    state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
-    model.load_state_dict(state_dict)
-    model.eval()
-    model.to(device)
-    return model
-
-
 def setup_postprocessor(CONFIG):
     # CRF post-processor
     postprocessor = DenseCRF(
@@ -89,19 +80,18 @@ def preprocessing(image, device, CONFIG):
 
 
 def inference(model, image, raw_image=None, postprocessor=None):
-    B, C, H, W = image.shape
+    _, _, H, W = image.shape
 
     # Image -> Probability map
     logits = model(image)
-    logits = F.interpolate(logits, size=(H, W), mode="bilinear", align_corners=True)
-    probs = F.softmax(logits, dim=1)
-    probs = probs.data.cpu().numpy()[0]
+    logits = F.interpolate(logits, size=(H, W), mode="bilinear", align_corners=False)
+    probs = F.softmax(logits, dim=1)[0]
+    probs = probs.cpu().numpy()
 
     # Refine the prob map with CRF
     if postprocessor and raw_image is not None:
         probs = postprocessor(raw_image, probs)
 
-    # Pixel-wise argmax
     labelmap = np.argmax(probs, axis=0)
 
     return labelmap
@@ -110,26 +100,59 @@ def inference(model, image, raw_image=None, postprocessor=None):
 @click.group()
 @click.pass_context
 def main(ctx):
+    """
+    Demo with a trained model
+    """
+
     print("Mode:", ctx.invoked_subcommand)
 
 
-@main.command(help="Inference from a single image")
-@click.option("-c", "--config", type=str, required=True, help="yaml")
-@click.option("-i", "--image-path", type=str, required=True)
-@click.option("-m", "--model-path", type=str, required=True, help="pth")
-@click.option("--cuda/--no-cuda", default=True, help="Switch GPU/CPU")
-@click.option("--crf", is_flag=True, help="CRF post processing")
-def single(config, image_path, model_path, cuda, crf):
-    # Disable autograd globally
-    torch.set_grad_enabled(False)
+@main.command()
+@click.option(
+    "-c",
+    "--config-path",
+    type=click.File(),
+    required=True,
+    help="Dataset configuration file in YAML",
+)
+@click.option(
+    "-m",
+    "--model-path",
+    type=click.Path(exists=True),
+    required=True,
+    help="PyTorch model to be loaded",
+)
+@click.option(
+    "-i",
+    "--image-path",
+    type=click.Path(exists=True),
+    required=True,
+    help="Image to be processed",
+)
+@click.option(
+    "--cuda/--cpu", default=True, help="Enable CUDA if available [default: --cuda]"
+)
+@click.option("--crf", is_flag=True, show_default=True, help="CRF post-processing")
+def single(config_path, model_path, image_path, cuda, crf):
+    """
+    Inference from a single image
+    """
 
     # Setup
+    CONFIG = Dict(yaml.load(config_path))
     device = get_device(cuda)
-    CONFIG = Dict(yaml.load(open(config)))
+    torch.set_grad_enabled(False)
+
     classes = get_classtable(CONFIG)
-    model = setup_model(model_path, CONFIG.DATASET.N_CLASSES, device)
     postprocessor = setup_postprocessor(CONFIG) if crf else None
 
+    model = eval(CONFIG.MODEL.NAME)(n_classes=CONFIG.DATASET.N_CLASSES)
+    state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
+    model.load_state_dict(state_dict)
+    model.eval()
+    model.to(device)
+    print("Model:", CONFIG.MODEL.NAME)
+
     # Inference
     image = cv2.imread(image_path, cv2.IMREAD_COLOR)
     image, raw_image = preprocessing(image, device, CONFIG)
@@ -158,25 +181,47 @@ def single(config, image_path, model_path, cuda, crf):
     plt.show()
 
 
-@main.command(help="Inference from camera stream")
-@click.option("-c", "--config", type=str, required=True, help="yaml")
-@click.option("-m", "--model-path", type=str, required=True, help="pth")
-@click.option("--cuda/--no-cuda", default=True, help="Switch GPU/CPU")
-@click.option("--crf", is_flag=True, help="CRF post processing")
-@click.option("--camera-id", type=int, default=0)
-def live(config, model_path, cuda, crf, camera_id):
-    # Disable autograd globally
-    torch.set_grad_enabled(False)
-    # Auto-tune cuDNN
-    torch.backends.cudnn.benchmark = True
+@main.command()
+@click.option(
+    "-c",
+    "--config-path",
+    type=click.File(),
+    required=True,
+    help="Dataset configuration file in YAML",
+)
+@click.option(
+    "-m",
+    "--model-path",
+    type=click.Path(exists=True),
+    required=True,
+    help="PyTorch model to be loaded",
+)
+@click.option(
+    "--cuda/--cpu", default=True, help="Enable CUDA if available [default: --cuda]"
+)
+@click.option("--crf", is_flag=True, show_default=True, help="CRF post-processing")
+@click.option("--camera-id", type=int, default=0, show_default=True, help="Device ID")
+def live(config_path, model_path, cuda, crf, camera_id):
+    """
+    Inference from camera stream
+    """
 
     # Setup
+    CONFIG = Dict(yaml.load(config_path))
     device = get_device(cuda)
-    CONFIG = Dict(yaml.load(open(config)))
+    torch.set_grad_enabled(False)
+    torch.backends.cudnn.benchmark = True
+
     classes = get_classtable(CONFIG)
-    model = setup_model(model_path, CONFIG.DATASET.N_CLASSES, device)
     postprocessor = setup_postprocessor(CONFIG) if crf else None
 
+    model = eval(CONFIG.MODEL.NAME)(n_classes=CONFIG.DATASET.N_CLASSES)
+    state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
+    model.load_state_dict(state_dict)
+    model.eval()
+    model.to(device)
+    print("Model:", CONFIG.MODEL.NAME)
+
     # UVC camera stream
     cap = cv2.VideoCapture(camera_id)
     cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc(*"YUYV"))
@@ -197,7 +242,7 @@ def mouse_event(event, x, y, flags, labelmap):
     cv2.namedWindow(window_name, cv2.WINDOW_AUTOSIZE)
 
     while True:
-        ret, frame = cap.read()
+        _, frame = cap.read()
         image, raw_image = preprocessing(frame, device, CONFIG)
         labelmap = inference(model, image, raw_image, postprocessor)
         colormap = colorize(labelmap)
diff --git a/docs/datasets/cocostuff.png b/docs/datasets/cocostuff.png
new file mode 100644
index 0000000..74c8a04
Binary files /dev/null and b/docs/datasets/cocostuff.png differ
diff --git a/docs/data.png b/docs/datasets/voc12.png
similarity index 66%
rename from docs/data.png
rename to docs/datasets/voc12.png
index 8bc51c8..290b4a6 100644
Binary files a/docs/data.png and b/docs/datasets/voc12.png differ
diff --git a/docs/examples/coco_init.png b/docs/examples/coco_init.png
new file mode 100644
index 0000000..e01e3ad
Binary files /dev/null and b/docs/examples/coco_init.png differ
diff --git a/docs/examples/cocostuff10k.png b/docs/examples/cocostuff10k.png
new file mode 100644
index 0000000..ca4d71c
Binary files /dev/null and b/docs/examples/cocostuff10k.png differ
diff --git a/docs/examples/cocostuff164k.png b/docs/examples/cocostuff164k.png
new file mode 100644
index 0000000..d5f1c27
Binary files /dev/null and b/docs/examples/cocostuff164k.png differ
diff --git a/docs/examples/voc12.png b/docs/examples/voc12.png
new file mode 100644
index 0000000..8f90aed
Binary files /dev/null and b/docs/examples/voc12.png differ
diff --git a/hubconf.py b/hubconf.py
index e38bff0..94f496f 100644
--- a/hubconf.py
+++ b/hubconf.py
@@ -5,6 +5,8 @@
 # URL:    https://kazuto1011.github.io
 # Date:   20 December 2018
 
+from __future__ import print_function
+
 
 def deeplabv2_resnet101(**kwargs):
     """
@@ -12,6 +14,12 @@ def deeplabv2_resnet101(**kwargs):
     n_classes (int): the number of classes
     """
 
+    if kwargs["pretrained"]:
+        raise NotImplementedError(
+            "Please download from "
+            "https://github.com/kazuto1011/deeplab-pytorch/tree/master#pretrained-models"
+        )
+
     from libs.models.deeplabv2 import DeepLabV2
     from libs.models.msc import MSC
 
diff --git a/libs/datasets/__init__.py b/libs/datasets/__init__.py
index 7aae0d2..314a751 100644
--- a/libs/datasets/__init__.py
+++ b/libs/datasets/__init__.py
@@ -1,6 +1,11 @@
-from __future__ import absolute_import
+from .voc import VOC, VOCAug
 from .cocostuff import CocoStuff10k, CocoStuff164k
 
 
 def get_dataset(name):
-    return {"cocostuff10k": CocoStuff10k, "cocostuff164k": CocoStuff164k}[name]
+    return {
+        "cocostuff10k": CocoStuff10k,
+        "cocostuff164k": CocoStuff164k,
+        "voc": VOC,
+        "vocaug": VOCAug,
+    }[name]
diff --git a/libs/datasets/base.py b/libs/datasets/base.py
new file mode 100644
index 0000000..efb6a8e
--- /dev/null
+++ b/libs/datasets/base.py
@@ -0,0 +1,123 @@
+#!/usr/bin/env python
+# coding: utf-8
+#
+# Author:   Kazuto Nakashima
+# URL:      http://kazuto1011.github.io
+# Created:  2017-10-30
+
+import random
+
+import cv2
+import numpy as np
+import torch
+from PIL import Image
+from torch.utils import data
+
+
+class _BaseDataset(data.Dataset):
+    """
+    Base dataset class
+    """
+
+    def __init__(
+        self,
+        root,
+        split,
+        ignore_label,
+        mean_bgr,
+        augment=True,
+        base_size=None,
+        crop_size=321,
+        scales=(1.0),
+        flip=True,
+    ):
+        self.root = root
+        self.split = split
+        self.ignore_label = ignore_label
+        self.mean_bgr = np.array(mean_bgr)
+        self.augment = augment
+        self.base_size = base_size
+        self.crop_size = crop_size
+        self.scales = scales
+        self.flip = flip
+        self.files = []
+        self._set_files()
+
+        cv2.setNumThreads(0)
+
+    def _set_files(self):
+        """
+        Create a file path/image id list.
+        """
+        raise NotImplementedError()
+
+    def _load_data(self, image_id):
+        """
+        Load the image and label in numpy.ndarray
+        """
+        raise NotImplementedError()
+
+    def _augmentation(self, image, label):
+        # Scaling
+        h, w = label.shape
+        if self.base_size:
+            if h > w:
+                h, w = (self.base_size, int(self.base_size * w / h))
+            else:
+                h, w = (int(self.base_size * h / w), self.base_size)
+        scale_factor = random.choice(self.scales)
+        h, w = (int(h * scale_factor), int(w * scale_factor))
+        image = cv2.resize(image, (w, h), interpolation=cv2.INTER_LINEAR)
+        label = Image.fromarray(label).resize((w, h), resample=Image.NEAREST)
+        label = np.asarray(label, dtype=np.int64)
+
+        # Padding to fit for crop_size
+        h, w = label.shape
+        pad_h = max(self.crop_size - h, 0)
+        pad_w = max(self.crop_size - w, 0)
+        pad_kwargs = {
+            "top": 0,
+            "bottom": pad_h,
+            "left": 0,
+            "right": pad_w,
+            "borderType": cv2.BORDER_CONSTANT,
+        }
+        if pad_h > 0 or pad_w > 0:
+            image = cv2.copyMakeBorder(image, value=self.mean_bgr, **pad_kwargs)
+            label = cv2.copyMakeBorder(label, value=self.ignore_label, **pad_kwargs)
+
+        # Cropping
+        h, w = label.shape
+        start_h = random.randint(0, h - self.crop_size)
+        start_w = random.randint(0, w - self.crop_size)
+        end_h = start_h + self.crop_size
+        end_w = start_w + self.crop_size
+        image = image[start_h:end_h, start_w:end_w]
+        label = label[start_h:end_h, start_w:end_w]
+
+        if self.flip:
+            # Random flipping
+            if random.random() < 0.5:
+                image = np.fliplr(image).copy()  # HWC
+                label = np.fliplr(label).copy()  # HW
+        return image, label
+
+    def __getitem__(self, index):
+        image_id, image, label = self._load_data(index)
+        if self.augment:
+            image, label = self._augmentation(image, label)
+        # Mean subtraction
+        image -= self.mean_bgr
+        # HWC -> CHW
+        image = image.transpose(2, 0, 1)
+        return image_id, image.astype(np.float32), label.astype(np.int64)
+
+    def __len__(self):
+        return len(self.files)
+
+    def __repr__(self):
+        fmt_str = "Dataset: " + self.__class__.__name__ + "\n"
+        fmt_str += "    # data: {}\n".format(self.__len__())
+        fmt_str += "    Split: {}\n".format(self.split)
+        fmt_str += "    Root: {}".format(self.root)
+        return fmt_str
diff --git a/libs/datasets/cocostuff.py b/libs/datasets/cocostuff.py
index a7dd313..d132d48 100644
--- a/libs/datasets/cocostuff.py
+++ b/libs/datasets/cocostuff.py
@@ -5,151 +5,26 @@
 # URL:      http://kazuto1011.github.io
 # Created:  2017-10-30
 
-from __future__ import print_function
+from __future__ import absolute_import, print_function
 
-import glob
 import os.path as osp
-import random
-from collections import defaultdict
 from glob import glob
 
 import cv2
-import h5py
 import numpy as np
 import scipy.io as sio
 import torch
+from PIL import Image
 from torch.utils import data
-from tqdm import tqdm
 
+from .base import _BaseDataset
 
-class _CocoStuff(data.Dataset):
-    """COCO-Stuff base class"""
 
-    def __init__(
-        self,
-        root,
-        split="train",
-        base_size=513,
-        crop_size=321,
-        mean=(104.008, 116.669, 122.675),
-        scale=(0.5, 0.75, 1.0, 1.25, 1.5),
-        warp=True,
-        flip=True,
-        preload=False,
-    ):
-        self.root = root
-        self.split = split
-        self.base_size = base_size
-        self.crop_size = crop_size
-        self.mean = np.array(mean)
-        self.scale = scale
-        self.warp = warp
-        self.flip = flip
-        self.preload = preload
-
-        self.files = []
-        self.images = []
-        self.labels = []
-        self.ignore_label = None
-
-        self._set_files()
-
-        if self.preload:
-            self._preload_data()
-
-        cv2.setNumThreads(0)
-
-    def _set_files(self):
-        raise NotImplementedError()
-
-    def _transform(self, image, label):
-        # Mean subtraction
-        image -= self.mean
-        # Pre-scaling
-        if self.warp:
-            base_size = (self.base_size,) * 2
-        else:
-            raw_h, raw_w = label.shape
-            if raw_h > raw_w:
-                base_size = (int(self.base_size * raw_w / raw_h), self.base_size)
-            else:
-                base_size = (self.base_size, int(self.base_size * raw_h / raw_w))
-        image = cv2.resize(image, base_size, interpolation=cv2.INTER_LINEAR)
-        label = cv2.resize(label, base_size, interpolation=cv2.INTER_NEAREST)
-        if self.scale is not None:
-            # Scaling
-            scale_factor = random.choice(self.scale)
-            scale_kwargs = {"dsize": None, "fx": scale_factor, "fy": scale_factor}
-            image = cv2.resize(image, interpolation=cv2.INTER_LINEAR, **scale_kwargs)
-            label = cv2.resize(label, interpolation=cv2.INTER_NEAREST, **scale_kwargs)
-            scale_h, scale_w = label.shape
-            # Padding
-            pad_h = max(max(base_size[1], self.crop_size) - scale_h, 0)
-            pad_w = max(max(base_size[0], self.crop_size) - scale_w, 0)
-            pad_kwargs = {
-                "top": 0,
-                "bottom": pad_h,
-                "left": 0,
-                "right": pad_w,
-                "borderType": cv2.BORDER_CONSTANT,
-            }
-            if pad_h > 0 or pad_w > 0:
-                image = cv2.copyMakeBorder(image, value=(0.0, 0.0, 0.0), **pad_kwargs)
-                label = cv2.copyMakeBorder(label, value=self.ignore_label, **pad_kwargs)
-            # Random cropping
-            base_h, base_w = label.shape
-            start_h = random.randint(0, base_h - self.crop_size)
-            start_w = random.randint(0, base_w - self.crop_size)
-            end_h = start_h + self.crop_size
-            end_w = start_w + self.crop_size
-            image = image[start_h:end_h, start_w:end_w]
-            label = label[start_h:end_h, start_w:end_w]
-        if self.flip:
-            # Random flipping
-            if random.random() < 0.5:
-                image = np.fliplr(image).copy()  # HWC
-                label = np.fliplr(label).copy()  # HW
-        # HWC -> CHW
-        image = image.transpose(2, 0, 1)
-        return image, label
-
-    def _load_data(self, image_id):
-        raise NotImplementedError()
-
-    def _preload_data(self):
-        for image_id in tqdm(
-            self.files, desc="Preloading...", leave=False, dynamic_ncols=True
-        ):
-            image, label = self._load_data(image_id)
-            self.images.append(image)
-            self.labels.append(label)
-
-    def __getitem__(self, index):
-        if self.preload:
-            image, label = self.images[index], self.labels[index]
-        else:
-            image_id = self.files[index]
-            image, label = self._load_data(image_id)
-        image, label = self._transform(image, label)
-        return image.astype(np.float32), label.astype(np.int64)
-
-    def __len__(self):
-        return len(self.files)
-
-    def __repr__(self):
-        fmt_str = "Dataset " + self.__class__.__name__ + "\n"
-        fmt_str += "    Number of datapoints: {}\n".format(self.__len__())
-        fmt_str += "    Split: {}\n".format(self.split)
-        fmt_str += "    Root Location: {}\n".format(self.root)
-        return fmt_str
-
-
-class CocoStuff10k(_CocoStuff):
+class CocoStuff10k(_BaseDataset):
     """COCO-Stuff 10k dataset"""
 
-    def __init__(self, version="1.1", **kwargs):
-        self.version = version
-        self.ignore_label = -1
+    def __init__(self, warp_image=True, **kwargs):
+        self.warp_image = warp_image
         super(CocoStuff10k, self).__init__(**kwargs)
 
     def _set_files(self):
@@ -162,32 +37,28 @@ def _set_files(self):
         else:
             raise ValueError("Invalid split name: {}".format(self.split))
 
-    def _load_data(self, image_id):
+    def _load_data(self, index):
         # Set paths
+        image_id = self.files[index]
         image_path = osp.join(self.root, "images", image_id + ".jpg")
         label_path = osp.join(self.root, "annotations", image_id + ".mat")
-        # Load an image
+        # Load an image and label
         image = cv2.imread(image_path, cv2.IMREAD_COLOR).astype(np.float32)
-        # Load a label map
-        if self.version == "1.1":
-            label = sio.loadmat(label_path)["S"].astype(np.int64)
-            label -= 1  # unlabeled (0 -> -1)
-        elif self.version == "1.0":
-            label = np.array(h5py.File(label_path, "r")["S"], dtype=np.int64)
-            label = label.transpose(1, 0)
-            label -= 2  # unlabeled (1 -> -1)
-        else:
-            raise NotImplementedError(
-                "1.0 or 1.1 expected, but got: {}".format(self.version)
-            )
-        return image, label
-
-
-class CocoStuff164k(_CocoStuff):
+        label = sio.loadmat(label_path)["S"]
+        label -= 1  # unlabeled (0 -> -1)
+        label[label == -1] = 255
+        # Warping: this is just for reproducing the official scores on GitHub
+        if self.warp_image:
+            image = cv2.resize(image, (513, 513), interpolation=cv2.INTER_LINEAR)
+            label = Image.fromarray(label).resize((513, 513), resample=Image.NEAREST)
+            label = np.asarray(label)
+        return image_id, image, label
+
+
+class CocoStuff164k(_BaseDataset):
     """COCO-Stuff 164k dataset"""
 
     def __init__(self, **kwargs):
-        self.ignore_label = 255
         super(CocoStuff164k, self).__init__(**kwargs)
 
     def _set_files(self):
@@ -202,14 +73,15 @@ def _set_files(self):
         else:
             raise ValueError("Invalid split name: {}".format(self.split))
 
-    def _load_data(self, image_id):
+    def _load_data(self, index):
         # Set paths
+        image_id = self.files[index]
         image_path = osp.join(self.root, "images", self.split, image_id + ".jpg")
         label_path = osp.join(self.root, "annotations", self.split, image_id + ".png")
-        # Load an image
+        # Load an image and label
         image = cv2.imread(image_path, cv2.IMREAD_COLOR).astype(np.float32)
-        label = cv2.imread(label_path, cv2.IMREAD_GRAYSCALE).astype(np.int64)
-        return image, label
+        label = cv2.imread(label_path, cv2.IMREAD_GRAYSCALE)
+        return image_id, image, label
 
 
 def get_parent_class(value, dictionary):
@@ -231,30 +103,32 @@ def get_parent_class(value, dictionary):
     import matplotlib.pyplot as plt
     import matplotlib.cm as cm
     import torchvision
-    from torchvision.utils import make_grid
     import yaml
+    from torchvision.utils import make_grid
+    from tqdm import tqdm
 
     kwargs = {"nrow": 10, "padding": 50}
     batch_size = 100
 
-    dataset_root = "/media/kazuto1011/Extra/cocostuff/cocostuff-164k"
-    dataset = CocoStuff164k(root=dataset_root, split="train2017")
-
+    dataset = CocoStuff164k(
+        root="/media/kazuto1011/Extra/cocostuff/cocostuff-164k",
+        split="train2017",
+        ignore_label=255,
+        mean_bgr=(104.008, 116.669, 122.675),
+        augment=True,
+        crop_size=321,
+        scales=(0.5, 0.75, 1.0, 1.25, 1.5),
+        flip=True,
+    )
     print(dataset)
 
     loader = data.DataLoader(dataset, batch_size=batch_size)
 
-    for i, (images, labels) in tqdm(
+    for i, (image_ids, images, labels) in tqdm(
         enumerate(loader), total=np.ceil(len(dataset) / batch_size), leave=False
     ):
-
         if i == 0:
-            mean = (
-                torch.tensor((104.008, 116.669, 122.675))
-                .unsqueeze(0)
-                .unsqueeze(2)
-                .unsqueeze(3)
-            )
+            mean = torch.tensor((104.008, 116.669, 122.675))[None, :, None, None]
             images += mean.expand_as(images)
             image = make_grid(images, pad_value=-1, **kwargs).numpy()
             image = np.transpose(image, (1, 2, 0))
@@ -271,7 +145,7 @@ def get_parent_class(value, dictionary):
             label = label.astype(np.uint8)
 
             tiled_images = np.hstack((image, label))
-            # cv2.imwrite("./docs/data.png", tiled_images)
+            # cv2.imwrite("./docs/datasets/cocostuff.png", tiled_images)
             plt.imshow(np.dstack((tiled_images[..., 2::-1], tiled_images[..., 3])))
             plt.show()
             break
diff --git a/libs/datasets/voc.py b/libs/datasets/voc.py
new file mode 100644
index 0000000..80b03f4
--- /dev/null
+++ b/libs/datasets/voc.py
@@ -0,0 +1,140 @@
+#!/usr/bin/env python
+# coding: utf-8
+#
+# Author: Kazuto Nakashima
+# URL:    https://kazuto1011.github.io
+# Date:   08 February 2019
+
+from __future__ import absolute_import, print_function
+
+import os.path as osp
+
+import cv2
+import numpy as np
+import torch
+from PIL import Image
+from torch.utils import data
+
+from .base import _BaseDataset
+
+
+class VOC(_BaseDataset):
+    """
+    PASCAL VOC Segmentation dataset
+    """
+
+    def __init__(self, year=2012, **kwargs):
+        self.year = year
+        super(VOC, self).__init__(**kwargs)
+
+    def _set_files(self):
+        self.root = osp.join(self.root, "VOC{}".format(self.year))
+        self.image_dir = osp.join(self.root, "JPEGImages")
+        self.label_dir = osp.join(self.root, "SegmentationClass")
+
+        if self.split in ["train", "trainval", "val", "test"]:
+            file_list = osp.join(
+                self.root, "ImageSets/Segmentation", self.split + ".txt"
+            )
+            file_list = tuple(open(file_list, "r"))
+            file_list = [id_.rstrip() for id_ in file_list]
+            self.files = file_list
+        else:
+            raise ValueError("Invalid split name: {}".format(self.split))
+
+    def _load_data(self, image_id):
+        # Set paths
+        image_path = osp.join(self.root, self.image_dir, image_id + ".jpg")
+        label_path = osp.join(self.root, self.label_dir, image_id + ".png")
+        # Load an image
+        image = cv2.imread(image_path, cv2.IMREAD_COLOR).astype(np.float32)
+        label = np.asarray(Image.open(label_path), dtype=np.int32)
+        return image_id, image, label
+
+
+class VOCAug(_BaseDataset):
+    """
+    PASCAL VOC Segmentation dataset with extra annotations
+    """
+
+    def __init__(self, year=2012, **kwargs):
+        self.year = year
+        super(VOCAug, self).__init__(**kwargs)
+
+    def _set_files(self):
+        self.root = osp.join(self.root, "VOC{}".format(self.year))
+
+        if self.split in ["train", "train_aug", "trainval", "trainval_aug", "val"]:
+            file_list = osp.join(
+                self.root, "ImageSets/SegmentationAug", self.split + ".txt"
+            )
+            file_list = tuple(open(file_list, "r"))
+            file_list = [id_.rstrip().split(" ") for id_ in file_list]
+            self.files, self.labels = list(zip(*file_list))
+        else:
+            raise ValueError("Invalid split name: {}".format(self.split))
+
+    def _load_data(self, index):
+        # Set paths
+        image_id = self.files[index].split("/")[-1].split(".")[0]
+        image_path = osp.join(self.root, self.files[index][1:])
+        label_path = osp.join(self.root, self.labels[index][1:])
+        # Load an image
+        image = cv2.imread(image_path, cv2.IMREAD_COLOR).astype(np.float32)
+        label = np.asarray(Image.open(label_path), dtype=np.int32)
+        return image_id, image, label
+
+
+if __name__ == "__main__":
+    import matplotlib
+    import matplotlib.pyplot as plt
+    import matplotlib.cm as cm
+    import torchvision
+    import yaml
+    from torchvision.utils import make_grid
+    from tqdm import tqdm
+
+    kwargs = {"nrow": 10, "padding": 50}
+    batch_size = 100
+
+    dataset = VOCAug(
+        root="/media/kazuto1011/Extra/VOCdevkit",
+        split="train_aug",
+        ignore_label=255,
+        mean_bgr=(104.008, 116.669, 122.675),
+        year=2012,
+        augment=True,
+        base_size=None,
+        crop_size=513,
+        scales=(0.5, 0.75, 1.0, 1.25, 1.5),
+        flip=True,
+    )
+    print(dataset)
+
+    loader = data.DataLoader(dataset, batch_size=batch_size)
+
+    for i, (image_ids, images, labels) in tqdm(
+        enumerate(loader), total=np.ceil(len(dataset) / batch_size), leave=False
+    ):
+        if i == 0:
+            mean = torch.tensor((104.008, 116.669, 122.675))[None, :, None, None]
+            images += mean.expand_as(images)
+            image = make_grid(images, pad_value=-1, **kwargs).numpy()
+            image = np.transpose(image, (1, 2, 0))
+            mask = np.zeros(image.shape[:2])
+            mask[(image != -1)[..., 0]] = 255
+            image = np.dstack((image, mask)).astype(np.uint8)
+
+            labels = labels[:, np.newaxis, ...]
+            label = make_grid(labels, pad_value=255, **kwargs).numpy()
+            label_ = np.transpose(label, (1, 2, 0))[..., 0].astype(np.float32)
+            label = cm.jet_r(label_ / 21.0) * 255
+            mask = np.zeros(label.shape[:2])
+            label[..., 3][(label_ == 255)] = 0
+            label = label.astype(np.uint8)
+
+            tiled_images = np.hstack((image, label))
+            # cv2.imwrite("./docs/datasets/voc12.png", tiled_images)
+            plt.imshow(np.dstack((tiled_images[..., 2::-1], tiled_images[..., 3])))
+            plt.show()
+            break
diff --git a/libs/models/__init__.py b/libs/models/__init__.py
index 62a0640..1e67048 100644
--- a/libs/models/__init__.py
+++ b/libs/models/__init__.py
@@ -1,25 +1,33 @@
 from __future__ import absolute_import
 from .resnet import *
+from .deeplabv1 import *
 from .deeplabv2 import *
 from .deeplabv3 import *
 from .deeplabv3plus import *
 from .msc import *
 
 
-def init_weights(model):
-    for m in model.modules():
-        if isinstance(m, nn.Conv2d):
-            nn.init.kaiming_normal_(m.weight)
-            if m.bias is not None:
-                nn.init.constant_(m.bias, 0)
-        elif isinstance(m, nn.Linear):
-            nn.init.kaiming_normal_(m.weight)
-            if m.bias is not None:
-                nn.init.constant_(m.bias, 0)
-        elif isinstance(m, nn.BatchNorm2d):
-            nn.init.constant_(m.weight, 1)
-            if m.bias is not None:
-                nn.init.constant_(m.bias, 0)
+def init_weights(module):
+    if isinstance(module, nn.Conv2d):
+        nn.init.kaiming_normal_(module.weight)
+        if module.bias is not None:
+            nn.init.constant_(module.bias, 0)
+    elif isinstance(module, nn.Linear):
+        nn.init.kaiming_normal_(module.weight)
+        if module.bias is not None:
+            nn.init.constant_(module.bias, 0)
+    elif isinstance(module, nn.BatchNorm2d):
+        nn.init.constant_(module.weight, 1)
+        if module.bias is not None:
+            nn.init.constant_(module.bias, 0)
+
+
+def ResNet101(n_classes):
+    return ResNet(n_classes=n_classes, n_blocks=[3, 4, 23, 3])
+
+
+def DeepLabV1_ResNet101(n_classes):
+    return DeepLabV1(n_classes=n_classes, n_blocks=[3, 4, 23, 3])
 
 
 def DeepLabV2_ResNet101_MSC(n_classes):
@@ -40,7 +48,7 @@ def DeepLabV2S_ResNet101_MSC(n_classes):
     )
 
 
-def DeepLabV3_ResNet101_MSC(n_classes, output_stride):
+def DeepLabV3_ResNet101_MSC(n_classes, output_stride=16):
     if output_stride == 16:
         atrous_rates = [6, 12, 18]
     elif output_stride == 8:
@@ -48,19 +56,22 @@ def DeepLabV3_ResNet101_MSC(n_classes, output_stride):
     else:
         NotImplementedError
 
-    return MSC(
-        base=DeepLabV3(
-            n_classes=n_classes,
-            n_blocks=[3, 4, 23, 3],
-            atrous_rates=atrous_rates,
-            multi_grids=[1, 2, 4],
-            output_stride=output_stride,
-        ),
-        scales=[0.5, 0.75],
+    base = DeepLabV3(
+        n_classes=n_classes,
+        n_blocks=[3, 4, 23, 3],
+        atrous_rates=atrous_rates,
+        multi_grids=[1, 2, 4],
+        output_stride=output_stride,
     )
 
+    for name, module in base.named_modules():
+        if ".bn" in name:
+            module.momentum = 0.9997
 
-def DeepLabV3Plus_ResNet101_MSC(n_classes, output_stride):
+    return MSC(base=base, scales=[0.5, 0.75])
+
+
+def DeepLabV3Plus_ResNet101_MSC(n_classes, output_stride=16):
     if output_stride == 16:
         atrous_rates = [6, 12, 18]
     elif output_stride == 8:
@@ -68,13 +79,16 @@ def DeepLabV3Plus_ResNet101_MSC(n_classes, output_stride):
     else:
         NotImplementedError
 
-    return MSC(
-        base=DeepLabV3Plus(
-            n_classes=n_classes,
-            n_blocks=[3, 4, 23, 3],
-            atrous_rates=atrous_rates,
-            multi_grids=[1, 2, 4],
-            output_stride=output_stride,
-        ),
-        scales=[0.5, 0.75],
+    base = DeepLabV3Plus(
+        n_classes=n_classes,
+        n_blocks=[3, 4, 23, 3],
+        atrous_rates=atrous_rates,
+        multi_grids=[1, 2, 4],
+        output_stride=output_stride,
     )
+
+    for name, module in base.named_modules():
+        if ".bn" in name:
+            module.momentum = 0.9997
+
+    return MSC(base=base, scales=[0.5, 0.75])
diff --git a/libs/models/deeplabv1.py b/libs/models/deeplabv1.py
new file mode 100644
index 0000000..ac5e673
--- /dev/null
+++ b/libs/models/deeplabv1.py
@@ -0,0 +1,40 @@
+#!/usr/bin/env python
+# coding: utf-8
+#
+# Author: Kazuto Nakashima
+# URL:    https://kazuto1011.github.io
+# Date:   19 February 2019
+
+from __future__ import absolute_import, print_function
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from .resnet import _ResLayer, _Stem
+
+
+class DeepLabV1(nn.Sequential):
+    """
+    DeepLab v1: Dilated ResNet + 1x1 Conv
+    Note that this is just a container for loading the pretrained COCO model and not mentioned as "v1" in papers.
+    """
+
+    def __init__(self, n_classes, n_blocks):
+        super(DeepLabV1, self).__init__()
+        self.add_module("layer1", _Stem())
+        self.add_module("layer2", _ResLayer(n_blocks[0], 64, 64, 256, 1, 1))
+        self.add_module("layer3", _ResLayer(n_blocks[1], 256, 128, 512, 2, 1))
+        self.add_module("layer4", _ResLayer(n_blocks[2], 512, 256, 1024, 1, 2))
+        self.add_module("layer5", _ResLayer(n_blocks[3], 1024, 512, 2048, 1, 4))
+        self.add_module("fc", nn.Conv2d(2048, n_classes, 1))
+
+
+if __name__ == "__main__":
+    model = DeepLabV1(n_classes=21, n_blocks=[3, 4, 23, 3])
+    model.eval()
+    image = torch.randn(1, 3, 513, 513)
+
+    print(model)
+    print("input:", image.shape)
+    print("output:", model(image).shape)
diff --git a/libs/models/deeplabv2.py b/libs/models/deeplabv2.py
index bc8fc82..c2e3729 100644
--- a/libs/models/deeplabv2.py
+++ b/libs/models/deeplabv2.py
@@ -5,7 +5,7 @@
 # URL:      http://kazuto1011.github.io
 # Created:  2017-11-19
 
-from collections import OrderedDict
+from __future__ import absolute_import, print_function
 
 import torch
 import torch.nn as nn
@@ -15,11 +15,13 @@
 
 
 class _ASPP(nn.Module):
-    """Atrous Spatial Pyramid Pooling"""
+    """
+    Atrous spatial pyramid pooling (ASPP)
+    """
 
     def __init__(self, in_ch, out_ch, rates):
         super(_ASPP, self).__init__()
-        for i, rate in enumerate(zip(rates)):
+        for i, rate in enumerate(rates):
             self.add_module(
                 "c{}".format(i),
                 nn.Conv2d(in_ch, out_ch, 3, 1, padding=rate, dilation=rate, bias=True),
@@ -34,7 +36,10 @@ def forward(self, x):
 
 
 class DeepLabV2(nn.Sequential):
-    """DeepLab v2 (OS=8)"""
+    """
+    DeepLab v2: Dilated ResNet + ASPP
+    Output stride is fixed at 8
+    """
 
     def __init__(self, n_classes, n_blocks, atrous_rates):
         super(DeepLabV2, self).__init__()
diff --git a/libs/models/deeplabv3.py b/libs/models/deeplabv3.py
index 3c2575d..2d1f821 100644
--- a/libs/models/deeplabv3.py
+++ b/libs/models/deeplabv3.py
@@ -5,6 +5,8 @@
 # URL:      http://kazuto1011.github.io
 # Created:  2018-03-26
 
+from __future__ import absolute_import, print_function
+
 from collections import OrderedDict
 
 import torch
@@ -15,13 +17,15 @@
 
 
 class _ASPP(nn.Module):
-    """Atrous Spatial Pyramid Pooling with image pool"""
+    """
+    Atrous spatial pyramid pooling with image-level feature
+    """
 
     def __init__(self, n_in, n_out, rates):
         super(_ASPP, self).__init__()
         self.stages = nn.Module()
         self.stages.add_module("c0", _ConvBnReLU(n_in, n_out, 1, 1, 0, 1))
-        for i, rate in enumerate(zip(rates)):
+        for i, rate in enumerate(rates):
             self.stages.add_module(
                 "c{}".format(i + 1),
                 _ConvBnReLU(n_in, n_out, 3, 1, padding=rate, dilation=rate),
@@ -37,7 +41,7 @@ def __init__(self, n_in, n_out, rates):
 
     def forward(self, x):
         h = self.imagepool(x)
-        h = [F.interpolate(h, size=x.shape[2:], mode="bilinear", align_corners=True)]
+        h = [F.interpolate(h, size=x.shape[2:], mode="bilinear", align_corners=False)]
         for stage in self.stages.children():
             h += [stage(x)]
         h = torch.cat(h, dim=1)
@@ -45,7 +49,9 @@ def forward(self, x):
 
 
 class DeepLabV3(nn.Sequential):
-    """DeepLab v3"""
+    """
+    DeepLab v3: Dilated ResNet with multi-grid + improved ASPP
+    """
 
     def __init__(self, n_classes, n_blocks, atrous_rates, multi_grids, output_stride):
         super(DeepLabV3, self).__init__()
diff --git a/libs/models/deeplabv3plus.py b/libs/models/deeplabv3plus.py
index 592c837..71f095a 100644
--- a/libs/models/deeplabv3plus.py
+++ b/libs/models/deeplabv3plus.py
@@ -5,6 +5,8 @@
 # URL:      http://kazuto1011.github.io
 # Created:  2018-03-26
 
+from __future__ import absolute_import, print_function
+
 from collections import OrderedDict
 
 import torch
@@ -16,7 +18,9 @@
 
 
 class DeepLabV3Plus(nn.Module):
-    """DeepLab v3+"""
+    """
+    DeepLab v3+: Dilated ResNet with multi-grid + improved ASPP + decoder
+    """
 
     def __init__(self, n_classes, n_blocks, atrous_rates, multi_grids, output_stride):
         super(DeepLabV3Plus, self).__init__()
@@ -60,10 +64,10 @@ def forward(self, x):
         h = self.layer5(h)
         h = self.aspp(h)
         h = self.fc1(h)
-        h = F.interpolate(h, size=h_.shape[2:], mode="bilinear", align_corners=True)
+        h = F.interpolate(h, size=h_.shape[2:], mode="bilinear", align_corners=False)
         h = torch.cat((h, h_), dim=1)
         h = self.fc2(h)
-        h = F.interpolate(h, size=x.shape[2:], mode="bilinear", align_corners=True)
+        h = F.interpolate(h, size=x.shape[2:], mode="bilinear", align_corners=False)
         return h
 
 
@@ -73,7 +77,7 @@ def forward(self, x):
         n_blocks=[3, 4, 23, 3],
         atrous_rates=[6, 12, 18],
         multi_grids=[1, 2, 4],
-        output_stride=8,
+        output_stride=16,
     )
     model.eval()
     image = torch.randn(1, 3, 513, 513)
diff --git a/libs/models/msc.py b/libs/models/msc.py
index ed863aa..c51d798 100644
--- a/libs/models/msc.py
+++ b/libs/models/msc.py
@@ -11,25 +11,30 @@
 
 
 class MSC(nn.Module):
-    """Multi-scale inputs"""
+    """
+    Multi-scale inputs
+    """
 
-    def __init__(self, base, scales=[0.5, 0.75]):
+    def __init__(self, base, scales=None):
         super(MSC, self).__init__()
         self.base = base
-        self.scales = scales
+        if scales:
+            self.scales = scales
+        else:
+            self.scales = [0.5, 0.75]
 
     def forward(self, x):
         # Original
         logits = self.base(x)
+        _, _, H, W = logits.shape
         interp = lambda l: F.interpolate(
-            l, size=logits.shape[2:], mode="bilinear", align_corners=True
+            l, size=(H, W), mode="bilinear", align_corners=False
         )
 
         # Scaled
         logits_pyramid = []
         for p in self.scales:
-            size = [int(s * p) for s in x.shape[2:]]
-            h = F.interpolate(x, size=size, mode="bilinear", align_corners=True)
+            h = F.interpolate(x, scale_factor=p, mode="bilinear", align_corners=False)
             logits_pyramid.append(self.base(h))
 
         # Pixel-wise max
diff --git a/libs/models/resnet.py b/libs/models/resnet.py
index 62e8b63..24d5f69 100644
--- a/libs/models/resnet.py
+++ b/libs/models/resnet.py
@@ -5,6 +5,8 @@
 # URL:      http://kazuto1011.github.io
 # Created:  2017-11-19
 
+from __future__ import absolute_import, print_function
+
 from collections import OrderedDict
 
 import torch
@@ -20,6 +22,10 @@
 
 
 class _ConvBnReLU(nn.Sequential):
+    """
+    Cascade of 2D convolution, batch norm, and ReLU.
+    """
+
     BATCH_NORM = _BATCH_NORM
 
     def __init__(
@@ -39,7 +45,9 @@ def __init__(
 
 
 class _Bottleneck(nn.Module):
-    """Bottleneck Unit"""
+    """
+    Bottleneck block of MSRA ResNet.
+    """
 
     def __init__(self, in_ch, mid_ch, out_ch, stride, dilation, downsample):
         super(_Bottleneck, self).__init__()
@@ -61,7 +69,9 @@ def forward(self, x):
 
 
 class _ResLayer(nn.Sequential):
-    """Residual blocks"""
+    """
+    Residual layer with multi grids
+    """
 
     def __init__(
         self, n_layers, in_ch, mid_ch, out_ch, stride, dilation, multi_grids=None
@@ -88,10 +98,43 @@ def __init__(
 
 class _Stem(nn.Sequential):
     """
-    The 1st Residual Layer
+    The 1st Residual Layer.
+    Note that the max pooling is different from both MSRA and FAIR ResNet.
     """
 
     def __init__(self):
         super(_Stem, self).__init__()
         self.add_module("conv1", _ConvBnReLU(3, 64, 7, 2, 3, 1))
         self.add_module("pool", nn.MaxPool2d(3, 2, 1, ceil_mode=True))
+
+
+class ResNet(nn.Module):
+    def __init__(self, n_classes, n_blocks):
+        super(ResNet, self).__init__()
+        self.add_module("layer1", _Stem())
+        self.add_module("layer2", _ResLayer(n_blocks[0], 64, 64, 256, 1, 1))
+        self.add_module("layer3", _ResLayer(n_blocks[1], 256, 128, 512, 2, 1))
+        self.add_module("layer4", _ResLayer(n_blocks[2], 512, 256, 1024, 2, 1))
+        self.add_module("layer5", _ResLayer(n_blocks[3], 1024, 512, 2048, 2, 1))
+        self.add_module("pool5", nn.AdaptiveAvgPool2d(1))
+        self.add_module("fc", nn.Linear(2048, n_classes))
+
+    def forward(self, x):
+        h = self.layer1(x)
+        h = self.layer2(h)
+        h = self.layer3(h)
+        h = self.layer4(h)
+        h = self.layer5(h)
+        h = self.pool5(h)
+        h = self.fc(h.view(h.size(0), -1))
+        return h
+
+
+if __name__ == "__main__":
+    model = ResNet(n_classes=1000, n_blocks=[3, 4, 23, 3])
+    model.eval()
+    image = torch.randn(1, 3, 224, 224)
+
+    print(model)
+    print("input:", image.shape)
+    print("output:", model(image).shape)
diff --git a/main.py b/main.py
index a15d112..709b076 100644
--- a/main.py
+++ b/main.py
@@ -5,11 +5,11 @@
 # URL:    https://kazuto1011.github.io
 # Date:   07 January 2019
 
-
 from __future__ import absolute_import, division, print_function
 
 import json
-import os.path as osp
+import multiprocessing
+import os
 
 import click
 import joblib
@@ -19,39 +19,33 @@
 import torch.nn.functional as F
 import yaml
 from addict import Dict
+from PIL import Image
 from tensorboardX import SummaryWriter
 from torchnet.meter import MovingAverageValueMeter
 from tqdm import tqdm
 
 from libs.datasets import get_dataset
-from libs.models import DeepLabV2_ResNet101_MSC
+from libs.models import *
 from libs.utils import DenseCRF, PolynomialLR, scores
 
 
+def makedirs(dirs):
+    if not os.path.exists(dirs):
+        os.makedirs(dirs)
+
+
 def get_device(cuda):
     cuda = cuda and torch.cuda.is_available()
     device = torch.device("cuda" if cuda else "cpu")
     if cuda:
-        current_device = torch.cuda.current_device()
-        print("Device:", torch.cuda.get_device_name(current_device))
+        print("Device:")
+        for i in range(torch.cuda.device_count()):
+            print("    {}:".format(i), torch.cuda.get_device_name(i))
     else:
         print("Device: CPU")
     return device
 
 
-def setup_model(model_path, n_classes, train=True):
-    model = DeepLabV2_ResNet101_MSC(n_classes=n_classes)
-    state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
-    if train:
-        model.load_state_dict(state_dict, strict=False)  # to skip ASPP
-        model = nn.DataParallel(model)
-    else:
-        model.load_state_dict(state_dict)
-        model = nn.DataParallel(model)
-        model.eval()
-    return model
-
-
 def get_params(model, key):
     # For Dilated FCN
     if key == "1x":
@@ -74,41 +68,65 @@ def get_params(model, key):
                     yield m[1].bias
 
 
-def resize_labels(labels, shape):
-    labels = labels.unsqueeze(1).float()  # Add channel axis
-    labels = F.interpolate(labels, shape, mode="nearest")
-    labels = labels.squeeze(1).long()
-    return labels
+def resize_labels(labels, size):
+    """
+    Downsample labels for 0.5x and 0.75x logits by nearest interpolation.
+    Other nearest methods result in misaligned labels.
+    -> F.interpolate(labels, shape, mode='nearest')
+    -> cv2.resize(labels, shape, interpolation=cv2.INTER_NEAREST)
+    """
+    new_labels = []
+    for label in labels:
+        label = label.float().numpy()
+        label = Image.fromarray(label).resize(size, resample=Image.NEAREST)
+        new_labels.append(np.asarray(label))
+    new_labels = torch.LongTensor(new_labels)
+    return new_labels
 
 
 @click.group()
 @click.pass_context
 def main(ctx):
+    """
+    Training and evaluation
+    """
     print("Mode:", ctx.invoked_subcommand)
 
 
 @main.command()
-@click.option("-c", "--config", type=str, required=True, help="yaml")
-@click.option("--cuda/--no-cuda", default=True, help="Switch GPU/CPU")
-def train(config, cuda):
-    # Auto-tune cuDNN
-    torch.backends.cudnn.benchmark = True
+@click.option(
+    "-c",
+    "--config-path",
+    type=click.File(),
+    required=True,
+    help="Dataset configuration file in YAML",
+)
+@click.option(
+    "--cuda/--cpu", default=True, help="Enable CUDA if available [default: --cuda]"
+)
+def train(config_path, cuda):
+    """
+    Training DeepLab by v2 protocol
+    """
 
     # Configuration
+    CONFIG = Dict(yaml.load(config_path))
     device = get_device(cuda)
-    CONFIG = Dict(yaml.load(open(config)))
+    torch.backends.cudnn.benchmark = True
 
-    # Dataset 10k or 164k
+    # Dataset
     dataset = get_dataset(CONFIG.DATASET.NAME)(
         root=CONFIG.DATASET.ROOT,
         split=CONFIG.DATASET.SPLIT.TRAIN,
-        base_size=CONFIG.IMAGE.SIZE.TRAIN.BASE,
-        crop_size=CONFIG.IMAGE.SIZE.TRAIN.CROP,
-        mean=(CONFIG.IMAGE.MEAN.B, CONFIG.IMAGE.MEAN.G, CONFIG.IMAGE.MEAN.R),
-        warp=CONFIG.DATASET.WARP_IMAGE,
-        scale=CONFIG.DATASET.SCALES,
+        ignore_label=CONFIG.DATASET.IGNORE_LABEL,
+        mean_bgr=(CONFIG.IMAGE.MEAN.B, CONFIG.IMAGE.MEAN.G, CONFIG.IMAGE.MEAN.R),
+        augment=True,
+        base_size=CONFIG.IMAGE.SIZE.BASE,
+        crop_size=CONFIG.IMAGE.SIZE.TRAIN,
+        scales=CONFIG.DATASET.SCALES,
         flip=True,
     )
+    print(dataset)
 
     # DataLoader
     loader = torch.utils.data.DataLoader(
@@ -119,10 +137,27 @@ def train(config, cuda):
     )
     loader_iter = iter(loader)
 
-    # Model
-    model = setup_model(CONFIG.MODEL.INIT_MODEL, CONFIG.DATASET.N_CLASSES, train=True)
+    # Model check
+    print("Model:", CONFIG.MODEL.NAME)
+    assert (
+        CONFIG.MODEL.NAME == "DeepLabV2_ResNet101_MSC"
+    ), 'Currently support only "DeepLabV2_ResNet101_MSC"'
+
+    # Model setup
+    model = eval(CONFIG.MODEL.NAME)(n_classes=CONFIG.DATASET.N_CLASSES)
+    state_dict = torch.load(CONFIG.MODEL.INIT_MODEL)
+    print("    Init:", CONFIG.MODEL.INIT_MODEL)
+    for m in model.base.state_dict().keys():
+        if m not in state_dict.keys():
+            print("    Skip init:", m)
+    model.base.load_state_dict(state_dict, strict=False)  # to skip ASPP
+    model = nn.DataParallel(model)
     model.to(device)
 
+    # Loss definition
+    criterion = nn.CrossEntropyLoss(ignore_index=CONFIG.DATASET.IGNORE_LABEL)
+    criterion.to(device)
+
     # Optimizer
     optimizer = torch.optim.SGD(
         # cf lr_mult and decay_mult in train.prototxt
@@ -154,14 +189,21 @@ def train(config, cuda):
         power=CONFIG.SOLVER.POLY_POWER,
     )
 
-    # Loss definition
-    criterion = nn.CrossEntropyLoss(ignore_index=CONFIG.DATASET.IGNORE_LABEL)
-    criterion.to(device)
-
-    # TensorBoard logger
-    writer = SummaryWriter(CONFIG.SOLVER.LOG_DIR)
+    # Setup loss logger
+    writer = SummaryWriter(os.path.join(CONFIG.EXP.OUTPUT_DIR, "logs", CONFIG.EXP.ID))
     average_loss = MovingAverageValueMeter(CONFIG.SOLVER.AVERAGE_LOSS)
 
+    # Path to save models
+    checkpoint_dir = os.path.join(
+        CONFIG.EXP.OUTPUT_DIR,
+        "models",
+        CONFIG.EXP.ID,
+        CONFIG.MODEL.NAME.lower(),
+        CONFIG.DATASET.SPLIT.TRAIN,
+    )
+    makedirs(checkpoint_dir)
+    print("Checkpoint dst:", checkpoint_dir)
+
     # Freeze the batch norm pre-trained on COCO
     model.train()
     model.module.base.freeze_bn()
@@ -169,7 +211,6 @@ def train(config, cuda):
     for iteration in tqdm(
         range(1, CONFIG.SOLVER.ITER_MAX + 1),
         total=CONFIG.SOLVER.ITER_MAX,
-        leave=False,
         dynamic_ncols=True,
     ):
 
@@ -179,26 +220,23 @@ def train(config, cuda):
         loss = 0
         for _ in range(CONFIG.SOLVER.ITER_SIZE):
             try:
-                images, labels = next(loader_iter)
+                _, images, labels = next(loader_iter)
             except:
                 loader_iter = iter(loader)
-                images, labels = next(loader_iter)
-
-            images = images.to(device)
-            labels = labels.to(device)
+                _, images, labels = next(loader_iter)
 
             # Propagate forward
-            logits = model(images)
+            logits = model(images.to(device))
 
             # Loss
             iter_loss = 0
             for logit in logits:
                 # Resize labels for {100%, 75%, 50%, Max} logits
                 _, _, H, W = logit.shape
-                labels_ = resize_labels(labels, shape=(H, W))
-                iter_loss += criterion(logit, labels_)
+                labels_ = resize_labels(labels, size=(H, W))
+                iter_loss += criterion(logit, labels_.to(device))
 
-            # Backpropagate (just compute gradients wrt the loss)
+            # Propagate backward (just compute gradients wrt the loss)
             iter_loss /= CONFIG.SOLVER.ITER_SIZE
             iter_loss.backward()
 
@@ -216,9 +254,16 @@ def train(config, cuda):
         if iteration % CONFIG.SOLVER.ITER_TB == 0:
             writer.add_scalar("loss/train", average_loss.value()[0], iteration)
             for i, o in enumerate(optimizer.param_groups):
-                writer.add_scalar("lr/group{}".format(i), o["lr"], iteration)
-            if False:  # This produces a large log file
-                for name, param in model.named_parameters():
+                writer.add_scalar("lr/group_{}".format(i), o["lr"], iteration)
+            for i in range(torch.cuda.device_count()):
+                writer.add_scalar(
+                    "gpu/device_{}/memory_cached".format(i),
+                    torch.cuda.memory_cached(i) / 1024 ** 3,
+                    iteration,
+                )
+
+            if False:
+                for name, param in model.module.base.named_parameters():
                     name = name.replace(".", "/")
                     # Weight/gradient distribution
                     writer.add_histogram(name, param, iteration, bins="auto")
@@ -231,50 +276,51 @@ def train(config, cuda):
         if iteration % CONFIG.SOLVER.ITER_SAVE == 0:
             torch.save(
                 model.module.state_dict(),
-                osp.join(CONFIG.MODEL.SAVE_DIR, "checkpoint_{}.pth".format(iteration)),
+                os.path.join(checkpoint_dir, "checkpoint_{}.pth".format(iteration)),
             )
 
-        # To verify progress separately
-        torch.save(
-            model.module.state_dict(),
-            osp.join(CONFIG.MODEL.SAVE_DIR, "checkpoint_current.pth"),
-        )
-
     torch.save(
-        model.module.state_dict(),
-        osp.join(CONFIG.MODEL.SAVE_DIR, "checkpoint_final.pth"),
+        model.module.state_dict(), os.path.join(checkpoint_dir, "checkpoint_final.pth")
     )
 
 
 @main.command()
-@click.option("-c", "--config", type=str, required=True, help="yaml")
-@click.option("-m", "--model-path", type=str, required=True, help="pth")
-@click.option("--cuda/--no-cuda", default=True, help="Switch GPU/CPU")
-@click.option("--crf", is_flag=True, help="CRF post processing")
-def test(config, model_path, cuda, crf):
-    # Disable autograd globally
-    torch.set_grad_enabled(False)
+@click.option(
+    "-c",
+    "--config-path",
+    type=click.File(),
+    required=True,
+    help="Dataset configuration file in YAML",
+)
+@click.option(
+    "-m",
+    "--model-path",
+    type=click.Path(exists=True),
+    required=True,
+    help="PyTorch model to be loaded",
+)
+@click.option(
+    "--cuda/--cpu", default=True, help="Enable CUDA if available [default: --cuda]"
+)
+def test(config_path, model_path, cuda):
+    """
+    Evaluation on validation set
+    """
 
-    # Setup
+    # Configuration
+    CONFIG = Dict(yaml.load(config_path))
     device = get_device(cuda)
-    CONFIG = Dict(yaml.load(open(config)))
-
-    # If the image size never change,
-    if CONFIG.DATASET.WARP_IMAGE:
-        # Auto-tune cuDNN
-        torch.backends.cudnn.benchmark = True
+    torch.set_grad_enabled(False)
 
-    # Dataset 10k or 164k
+    # Dataset
     dataset = get_dataset(CONFIG.DATASET.NAME)(
         root=CONFIG.DATASET.ROOT,
         split=CONFIG.DATASET.SPLIT.VAL,
-        base_size=CONFIG.IMAGE.SIZE.TEST,
-        crop_size=None,
-        mean=(CONFIG.IMAGE.MEAN.B, CONFIG.IMAGE.MEAN.G, CONFIG.IMAGE.MEAN.R),
-        warp=CONFIG.DATASET.WARP_IMAGE,
-        scale=None,
-        flip=False,
+        ignore_label=CONFIG.DATASET.IGNORE_LABEL,
+        mean_bgr=(CONFIG.IMAGE.MEAN.B, CONFIG.IMAGE.MEAN.G, CONFIG.IMAGE.MEAN.R),
+        augment=False,
     )
+    print(dataset)
 
     # DataLoader
     loader = torch.utils.data.DataLoader(
@@ -285,9 +331,106 @@ def test(config, model_path, cuda, crf):
     )
 
     # Model
-    model = setup_model(model_path, CONFIG.DATASET.N_CLASSES, train=False)
+    model = eval(CONFIG.MODEL.NAME)(n_classes=CONFIG.DATASET.N_CLASSES)
+    state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
+    model.load_state_dict(state_dict)
+    model = nn.DataParallel(model)
+    model.eval()
     model.to(device)
 
+    # Path to save logits
+    logit_dir = os.path.join(
+        CONFIG.EXP.OUTPUT_DIR,
+        "features",
+        CONFIG.EXP.ID,
+        CONFIG.MODEL.NAME.lower(),
+        CONFIG.DATASET.SPLIT.VAL,
+        "logit",
+    )
+    makedirs(logit_dir)
+    print("Logit dst:", logit_dir)
+
+    # Path to save scores
+    save_dir = os.path.join(
+        CONFIG.EXP.OUTPUT_DIR,
+        "scores",
+        CONFIG.EXP.ID,
+        CONFIG.MODEL.NAME.lower(),
+        CONFIG.DATASET.SPLIT.VAL,
+    )
+    makedirs(save_dir)
+    save_path = os.path.join(save_dir, "scores.json")
+    print("Score dst:", save_path)
+
+    preds, gts = [], []
+    for image_ids, images, gt_labels in tqdm(
+        loader, total=len(loader), dynamic_ncols=True
+    ):
+        # Image
+        images = images.to(device)
+
+        # Forward propagation
+        logits = model(images)
+
+        # Save on disk for CRF post-processing
+        for image_id, logit in zip(image_ids, logits):
+            filename = os.path.join(logit_dir, image_id + ".npy")
+            np.save(filename, logit.cpu().numpy())
+
+        # Pixel-wise labeling
+        _, H, W = gt_labels.shape
+        logits = F.interpolate(
+            logits, size=(H, W), mode="bilinear", align_corners=False
+        )
+        probs = F.softmax(logits, dim=1)
+        labels = torch.argmax(probs, dim=1)
+
+        preds += list(labels.cpu().numpy())
+        gts += list(gt_labels.numpy())
+
+    # Pixel Accuracy, Mean Accuracy, Class IoU, Mean IoU, Freq Weighted IoU
+    score = scores(gts, preds, n_class=CONFIG.DATASET.N_CLASSES)
+
+    with open(save_path, "w") as f:
+        json.dump(score, f, indent=4, sort_keys=True)
+
+
+@main.command()
+@click.option(
+    "-c",
+    "--config-path",
+    type=click.File(),
+    required=True,
+    help="Dataset configuration file in YAML",
+)
+@click.option(
+    "-j",
+    "--n-jobs",
+    type=int,
+    default=multiprocessing.cpu_count(),
+    show_default=True,
+    help="Number of parallel jobs",
+)
+def crf(config_path, n_jobs):
+    """
+    CRF post-processing on pre-computed logits
+    """
+
+    # Configuration
+    CONFIG = Dict(yaml.load(config_path))
+    torch.set_grad_enabled(False)
+    print("# jobs:", n_jobs)
+
+    # Dataset
+    dataset = get_dataset(CONFIG.DATASET.NAME)(
+        root=CONFIG.DATASET.ROOT,
+        split=CONFIG.DATASET.SPLIT.VAL,
+        ignore_label=CONFIG.DATASET.IGNORE_LABEL,
+        mean_bgr=(CONFIG.IMAGE.MEAN.B, CONFIG.IMAGE.MEAN.G, CONFIG.IMAGE.MEAN.R),
+        augment=False,
+    )
+    print(dataset)
+
     # CRF post-processor
     postprocessor = DenseCRF(
         iter_max=CONFIG.CRF.ITER_MAX,
@@ -298,37 +441,61 @@ def test(config, model_path, cuda, crf):
         bi_w=CONFIG.CRF.BI_W,
     )
 
-    preds, gts = [], []
-    for images, labels in tqdm(
-        loader, total=len(loader), leave=False, dynamic_ncols=True
-    ):
-        # Image
-        images = images.to(device)
-        _, H, W = labels.shape
+    # Path to logit files
+    logit_dir = os.path.join(
+        CONFIG.EXP.OUTPUT_DIR,
+        "features",
+        CONFIG.EXP.ID,
+        CONFIG.MODEL.NAME.lower(),
+        CONFIG.DATASET.SPLIT.VAL,
+        "logit",
+    )
+    print("Logit src:", logit_dir)
+    if not os.path.isdir(logit_dir):
+        print("Logit not found, run first: python main.py test [OPTIONS]")
+        quit()
+
+    # Path to save scores
+    save_dir = os.path.join(
+        CONFIG.EXP.OUTPUT_DIR,
+        "scores",
+        CONFIG.EXP.ID,
+        CONFIG.MODEL.NAME.lower(),
+        CONFIG.DATASET.SPLIT.VAL,
+    )
+    makedirs(save_dir)
+    save_path = os.path.join(save_dir, "scores_crf.json")
+    print("Score dst:", save_path)
 
-        # Forward propagation
-        logits = model(images)
-        logits = F.interpolate(logits, size=(H, W), mode="bilinear", align_corners=True)
-        probs = F.softmax(logits, dim=1)
-        probs = probs.data.cpu().numpy()
-
-        # Postprocessing
-        if crf:
-            # images: (B,C,H,W) -> (B,H,W,C)
-            images = images.data.cpu().numpy().astype(np.uint8).transpose(0, 2, 3, 1)
-            probs = joblib.Parallel(n_jobs=-1)(
-                [joblib.delayed(postprocessor)(*pair) for pair in zip(images, probs)]
-            )
+    # Process per sample
+    def process(i):
+        image_id, image, gt_label = dataset.__getitem__(i)
+
+        filename = os.path.join(logit_dir, image_id + ".npy")
+        logit = np.load(filename)
 
-        labelmaps = np.argmax(probs, axis=1)
+        _, H, W = image.shape
+        logit = torch.FloatTensor(logit)[None, ...]
+        logit = F.interpolate(logit, size=(H, W), mode="bilinear", align_corners=False)
+        prob = F.softmax(logit, dim=1)[0].numpy()
+
+        image = image.astype(np.uint8).transpose(1, 2, 0)
+        prob = postprocessor(image, prob)
+        label = np.argmax(prob, axis=0)
+
+        return label, gt_label
+
+    # CRF in multi-process
+    results = joblib.Parallel(n_jobs=n_jobs, verbose=10, pre_dispatch="all")(
+        [joblib.delayed(process)(i) for i in range(len(dataset))]
+    )
 
-        preds += list(labelmaps)
-        gts += list(labels.numpy())
+    preds, gts = zip(*results)
 
     # Pixel Accuracy, Mean Accuracy, Class IoU, Mean IoU, Freq Weighted IoU
     score = scores(gts, preds, n_class=CONFIG.DATASET.N_CLASSES)
 
-    with open(model_path.replace(".pth", ".json"), "w") as f:
+    with open(save_path, "w") as f:
         json.dump(score, f, indent=4, sort_keys=True)
 
 
diff --git a/scripts/setup_caffemodels.sh b/scripts/setup_caffemodels.sh
index aed2c05..06d95d8 100755
--- a/scripts/setup_caffemodels.sh
+++ b/scripts/setup_caffemodels.sh
@@ -1,21 +1,18 @@
 #!/bin/bash
 
 # Download released caffemodels
-wget -nc http://liangchiehchen.com/projects/released/deeplab_aspp_resnet101/prototxt_and_model.zip
+wget -nc -P ./data http://liangchiehchen.com/projects/released/deeplab_aspp_resnet101/prototxt_and_model.zip
 
-unzip -n prototxt_and_model.zip
+unzip -n ./data/prototxt_and_model.zip -d ./data
 
 # Move caffemodels to data directories
 ## MSCOCO
-mv init.caffemodel data/models/deeplab_resnet101/coco_init
+mv ./data/init.caffemodel ./data/models/coco/deeplabv1_resnet101/caffemodel
 ## PASCAL VOC 2012
-mv train_iter_20000.caffemodel data/models/deeplab_resnet101/voc12
-mv train2_iter_20000.caffemodel data/models/deeplab_resnet101/voc12
-
-# Remove *.prototxt
-rm *.prototxt
+mv ./data/train_iter_20000.caffemodel ./data/models/voc12/deeplabv2_resnet101_msc/caffemodel
+mv ./data/train2_iter_20000.caffemodel ./data/models/voc12/deeplabv2_resnet101_msc/caffemodel
 
 echo ===============================================================================================
 echo "Next, try running script below:"
-echo -e "\033[32m python convert.py --dataset coco_init \033[00m"
+echo -e "\033[32m python convert.py --dataset coco \033[00m"
 echo ===============================================================================================
\ No newline at end of file
diff --git a/scripts/setup_voc12.sh b/scripts/setup_voc12.sh
new file mode 100755
index 0000000..401b024
--- /dev/null
+++ b/scripts/setup_voc12.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+
+DATASET_DIR=$1
+
+# Download PASCAL VOC12 (2GB)
+wget -nc -P $DATASET_DIR http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
+
+# Extract images, annotations, etc.
+tar -xvf $DATASET_DIR/VOCtrainval_11-May-2012.tar -C $DATASET_DIR
\ No newline at end of file
diff --git a/scripts/train_eval.sh b/scripts/train_eval.sh
new file mode 100755
index 0000000..bde1b6d
--- /dev/null
+++ b/scripts/train_eval.sh
@@ -0,0 +1,40 @@
+#!/bin/bash
+
+set -x
+
+
+# 0. Choose from {voc12, cocostuff10k, cocostuff164k}
+DATASET=voc12
+
+
+# 1. Train DeepLab v2 on ${DATASET}
+python main.py train \
+-c configs/${DATASET}.yaml
+
+# Trained models are saved into
+#   data/models/${DATASET}/deeplabv2_resnet101_msc/*/checkpoint_5000.pth
+#   data/models/${DATASET}/deeplabv2_resnet101_msc/*/checkpoint_10000.pth
+#   data/models/${DATASET}/deeplabv2_resnet101_msc/*/checkpoint_15000.pth
+#   ...
+
+# Tensorboard logs are in data/logs.
+
+
+# 2. Evaluate the model on val set
+python main.py test \
+-c configs/${DATASET}.yaml \
+-m data/models/${DATASET}/deeplabv2_resnet101_msc/*/checkpoint_final.pth
+
+# Validation scores on 4 metrics are saved as
+#   data/scores/${DATASET}/deeplabv2_resnet101_msc/*/scores.json
+
+# Logits are saved into
+#   data/features/${DATASET}/deeplabv2_resnet101_msc/*/logit/...
+
+
+# 3. Re-evaluate the model with CRF post-processing
+python main.py crf \
+-c configs/${DATASET}.yaml
+
+# Scores with CRF on 4 metrics are saved as
+#   data/scores/${DATASET}/deeplabv2_resnet101_msc/*/scores_crf.json
\ No newline at end of file

Train set	Eval set	CRF?	Code	Pixel Accuracy	Mean Accuracy	Mean IoU	FreqW IoU
+ 10k train † + (Model) +	10k val †		Original [2]	65.1	45.5	34.4	50.4
			Ours	65.8	45.7	34.8	51.2
		✓	Ours	67.1	46.4	35.6	52.5
+ 164k train + (Model) +	10k val		Ours	68.4	55.6	44.2	55.1
	10k val	✓	Ours	69.2	55.9	45.0	55.9
	164k val		Ours	66.8	51.2	39.1	51.5
	164k val	✓	Ours	67.6	51.5	39.7	52.3
Train set	Eval set	CRF?	Code	Pixel Accuracy	Mean Accuracy	Mean IoU	FreqW IoU
+ trainaug + (Model) +	val		Original [3]	-	-	76.35	-
			Ours	94.64	86.50	76.65	90.41
		✓	Original [3]	-	-	77.69	-
		✓	Ours	95.04	86.64	77.93	91.06