Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Refactor dataset_converters for restoration datasets #1690

Merged
merged 15 commits into from
Mar 15, 2023
30 changes: 30 additions & 0 deletions tools/dataset_converters/classic5/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Preparing Classic5 Dataset

<!-- [DATASET] -->

```bibtex
@article{zhang2017beyond,
title={Beyond a {Gaussian} denoiser: Residual learning of deep {CNN} for image denoising},
author={Zhang, Kai and Zuo, Wangmeng and Chen, Yunjin and Meng, Deyu and Zhang, Lei},
journal={IEEE Transactions on Image Processing},
year={2017},
volume={26},
number={7},
pages={3142-3155},
}
```

The test datasets can be download from [here](https://github.com/cszn/DnCNN/tree/master/testsets).

The folder structure should look like:

```text
mmediting
├── mmedit
├── tools
├── configs
├── data
|   ├── Classic5
|   |   ├── input
|   |   ├── target
```
28 changes: 28 additions & 0 deletions tools/dataset_converters/classic5/README_zh-CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# 准备 Classic5 数据集

<!-- [DATASET] -->

```bibtex
@article{zhang2017beyond,
title={Beyond a {Gaussian} denoiser: Residual learning of deep {CNN} for image denoising},
author={Zhang, Kai and Zuo, Wangmeng and Chen, Yunjin and Meng, Deyu and Zhang, Lei},
journal={IEEE Transactions on Image Processing},
year={2017},
volume={26},
number={7},
pages={3142-3155},
}
```

测试数据集可以从 [此处](https://github.com/cszn/DnCNN/tree/master/testsets) 下载。

文件目录结构应如下所示:

```text
mmediting
├── mmedit
├── tools
├── configs
├── data
|   ├── Classic5
```
31 changes: 31 additions & 0 deletions tools/dataset_converters/denoising/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Preparing Denoising Dataset

<!-- [DATASET] -->

```bibtex
@inproceedings{Zamir2021Restormer,
title={Restormer: Efficient Transformer for High-Resolution Image Restoration},
author={Syed Waqas Zamir and Aditya Arora and Salman Khan and Munawar Hayat and Fahad Shahbaz Khan and Ming-Hsuan Yang},
booktitle={CVPR},
year={2022}
}
```

The test datasets (Set12, BSD68, CBSD68, Kodak, McMaster, Urban100) can be download from [here](https://drive.google.com/file/d/1mwMLt-niNqcQpfN_ZduG9j4k6P_ZkOl0/).

The folder structure should look like:

```text
mmediting
├── mmedit
├── tools
├── configs
├── data
|   ├── denoising_gaussian_test
|   |   ├── Set12
|   |   ├── BSD68
|   |   ├── CBSD68
|   |   ├── Kodak
|   |   ├── McMaster
|   |   ├── Urban100
```
31 changes: 31 additions & 0 deletions tools/dataset_converters/denoising/README_zh-CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# 准备 Denoising 数据集

<!-- [DATASET] -->

```bibtex
@inproceedings{Zamir2021Restormer,
title={Restormer: Efficient Transformer for High-Resolution Image Restoration},
author={Syed Waqas Zamir and Aditya Arora and Salman Khan and Munawar Hayat and Fahad Shahbaz Khan and Ming-Hsuan Yang},
booktitle={CVPR},
year={2022}
}
```

测试数据集(Set12, BSD68, CBSD68, Kodak, McMaster, Urban100)可以从 [此处](https://drive.google.com/file/d/1P_-RAvltEoEhfT-9GrWRdpEi6NSswTs8/) 下载。

文件目录结构应如下所示:

```text
mmediting
├── mmedit
├── tools
├── configs
├── data
|   ├── denoising_gaussian_test
|   |   ├── Set12
|   |   ├── BSD68
|   |   ├── CBSD68
|   |   ├── Kodak
|   |   ├── McMaster
|   |   ├── Urban100
```
42 changes: 42 additions & 0 deletions tools/dataset_converters/deraining/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Preparing Deraining Dataset

<!-- [DATASET] -->

```bibtex
@inproceedings{Zamir2021Restormer,
title={Restormer: Efficient Transformer for High-Resolution Image Restoration},
author={Syed Waqas Zamir and Aditya Arora and Salman Khan and Munawar Hayat and Fahad Shahbaz Khan and Ming-Hsuan Yang},
booktitle={CVPR},
year={2022}
}
```

The test datasets (Rain100H, Rain100L, Test100, Test1200, Test2800) can be download from [here](https://drive.google.com/file/d/1P_-RAvltEoEhfT-9GrWRdpEi6NSswTs8/).

The folder structure should look like:

```text
mmediting
├── mmedit
├── tools
├── configs
├── data
|   ├── Rain100H
|   |   ├── input
|   |   ├── target
|   ├── Rain100L
|   |   ├── input
|   |   ├── target
|   ├── Test100
|   |   ├── input
|   |   ├── target
|   ├── Test1200
|   |   ├── input
|   |   ├── target
|   ├── Test2800
|   |   ├── input
|   |   ├── target
|   ├── Test100
|   |   ├── input
|   |   ├── target
```
42 changes: 42 additions & 0 deletions tools/dataset_converters/deraining/README_zh-CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# 准备 Deraining 数据集

<!-- [DATASET] -->

```bibtex
@inproceedings{Zamir2021Restormer,
title={Restormer: Efficient Transformer for High-Resolution Image Restoration},
author={Syed Waqas Zamir and Aditya Arora and Salman Khan and Munawar Hayat and Fahad Shahbaz Khan and Ming-Hsuan Yang},
booktitle={CVPR},
year={2022}
}
```

测试数据集(Rain100H, Rain100L, Test100, Test1200, Test2800)可以从 [此处](https://drive.google.com/file/d/1P_-RAvltEoEhfT-9GrWRdpEi6NSswTs8/) 下载。

文件目录结构应如下所示:

```text
mmediting
├── mmedit
├── tools
├── configs
├── data
|   ├── Rain100H
|   |   ├── input
|   |   ├── target
|   ├── Rain100L
|   |   ├── input
|   |   ├── target
|   ├── Test100
|   |   ├── input
|   |   ├── target
|   ├── Test1200
|   |   ├── input
|   |   ├── target
|   ├── Test2800
|   |   ├── input
|   |   ├── target
|   ├── Test100
|   |   ├── input
|   |   ├── target
```
18 changes: 17 additions & 1 deletion tools/dataset_converters/df2k_ost/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ mmediting
For faster IO, we recommend to crop the images to sub-images. We provide such a script:

```shell
python tools/dataset_converters/super-resolution/df2k_ost/preprocess_df2k_ost_dataset.py --data-root ./data/df2k_ost
python tools/dataset_converters/df2k_ost/preprocess_df2k_ost_dataset.py --data-root ./data/df2k_ost
```

The generated data is stored under `df2k_ost` and the data structure is as follows, where `_sub` indicates the sub-images.
Expand All @@ -51,9 +51,25 @@ mmediting
│ ├── df2k_ost
│ │ ├── GT
│ │ ├── GT_sub
│ │ ├── meta_info_df2k_ost.txt
...
```

## Prepare annotation list

If you use the annotation mode for the dataset, you first need to prepare a specific `txt` file.

Each line in the annotation file contains the image names and image shape (usually for the ground-truth images), separated by a white space.

Example of an annotation file:

```text
0001_s001.png (480,480,3)
0001_s002.png (480,480,3)
```

Note that `preprocess_df2k_ost_dataset.py` will generate default annotation files.

## Prepare LMDB dataset for DF2K_OST

If you want to use LMDB datasets for faster IO speed, you can make LMDB files by:
Expand Down
18 changes: 17 additions & 1 deletion tools/dataset_converters/df2k_ost/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ mmediting
为了更快的 IO,我们建议将图像裁剪为子图像。 我们提供了这样一个脚本:

```shell
python tools/dataset_converters/super-resolution/df2k_ost/preprocess_df2k_ost_dataset.py --data-root ./data/df2k_ost
python tools/dataset_converters/df2k_ost/preprocess_df2k_ost_dataset.py --data-root ./data/df2k_ost
```

生成的数据存放在 `df2k_ost` 下,数据结构如下,其中 `_sub` 表示子图像。
Expand All @@ -51,9 +51,25 @@ mmediting
│ ├── df2k_ost
│ │ ├── GT
│ │ ├── GT_sub
│ │ ├── meta_info_df2k_ost.txt
...
```

## 准备标注列表文件

如果您想使用`标注模式`来处理数据集,需要先准备一个 `txt` 格式的标注文件。

标注文件中的每一行包含了图片名以及图片尺寸(这些通常是 ground-truth 图片),这两个字段用空格间隔开。

标注文件示例:

```text
0001_s001.png (480,480,3)
0001_s002.png (480,480,3)
```

请注意,`preprocess_df2k_ost_dataset.py` 脚本默认生成一份标注文件。

## Prepare LMDB dataset for DF2K_OST

如果你想使用 LMDB 数据集来获得更快的 IO 速度,你可以通过以下方式制作 LMDB 文件:
Expand Down
31 changes: 27 additions & 4 deletions tools/dataset_converters/df2k_ost/preprocess_df2k_ost_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,23 @@
import cv2
import lmdb
import mmcv
import mmengine
import numpy as np


def generate_anno_file(args):
"""Generate annotation file for DF2K_OST datasets from the ground-truth
folder."""

print('Generate annotation files ...')
txt_file = osp.join(args.data_root, args.anno_path)
mmengine.utils.mkdir_or_exist(osp.dirname(txt_file))
img_list = sorted(os.listdir(osp.join(args.data_root, 'GT_sub')))
with open(txt_file, 'w') as f:
for img in img_list:
f.write(f'{img} ({args.crop_size}, {args.crop_size}, 3)\n')


def main_extract_subimages(args):
"""A multi-thread tool to crop large images to sub-images for faster IO.

Expand All @@ -34,8 +48,8 @@ def main_extract_subimages(args):
opt['compression_level'] = args.compression_level

# HR images
opt['input_folder'] = osp.join(args.data_root, 'df2k_ost/GT')
opt['save_folder'] = osp.join(args.data_root, 'df2k_ost/GT_sub')
opt['input_folder'] = osp.join(args.data_root, 'GT')
opt['save_folder'] = osp.join(args.data_root, 'GT_sub')
opt['crop_size'] = args.crop_size
opt['step'] = args.step
opt['thresh_size'] = args.thresh_size
Expand All @@ -60,10 +74,10 @@ def extract_subimages(opt):
print(f'Folder {save_folder} already exists. Exit.')
sys.exit(1)

img_list = list(mmcv.scandir(input_folder, suffix='png'))
img_list = list(mmengine.scandir(input_folder, suffix='png'))
img_list = [osp.join(input_folder, v) for v in img_list]

prog_bar = mmcv.ProgressBar(len(img_list))
prog_bar = mmengine.ProgressBar(len(img_list))
pool = Pool(opt['n_thread'])
for path in img_list:
pool.apply_async(
Expand Down Expand Up @@ -305,6 +319,12 @@ def parse_args():
description='Prepare DF2K_OST dataset',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--data-root', help='dataset root')
parser.add_argument(
'--anno-path',
nargs='?',
default='meta_info_df2k_ost.txt',
type=str,
help='annotation file path')
parser.add_argument(
'--crop-size',
type=int,
Expand Down Expand Up @@ -349,6 +369,9 @@ def parse_args():
# extract subimages
main_extract_subimages(args)

# generate annotation files
generate_anno_file(args)

# prepare lmdb files if necessary
if args.make_lmdb:
make_lmdb_for_df2k_ost(args.data_root)
Loading