Skip to content

Commit

Permalink
Add mim download odl dataset (#10460)
Browse files Browse the repository at this point in the history
  • Loading branch information
hhaAndroid authored Jun 28, 2023
1 parent 5036dc5 commit b3c1165
Show file tree
Hide file tree
Showing 8 changed files with 108 additions and 1 deletion.
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
include requirements/*.txt
include mmdet/VERSION
include mmdet/.mim/model-index.yml
include mmdet/.mim/dataset-index.yml
include mmdet/.mim/demo/*/*
recursive-include mmdet/.mim/configs *.py *.yml
recursive-include mmdet/.mim/tools *.sh *.py
17 changes: 17 additions & 0 deletions dataset-index.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
voc2007:
dataset: PASCAL_VOC2007
download_root: data
data_root: data
script: tools/dataset_converters/scripts/preprocess_voc2007.sh

voc2012:
dataset: PASCAL_VOC2012
download_root: data
data_root: data
script: tools/dataset_converters/scripts/preprocess_voc2012.sh

coco2017:
dataset: COCO_2017
download_root: data
data_root: data/coco
script: tools/dataset_converters/scripts/preprocess_coco2017.sh
28 changes: 28 additions & 0 deletions docs/en/user_guides/dataset_prepare.md
Original file line number Diff line number Diff line change
Expand Up @@ -280,3 +280,31 @@ data
```

The above folders include all data of ADE20K's semantic segmentation, instance segmentation, and panoptic segmentation.

### Download from OpenDataLab

By using [OpenDataLab](https://opendatalab.com/), researchers can obtain free formatted datasets in various fields. Through the search function of the platform, researchers may address the dataset they look for quickly and easily. Using the formatted datasets from the platform, researchers can efficiently conduct tasks across datasets.

Currently, MIM supports downloading VOC and COCO datasets from OpenDataLab with one command line. More datasets will be supported in the future. You can also directly download the datasets you need from the OpenDataLab platform and then convert them to the format required by MMDetection.

If you use MIM to download, make sure that the version is greater than v0.3.8. You can use the following command to update:

```Bash
pip install -U openmim
```

```Bash
# install OpenDataLab CLI tools
pip install -U opendatalab
# log in OpenDataLab, registry
odl login

# download voc2007 and preprocess by MIM
mim download mmdet --dataset voc2007

# download voc2012 and preprocess by MIM
mim download mmdet --dataset voc2012

# download coco2017 and preprocess by MIM
mim download mmdet --dataset coco2017
```
28 changes: 28 additions & 0 deletions docs/zh_cn/user_guides/dataset_prepare.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,3 +277,31 @@ data
```

上述文件夹包括ADE20K的语义分割、实例分割和泛在分割的所有数据。

### 从 OpenDataLab 中下载

[OpenDataLab](https://opendatalab.com/) 为人工智能研究者提供免费开源的数据集,通过 OpenDataLab,研究者可以获得格式统一的各领域经典数据集。通过平台的搜索功能,研究者可以迅速便捷地找到自己所需数据集;通过平台的统一格式,研究者可以便捷地对跨数据集任务进行开发。

目前,MIM 支持使用一条命令行从 OpenDataLab 中下载 VOC 和 COCO 数据集,后续将支持更多数据集。你也可以直接访问 OpenDataLab 平台下载你所需的数据集,然后将其转化为 MMDetection 所要求的格式。

如果使用 MIM 下载,请确保版本大于 v0.3.8,你可以使用如下命令更新:

```Bash
pip install -U openmim
```

```Bash
# install OpenDataLab CLI tools
pip install -U opendatalab
# log in OpenDataLab, registry
odl login

# download voc2007 and preprocess by MIM
mim download mmdet --dataset voc2007

# download voc2012 and preprocess by MIM
mim download mmdet --dataset voc2012

# download coco2017 and preprocess by MIM
mim download mmdet --dataset coco2017
```
4 changes: 3 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,9 @@ def add_mim_extension():
else:
return

filenames = ['tools', 'configs', 'demo', 'model-index.yml']
filenames = [
'tools', 'configs', 'demo', 'model-index.yml', 'dataset-index.yml'
]
repo_path = osp.dirname(__file__)
mim_path = osp.join(repo_path, 'mmdet', '.mim')
os.makedirs(mim_path, exist_ok=True)
Expand Down
15 changes: 15 additions & 0 deletions tools/dataset_converters/scripts/preprocess_coco2017.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env bash

DOWNLOAD_DIR=$1
DATA_ROOT=$2

unzip $DOWNLOAD_DIR/COCO_2017/raw/Images/val2017.zip -d $DATA_ROOT
unzip $DOWNLOAD_DIR/COCO_2017/raw/Images/train2017.zip -d $DATA_ROOT
unzip $DOWNLOAD_DIR/COCO_2017/raw/Images/test2017.zip -d $DATA_ROOT/
unzip $DOWNLOAD_DIR/COCO_2017/raw/Images/unlabeled2017.zip -d $DATA_ROOT
unzip $DOWNLOAD_DIR/COCO_2017/raw/Annotations/stuff_annotations_trainval2017.zip -d $DATA_ROOT/
unzip $DOWNLOAD_DIR/COCO_2017/raw/Annotations/panoptic_annotations_trainval2017.zip -d $DATA_ROOT/
unzip $DOWNLOAD_DIR/COCO_2017/raw/Annotations/image_info_unlabeled2017.zip -d $DATA_ROOT/
unzip $DOWNLOAD_DIR/COCO_2017/raw/Annotations/image_info_test2017.zip -d $DATA_ROOT/
unzip $DOWNLOAD_DIR/COCO_2017/raw/Annotations/annotations_trainval2017.zip -d $DATA_ROOT
rm -rf $DATA_ROOT/COCO_2017
8 changes: 8 additions & 0 deletions tools/dataset_converters/scripts/preprocess_voc2007.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash

DOWNLOAD_DIR=$1
DATA_ROOT=$2

tar -xvf $DOWNLOAD_DIR/PASCAL_VOC2007/raw/VOCtrainval_06-Nov-2007.tar -C $DATA_ROOT
tar -xvf $DOWNLOAD_DIR/PASCAL_VOC2007/raw/VOCtestnoimgs_06-Nov-2007.tar -C $DATA_ROOT
rm -rf $DATA_ROOT/PASCAL_VOC2007
8 changes: 8 additions & 0 deletions tools/dataset_converters/scripts/preprocess_voc2012.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash

DOWNLOAD_DIR=$1
DATA_ROOT=$2

tar -xvf $DOWNLOAD_DIR/PASCAL_VOC2012/raw/VOCtrainval_11-May-2012.tar -C $DATA_ROOT
tar -xvf $DOWNLOAD_DIR/PASCAL_VOC2012/raw/VOC2012test.tar -C $DATA_ROOT
rm -rf $DATA_ROOT/PASCAL_VOC2012

0 comments on commit b3c1165

Please sign in to comment.