Skip to content

Commit

Permalink
Merge pull request #92 from openvinotoolkit/develop
Browse files Browse the repository at this point in the history
Release v0.1.5
  • Loading branch information
Maxim Zhiltsov authored Jan 23, 2021
2 parents 7407d12 + c7e1fdf commit e2d2fa0
Show file tree
Hide file tree
Showing 94 changed files with 4,073 additions and 1,013 deletions.
25 changes: 20 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,36 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).


## [Unreleased]
## 01/23/2021 - Release v0.1.5
### Added
-
- `WiderFace` dataset format (<https://github.com/openvinotoolkit/datumaro/pull/65>, <https://github.com/openvinotoolkit/datumaro/pull/90>)
- Function to transform annotations to labels (<https://github.com/openvinotoolkit/datumaro/pull/66>)
- Dataset splits for classification, detection and re-id tasks (<https://github.com/openvinotoolkit/datumaro/pull/68>, <https://github.com/openvinotoolkit/datumaro/pull/81>)
- `VGGFace2` dataset format (<https://github.com/openvinotoolkit/datumaro/pull/69>, <https://github.com/openvinotoolkit/datumaro/pull/82>)
- Unique image count statistic (<https://github.com/openvinotoolkit/datumaro/pull/87>)
- Installation with pip by name `datumaro`

### Changed
-
- `Dataset` class extended with new operations: `save`, `load`, `export`, `import_from`, `detect`, `run_model` (<https://github.com/openvinotoolkit/datumaro/pull/71>)
- Allowed importing `Extractor`-only defined formats (in `Project.import_from`, `dataset.import_from` and CLI/`project import`) (<https://github.com/openvinotoolkit/datumaro/pull/71>)
- `datum project ...` commands replaced with `datum ...` commands (<https://github.com/openvinotoolkit/datumaro/pull/84>)
- Supported more image formats in `ImageNet` extractors (<https://github.com/openvinotoolkit/datumaro/pull/85>)
- Allowed adding `Importer`-defined formats as project sources (`source add`) (<https://github.com/openvinotoolkit/datumaro/pull/86>)
- Added max search depth in `ImageDir` format and importers (<https://github.com/openvinotoolkit/datumaro/pull/86>)

### Deprecated
-
- `datum project ...` CLI context (<https://github.com/openvinotoolkit/datumaro/pull/84>)

### Removed
-

### Fixed
-
- Allow plugins inherited from `Extractor` (instead of only `SourceExtractor`) (<https://github.com/openvinotoolkit/datumaro/pull/70>)
- Windows installation with `pip` for `pycocotools` (<https://github.com/openvinotoolkit/datumaro/pull/73>)
- `YOLO` extractor path matching on Windows (<https://github.com/openvinotoolkit/datumaro/pull/73>)
- Fixed inplace file copying when saving images (<https://github.com/openvinotoolkit/datumaro/pull/76>)
- Fixed `labelmap` parameter type checking in `VOC` converter (<https://github.com/openvinotoolkit/datumaro/pull/76>)
- Fixed model copying on addition in CLI (<https://github.com/openvinotoolkit/datumaro/pull/94>)

### Security
-
Expand Down
55 changes: 41 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,23 +44,23 @@ CVAT annotations ---> Publication, statistics etc.
- Convert only non-`occluded` annotations from a [CVAT](https://github.com/opencv/cvat) project to TFrecord:
```bash
# export Datumaro dataset in CVAT UI, extract somewhere, go to the project dir
datum project filter -e '/item/annotation[occluded="False"]' \
datum filter -e '/item/annotation[occluded="False"]' \
--mode items+anno --output-dir not_occluded
datum project export --project not_occluded \
datum export --project not_occluded \
--format tf_detection_api -- --save-images
```

- Annotate MS COCO dataset, extract image subset, re-annotate it in [CVAT](https://github.com/opencv/cvat), update old dataset:
```bash
# Download COCO dataset http://cocodataset.org/#download
# Put images to coco/images/ and annotations to coco/annotations/
datum project import --format coco --input-path <path/to/coco>
datum project export --filter '/image[images_I_dont_like]' --format cvat \
datum import --format coco --input-path <path/to/coco>
datum export --filter '/image[images_I_dont_like]' --format cvat \
--output-dir reannotation
# import dataset and images to CVAT, re-annotate
# export Datumaro project, extract to 'reannotation-upd'
datum project project merge reannotation-upd
datum project export --format coco
datum merge reannotation-upd
datum export --format coco
```

- Annotate instance polygons in [CVAT](https://github.com/opencv/cvat), export as masks in COCO:
Expand All @@ -72,18 +72,18 @@ CVAT annotations ---> Publication, statistics etc.
- Apply an OpenVINO detection model to some COCO-like dataset,
then compare annotations with ground truth and visualize in TensorBoard:
```bash
datum project import --format coco --input-path <path/to/coco>
datum import --format coco --input-path <path/to/coco>
# create model results interpretation script
datum model add mymodel openvino \
--weights model.bin --description model.xml \
--interpretation-script parse_results.py
datum model run --model mymodel --output-dir mymodel_inference/
datum project diff mymodel_inference/ --format tensorboard --output-dir diff
datum diff mymodel_inference/ --format tensorboard --output-dir diff
```

- Change colors in PASCAL VOC-like `.png` masks:
```bash
datum project import --format voc --input-path <path/to/voc/dataset>
datum import --format voc --input-path <path/to/voc/dataset>

# Create a color map file with desired colors:
#
Expand All @@ -93,24 +93,42 @@ CVAT annotations ---> Publication, statistics etc.
#
# Save as mycolormap.txt

datum project export --format voc_segmentation -- --label-map mycolormap.txt
datum export --format voc_segmentation -- --label-map mycolormap.txt
# add "--apply-colormap=0" to save grayscale (indexed) masks
# check "--help" option for more info
# use "datum --loglevel debug" for extra conversion info
```

- Create a custom COCO-like dataset:
```python
import numpy as np
from datumaro.components.extractor import (DatasetItem,
Bbox, LabelCategories, AnnotationType)
from datumaro.components.dataset import Dataset

dataset = Dataset(categories={
AnnotationType.label: LabelCategories.from_iterable(['cat', 'dog'])
})
dataset.put(DatasetItem(id=0, image=np.ones((5, 5, 3)), annotations=[
Bbox(1, 2, 3, 4, label=0),
]))
dataset.export('test_dataset', 'coco')
```

<!--lint enable list-item-bullet-indent-->
<!--lint enable list-item-indent-->

## Features

[(Back to top)](#table-of-contents)

- Dataset reading, writing, conversion in any direction. Supported formats:
- Dataset reading, writing, conversion in any direction. [Supported formats](docs/user_manual.md#supported-formats):
- [COCO](http://cocodataset.org/#format-data) (`image_info`, `instances`, `person_keypoints`, `captions`, `labels`*)
- [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html) (`classification`, `detection`, `segmentation`, `action_classification`, `person_layout`)
- [YOLO](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data) (`bboxes`)
- [TF Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md) (`bboxes`, `masks`)
- [WIDER Face](http://shuoyang1213.me/WIDERFACE/) (`bboxes`)
- [VGGFace2](https://github.com/ox-vgg/vgg_face2) (`landmarks`, `bboxes`)
- [MOT sequences](https://arxiv.org/pdf/1906.04567.pdf)
- [MOTS PNG](https://www.vision.rwth-aachen.de/page/mots)
- [ImageNet](http://image-net.org/)
Expand All @@ -129,6 +147,14 @@ CVAT annotations ---> Publication, statistics etc.
- polygons to instance masks and vise-versa
- apply a custom colormap for mask annotations
- rename or remove dataset labels
- Splitting a dataset into multiple subsets like `train`, `val`, and `test`:
- random split
- task-specific splits based on annotations,
which keep initial label and attribute distributions
- for classification task, based on labels
- for detection task, based on bboxes
- for re-identification task, based on labels,
avoiding having same IDs in training and test splits
- Dataset quality checking
- Simple checking for errors
- Comparison with model infernece
Expand Down Expand Up @@ -162,7 +188,7 @@ python -m virtualenv venv
Install Datumaro package:

``` bash
pip install 'git+https://github.com/openvinotoolkit/datumaro'
pip install datumaro
```

## Usage
Expand Down Expand Up @@ -208,13 +234,14 @@ dataset = dataset.transform(project.env.transforms.get('remap_labels'),
{'cat': 'dog', # rename cat to dog
'truck': 'car', # rename truck to car
'person': '', # remove this label
}, default='delete')
}, default='delete') # remove everything else

# iterate over dataset elements
for item in dataset:
print(item.id, item.annotations)

# export the resulting dataset in COCO format
project.env.converters.get('coco').convert(dataset, save_dir='dst/dir')
dataset.export('dst/dir', 'coco')
```

> Check our [developer guide](docs/developer_guide.md) for additional information.
Expand Down
2 changes: 1 addition & 1 deletion datumaro/cli/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT
25 changes: 16 additions & 9 deletions datumaro/cli/__main__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Expand Down Expand Up @@ -58,18 +58,25 @@ def make_parser():
_LogManager._define_loglevel_option(parser)

known_contexts = [
('project', contexts.project, "Actions on projects (datasets)"),
('source', contexts.source, "Actions on data sources"),
('model', contexts.model, "Actions on models"),
('project', contexts.project, "Actions with project (deprecated)"),
('source', contexts.source, "Actions with data sources"),
('model', contexts.model, "Actions with models"),
]
known_commands = [
('create', commands.create, "Create project"),
('add', commands.add, "Add source to project"),
('remove', commands.remove, "Remove source from project"),
('export', commands.export, "Export project"),
('import', commands.import_, "Create project from existing dataset"),
('add', commands.add, "Add data source to project"),
('remove', commands.remove, "Remove data source from project"),
('export', commands.export, "Export project in some format"),
('filter', commands.filter, "Filter project"),
('transform', commands.transform, "Transform project"),
('merge', commands.merge, "Merge projects"),
('convert', commands.convert, "Convert dataset into another format"),
('diff', commands.diff, "Compare projects with intersection"),
('ediff', commands.ediff, "Compare projects for equality"),
('stats', commands.stats, "Compute project statistics"),
('info', commands.info, "Print project info"),
('explain', commands.explain, "Run Explainable AI algorithm for model"),
('merge', commands.merge, "Merge datasets"),
('convert', commands.convert, "Convert dataset"),
]

# Argparse doesn't support subparser groups:
Expand Down
13 changes: 10 additions & 3 deletions datumaro/cli/commands/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

from . import add, create, explain, export, remove, merge, convert
# pylint: disable=redefined-builtin

from . import (
create, add, remove, import_,
explain,
export, merge, convert, transform, filter,
diff, ediff, stats,
info
)
3 changes: 1 addition & 2 deletions datumaro/cli/commands/add.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2020-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Expand Down
49 changes: 12 additions & 37 deletions datumaro/cli/commands/convert.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Expand All @@ -9,6 +8,7 @@
import os.path as osp

from datumaro.components.project import Environment
from datumaro.components.dataset import Dataset

from ..contexts.project import FilterModes
from ..util import CliException, MultilineFormatter, make_file_name
Expand Down Expand Up @@ -63,51 +63,29 @@ def convert_command(args):
env = Environment()

try:
converter = env.converters.get(args.output_format)
converter = env.converters[args.output_format]
except KeyError:
raise CliException("Converter for format '%s' is not found" % \
args.output_format)
extra_args = converter.from_cmdline(args.extra_args)
def converter_proxy(extractor, save_dir):
return converter.convert(extractor, save_dir, **extra_args)
extra_args = converter.parse_cmdline(args.extra_args)

filter_args = FilterModes.make_filter_args(args.filter_mode)

fmt = args.input_format
if not args.input_format:
matches = []
for format_name in env.importers.items:
log.debug("Checking '%s' format...", format_name)
importer = env.make_importer(format_name)
try:
match = importer.detect(args.source)
if match:
log.debug("format matched")
matches.append((format_name, importer))
except NotImplementedError:
log.debug("Format '%s' does not support auto detection.",
format_name)

matches = env.detect_dataset(args.source)
if len(matches) == 0:
log.error("Failed to detect dataset format. "
"Try to specify format with '-if/--input-format' parameter.")
return 1
elif len(matches) != 1:
log.error("Multiple formats match the dataset: %s. "
"Try to specify format with '-if/--input-format' parameter.",
', '.join(m[0] for m in matches))
', '.join(matches))
return 2

format_name, importer = matches[0]
args.input_format = format_name
fmt = matches[0]
log.info("Source dataset format detected as '%s'", args.input_format)
else:
try:
importer = env.make_importer(args.input_format)
if hasattr(importer, 'from_cmdline'):
extra_args = importer.from_cmdline()
except KeyError:
raise CliException("Importer for format '%s' is not found" % \
args.input_format)

source = osp.abspath(args.source)

Expand All @@ -121,15 +99,12 @@ def converter_proxy(extractor, save_dir):
(osp.basename(source), make_file_name(args.output_format)))
dst_dir = osp.abspath(dst_dir)

project = importer(source)
dataset = project.make_dataset()
dataset = Dataset.import_from(source, fmt)

log.info("Exporting the dataset")
dataset.export_project(
save_dir=dst_dir,
converter=converter_proxy,
filter_expr=args.filter,
**filter_args)
if args.filter:
dataset = dataset.filter(args.filter, **filter_args)
dataset.export(format=args.output_format, save_dir=dst_dir, **extra_args)

log.info("Dataset exported to '%s' as '%s'" % \
(dst_dir, args.output_format))
Expand Down
3 changes: 1 addition & 2 deletions datumaro/cli/commands/create.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Expand Down
7 changes: 7 additions & 0 deletions datumaro/cli/commands/diff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

# pylint: disable=unused-import

from ..contexts.project import build_diff_parser as build_parser
7 changes: 7 additions & 0 deletions datumaro/cli/commands/ediff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

# pylint: disable=unused-import

from ..contexts.project import build_ediff_parser as build_parser
3 changes: 1 addition & 2 deletions datumaro/cli/commands/explain.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Expand Down
3 changes: 1 addition & 2 deletions datumaro/cli/commands/export.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Expand Down
7 changes: 7 additions & 0 deletions datumaro/cli/commands/filter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Copyright (C) 2020-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

# pylint: disable=unused-import

from ..contexts.project import build_filter_parser as build_parser
Loading

0 comments on commit e2d2fa0

Please sign in to comment.