Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Datumaro] Update documentation #2059

Merged
merged 3 commits into from
Aug 21, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion datumaro/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ Plugins reside in plugin directories:
- `<project_dir>/.datumaro/plugins` for project-specific components

A plugin is a python file or package with any name, which exports some symbols.
To export a symbol put it to `exports` list of the module like this:
To export a symbol, put it to `exports` list of the module like this:

``` python
class MyComponent1: ...
Expand Down
12 changes: 6 additions & 6 deletions datumaro/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Dataset Framework (Datumaro)
# Dataset Management Framework (Datumaro)

A framework to build, transform, and analyze datasets.

<!--lint disable fenced-code-flag-->
```
CVAT annotations -- ---> Annotation tool
... \ /
\ /
COCO-like dataset -----> Datumaro ---> dataset ------> Model training
... / \
/ \
VOC-like dataset -- ---> Publication etc.
```
<!--lint enable fenced-code-flag-->
Expand Down Expand Up @@ -55,12 +55,12 @@ VOC-like dataset -- ---> Publication etc.
- Dataset building operations:
- Merging multiple datasets into one
- Dataset filtering with custom conditions, for instance:
- remove all annotations except polygons of a certain class
- remove polygons of a certain class
- remove images without a specific class
- remove occluded annotations from images
- remove `occluded` annotations from images
- keep only vertically-oriented images
- remove small area bounding boxes from annotations
- Annotation conversions, for instance
- Annotation conversions, for instance:
- polygons to instance masks and vise-versa
- apply a custom colormap for mask annotations
- rename or remove dataset labels
Expand Down
2 changes: 1 addition & 1 deletion datumaro/datumaro/cli/contexts/project/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -579,7 +579,7 @@ def build_transform_parser(parser_ctor=argparse.ArgumentParser):
|n
Examples:|n
- Convert instance polygons to masks:|n
|s|stransform -n polygons_to_masks
|s|stransform -t polygons_to_masks
""" % ', '.join(builtins),
formatter_class=MultilineFormatter)

Expand Down
9 changes: 8 additions & 1 deletion datumaro/datumaro/plugins/transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,13 @@ def transform_item(self, item):
.format(item=item))

class RemapLabels(Transform, CliPlugin):
"""
Changes labels in the dataset.|n
Examples:|n
- Rename 'person' to 'car' and 'cat' to 'dog', keep 'bus', remove others:|n
|s|sremap_labels -l person:car -l bus:bus -l cat:dog --default delete
"""

DefaultAction = Enum('DefaultAction', ['keep', 'delete'])

@staticmethod
Expand All @@ -428,7 +435,7 @@ def build_cmdline_parser(cls, **kwargs):
parser.add_argument('--default',
choices=[a.name for a in cls.DefaultAction],
default=cls.DefaultAction.keep.name,
help="Action for unspecified labels")
help="Action for unspecified labels (default: %(default)s)")
return parser

def __init__(self, extractor, mapping, default=None):
Expand Down
24 changes: 13 additions & 11 deletions datumaro/docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ Datumaro is:
- Versioning (for images, annotations, subsets, sources etc., comparison)
- Documentation generation
- Provision of iterators for user code
- Dataset downloading
- Dataset generation
- Dataset building (export in a specific format, indexation, statistics, documentation)
- Dataset exporting to other formats
- Dataset debugging (run inference, generate dataset slices, compute statistics)
Expand Down Expand Up @@ -111,25 +113,25 @@ can be downloaded by user to be operated on with Datumaro CLI.
- [ ] with TensorBoard

- Calculation of statistics for datasets
- [ ] Pixel mean, std
- [ ] Object counts (detection scenario)
- [ ] Image-Class distribution (classification scenario)
- [ ] Pixel-Class distribution (segmentation scenario)
- [ ] Image clusters
- [x] Pixel mean, std
- [x] Object counts (detection scenario)
- [x] Image-Class distribution (classification scenario)
- [x] Pixel-Class distribution (segmentation scenario)
- [ ] Image similarity clusters
- [ ] Custom statistics

- Dataset building
- [x] Composite dataset building
- [ ] Annotation remapping
- [ ] Subset splitting
- [x] Class remapping
- [x] Subset splitting
- [x] Dataset filtering (`extract`)
- [x] Dataset merging (`merge`)
- [ ] Dataset item editing (`edit`)

- Dataset comparison (`diff`)
- [x] Annotation-annotation comparison
- [x] Annotation-inference comparison
- [ ] Annotation quality estimation (for CVAT)
- [x] Annotation quality estimation (for CVAT)
- Provide a simple method to check
annotation quality with a model and generate summary

Expand All @@ -142,9 +144,9 @@ can be downloaded by user to be operated on with Datumaro CLI.
- [x] Task export
- [x] Datumaro project export
- [x] Dataset export
- [ ] Original raw data (images, a video file) can be downloaded (exported)
- [x] Original raw data (images, a video file) can be downloaded (exported)
together with annotations or just have links
on CVAT server (in the future support S3, etc)
on CVAT server (in future, support S3, etc)
- [x] Be able to use local files instead of remote links
- [ ] Specify cache directory
- [x] Use case "annotate for model training"
Expand All @@ -154,7 +156,7 @@ can be downloaded by user to be operated on with Datumaro CLI.
- convert to a training format
- train a DL model
- [x] Use case "annotate - reannotate problematic images - merge"
- [ ] Use case "annotate and estimate quality"
- [x] Use case "annotate and estimate quality"
- create a task
- annotate
- estimate quality of annotations
Expand Down
Loading