Skip to content

Commit

Permalink
Boost segmentation import performance (#1261)
Browse files Browse the repository at this point in the history
<!-- Contributing guide:
https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md
-->

### Summary

When analyzing the import performance for `cityscapes` and
`kaggle_image_mask`, I have checked that the most bottleneck is
`np.unique` for parsing the unique class indices within each mask.
Analysis before PR:

![image](https://github.com/openvinotoolkit/datumaro/assets/89109581/3b75783b-c01b-4a99-8319-385abb430ead)

Analysis after PR:

![image](https://github.com/openvinotoolkit/datumaro/assets/89109581/6fd0e946-a765-414d-918c-89c798f7eb32)

Instead of parsing unique class indices within a mask, I have changed to
use all class indices in a dataset.
As a result, the performance is 5 times faster.

<!--
Resolves #111 and #222.
Depends on #1000 (for series of dependent commits).

This PR introduces this capability to make the project better in this
and that.

- Added this feature
- Removed that feature
- Fixed the problem #1234
-->

### How to test
<!-- Describe the testing procedure for reviewers, if changes are
not fully covered by unit tests or manual testing can be complicated.
-->

### Checklist
<!-- Put an 'x' in all the boxes that apply -->
- [x] I have added unit tests to cover my changes.​
- [ ] I have added integration tests to cover my changes.​
- [x] I have added the description of my changes into
[CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md).​
- [ ] I have updated the
[documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs)
accordingly

### License

- [ ] I submit _my code changes_ under the same [MIT
License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE)
that covers the project.
  Feel free to contact the maintainers if that's a concern.
- [ ] I have updated the license header for each file (see an example
below).

```python
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT
```
  • Loading branch information
wonjuleee authored Feb 13, 2024
1 parent 8e77887 commit 7d3b237
Show file tree
Hide file tree
Showing 8 changed files with 74 additions and 153 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
(<https://github.com/openvinotoolkit/datumaro/pull/1245>)
- Enable image backend and color channel format to be selectable
(<https://github.com/openvinotoolkit/datumaro/pull/1246>)
- Boost up `CityscapesBase` and `KaggleImageMaskBase` by dropping `np.unique`
(<https://github.com/openvinotoolkit/datumaro/pull/1261>)
- Enhance RISE algortihm for explainable AI
(<https://github.com/openvinotoolkit/datumaro/pull/1263>)
- Enhance explore unit test to use real dataset from ImageNet
Expand Down
40 changes: 21 additions & 19 deletions src/datumaro/plugins/data_formats/cityscapes.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,30 +265,35 @@ def _load_items(self):
recursive=True,
)
mask_suffix = CityscapesPath.GT_INSTANCE_MASK_SUFFIX

self._categories = self._load_categories(
self._path, use_train_label_map=mask_suffix is CityscapesPath.LABEL_TRAIN_IDS_SUFFIX
)

label_ids = []
for label_cat in self._categories[AnnotationType.label]:
label_id, _ = self._categories[AnnotationType.label].find(label_cat.name)
if label_id:
label_ids.append(label_id)

for mask_path in masks:
item_id = self._get_id_from_mask_path(mask_path, mask_suffix)

anns = []
instances_mask = load_image(mask_path, dtype=np.int32)
segm_ids = np.unique(instances_mask)
for segm_id in segm_ids:
# either is_crowd or ann_id should be set
if segm_id < 1000:
label_id = segm_id
is_crowd = True
ann_id = None
else:
label_id = segm_id // 1000
is_crowd = False
ann_id = segm_id % 1000
mask_id = 1
for label_id in label_ids:
if label_id not in instances_mask:
continue
binary_mask = self._lazy_extract_mask(instances_mask, label_id)
anns.append(
Mask(
image=self._lazy_extract_mask(instances_mask, segm_id),
id=mask_id,
image=binary_mask,
label=label_id,
id=ann_id,
attributes={"is_crowd": is_crowd},
)
)
mask_id += 1

image = image_path_by_id.pop(item_id, None)
if image:
Expand All @@ -303,9 +308,6 @@ def _load_items(self):
id=item_id, subset=self._subset, media=Image.from_file(path=path)
)

self._categories = self._load_categories(
self._path, use_train_label_map=mask_suffix is CityscapesPath.LABEL_TRAIN_IDS_SUFFIX
)
return items

@staticmethod
Expand Down Expand Up @@ -429,8 +431,8 @@ def _apply_impl(self):
masks,
instance_ids=[
self._label_id_mapping(m.label)
if m.attributes.get("is_crowd", False)
else self._label_id_mapping(m.label) * 1000 + (m.id or (i + 1))
# if m.attributes.get("is_crowd", False)
# else self._label_id_mapping(m.label) * 1000 + (m.id or (i + 1))
for i, m in enumerate(masks)
],
instance_labels=[self._label_id_mapping(m.label) for m in masks],
Expand Down
6 changes: 4 additions & 2 deletions src/datumaro/plugins/data_formats/kaggle/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,7 @@ def __init__(
self._path = path
self._mask_path = mask_path

self._label_ids = []
self._categories = self._load_categories(labelmap_file)
self._items = self._load_items()

Expand All @@ -241,6 +242,7 @@ def _load_categories(self, label_map_file: Optional[str]):
for label_name, label_color in label_map.items():
label_id = label_categories.find(label_name)[0]
colormap[label_id] = label_color
self._label_ids.append(label_id)

categories[AnnotationType.mask] = MaskCategories(colormap)

Expand All @@ -260,8 +262,8 @@ def _lazy_extract_mask(mask, c):
instances_mask = load_image(
osp.join(self._mask_path, mask_name), dtype=np.int32
)
label_ids = np.unique(instances_mask)
for label_id in label_ids:
# label_ids = np.unique(instances_mask)
for label_id in self._label_ids:
anns.append(
Mask(
image=_lazy_extract_mask(instances_mask, label_id),
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 7d3b237

Please sign in to comment.