-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete annotation and image when the sum of the annotations reaches a certain size #1203
Comments
Hi @DP1701, Unfortunately, there is no exact functionality what you want. I'll pile up this functionality to our development backlog. Therefore, you might implement your own Python script for this. I just wrote about this how it can be possible using Datumaro on my side.
import numpy as np
import datumaro as dm
### Create example dataset ###
def create_example_dataset() -> dm.Dataset:
blank_img = dm.Image.from_numpy(np.zeros([10, 10, 3], dtype=np.uint8))
categories = ["label_1", "label_2", "label_3"]
points_of_1x1_box = np.array([0, 0, 0, 1, 1, 1, 1, 0])
item_not_to_drop = dm.DatasetItem(
id="item_not_to_drop",
media=blank_img,
annotations=[
dm.Polygon(
points=label + points_of_1x1_box,
label=label,
)
for label in range(len(categories))
],
)
item_drop_by_big_polygon = dm.DatasetItem(
id="item_drop_by_big_polygon",
media=blank_img,
annotations=[
dm.Polygon(
points=8 * points_of_1x1_box, # 8x8 box
label=0,
)
],
)
item_drop_by_polygon_union = dm.DatasetItem(
id="item_drop_by_polygon_union",
media=blank_img,
annotations=[
dm.Polygon(
points=offset + 4 * points_of_1x1_box, # 10 4x4 boxes placed in diagnoal
label=1,
)
for offset in range(10)
],
)
return dm.Dataset.from_iterable(
iterable=[
item_not_to_drop,
item_drop_by_big_polygon,
item_drop_by_polygon_union,
],
categories=categories,
)
dataset = create_example_dataset()
### Print result
print(dataset)
for item in dataset:
print(item) Dataset
size=3
source_path=None
media_type=<class 'datumaro.components.media.Image'>
annotated_items_count=3
annotations_count=14
subsets
default: # of items=3, # of annotated items=3, # of annotations=14, annotation types=['polygon']
infos
categories
label: ['label_1', 'label_2', 'label_3']
DatasetItem(id='item_not_to_drop', subset='default', media=ImageFromNumpy(data=array([[[0, 0, 0], ...), annotations=[Polygon(id=0, attributes={}, group=0, object_id=-1, points=[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0], label=0, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 1.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 2.0], label=2, z_order=0)], attributes={})
DatasetItem(id='item_drop_by_big_polygon', subset='default', media=ImageFromNumpy(data=array([[[0, 0, 0], ...), annotations=[Polygon(id=0, attributes={}, group=0, object_id=-1, points=[0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 0.0], label=0, z_order=0)], attributes={})
DatasetItem(id='item_drop_by_polygon_union', subset='default', media=ImageFromNumpy(data=array([[[0, 0, 0], ...), annotations=[Polygon(id=0, attributes={}, group=0, object_id=-1, points=[0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 0.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 1.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 2.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 3.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 4.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[5.0, 5.0, 5.0, 9.0, 9.0, 9.0, 9.0, 5.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[6.0, 6.0, 6.0, 10.0, 10.0, 10.0, 10.0, 6.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[7.0, 7.0, 7.0, 11.0, 11.0, 11.0, 11.0, 7.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[8.0, 8.0, 8.0, 12.0, 12.0, 12.0, 12.0, 8.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[9.0, 9.0, 9.0, 13.0, 13.0, 13.0, 13.0, 9.0], label=1, z_order=0)], attributes={})
### Remove if the maximum union area of polygons > 1 / 4 image size ###
from collections import defaultdict
import shapely.geometry as sg
def get_max_polygon_area(polygon_group_by_label: dict[int, list[sg.Polygon]]) -> float:
max_area = 0.0
for polygons in polygon_group_by_label.values():
union = sg.Polygon()
for polygon in polygons:
union = union.union(polygon)
max_area = max(max_area, union.area)
return max_area
### Gather item id and subset to remove
items_to_remove = []
for item in dataset:
height, width = item.media_as(dm.Image).size
image_size = height * width
polygon_group_by_label = defaultdict(list)
for ann in item.annotations:
if not isinstance(ann, dm.Polygon):
continue
polygon_group_by_label[ann.label] += [sg.Polygon(ann.get_points())]
max_polygon_area = get_max_polygon_area(polygon_group_by_label)
if max_polygon_area > 1 / 4 * image_size:
print(
f"item_id: {item.id}, max_polygon_area: {max_polygon_area}, image_size: {image_size}, "
f"Remove this item: {item.id}"
)
items_to_remove += [(item.id, item.subset)]
else:
print(f"item_id: {item.id}, max_polygon_area: {max_polygon_area}, image_size: {image_size}")
### Remove from the dataset
for item_id, subset in items_to_remove:
dataset.remove(id=item_id, subset=subset)
### Print result
print(dataset)
for item in dataset:
print(item) item_id: item_not_to_drop, max_polygon_area: 1.0, image_size: 100
item_id: item_drop_by_big_polygon, max_polygon_area: 64.0, image_size: 100, Remove this item: item_drop_by_big_polygon
item_id: item_drop_by_polygon_union, max_polygon_area: 79.0, image_size: 100, Remove this item: item_drop_by_polygon_union
Dataset
size=1
source_path=None
media_type=<class 'datumaro.components.media.Image'>
annotated_items_count=1
annotations_count=3
subsets
default: # of items=1, # of annotated items=1, # of annotations=3, annotation types=['polygon']
infos
categories
label: ['label_1', 'label_2', 'label_3']
DatasetItem(id='item_not_to_drop', subset='default', media=ImageFromNumpy(data=array([[[0, 0, 0], ...), annotations=[Polygon(id=0, attributes={}, group=0, object_id=-1, points=[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0], label=0, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 1.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 2.0], label=2, z_order=0)], attributes={}) I hope this would be helpful for your work. |
<!-- Contributing guide: https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md --> ### Summary - Ticket no. 127146 - Same as title - Updated the Jupyter notebook example as well. - It is raised by this user requirement, #1203 ### How to test Added some unit tests as well. ### Checklist <!-- Put an 'x' in all the boxes that apply --> - [x] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [x] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [x] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [x] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [x] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2023 Intel Corporation # # SPDX-License-Identifier: MIT ``` --------- Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
Thanks @vinnamkim for providing the workaround. @DP1701, hope to be well with above. I will close this issue. |
Hello everyone,
I have a coco_instance data set that contains several polygons. I would like to filter the following: If one or more polygons of a certain class takes up more than 1/4 of the image (image resolution), then I would like to delete the image and all annotations in it. Now the question arises for me whether this can be achieved with Datumaro or whether I should rather design my own Python script for this?
The text was updated successfully, but these errors were encountered: