Skip to content

Commit

Permalink
Merge branch 'develop' into mergeback/1.9.1
Browse files Browse the repository at this point in the history
  • Loading branch information
yunchu committed Sep 27, 2024
2 parents ec9f3ba + c4d7bb4 commit ae2eda5
Show file tree
Hide file tree
Showing 36 changed files with 1,042 additions and 39 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/publish_to_pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,12 +80,12 @@ jobs:
file_glob: true
- name: Publish package distributions to PyPI
if: ${{ steps.check-tag.outputs.match != '' }}
uses: pypa/gh-action-pypi-publish@v1.10.1
uses: pypa/gh-action-pypi-publish@v1.10.2
with:
password: ${{ secrets.PYPI_API_TOKEN }}
- name: Publish package distributions to TestPyPI
if: ${{ steps.check-tag.outputs.match == '' }}
uses: pypa/gh-action-pypi-publish@v1.10.1
uses: pypa/gh-action-pypi-publish@v1.10.2
with:
password: ${{ secrets.TESTPYPI_API_TOKEN }}
repository-url: https://test.pypi.org/legacy/
Expand Down
18 changes: 17 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,23 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## \[Q4 2024 Release 1.9.1\]
## \[Unreleased\]

### New features
- Support KITTI 3D format
(<https://github.com/openvinotoolkit/datumaro/pull/1619>)
- Add PseudoLabeling transform for unlabeled dataset
(<https://github.com/openvinotoolkit/datumaro/pull/1594>)

### Enhancements
- Raise an appropriate error when exporting a datumaro dataset if its subset name contains path separators.
(<https://github.com/openvinotoolkit/datumaro/pull/1615>)
- Update docs for transform plugins
(<https://github.com/openvinotoolkit/datumaro/pull/1599>)

### Bug fixes

## Q4 2024 Release 1.9.1
### Enhancements
- Support multiple labels for kaggle format
(<https://github.com/openvinotoolkit/datumaro/pull/1607>)
Expand All @@ -22,6 +36,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### New features
- Add a new CLI command: datum format
(<https://github.com/openvinotoolkit/datumaro/pull/1570>)
- Add a new Cuboid2D annotation type
(<https://github.com/openvinotoolkit/datumaro/pull/1601>)
- Support language dataset for DmTorchDataset
(<https://github.com/openvinotoolkit/datumaro/pull/1592>)

Expand Down
58 changes: 57 additions & 1 deletion docs/source/docs/command-reference/context_free/transform.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,10 @@ Basic dataset item manipulations:
- [`remove_images`](#remove_images) - Removes specific images
- [`remove_annotations`](#remove_annotations) - Removes annotations
- [`remove_attributes`](#remove_attributes) - Removes attributes
- [`astype_annotations`](#astype_annotations) - Convert annotation type
- [`astype_annotations`](#astype_annotations) - Transforms annotation types
- [`pseudo_labeling`](#pseudo_labeling) - Generates pseudo labels for unlabeled data
- [`correct`](#correct) - Corrects annotaiton types
- [`clean`](#clean) - Removes noisy data for tabular dataset

Subset manipulations:
- [`random_split`](#random_split) - Splits dataset into subsets
Expand Down Expand Up @@ -826,6 +829,35 @@ bbox_values_decrement [-h]
Optional arguments:
- `-h`, `--help` (flag) - Show this help message and exit

#### `pseudo_labeling`

Assigns pseudo-labels to items in a dataset based on their similarity to predefined labels. This class is useful for semi-supervised learning when dealing with missing or uncertain labels.

The process includes:

- Similarity Computation: Uses hashing techniques to compute the similarity between items and predefined labels.
- Pseudo-Label Assignment: Assigns the most similar label as a pseudo-label to each item.

Attributes:

- `extractor` (IDataset) - Provides access to dataset items and their annotations.
- `labels` (Optional[List[str]]) - List of predefined labels for pseudo-labeling. Defaults to all available labels if not provided.
- `explorer` (Optional[Explorer]) - Computes hash keys for items and labels. If not provided, a new Explorer is created.

Usage:
```console
pseudo_labeling [-h] [--labels LABELS]

Optional arguments:
- `-h`, `--help` (flag) - Show this help message and exit
- `--labels` (str) - Comma-separated list of label names for pseudo-labeling

Examples:
- Assign pseudo-labels based on predefined labels
```console
datum transform -t pseudo_labeling -- --labels 'label1,label2'
```

#### `correct`

Correct the dataset from a validation report
Expand All @@ -838,3 +870,27 @@ correct [-h] [-r REPORT_PATH]
Optional arguments:
- `-h`, `--help` (flag) - Show this help message and exit
- `-r`, `--reports` (str) - A validation report from a 'validate' CLI (default=validation_reports.json)

#### `clean`

Refines and preprocesses media items in a dataset, focusing on string, numeric, and categorical data. This transform is designed to clean and improve the quality of the data, making it more suitable for analysis and modeling.

The cleaning process includes:

- String Data: Removes unnecessary characters using NLP techniques.
- Numeric Data: Identifies and handles outliers and missing values.
- Categorical Data: Cleans and refines categorical information.

Usage:
```console
clean [-h]
```

Optional arguments:
- `-h`, `--help` (flag) - Show this help message and exit

Examples:
- Clean and preprocess dataset items
```console
datum transform -t clean
```
2 changes: 2 additions & 0 deletions docs/source/docs/data-formats/formats/datumaro.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ A Datumaro dataset directory should have the following structure:
└── ...
```

Note that the subset name shouldn't contain path separators.

If your dataset is not following the above directory structure,
it cannot detect and import your dataset as the Datumaro format properly.

Expand Down
2 changes: 2 additions & 0 deletions docs/source/docs/data-formats/formats/datumaro_binary.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,8 @@ A DatumaroBinary dataset directory should have the following structure:
└── ...
```

Note that the subset name shouldn't contain path separators.

If your dataset is not following the above directory structure,
it cannot detect and import your dataset as the DatumaroBinary format properly.

Expand Down
12 changes: 12 additions & 0 deletions docs/source/docs/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,18 @@ Release Notes
.. toctree::
:maxdepth: 1

v1.9.1 (2024 Q3)
----------------

Enhancements
^^^^^^^^^^^^
- Support multiple labels for kaggle format
- Use DataFrame.map instead of DataFrame.applymap

Bug fixes
^^^^^^^^^
- Fix StreamDataset merging when importing in eager mode

v1.9.0 (2024 Q3)
----------------

Expand Down
36 changes: 36 additions & 0 deletions src/datumaro/components/annotation.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ class AnnotationType(IntEnum):
feature_vector = 13
tabular = 14
rotated_bbox = 15
cuboid_2d = 16


COORDINATE_ROUNDING_DIGITS = 2
Expand Down Expand Up @@ -1363,6 +1364,41 @@ def wrap(item, **kwargs):
return attr.evolve(item, **d)


@attrs(slots=True, init=False, order=False)
class Cuboid2D(Annotation):
"""
Cuboid2D annotation class. This class represents a 3D bounding box defined by its point coordinates
in the following way:
[(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x5, y5), (x6, y6), (x7, y7), (x8, y8)].
6---7
/| /|
5-+-8 |
| 2 + 3
|/ |/
1---4
Attributes:
_type (AnnotationType): The type of annotation, set to `AnnotationType.bbox`.
Methods:
__init__: Initializes the Cuboid2D with its coordinates.
wrap: Creates a new Bbox instance with updated attributes.
"""

_type = AnnotationType.cuboid_2d
points = field(default=None)
label: Optional[int] = field(
converter=attr.converters.optional(int), default=None, kw_only=True
)
z_order: int = field(default=0, validator=default_if_none(int), kw_only=True)

def __init__(self, _points: Iterable[Tuple[float, float]], *args, **kwargs):
kwargs.pop("points", None) # comes from wrap()
self.__attrs_init__(points=_points, *args, **kwargs)


@attrs(slots=True, order=False)
class PointsCategories(Categories):
"""
Expand Down
6 changes: 6 additions & 0 deletions src/datumaro/components/annotations/matcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
"ImageAnnotationMatcher",
"HashKeyMatcher",
"FeatureVectorMatcher",
"Cuboid2DMatcher",
]


Expand Down Expand Up @@ -378,3 +379,8 @@ def distance(self, a, b):
b = Points([p for pt in b.as_polygon() for p in pt])

return OKS(a, b, sigma=self.sigma)


@attrs
class Cuboid2DMatcher(ShapeMatcher):
pass
6 changes: 6 additions & 0 deletions src/datumaro/components/annotations/merger.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
AnnotationMatcher,
BboxMatcher,
CaptionsMatcher,
Cuboid2DMatcher,
Cuboid3dMatcher,
FeatureVectorMatcher,
HashKeyMatcher,
Expand Down Expand Up @@ -210,3 +211,8 @@ class TabularMerger(AnnotationMerger, TabularMatcher):
@attrs
class RotatedBboxMerger(_ShapeMerger, RotatedBboxMatcher):
pass


@attrs
class Cuboid2DMerger(_ShapeMerger, Cuboid2DMatcher):
pass
10 changes: 10 additions & 0 deletions src/datumaro/components/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -342,6 +342,16 @@ def __str__(self):
return f"Item {self.item_id} is repeated in the source sequence."


@define(auto_exc=False)
class PathSeparatorInSubsetNameError(DatasetError):
subset: str = field()

def __str__(self):
return (
f"Failed to export the subset '{self.subset}': subset name contains path separator(s)."
)


class DatasetQualityError(DatasetError):
pass

Expand Down
3 changes: 3 additions & 0 deletions src/datumaro/components/merge/intersect_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
AnnotationMerger,
BboxMerger,
CaptionsMerger,
Cuboid2DMerger,
Cuboid3dMerger,
EllipseMerger,
FeatureVectorMerger,
Expand Down Expand Up @@ -455,6 +456,8 @@ def _for_type(t, **kwargs):
return _make(TabularMerger, **kwargs)
elif t is AnnotationType.rotated_bbox:
return _make(RotatedBboxMerger, **kwargs)
elif t is AnnotationType.cuboid_2d:
return _make(Cuboid2DMerger, **kwargs)
else:
raise NotImplementedError("Type %s is not supported" % t)

Expand Down
34 changes: 34 additions & 0 deletions src/datumaro/components/visualizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
AnnotationType,
Bbox,
Caption,
Cuboid2D,
Cuboid3d,
DepthAnnotation,
Ellipse,
Expand Down Expand Up @@ -661,6 +662,39 @@ def _draw_cuboid_3d(
) -> None:
raise NotImplementedError(f"{ann.type} is not implemented yet.")

def _draw_cuboid_2d(
self,
ann: Cuboid2D,
label_categories: Optional[LabelCategories],
fig: Figure,
ax: Axes,
context: List,
) -> None:
import matplotlib.patches as patches

points = ann.points
color = self._get_color(ann)
label_text = label_categories[ann.label].name if label_categories is not None else ann.label

# Define the faces based on vertex indices

faces = [
[points[i] for i in [0, 1, 2, 3]], # Bottom face
[points[i] for i in [4, 5, 6, 7]], # Top face
[points[i] for i in [0, 1, 5, 4]], # Front face
[points[i] for i in [1, 2, 6, 5]], # Right face
[points[i] for i in [2, 3, 7, 6]], # Back face
[points[i] for i in [3, 0, 4, 7]], # Left face
]
ax.text(points[0][0], points[0][1] - self.text_y_offset, label_text, color=color)

# Draw each face
for face in faces:
polygon = patches.Polygon(
face, fill=False, linewidth=self.bbox_linewidth, edgecolor=color
)
ax.add_patch(polygon)

def _draw_super_resolution_annotation(
self,
ann: SuperResolutionAnnotation,
Expand Down
13 changes: 13 additions & 0 deletions src/datumaro/plugins/data_formats/datumaro/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
AnnotationType,
Bbox,
Caption,
Cuboid2D,
Cuboid3d,
Ellipse,
GroupType,
Expand Down Expand Up @@ -378,6 +379,18 @@ def _load_annotations(self, item: Dict):

elif ann_type == AnnotationType.hash_key:
continue
elif ann_type == AnnotationType.cuboid_2d:
loaded.append(
Cuboid2D(
list(map(tuple, points)),
label=label_id,
id=ann_id,
attributes=attributes,
group=group,
object_id=object_id,
z_order=z_order,
)
)
else:
raise NotImplementedError()
except Exception as e:
Expand Down
Loading

0 comments on commit ae2eda5

Please sign in to comment.