-
Notifications
You must be signed in to change notification settings - Fork 135
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix validator and add notebooks and document for level-up validator (#…
…933) <!-- Contributing guide: https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md --> ### Summary <!-- Resolves #111 and #222. Depends on #1000 (for series of dependent commits). This PR introduces this capability to make the project better in this and that. - Added this feature - Removed that feature - Fixed the problem #1234 --> ### How to test <!-- Describe the testing procedure for reviewers, if changes are not fully covered by unit tests or manual testing can be complicated. --> ### Checklist <!-- Put an 'x' in all the boxes that apply --> - [ ] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [ ] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [ ] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [ ] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [ ] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2023 Intel Corporation # # SPDX-License-Identifier: MIT ``` --------- Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com> Co-authored-by: Vinnam Kim <vinnam.kim@intel.com>
- Loading branch information
Showing
9 changed files
with
952 additions
and
457 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6 changes: 3 additions & 3 deletions
6
docs/source/docs/level-up/basic_skills/04_detect_data_format.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
28 changes: 0 additions & 28 deletions
28
docs/source/docs/level-up/intermediate_skills/08_data_refinement.md
This file was deleted.
Oops, something went wrong.
73 changes: 73 additions & 0 deletions
73
docs/source/docs/level-up/intermediate_skills/08_data_validate.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
=========================== | ||
Level 8: Dataset Validation | ||
=========================== | ||
|
||
|
||
When creating a dataset, it is natural for imbalances to occur between categories, and sometimes | ||
there may be very few data points for the minority class. In addition, inconsistent annotations may | ||
be produced by annotators or over time. When training a model with such data, more attention should | ||
be paid, and sometimes it may be necessary to filter or correct the data in advance. Datumaro provides | ||
data validation functionality for this purpose. | ||
|
||
More detailed descriptions about validation errors and warnings are given by :ref:`here <Validate>`. | ||
The Python example for the usage of validator is described in `here <https://github.com/openvinotoolkit/datumaro/blob/develop/notebooks/11_validate.ipynb>`_. | ||
|
||
|
||
.. tab-set:: | ||
|
||
.. tab-item:: Python | ||
|
||
.. code-block:: python | ||
from datumaro.components.environment import Environment | ||
from datumaro.components.dataset import Dataset | ||
data_path = '/path/to/data' | ||
env = Environment() | ||
detected_formats = env.detect_dataset(data_path) | ||
dataset = Dataset.import_from(path, detected_formats[0]) | ||
from datumaro.plugins.validators import DetectionValidator | ||
validator = DetectionValidator() # Or ClassificationValidator or SegementationValidator | ||
reports = validator.validate(dataset) | ||
.. tab-item:: ProjectCLI | ||
|
||
With the project-based CLI, we first require to create a project by | ||
|
||
.. code-block:: bash | ||
datum project create -o <path/to/project> | ||
We now import MS-COCO validation data into the project through | ||
|
||
.. code-block:: bash | ||
datum project import --format coco_instances -p <path/to/project> <path/to/cityscapes> | ||
(Optional) When we import a data, the change is automatically commited in the project. | ||
This can be shown through `log` as | ||
|
||
.. code-block:: bash | ||
datum project log -p <path/to/project> | ||
(Optional) We can check the imported dataset information such as subsets, number of data, or | ||
categories through `info`. | ||
|
||
.. code-block:: bash | ||
datum project dinfo -p <path/to/project> | ||
Finally, we validate the data within the project as | ||
|
||
.. code-block:: bash | ||
datum validate --task-type <classification/detection/segmentation> --subset <subset_name> -p <path/to/project> | ||
We now have the validation report named by validation-report-<subset_name>.json. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters