Skip to content

Commit

Permalink
Add intermediate skill docs for filter (#996)
Browse files Browse the repository at this point in the history
- Ticket no.107285
- Update intermediate skill documentation for filter
  - Add python, CLI, ProjectCLI examples
  • Loading branch information
sooahleex authored May 17, 2023
1 parent 8fe4cf0 commit 332879d
Show file tree
Hide file tree
Showing 8 changed files with 115 additions and 19 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
(<https://github.com/openvinotoolkit/datumaro/pull/976>)
- Add SynthiaSfImporter and SynthiaAlImporter
(<https://github.com/openvinotoolkit/datumaro/pull/987>)
- Add intermediate skill docs for filter
(<https://github.com/openvinotoolkit/datumaro/pull/996>)

### Enhancements
- Use autosummary for fully-automatic Python module docs generation
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
============================
Level 11: Project Versioning
Level 12: Project Versioning
============================

Project versioning is a concept unique to Datumaro. Datumaro project includes a data source and revision tree,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
=================================
Level 12: Pseudo Label Generation
Level 13: Pseudo Label Generation
=================================

TBD
12 changes: 6 additions & 6 deletions docs/source/docs/level-up/advanced_skills/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,30 +5,30 @@ Advanced Skills
:maxdepth: 1
:hidden:

11_project_versioning
12_pseudo_label_generation
12_project_versioning
13_pseudo_label_generation

.. grid:: 1 2 2 2
:gutter: 2

.. grid-item-card::

.. button-ref:: 11_project_versioning
.. button-ref:: 12_project_versioning
:color: primary
:outline:
:expand:

Level 11: Project Versioning
Level 12: Project Versioning

:bdg-success:`ProjectCLI`

.. grid-item-card::

.. button-ref:: 12_pseudo_label_generation
.. button-ref:: 13_pseudo_label_generation
:color: primary
:outline:
:expand:

Level 12: Psuedo Label Generation
Level 13: Psuedo Label Generation

:bdg-success:`ProjectCLI`
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
===========================
Level 9: Dataset Filtering
===========================

With the increasing availability of public data, the need for data filtering has become more apparent. Raw data often
contains irrelevant or unnecessary information, making it difficult to extract the desired insights or use it effectively
for decision-making purposes. Data filtering involves the process of identifying and selecting relevant data points while
excluding or removing the irrelevant ones to improve the quality and usability of the data. This process is essential for
ensuring that data can be used effectively and efficiently to drive insights and inform decisions. As the volume and complexity
of data continue to grow, data filtering will become an increasingly important aspect of data management and analysis.
By filtering the dataset in this way, we can create a subset of data that is tailored to our specific needs, making it easier
to extract meaningful insights or use it effectively for decision-making purposes.

In this tutorial, we provide the simple example of filtering dataset using item and annotation. To set how to filter dataset,
which satisfied some condition, we use XML as query format. Refer this `XPATH <https://devhints.io/xpath>`_ to set your own filter.
The detailed description for filter operation is given by :doc:`Filter <../../command-reference/context_free/filter>`.
The more advanced Python example is given :doc:`this notebook <../../jupyter_notebook_examples/notebooks/04_filter>`.

.. tab-set::

.. tab-item:: ProjectCLI

With the project-based CLI, we first create project and import datasets into the project

.. code-block:: bash
datum project create --output-dir <path/to/project>
datum project import --format datumaro --project <path/to/project> <path/to/data>
We filter dataset through

.. code-block:: bash
datum filter -e <how/to/filter/dataset> --project <path/to/project>
We can set ``<how/to/filter/dataset>`` as your own filter like ``'/item/annotation[label="cat" and area > 85]'``.
This example command will filter only items through the bbox annotations which have `cat` label and bbox area (`w * h`) more than 85.

.. tab-item:: CLI

Without the project declaration, we can simply filter dataset by

.. code-block:: bash
datum filter <target> -e <how/to/filter/dataset> --output-dir <path/to/output>
We could use ``--overwrite`` instead of setting ``--output-dir``.
And we can set ``<how/to/filter/dataset>`` as our own filter like ``'/item[subset="test"]'``
to filter only items whose `subset` is `test`.

.. tab-item:: Python

With Python API, we can filter items as below

.. code-block:: python
from datumaro.components.dataset import Dataset
dataset_path = '/path/to/data'
dataset = Dataset.import_from(dataset_path, 'datumaro')
filtered_result = Dataset.filter(dataset, 'how/to/filter/dataset')
We can set ``<how/to/filter/dataset>`` as your own filter like ``'/item/annotation[occluded="True"]'``.
This example command will filter only items through the annotation attribute which has `occluded`.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
=====================================================
Level 9: Dataset Explorartion from a Query Image/Text
Level 10: Dataset Explorartion from a Query Image/Text
=====================================================


Expand Down Expand Up @@ -34,13 +34,27 @@ The Python example for the usage of explorer is described in :doc:`here <../../j
topk = 20
topk_result = explorer.explore_topk(query, topk)
.. tab-item:: CLI

Without the project declaration, we can simply ``explore`` dataset by

.. code-block:: bash
datum explore <target> --query QUERY -topk TOPK_NUM
``QUERY`` could be image file path, text description, list of both of them

``TOPK_NUM`` is an integer that you want to find the number of similar results for query

Exploration result would be printed by log and result files would be copied into ``explore_result`` folder

.. tab-item:: ProjectCLI

With the project-based CLI, we first require to ``create`` a project by

.. code-block:: bash
datum project create -o <path/to/project>
datum project create --output-dir <path/to/project>
We now ``import`` data in to project through

Expand All @@ -52,10 +66,10 @@ The Python example for the usage of explorer is described in :doc:`here <../../j

.. code-block:: bash
datum explore -q QUERY -topk TOPK_NUM -p <path/to/project>
datum explore --query QUERY -topk TOPK_NUM -p <path/to/project>
``QUERY`` could be image file path, text description, list of both of them

``TOPK_NUM`` is an integer that you want to find the number of similar results for query

Exploration result would be printed by log and visualized result would be saved by ``explorer.png``
Exploration result would be printed by log and result files would be copied into ``explore_result`` folder
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
===========================
Level 10: Data Generation
Level 11: Data Generation
===========================


Expand Down
27 changes: 21 additions & 6 deletions docs/source/docs/level-up/intermediate_skills/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@ Intermediate Skills
06_data_comparison
07_data_merge
08_data_validate
09_data_exploration
10_data_generation
09_data_filtering
10_data_exploration
11_data_generation

.. grid:: 1 2 2 2
:gutter: 2
Expand Down Expand Up @@ -65,24 +66,38 @@ Intermediate Skills

.. grid-item-card::

.. button-ref:: 09_data_exploration
.. button-ref:: 09_data_filtering
:color: primary
:outline:
:expand:

Level 09: Data Exploration
Level 09: Data Filtering

:bdg-success:`ProjectCLI`
:bdg-info:`CLI`
:bdg-warning:`Python`

.. grid-item-card::

.. button-ref:: 10_data_exploration
:color: primary
:outline:
:expand:

Level 10: Data Exploration

:bdg-warning:`Python`
:bdg-info:`CLI`
:bdg-success:`ProjectCLI`

.. grid-item-card::

.. button-ref:: 10_data_generation
.. button-ref:: 11_data_generation
:color: primary
:outline:
:expand:

Level 10: Data Generation
Level 11: Data Generation

:bdg-info:`CLI`
:bdg-warning:`Python`

0 comments on commit 332879d

Please sign in to comment.