-
Notifications
You must be signed in to change notification settings - Fork 135
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add intermediate skill docs for filter (#996)
- Ticket no.107285 - Update intermediate skill documentation for filter - Add python, CLI, ProjectCLI examples
- Loading branch information
Showing
8 changed files
with
115 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
...advanced_skills/11_project_versioning.rst → ...advanced_skills/12_project_versioning.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
...ced_skills/12_pseudo_label_generation.rst → ...ced_skills/13_pseudo_label_generation.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
================================= | ||
Level 12: Pseudo Label Generation | ||
Level 13: Pseudo Label Generation | ||
================================= | ||
|
||
TBD |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
65 changes: 65 additions & 0 deletions
65
docs/source/docs/level-up/intermediate_skills/09_data_filtering.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
=========================== | ||
Level 9: Dataset Filtering | ||
=========================== | ||
|
||
With the increasing availability of public data, the need for data filtering has become more apparent. Raw data often | ||
contains irrelevant or unnecessary information, making it difficult to extract the desired insights or use it effectively | ||
for decision-making purposes. Data filtering involves the process of identifying and selecting relevant data points while | ||
excluding or removing the irrelevant ones to improve the quality and usability of the data. This process is essential for | ||
ensuring that data can be used effectively and efficiently to drive insights and inform decisions. As the volume and complexity | ||
of data continue to grow, data filtering will become an increasingly important aspect of data management and analysis. | ||
By filtering the dataset in this way, we can create a subset of data that is tailored to our specific needs, making it easier | ||
to extract meaningful insights or use it effectively for decision-making purposes. | ||
|
||
In this tutorial, we provide the simple example of filtering dataset using item and annotation. To set how to filter dataset, | ||
which satisfied some condition, we use XML as query format. Refer this `XPATH <https://devhints.io/xpath>`_ to set your own filter. | ||
The detailed description for filter operation is given by :doc:`Filter <../../command-reference/context_free/filter>`. | ||
The more advanced Python example is given :doc:`this notebook <../../jupyter_notebook_examples/notebooks/04_filter>`. | ||
|
||
.. tab-set:: | ||
|
||
.. tab-item:: ProjectCLI | ||
|
||
With the project-based CLI, we first create project and import datasets into the project | ||
|
||
.. code-block:: bash | ||
datum project create --output-dir <path/to/project> | ||
datum project import --format datumaro --project <path/to/project> <path/to/data> | ||
We filter dataset through | ||
|
||
.. code-block:: bash | ||
datum filter -e <how/to/filter/dataset> --project <path/to/project> | ||
We can set ``<how/to/filter/dataset>`` as your own filter like ``'/item/annotation[label="cat" and area > 85]'``. | ||
This example command will filter only items through the bbox annotations which have `cat` label and bbox area (`w * h`) more than 85. | ||
|
||
.. tab-item:: CLI | ||
|
||
Without the project declaration, we can simply filter dataset by | ||
|
||
.. code-block:: bash | ||
datum filter <target> -e <how/to/filter/dataset> --output-dir <path/to/output> | ||
We could use ``--overwrite`` instead of setting ``--output-dir``. | ||
And we can set ``<how/to/filter/dataset>`` as our own filter like ``'/item[subset="test"]'`` | ||
to filter only items whose `subset` is `test`. | ||
|
||
.. tab-item:: Python | ||
|
||
With Python API, we can filter items as below | ||
|
||
.. code-block:: python | ||
from datumaro.components.dataset import Dataset | ||
dataset_path = '/path/to/data' | ||
dataset = Dataset.import_from(dataset_path, 'datumaro') | ||
filtered_result = Dataset.filter(dataset, 'how/to/filter/dataset') | ||
We can set ``<how/to/filter/dataset>`` as your own filter like ``'/item/annotation[occluded="True"]'``. | ||
This example command will filter only items through the annotation attribute which has `occluded`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
...ntermediate_skills/10_data_generation.rst → ...ntermediate_skills/11_data_generation.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
=========================== | ||
Level 10: Data Generation | ||
Level 11: Data Generation | ||
=========================== | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters