-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add intermediate skill docs for filter #996
Add intermediate skill docs for filter #996
Conversation
79170b1
to
b44bea6
Compare
Codecov ReportPatch and project coverage have no change.
Additional details and impacted files@@ Coverage Diff @@
## develop #996 +/- ##
========================================
Coverage 78.53% 78.53%
========================================
Files 233 233
Lines 26749 26749
Branches 5320 5320
========================================
+ Hits 21007 21008 +1
Misses 4497 4497
+ Partials 1245 1244 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
docs/source/docs/level-up/intermediate_skills/09_data_filtering.rst
Outdated
Show resolved
Hide resolved
docs/source/docs/level-up/intermediate_skills/09_data_filtering.rst
Outdated
Show resolved
Hide resolved
datum filter -e <how/to/filter/dataset> --project <path/to/project> | ||
|
||
We can set ``<how/to/filter/dataset>`` as your own filter like ``'/item/annotation[label="cat" and area > 85]'``. | ||
This example commands will filter only items through the bbox annotations which have `cat` label and bbox area (`w * h`) more than 85. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example commands will filter only items through the bbox annotations which have `cat` label and bbox area (`w * h`) more than 85. | |
This example command will filter only items through the bbox annotations which have `cat` label and bbox area (`w * h`) more than 85. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my knowledge, '/item/annotation[...]'
removes annotations not items (actually '/item[annotation...]'
removes items), but your sentence seems removing the items themselves.
datumaro/datumaro/cli/commands/filter.py
Lines 73 to 82 in 8fe4cf0
- Filter images with large-area bboxes:|n | |
|s|s%(prog)s -e '/item[annotation/type="bbox" and | |
annotation/area>2000]'|n | |
|n | |
- Filter out all irrelevant annotations from items:|n | |
|s|s%(prog)s -m a -e '/item/annotation[label = "person"]'|n | |
|n | |
- Filter out all irrelevant annotations from items:|n | |
|s|s%(prog)s -m a -e '/item/annotation[label="cat" and | |
area > 99.5]'|n |
Did you verify the actual behavior of Datumaro for this command? The scope of writing skill up page task contains the verification also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked this command myself. Before this command, the dataset had 5000 items, and after this command the dataset result only had 184 items. As I knew, the filter works by item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I closely looked into it and found that --mode
is an important argument for this problem. This is because it determines whether to filter items or annotations. Therefore, explanation for --mode
should be added to this section.
datumaro/datumaro/cli/commands/filter.py
Lines 97 to 104 in 8fe4cf0
parser.add_argument( | |
"-m", | |
"--mode", | |
default=FilterModes.i.name, | |
type=FilterModes.parse, | |
help="Filter mode (options: %s; default: %s)" | |
% (", ".join(FilterModes.list_options()), "%(default)s"), | |
) |
datumaro/datumaro/cli/util/project.py
Lines 193 to 204 in 8fe4cf0
def make_filter_args(cls, mode): | |
if mode == cls.items: | |
return {} | |
elif mode == cls.annotations: | |
return {"filter_annotations": True} | |
elif mode == cls.items_annotations: | |
return { | |
"filter_annotations": True, | |
"remove_empty": True, | |
} | |
else: | |
raise NotImplementedError() |
datumaro/datumaro/components/dataset.py
Lines 897 to 900 in 8fe4cf0
if filter_annotations: | |
return self.transform(XPathAnnotationsFilter, xpath=expr, remove_empty=remove_empty) | |
else: | |
return self.transform(XPathDatasetFilter, xpath=expr) |
datumaro/datumaro/components/filter.py
Lines 289 to 291 in 8fe4cf0
if self._remove_empty and len(annotations) == 0: | |
return None | |
return self.wrap_item(item, annotations=annotations) |
What did you give for --mode
? It seems that you gave -m i
so that the items disappeared. Then, what is the difference between '/item/annotation[...]'
and '/item[annotation...]'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I applied the command like datum filter source-1 -e '/item/annotation[label="cat" and area > 85]'
, dinfo
for source-1
is
length: 184
categories: label
label:
count: 80
labels: person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light (and 70 more)
subsets: val2017
'val2017':
length: 184
categories: label
label:
count: 80
labels: person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light (and 70 more)
And for the command datum filter source-4 -m a -e '/item/annotation[label="cat" and area > 85]'
, dinfo
for source-4
is
length: 5000
categories: label
label:
count: 80
labels: person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light (and 70 more)
subsets: val2017
'val2017':
length: 5000
categories: label
label:
count: 80
labels: person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light (and 70 more)
I imported source-1
and source-4
as same dataset, which is coco val2017. And if I applied for '/annotation[label="cat" and area > 85]'
as filter, It does not work out.
For this command explanation, this part is included in filter.md
and I linked this page in intermediate skill page for filter. In intermediate skill, we just give simple example to users and if users want more detailed information, they can go to the filter.md
, check it, and set the mode and filter according to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Summary
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.