Description
🚀 Feature
Improve error handling for empty directories in make_dataset()
.
Motivation
datasets.folder.make_dataset()
requires the class_to_idx
attribute that is then used to collect the instances:
vision/torchvision/datasets/folder.py
Lines 69 to 74 in 945f3a8
Currently, we have four places where make_dataset()
is used and in all cases class_to_idx
is generated the same:
-
vision/torchvision/datasets/folder.py
Line 126 in 945f3a8
with
def _find_classes(dir):
vision/torchvision/datasets/folder.py
Lines 164 to 167 in 945f3a8
-
vision/torchvision/datasets/hmdb51.py
Lines 65 to 71 in 945f3a8
-
vision/torchvision/datasets/kinetics.py
Lines 58 to 60 in 945f3a8
-
vision/torchvision/datasets/ucf101.py
Lines 58 to 60 in 945f3a8
Furthermore, only DatasetFolder
has a builtin check if make_dataset
found any samples:
vision/torchvision/datasets/folder.py
Lines 127 to 132 in 945f3a8
While this is better than passing silently and failing somewhere else (#2903), it still misses the underlying issue in case of an directory without subfolders.
Pitch
I propose three things:
- Factor out the implementation of the
DatasetFolder._find_classes()
method into afind_classes()
function similar to what we did withmake_dataset
in 'make_dataset' as staticmethod of 'DatasetFolder' #3215. - Raise an expressive error in
find_classes()
if no classes were found. - Make the
class_to_idx
parameter optional inmake_dataset
and callfind_classes
if it is omitted.
With this we are as flexible as before while we remove duplicated code.
- If one does not want the default behavior,
class_to_idx
can still be passed explicitly - If one needs the returned
classes
, e.g. the video datasets, a call could look like thisself.classes, class_to_idx = find_classes(root) self.samples = make_dataset(root, class_to_idx, ...)
- If one only needs the samples calling
self.samples = make_dataset(root, ...)
is enough
cc @pmeier