Verification of string arguments in datasets? #1132

pmeier · 2019-07-18T09:47:35Z

In the spirit of standardizing the datasets (#1080), I think it would be a good idea to have something like a verify_str_arg() function that checks whether if a given argument is a str (or a valid subclass) and if hat has a valid value. Right now we have a multitude of slightly different variants:

vision/torchvision/datasets/celeba.py

Lines 76 to 86 in bbd363c

    
           if split.lower() == "train": 
        
               split = 0 
        
           elif split.lower() == "valid": 
        
               split = 1 
        
           elif split.lower() == "test": 
        
               split = 2 
        
           elif split.lower() == "all": 
        
               split = None 
        
           else: 
        
               raise ValueError('Wrong split entered! Please use "train", ' 
        
                                '"valid", "test", or "all"')

vision/torchvision/datasets/sbd.py

Lines 68 to 69 in bbd363c

    
           if mode not in ("segmentation", "boundaries"): 
        
               raise ValueError("Argument mode should be 'segmentation' or 'boundaries'")

vision/torchvision/datasets/voc.py

Lines 103 to 106 in bbd363c

    
           if not os.path.exists(split_f): 
        
               raise ValueError( 
        
                   'Wrong image_set entered! Please use image_set="train" ' 
        
                   'or image_set="trainval" or image_set="val"')

Is this something we want to do?

The text was updated successfully, but these errors were encountered:

fmassa · 2019-07-23T13:55:00Z

I think the split names depend on the dataset. For example, train / val / valminusminival for some version of COCO.

Some consistency might be good, but not at the expense of generality.

pmeier · 2019-07-23T15:28:29Z

My mistake for choosing these examples with varying argument names, but that is not what I meant. I don't want to generalize the values of the split or similar arguments. What I propose is a function (probably in datasets.utils) that handles the verification:

def verify_str_arg(value, valid_values, arg):
    if not isinstance(value, str):
        raise ...
    if value not in valid_values:
        raise ...
    return value

With that all error messages would be the same across datasets. For the examples this would result in

split = verify_str_arg(split, ("train", "valid", "test", "all"), "split")

mode = verify_str_arg(mode, ("segmentation", "boundaries"), "mode")

image_set = verify_str_arg(image_set, ("train", "trainval", "val"), "image_set")

fmassa · 2019-07-25T09:32:26Z

Oh, that's definitely something that we would want to have! I completely misunderstood your point.

pmeier · 2019-07-26T10:54:07Z

Closed by #1167

fmassa added module: datasets needs discussion labels Jul 23, 2019

fmassa added enhancement help wanted and removed needs discussion labels Jul 25, 2019

pmeier mentioned this issue Jul 25, 2019

Standardize str argument verification in datasets #1167

Merged

pmeier closed this as completed Jul 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verification of string arguments in datasets? #1132

Verification of string arguments in datasets? #1132

pmeier commented Jul 18, 2019

fmassa commented Jul 23, 2019

pmeier commented Jul 23, 2019 •

edited

Loading

fmassa commented Jul 25, 2019

pmeier commented Jul 26, 2019

Verification of string arguments in datasets? #1132

Verification of string arguments in datasets? #1132

Comments

pmeier commented Jul 18, 2019

fmassa commented Jul 23, 2019

pmeier commented Jul 23, 2019 • edited Loading

fmassa commented Jul 25, 2019

pmeier commented Jul 26, 2019

pmeier commented Jul 23, 2019 •

edited

Loading