Skip to content

Verification of string arguments in datasets? #1132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pmeier opened this issue Jul 18, 2019 · 4 comments
Closed

Verification of string arguments in datasets? #1132

pmeier opened this issue Jul 18, 2019 · 4 comments

Comments

@pmeier
Copy link
Collaborator

pmeier commented Jul 18, 2019

In the spirit of standardizing the datasets (#1080), I think it would be a good idea to have something like a verify_str_arg() function that checks whether if a given argument is a str (or a valid subclass) and if hat has a valid value. Right now we have a multitude of slightly different variants:

if split.lower() == "train":
split = 0
elif split.lower() == "valid":
split = 1
elif split.lower() == "test":
split = 2
elif split.lower() == "all":
split = None
else:
raise ValueError('Wrong split entered! Please use "train", '
'"valid", "test", or "all"')

if mode not in ("segmentation", "boundaries"):
raise ValueError("Argument mode should be 'segmentation' or 'boundaries'")

if not os.path.exists(split_f):
raise ValueError(
'Wrong image_set entered! Please use image_set="train" '
'or image_set="trainval" or image_set="val"')

Is this something we want to do?

@fmassa
Copy link
Member

fmassa commented Jul 23, 2019

I think the split names depend on the dataset. For example, train / val / valminusminival for some version of COCO.

Some consistency might be good, but not at the expense of generality.

@pmeier
Copy link
Collaborator Author

pmeier commented Jul 23, 2019

My mistake for choosing these examples with varying argument names, but that is not what I meant. I don't want to generalize the values of the split or similar arguments. What I propose is a function (probably in datasets.utils) that handles the verification:

def verify_str_arg(value, valid_values, arg):
    if not isinstance(value, str):
        raise ...
    if value not in valid_values:
        raise ...
    return value

With that all error messages would be the same across datasets. For the examples this would result in

split = verify_str_arg(split, ("train", "valid", "test", "all"), "split")
mode = verify_str_arg(mode, ("segmentation", "boundaries"), "mode")
image_set = verify_str_arg(image_set, ("train", "trainval", "val"), "image_set")

@fmassa
Copy link
Member

fmassa commented Jul 25, 2019

Oh, that's definitely something that we would want to have! I completely misunderstood your point.

@pmeier
Copy link
Collaborator Author

pmeier commented Jul 26, 2019

Closed by #1167

@pmeier pmeier closed this as completed Jul 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants