Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of bool parameters. #2427

Merged
merged 7 commits into from
Jul 9, 2018

Conversation

riga
Copy link
Contributor

@riga riga commented May 24, 2018

Description, Motivation and Context

The current BoolParameter implementation does not allow to negate parameters on the command line. Example:

class MyTask(luigi.Task):
    param_a = luigi.BoolParameter(default=False)
    param_b = luigi.BoolParameter(default=True) # cannot be set to False on the CLI
    ...

So right now, there is no way to set param_b to False from the cli. Of course, users can redefine param_b to mean the opposite e.g. no_param_b, but this might become cumbersome in the long run (at least for us it kind of does). Also, there are some luigi-internal bool parameters (e.g. send_failure_email, check_unfulfilled_deps) which are true by default, so one has to toggle the value manually in the config file for running individual tasks.

The behavior I have in mind is:

luigi MyTask
# -> param_a = False (as before)

luigi MyTask --param_a
# -> param_a = True (as before)

luigi MyTask --param_a true
# -> param_a = True (new)

luigi MyTask --param_a false
# -> param_a = False (new)

# same behavior for param_b except for the default as before
luigi MyTask
# -> param_b = True (as before)

To achieve this, one needs to pass different values of nargs and const to the ArgumentParser, which requires only a minor change. I also updated BoolParameter.normalize() which was originally introduced in #1461. Until now it returned bool(value) or None. I think it is safe to return the result of parse() or None for back-compat in case of an error.

Does this sound reasonable to you?

Have you tested this? If so, how?

I extended all existing tests that deal with BoolParamter parsing to cover the update behavior.

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm cool with the motivation and functionality here.

Left a couple comments/questions.

Also, please have a look at the failing tests.

Thanks!

test/helpers.py Outdated
@@ -160,7 +160,7 @@ def run_locally(self, args):
temp = CmdlineParser._instance
try:
CmdlineParser._instance = None
run_exit_status = luigi.run(['--local-scheduler', '--no-lock'] + args)
run_exit_status = luigi.run(args + ['--local-scheduler', '--no-lock'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for the order swap here?

Copy link
Contributor Author

@riga riga May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I digged a bit deeper in the argparse module and it turns out that cases like

program --optional-arg(nargs="?") positional-arg

cannot be properly interpreted as argparse consumes optional arguments prior to positionals. So internally, "positional-arg" is considered the value of optional-arg although it does not necessarily require a value.

I swapped the order above just to test something, but apparently the change in this PR would require all bool parameters to be placed behind the task name on the command line. This also causes some tests to fail. The "Running from the Command Line" docs do not encourage to have parameters before the task family, and I personally would never use the cli this way but still, prohibiting it in the first place is a major change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're fine with that change? If yes, I'll also add a note to the docs on that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i'm fine with it - so long as tests pass (they aren't currently)

parser_kwargs = super(BoolParameter, cls)._parser_kwargs(*args, **kwargs)
parser_kwargs.update({
"nargs": "?",
"const": True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the importance of nargs and const?

Copy link
Contributor Author

@riga riga May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argparse translates action="store_true" into nargs=0, const=True which means that no value (nargs=0) is expected for that argument, and if the argument but no value is given, it should be interpreted as True (const=True). The proposed feature is almost identical, except for the nargs="?" part.

@Tarrasch
Copy link
Contributor

Tarrasch commented Jun 5, 2018

Did you consider adding a different version of BoolParameter or a parameter to it's constructor? I'm so afraid of changing the existing behavior will cause breakage (even if we think it's backward compatible)

@riga
Copy link
Contributor Author

riga commented Jun 6, 2018

@Tarrasch This is actually what I fear as well. I added a parameter improved_parsing to the BoolParameter init which steers the parsing behavior. It defaults to the class attribute improved_parsing which itself defaults to False, so the current behavior is still standard. However, one can do

luigi.BoolParameter.improved_parsing = True

at an early stage to trigger the improved parsing for all parameters, including internal ones.

@riga riga mentioned this pull request Jun 6, 2018
@Tarrasch
Copy link
Contributor

Tarrasch commented Jun 6, 2018

Sounds good! Could we rename it to something like this?

luigi.BoolParameter.parsing = luigi.BoolParameter.(IMPLICIT_PARSING/EXPLICIT_PARSING)

Just an idea that could improve readability.

Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, but can you just change all the examples back so that --local-scheduler can be passed first? Shouldn't be any issues now that the "explicit" way is optional and not default right?

@@ -125,12 +125,12 @@ def test_cmdline_local_scheduler(self, logger):

@mock.patch("logging.getLogger")
def test_cmdline_other_task(self, logger):
luigi.run(['--local-scheduler', '--no-lock', 'SomeTask', '--n', '1000'])
luigi.run(['SomeTask', '--local-scheduler', '--no-lock', '--n', '1000'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be needed to change right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔︎

self.assertEqual(dict(MockTarget.fs.get_all_data()), {'/tmp/test_1000': b'done'})

@mock.patch("logging.getLogger")
def test_cmdline_ambiguous_class(self, logger):
self.assertRaises(Exception, luigi.run, ['--local-scheduler', '--no-lock', 'AmbiguousClass'])
self.assertRaises(Exception, luigi.run, ['AmbiguousClass', '--local-scheduler', '--no-lock'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we don't want code who pass --local-scheduler first to break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔︎

@riga
Copy link
Contributor Author

riga commented Jun 18, 2018

Yes, you're right, they don't require any changes.

@riga
Copy link
Contributor Author

riga commented Jun 29, 2018

The requested changes are included now, so feel free to review again :)

@Tarrasch
Copy link
Contributor

Tarrasch commented Jul 4, 2018

Sorry for the late review. It looks all good but I would like some docs too. Simple class-documentation for BoolParameter would be easiest I think.

@Tarrasch Tarrasch merged commit d994bca into spotify:master Jul 9, 2018
thisiscab pushed a commit to glossier/luigi that referenced this pull request Aug 3, 2018
thisiscab pushed a commit to glossier/luigi that referenced this pull request Aug 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants