Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add plural flags --nodes, --tags, and --load-versions to kedro run #2301

Merged
merged 31 commits into from
Feb 16, 2023

Conversation

AhdraMeraliQB
Copy link
Contributor

@AhdraMeraliQB AhdraMeraliQB commented Feb 8, 2023

Signed-off-by: Ahdra Merali ahdra.merali@quantumblack.com

NOTE: Kedro datasets are moving from kedro.extras.datasets to a separate kedro-datasets package in
kedro-plugins repository. Any changes to the dataset implementations
should be done by opening a pull request in that repository.

Description

This PR completes #2244 by adding pluralised flags for the node, tag, and load-version options on kedro run

For Reviewers:

  • _split_load_versions was added to handle input strings, as previously --load-version would process the input as tuples due to multiple=True.
  • .strip() is used to trim the whitespaces from user input
  • I couldn't find any unit tests for the individual tags; I've added tests for the new flags but have not duplicated them for the old singular flags as they will just be deleted in 0.19.0. These tests check inputs with a singular value, multiple comma-separated values, and multiple comma-and-space-separated values.
  • The test for --load-versions is subsumed under the test for _split_load_versions just as testing --load-version was by test_reformat_load_versions.
  • For the period between 0.18.5 and 0.19.0 users will be able to use both versions of the flags in their kedro run commands. Whilst I don't think they will, is it worth considering limiting this behaviour?

Development notes

The helper function _split_load_versions was added as --load-version and --load-versions return different types, the singular returns a tuple and the plural a string. This is as a result of using multiple=True for the singular flag.

The both new and old tags have been tested manually to ensure they (still) behave as expected.

  • Add plural flags
  • Test they work for single inputs
  • Test they work for multiple, comma-separated inputs
  • Test behaviour against whitespaces
    N/B: Tags are robust against whitespaces
  • Trim whitespaces for load-versions
  • Trim whitespaces for nodes
  • Add tests for plural flags (single input, multiple input)
  • Make sure tests check whitespaces handling in input
  • Test behaviour when using both singular and plural flags in one command - it works!
  • Add test for _split_load_versions
  • Add additional tests (if necessary)

Checklist

Ahdra Merali added 2 commits February 8, 2023 17:27
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
@AhdraMeraliQB
Copy link
Contributor Author

Question for reviewers: Do we want to restrict users from using both --node and --nodes while both are available?

Ahdra Merali added 5 commits February 8, 2023 17:50
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Ahdra Merali and others added 2 commits February 9, 2023 15:15
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
env,
runner,
is_async,
node_names,
nodes_names,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is named this way for differentiation from the already existing node_names - in #2245 I'd suggest keeping node_names as the parameter name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine to keep node_names for now in #2245 but as part of #2247 I think we should consider just calling this nodes. (Just while I remember, Ivan's argument for why we might not want to do this: it's deliberately written as node_names to deter people from trying to pass node objects themselves to the underlying session.run. My argument for it is consistency and simplicity, particularly when we eventually return to #1423)

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
@AhdraMeraliQB AhdraMeraliQB marked this pull request as ready for review February 9, 2023 17:09
AhdraMeraliQB and others added 4 commits February 10, 2023 12:07
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I think the one thing missing is a test (or tests) to check that providing both --node and --nodes and tags, and load versions, works properly as well. Because now we're not really checking that these lines work:

tag = tag + tags
node_names = node_names + nodes_names
load_version = {**load_version, **load_versions}

AhdraMeraliQB and others added 4 commits February 13, 2023 09:34
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great PR 👍 Super clear explanations and excellent testing ⭐ I'm a little bit confused about _get_values_as_tuple, but that's not really a result of your changes. I wonder if that could be simplified after (or as part of) the ticket to remove the old flags, since it will no longer need to handle anything more complicated than just a single string (?). Happy to chat more about this anyway so we can try to figure it out!

Question for reviewers: Do we want to restrict users from using both --node and --nodes while both are available?

I think it's fine as it is. I don't think anyone will really try to do that, it's only going to be possible for a limited period of time, plus they will already get the deprecation warning and things will work ok when you use both. So not worth the extra code required to handle it - doing the + to combine arguments together like you do now is more than enough I think.

kedro/framework/cli/project.py Show resolved Hide resolved
env,
runner,
is_async,
node_names,
nodes_names,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine to keep node_names for now in #2245 but as part of #2247 I think we should consider just calling this nodes. (Just while I remember, Ivan's argument for why we might not want to do this: it's deliberately written as node_names to deter people from trying to pass node objects themselves to the underlying session.run. My argument for it is consistency and simplicity, particularly when we eventually return to #1423)

@@ -412,6 +438,16 @@ def run(
tag = _get_values_as_tuple(tag) if tag else tag
node_names = _get_values_as_tuple(node_names) if node_names else node_names

# temporary duplicates for the plural flags
tags = _get_values_as_tuple(tags) if tags else tuple(tags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but do we need these if/else clauses? Since _get_values_as_tuple("") == tuple("") == tuple() already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - was following convention from the earlier implementation but looks like they weren't needed there either. My suspicion is that multiple=True implies a default value of an empty tuple 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to find any documentation on the matter but manual testing does expose that values on options where multiple=True will be tuples if the default is unspecified, even when a value is not provided. I'll remove the unnecessary ifs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, that's what I suspected about multiple=True 👍 Just to double check though, it's ok to remove the if/else for both the multiple=True and the multiple=False versions of these arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! For the other flags their input is immediately handled by the callback function which goes on to return a list, and it's this list that get's passed to _get_values_as_tuple, which splits them safely

kedro/framework/cli/utils.py Outdated Show resolved Hide resolved
@@ -468,6 +472,11 @@ def _split_params(ctx, param, value):
return conf


def _split_load_versions(ctx, param, value):
lv_tuple = _get_values_as_tuple([value])
Copy link
Contributor

@antonymilne antonymilne Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the [value] here rather than value? Elsewhere we call _get_values_as_tuple just with a single string. (Edit: is that true? Maybe it gets tuple-ified due to multiple=True? But then that wouldn't make sense for the instances you've added 🤔 )

Also do we need the if/else here? Wouldn't _reformat_load_versions work the same just passing the empty value into it directly?

After we've removed the old load_version we should definitely just move _reformat_load_versions code into this instead of having two separate functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AntonyMilneQB

Do we need the [value] here rather than value? Elsewhere we call _get_values_as_tuple just with a single string.

Without wrapping value in a list _get_values_as_tuples will treat each character in the string as a value. The function is never called on a single string as multiple=True tuplifies the values of --node and --tag, and the callback functions for --nodes and --tags both return lists.

Also do we need the if/else here? Wouldn't _reformat_load_versions work the same just passing the empty value into it directly?

The if/else isn't needed as _reformat_load_versions would still work - the idea was to prevent calling another method if unnecessary. Which approach would be considered more pythonic?

After we've removed the old load_version we should definitely just move _reformat_load_versions code into this instead of having two separate functions.

Agreed

Copy link
Contributor

@antonymilne antonymilne Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if/else isn't needed as _reformat_load_versions would still work - the idea was to prevent calling another method if unnecessary. Which approach would be considered more pythonic?

Overall I think without the if/else. The performance overhead of calling another function here is very small compared to the (also small) additional complexity of branching in the code. It's a small thing, but I'd only skip calling the function here if it was particularly expensive.

@antonymilne antonymilne changed the title Add Add plural flags --nodes, --tags, and --load-versions to kedro run Add plural flags --nodes, --tags, and --load-versions to kedro run Feb 13, 2023
Co-authored-by: Antony Milne <49395058+AntonyMilneQB@users.noreply.github.com>
AhdraMeraliQB and others added 3 commits February 13, 2023 12:18
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
@AhdraMeraliQB AhdraMeraliQB requested review from merelcht and removed request for yetudada February 13, 2023 12:30
@AhdraMeraliQB
Copy link
Contributor Author

@AntonyMilneQB

I'm a little bit confused about _get_values_as_tuple, but that's not really a result of your changes. I wonder if that could be simplified after (or as part of) the ticket to remove the old flags, since it will no longer need to handle anything more complicated than just a single string (?)

It should be able to be removed 🤔 it's functionality then wouldn't extend beyond split_string, if the return type is changed from list to tuple. Definitely worth evaluating in #2245

@antonymilne
Copy link
Contributor

@AhdraMeraliQB

It should be able to be removed 🤔 it's functionality then wouldn't extend beyond split_string, if the return type is changed from list to tuple. Definitely worth evaluating in #2245

This would be a nice simplification. It would also be nice if we could put all this argument-munging into the callbacks rather than needing to do them inside the run function itself. Please could you add any of these proposed simplifications to #2245 (or wherever you think is appropriate)? I think we should try and do them sooner rather than later while they're fresh in our minds.

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! 👍 Don't forget to update the release notes.

@AhdraMeraliQB
Copy link
Contributor Author

AhdraMeraliQB commented Feb 14, 2023

@merelcht

Great work! 👍 Don't forget to update the release notes.

The changes were combined in the release notes of #2303: "Deprecate (singular flags) in favour of (plural flags)" - should this be elaborated on further?

@merelcht
Copy link
Member

The changes were combined in the release notes of #2303: "Deprecate (singular flags) in favour of (plural flags)" - should this be elaborated on further?

Yes, I would separately add that these flags have been added. That way it's also easier to trace which PR the changes were added in.

Ahdra Merali and others added 2 commits February 14, 2023 10:31
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
@AhdraMeraliQB AhdraMeraliQB merged commit d490bf9 into main Feb 16, 2023
@AhdraMeraliQB AhdraMeraliQB deleted the feat/add-plural-run-flags branch February 16, 2023 17:41
stichbury pushed a commit that referenced this pull request Feb 28, 2023
…2301)

* Add plural flags nodes, tags, load-versions

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Lint

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Lint

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Trim whitespaces from nodes

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Trim whitespaces from load versions

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Write test for nodes

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Write test for tags

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Write test for load versions

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove shorthand flag from plural

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Fix tests after merge

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Lint

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Undo merge side-effects

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add tests to check behaviour when both singular and plural flags used

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Lint

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Lint again

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Change deprecation condition

Co-authored-by: Antony Milne <49395058+AntonyMilneQB@users.noreply.github.com>

* Remove unnecessary ifs

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Lint changes from main

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add changes to RELEASE.md

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Try fix CI

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Revert CLI change

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

---------

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Co-authored-by: Antony Milne <49395058+AntonyMilneQB@users.noreply.github.com>
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add plural flags --nodes, --tags, and --load-versions to kedro run
3 participants