Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC [KED-3023] - Enable plugins to extend starter-aliases default list #1265

Closed

Conversation

Galileo-Galilei
Copy link
Member

@Galileo-Galilei Galileo-Galilei commented Feb 19, 2022

Description

This PR implements part of the suggestions described in #1040 and creates a fully functional examples on how kedro starter discovery could be modified to enable plugins to created aliased starters.

Development notes

  • I modified the _STARTERS_ALIASES to make it a dict with the format discussed in above issue.
  • I created a kedro.starter entrypoint which can is iterated over to update the _STARTERS_ALIASES constant.
  • I updated the kedro new and kedro starter list commands to use the updated _STARTERS_ALIASES constant (with new aliases discovered in the entrypoints).

As of now, the code is already working. You can test it with :

conda create -n fps python=3.8 -y
conda activate fps
pip install git+https://github.com/Galileo-Galilei/kedro.git@plugins-starters
pip install git+https://github.com/Galileo-Galilei/fake-plugin-starter.git
kedro starter list

and see the following output:

(*) my_fake_starter: http://google.com # my extra starter
astro-airflow-iris: https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris
astro-iris: https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris
mini-kedro: https://github.com/kedro-org/kedro-starters/tree/main/mini-kedro
pandas-iris: https://github.com/kedro-org/kedro-starters/tree/main/pandas-iris
pyspark: https://github.com/kedro-org/kedro-starters/tree/main/pyspark
pyspark-iris: https://github.com/kedro-org/kedro-starters/tree/main/pyspark-iris
spaceflights: https://github.com/kedro-org/kedro-starters/tree/main/spaceflights

You can have a look at the plugin code if needed.

Some questions are still to be answered:

  • is this a desirable implementation?
  • should we add extra conversion / parsing when loading the entrypoint to check the format of the constant (I think we should, this is likely error prone right now)
  • should we enable plugin authors to override "official" starters? My current code prevents this but it may be acceptable.
  • should we make a visual distinction when listing the starters ? i add an extra (*) at the beginning to ensure they are listed first and identifiable at first glance.

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

As of now, I did not add tests and documentation: I'll update them once you will confirm these changes are desirable

Signed-off-by: Yolan Honoré-Rougé <yolan.honore.rouge@gmail.com>
Signed-off-by: Yolan Honoré-Rougé <yolan.honore.rouge@gmail.com>
Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Galileo-Galilei! Thanks very much for your work on this. In general I like this a lot and think that using entrypoints is a good solution - I'm just not sure on some of the details.

My main comment is that, even though different git repos is the most obvious use of this, it's conceivable that I might want to share a starter by some other means (a shared filepath say). kedro new --starter is very flexible and can things like filepaths rather than just git repos, so I think we should make sure that extending starters is equally as generic. Similarly I might want to share two different starters that point to exactly the same place but just have different tags, i.e. different checkout arguments. I think we should allow this too just to keep things generic. So my suggested format for the dictionary would be:

_STARTER_ALIASES = {"name": {"template_path": ..., "directory": ..., "checkout": ...}}

where directory and checkout are optional (directory defaults to None).

As for your other questions...

should we add extra conversion / parsing when loading the entrypoint to check the format of the constant (I think we should, this is likely error prone right now)

No strong feelings here - we don't check other entrypoints very carefully I think, and this is quite an advanced feature where there will just be one simple pattern to copy and paste from a documented example. So I'm not too worried about verifying the input.

should we enable plugin authors to override "official" starters? My current code prevents this but it may be acceptable.

I think best to leave the official ones unaltered - see next answer for why.

should we make a visual distinction when listing the starters ? i add an extra (*) at the beginning to ensure they are listed first and identifiable at first glance.

No strong feelings, but my suggestion would be to separate the output into sections according to the plugin source. Like how we do for global/project commands if you just type kedro:
image
This would work nicely if you have starters coming from multiple plugins (assuming that's possible).

cc @lorenabalan

kedro/framework/cli/starters.py Outdated Show resolved Hide resolved
kedro/framework/cli/starters.py Outdated Show resolved Hide resolved
kedro/framework/cli/starters.py Show resolved Hide resolved
kedro/framework/cli/starters.py Outdated Show resolved Hide resolved
kedro/framework/cli/starters.py Outdated Show resolved Hide resolved
kedro/framework/cli/starters.py Outdated Show resolved Hide resolved
@@ -31,6 +31,7 @@
"line_magic": "kedro.line_magic",
"hooks": "kedro.hooks",
"cli_hooks": "kedro.cli_hooks",
"starter": "kedro.starter"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"starter": "kedro.starter"
"starters": "kedro.starters"

Just seems more natural to me (like hooks).

Not sure whether this should go inside some hierarchy like kedro.constants. as you suggested before (but then what else might go in there?) or whether it belongs at the top level 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hesitated for the "s", let's go for the suggestions. I know we originally discussed about calling it "constants", but I make very strong assumptions on the format when looping on "starters" entrypoint, so This is clearly not generic enough to accept other constants (+ I do not see any other use case as you mention)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'm happy to have it as a top-level entry point kedro.starters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and I've just remembered why I choose to call it "starter" instead of "starters": it is the name tof the CLI group command (kedro starter list) so I wanted to be consistent. What do you want to do? Keep the "s" or not?

Copy link
Contributor

@antonymilne antonymilne Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense. Personally I'd go for starters since I wouldn't be too surprised if kedro starter at some point in the future somehow gets subsumed into kedro new, and the inconsistency here makes grammatical sense to me anyway. But let's see what someone else thinks too - @lorenabalan?

@Galileo-Galilei
Copy link
Member Author

Galileo-Galilei commented Feb 21, 2022

Hi @AntonyMilneQB, thanks for the review. The more I think about it, the more I think the entrypoint solution is "pythonic" indeed. I even found a reference doing something similar.

Similarly I might want to share two different starters that point to exactly the same place but just have different tags, i.e. different checkout arguments

I was wondering if setting the checkout argument in the template is really something we want. You do not let this possibility currently, and it may be better to point to main/master by default and make overriding explicit? I do think that it may be confusing for users to just do kedro new --starter=my_awesome_starter and not have the latest version installed in case checkout argument is set to a fixed reference. Specifiying checkout at instantiation time is not enough for you? What would be the use case?

No strong feelings, but my suggestion would be to separate the output into sections according to the plugin source. Like how we do for global/project commands if you just type kedro

I actually had the very same idea at first, but I don't know how to make it work honestly 😅 I want to reuse your load_entry_points function and it just returns a list of the entrypoints, I don't know how I can know from which plugin they come from.

@antonymilne
Copy link
Contributor

I was wondering if setting the checkout argument in the template is really something we want. You o not lety this possibility currently, and it may be better to point to main/master by default and make overriding explicit? I do think that it may be confusing for users to just do kedro new --starter=my_awesome_starter and not have the last version installed in case checkout argument is set to a fixed reference. Specifiying checkout at instantiation time is not enough for you? What would be the use case?

Fair enough. I was thinking that checkout can be used more than just to specify a version of a starter - in theory, someone might want to set up their starters so that there are two different starters defined in the same repo but labelled by their branch (instead of by directory). But I'd forgotten that by default we expect starters to use checkout to correspond to kedro version (rather than main/master), so this scheme for setting up a starters repo doesn't make much sense after all I think. Happy to go with what you think is best here 🙂

I actually had the very same idea at first, but I don't know how to make it work honestly 😅 I want to reuse your load_entry_points function and it just returns a list of the entrypoints, I don't know how I can know from which plugin they come from.

Ah yes I see, that is annoying. The type hint with click.MultiCommand (and exception message) indicates to me that really the load_entry_points function is intended for use with loading CLI commands (even though we also use it for line_magic it seems). Given that extracting the package name seems quite simple (like this for example) I'd probably just forget about using load_entry_points in favour of writing your own thing that does exactly what you want here.

@Galileo-Galilei
Copy link
Member Author

Galileo-Galilei commented Feb 21, 2022

I've just managed to make it work but the code is poorly readable. I do not use click.MultiCommand and brutally parse the text which is not very satisfying. The output looks like this:

image

Edit: yes, I used exactly what you suggested, i found the same code snippet.

But I'd forgotten that by default we expect starters to use checkout to correspond to kedro version (rather than main/master), so this scheme for setting up a starters repo doesn't make much sense after all I think.

Yes that's what I figured out in the code. It makes sense for kedro itself to have a default to the starter matching the kedro version, but i don't think it really does for user defined starters. I implemented it however to match your suggestion.

Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! I left a couple more suggestions but this is looking really good now! Let's try and get a second pair of eyes on it before it becomes too dominated just by my opinion.

Yes that's what I figured out in the code. It makes sense for kedro itself to have a default to the starter matching the kedro version, but i don't think it really does for user defined starters. I implemented it however to match your suggestion.

Yeah, this is something I've wondered before also. Just for the purposes of this PR, I think we should leave the current behaviour as it is (i.e. by default checkout = kedro version, not main) and save possible changes to that for a separate discussion.

Then the only question in my mind then is whether plugin-defined starters should be allowed to define checkout themselves or not. Not a big deal either way - I'm happy just to defer to you on it or see what the third opinion says.

kedro/framework/cli/starters.py Outdated Show resolved Hide resolved
kedro/framework/cli/starters.py Outdated Show resolved Hide resolved
kedro/framework/cli/starters.py Outdated Show resolved Hide resolved
Comment on lines 224 to 245
starters_by_origin = {}
for starter_name, starter_config in starters_aliases.items():
starter_raw_config = {
starter_name: {k: v for k, v in starter_config.items() if k != "origin"}
}
if starter_config["origin"] in starters_by_origin:
starters_by_origin[starter_config["origin"]].append(starter_raw_config)
else:
starters_by_origin[starter_config["origin"]] = [starter_raw_config]
for origin, starter_aliases in starters_by_origin.items():
click.secho(f"Starters from '{origin}'\n", fg="yellow")
output = []
for starter_alias in starter_aliases:
for starter_name, starter_config in starter_alias.items():
starter_name_displayed = (
f"{starter_name} "
f"(template_path={starter_config.get('template_path')}, "
f"directory={starter_config.get('directory')}, "
f"checkout={starter_config.get('checkout')})"
)
output.append(starter_name_displayed)
click.echo(yaml.safe_dump(output))
Copy link
Contributor

@antonymilne antonymilne Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few possible simplifications here. I think we can rely on Python dictionary ordering, so you can do it without needing to restructure the dictionary at all:

    last_origin = "kedro"
    click.secho(f"Built-in starters\n", fg="yellow")

    for starter_name, starter_config in starters_aliases.items():
        origin = starter_config.pop("origin")
        if origin != last_origin:
            click.secho(f"\nStarters from {origin}\n", fg="yellow")
            last_origin = origin
        click.echo(f"{starter_name}: {starter_config}")

Or if you do want to restructure the dictionary then it's probably possible using something like this.

The above renders as follows:
image

... which is not super beautiful I will admit 😅 It would be nicer to make this a bit more human-presentable, maybe by using yaml.dump(default_flow_style=True) instead of just printing the raw dictionary. But I do like only giving directory and checkout when they're explicitly defined. Overall it would be nice if the readability of the output kedro starter list wasn't significantly degraded compared to how it currently looks in spite of the additional generality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, but I wouldn't do pop() because we'd only get a good output the first time user runs kedro starter list. The second time it's run, the key "origin" won't be there anymore.

Copy link
Member Author

@Galileo-Galilei Galileo-Galilei Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AntonyMilneQB I thought dictionnary ordering was only from python>=3.7 and forward and I explicitly avoid relying on dict ordering because I know you still support python=3.6 :) But I was wrong so glad to make it more readable.

@lorenabalan I think pop() should be fine because the _get_starter_aliases() is called at the beginning of the function each time, and it does itself copy the _STARTERS_ALIASES constant. It would be safer to explictly deepcopy the dict in case of future changes though, do you prefer to enforce it?

I am still experimenting to see if we can have something more readable.

Copy link
Contributor

@lorenabalan lorenabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Galileo-Galilei for taking a stab at this! 🙌 I like the general direction of this. :) My main observation is that I'd like us to limit the scope to just "auto-discovery of starters from plugins", and not worry about custom checkout and directory conventions. If we change anything around there it will automatically make this PR a breaking change.

I believe we shouldn't allow natively supported starter aliases to be overwritten, that can cause a lot of confusion. Also some minimal checks on stuff returned from the entrypoint would be nice. We don't have checks on the others because hooks and CLI commands are harder to validate than a plain dictionary. If a user-defined plugin defined a bad starter entrypoint, we should wrap it up in a nice exception message.

Would love to also see some documentation added before merging.

kedro/framework/cli/starters.py Outdated Show resolved Hide resolved
kedro/framework/cli/starters.py Outdated Show resolved Hide resolved
Comment on lines 224 to 245
starters_by_origin = {}
for starter_name, starter_config in starters_aliases.items():
starter_raw_config = {
starter_name: {k: v for k, v in starter_config.items() if k != "origin"}
}
if starter_config["origin"] in starters_by_origin:
starters_by_origin[starter_config["origin"]].append(starter_raw_config)
else:
starters_by_origin[starter_config["origin"]] = [starter_raw_config]
for origin, starter_aliases in starters_by_origin.items():
click.secho(f"Starters from '{origin}'\n", fg="yellow")
output = []
for starter_alias in starter_aliases:
for starter_name, starter_config in starter_alias.items():
starter_name_displayed = (
f"{starter_name} "
f"(template_path={starter_config.get('template_path')}, "
f"directory={starter_config.get('directory')}, "
f"checkout={starter_config.get('checkout')})"
)
output.append(starter_name_displayed)
click.echo(yaml.safe_dump(output))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, but I wouldn't do pop() because we'd only get a good output the first time user runs kedro starter list. The second time it's run, the key "origin" won't be there anymore.

Comment on lines 237 to 244
for starter_name, starter_config in starter_alias.items():
starter_name_displayed = (
f"{starter_name} "
f"(template_path={starter_config.get('template_path')}, "
f"directory={starter_config.get('directory')}, "
f"checkout={starter_config.get('checkout')})"
)
output.append(starter_name_displayed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to do that much formatting ourselves because yaml.dump can handle Python dictionaries. It's what we do for kedro catalog list as well.

Copy link
Member Author

@Galileo-Galilei Galileo-Galilei Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made another attempt with yaml.dump. i am a bit worried about readibilty though, since it displayed inforamtion on several lines for each starter and there are a lot of them, so it may hurt discoverability. Current implementation looks like this:

image
I can reorder the keys (to have "name" as the first key) and remove unused key (origin or directory when it is None), but I still find it harder to read than above suggestion. What do you think?

kedro/framework/cli/starters.py Outdated Show resolved Hide resolved
@Galileo-Galilei
Copy link
Member Author

Galileo-Galilei commented Feb 28, 2022

Hi @lorenabalan and @AntonyMilneQB, I've made a couple of changes since last review, tell me what you think:

  • I refactored the "configuration" dictionnary one should specify in an entrypoint from this:
    {"astro-airflow-iris": {
        "template_path": _STARTERS_REPO,
        "directory": "astro-airflow-iris",
    }, ...}

to this

[
    {
        "name": "astro-airflow-iris",
        "template_path": _STARTERS_REPO,
        "directory": "astro-airflow-iris",
    }, ...]

Avoiding nested dictionnary make manipulations easier (reordering by "origin" or "name" depending on what is needed) is easier this way, and I think this makes code more readable in general.

  • I've also tried to output result with yaml. dump but it is likely too verbose and less readable (see above coment). I'd likely suggest to go with Antony first suggestion (e.g my initial "custom" formatting with directory key present only when filled)

@merelcht merelcht added the Community Issue/PR opened by the open-source community label Mar 7, 2022
Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work on this @Galileo-Galilei. As soon as there's some tests and docs feel free to move it from a Draft, and I'm sure I'll be happy to approve 🙂

Your fake plugin repo is a bit out of date given the new dictionary structure for STARTER_ALIASES you're using - here's one that's up to date if you want to play around more: https://github.com/AntonyMilneQB/fake-plugin-starter

import os
import re
import shutil
import stat
import tempfile
from collections import OrderedDict
from pandas import value_counts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is used anywhere, but kedro doesn't have pandas as a dependency so we shouldn't use any of their functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an auto import from my IDE, this is not used indeed.

@@ -23,21 +27,49 @@
_clean_pycache,
_filter_deprecation_warnings,
command_with_verbosity,
# load_entry_points,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# load_entry_points,

def _check_starter_entrypoint_config(module_name: str, config: Dict[str, str]) -> None:
if not isinstance(config, dict):
raise ValueError(
f"The starter configuration loaded from entrypoint should be a 'dict', got '{config}' instead"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this error could be clarified a bit. Currently with your fake starter I get this:

ValueError: The starter configuration loaded from entrypoint should be a 'dict', got 'my_fake_starter' instead
  • can we put module_name in there like in the other error messages?
  • did you mean type(config)?
  • this sounds like a TypeError?

missing_keys = mandatory_keys - set(config.keys())
if missing_keys: # mandatory keys
raise ValueError(
f"Entrypoint kedro.starters from {module_name} must have the following missing keys: '{missing_keys}'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"Entrypoint kedro.starters from {module_name} must have the following missing keys: '{missing_keys}'"
f"Entrypoint kedro.starters from {module_name} must have the following keys, which are currently missing: '{missing_keys}'"

mandatory_keys = {"name", "template_path"}
optional_keys = {"directory"}

missing_keys = mandatory_keys - set(config.keys())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor suggestion:

Suggested change
missing_keys = mandatory_keys - set(config.keys())
provided_keys = set(config.keys())
missing_keys = mandatory_keys - provided_keys

and then reuse provided_keys on L111

Comment on lines 125 to 127
"astro-airflow-iris": {"template_path": ..., directory=..., "origin": "kedro"},
...,
"my-awesome-starter": {"template_path": ..., directory=..., "origin": "my-awesome-plugin"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this or the type hint are up to date any more?

):
module_name = starter_entry_point.module_name.split(".")[0]
for starter_config in starter_entry_point.load():
_check_starter_entrypoint_config(module_name, starter_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I have an invalid starter specification then this will raise ValueError and I won't be able to see the built-in starters. I would need to pip uninstall the plugin in order to get kedro starter working again I guess?

This is ok, but I wonder if it would be better to make those exceptions just warnings instead. No strong feelings, up to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted. I guess it should raise a warning. As a user, I would hate having an error I don't understand here and I cannot correct here.

# see https://www.geeksforgeeks.org/group-list-of-dictionary-data-by-particular-key-in-python/
# this returns a nested dictionary {name1: {template_path: xxx, directory: xxx}, name2: {template_path: xxx, directory: xxx}, ...}
starters_aliases_by_name = {
k: list(v)[0] for k, v in groupby(starters_aliases, key=itemgetter("name"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
k: list(v)[0] for k, v in groupby(starters_aliases, key=itemgetter("name"))
name: list(v)[0] for name, v in groupby(starters_aliases, key=itemgetter("name"))

and some other descriptive name for v if possible? I like how you named variables in starters_aliases_by_origin 🙂

@@ -288,7 +399,7 @@ def _get_cookiecutter_dir(
f" Specified tag {checkout}. The following tags are available: "
+ ", ".join(_get_available_tags(template_path))
)
official_starters = sorted(_STARTER_ALIASES)
official_starters = sorted(_STARTERS_ALIASES)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message that follows this isn't just going to give a list of inbuilt starters any more; it will give the whole _STARTERS_ALIASES dictionary which isn't good.

Given that now we can have starters from other sources maybe we should just recommend the user to run kedro starter list to see what's available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will return only inbuilt starters: the dictionarry is always deepcopied when _get_starters_aliases is called, so I think the message is still giving the right result. But maybe it would be better to call explicitly _get_starters_aliases() here and modify the message accordingly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry I wasn't clear here. The problem I see isn't so much that this shows only inbuilt starters (though maybe that is a small problem also) but what the exception message will say. Won't this line print out the whole dictionary, with all its subkeys etc., rather than just showing the alias names of the starters?

yaml.safe_dump(official_starters)

Comment on lines 244 to 267
starters_aliases = _get_starters_aliases()
starters_aliases_by_origin = {
# origin: [
# {
# k: v
# for k, v in config.items()
# if k in {"name", "template_path", "directory"} and v is not None
# }
# for config in config_list
# ]
origin: list(configs)
for origin, configs in groupby(
sorted(starters_aliases, key=itemgetter("origin")), key=itemgetter("origin")
)
}

# ensure kedro starters are listed first
kedro_starters = starters_aliases_by_origin.pop("kedro")
click.secho(f"\nBuilt-in starters\n", fg="yellow")
click.echo(yaml.dump(kedro_starters))

for origin, starter_config in starters_aliases_by_origin.items():
click.secho(f"\nStarters from {origin}\n", fg="yellow")
click.echo(yaml.dump(starter_config))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After spending WAAAY too long playing around with this (and for some time unknowingly working on something that just reproduced what you had obviously thought of before and commented out...), here's the best I've come up with:

    starters_aliases_by_origin = groupby(
        sorted(_get_starters_aliases(), key=itemgetter("origin")),
        key=itemgetter("origin"),
    )
    for origin, starter_config in starters_aliases_by_origin:
        click.secho(f"\nStarters from {origin}\n", fg="yellow")
        for starter in starter_config:
            del starter["origin"]
            name = starter.pop("name")
            click.echo(yaml.dump({name: starter}))

image

If we care about ordering so that kedro is always first, you could do the below, but personally I don't think it's worth the extra complexity and would just leave it unordered.

    starters_aliases_by_origin = {k: list(v) for k, v in groupby(
        sorted(_get_starters_aliases(), key=itemgetter("origin")),
        key=itemgetter("origin"),
    )}
    for origin, starter_config in sorted(starters_aliases_by_origin.items(), key=lambda x: x[0] != "kedro"):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I played a lot with the formatting too and get to the same conclusion as you. I'd personnally prefer showing built-in starters first but it is really not a big deal.

@antonymilne antonymilne changed the title POC [KED-3023] - Enable plugins to extend starter-aliases default list e Apr 4, 2022
@antonymilne antonymilne changed the title e POC [KED-3023] - Enable plugins to extend starter-aliases default lis Apr 4, 2022
@antonymilne antonymilne changed the title POC [KED-3023] - Enable plugins to extend starter-aliases default lis POC [KED-3023] - Enable plugins to extend starter-aliases default list Apr 4, 2022
@antonymilne
Copy link
Contributor

antonymilne commented May 20, 2022

Hello @Galileo-Galilei, I was just wondering if you would mind if we on the kedro team took this over? It feels like a nice feature and I think we're most of the way there so would be nice to get it in 🙂

Am I right in thinking that it's basically just tests and documentation left to do, or did you want to put anything else in here?

@Galileo-Galilei
Copy link
Member Author

Hi @AntonyMilneQB, sorry for the late reply.

There are indeed "only" tests and documentation to write but it's still a couple of hours of work (there are a bunch of edge case scenarii + the test setup is quite complicated to emulate plugin registration). I have very little time for now, so feel free to take over the PR, else I might resume working on it in a couple of weeks.

Thank you very much for the guidance through the code review and sorry for not being as available as I wish I could.

@merelcht merelcht removed the Community Issue/PR opened by the open-source community label Jun 6, 2022
@noklam
Copy link
Contributor

noklam commented Jun 6, 2022

Takeover this in #1592

@noklam noklam self-assigned this Jun 6, 2022
@antonymilne
Copy link
Contributor

Closing this in favour of #1592. Thank you very much for all your initial work on this and the idea in the first place @Galileo-Galilei!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants