Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy load commands from plugins #3901

Closed
wants to merge 6 commits into from
Closed

Lazy load commands from plugins #3901

wants to merge 6 commits into from

Conversation

ankatiyar
Copy link
Contributor

@ankatiyar ankatiyar commented May 30, 2024

Description

Partly resolve #1476

As discussed in #1476, the initialisation of KedroCLI takes up a significant chunk of time. This is especially evident if you have a lot of kedro plugins installed in the environment as KedroCLI which is a CommandCollection, loads up all the commands from the plugins before the command is processed. This PR is to add a lazy way to load the commands from the plugins.

Development notes

  • Add a property called plugin_groups which returns a dict of entry_point_name to the EntryPoint object.
  • This group is passed to the super class CommandCollection which is a custom version of click.CommandCollection used by Kedro and defined in kedro/framework/cli/utils.py

Update to custom CommandCollection:

  • Add a new main function which receives the command to be run as args, eg [ "catalog", "list"] or ["airflow", "create"]
  • if the first element of the args is not in the command, then load commands from plugins
  • run the command by passing to super().main()

Loading of plugins:

I tried to implement a sort of "smart" loading where if the command arg, i.e. airflow or mlflow or viz partially matches the entry point names in the lazy_group dict keys, load the plugin, check if the command now exists and exit.
Note: The entry point names are decided by the plugins, eg. kedro-airflow's ep name is airflow which fully matches the command name too but for kedro-viz and kedro-mlflow the entry point names don't match the commands. eg. Command: viz Entry point name: kedro-viz

Otherwise, load all plugins one by one, exit if the command exists after loading plugins.

TODO

  • Fix code coverage on unit tests, which is tricky since this code is only used when a plugin is installed but we don't have tests like these in our unit tests.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
@astrojuanlu
Copy link
Member

To note, as @ankatiyar noted internally, "bad" lookups will take a lot of time: for example, kedro infoo (notice typo) took 17 seconds in my computer and tried to import kedro-viz, kedro-mlflow, and more.

Going forward, is there a smarter way we can declare new CLI commands, so that we can look them up in entry points without actually importing anything? (Feel free to open a new issue about this)

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
@ankatiyar ankatiyar marked this pull request as ready for review June 10, 2024 13:52
@ankatiyar ankatiyar requested a review from merelcht as a code owner June 10, 2024 13:52
@ankatiyar
Copy link
Contributor Author

ankatiyar commented Jun 10, 2024

The code coverage is not complete but opening this up for review anyway to gather feedback on the implementation.

@ankatiyar ankatiyar requested a review from DimedS June 10, 2024 13:55
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some suggestions/questions to the code, but I actually don't think it's a good idea that now the plugin commands aren't loaded when you do kedro --help. I've built a small plugin myself following https://docs.kedro.org/en/stable/extend_kedro/plugins.html#example-of-a-simple-plugin and the problem you get here is that the commands belong to the kedro group, but don't show at all. In this case there's no other top-level group you can call to see what sub-commands exist. On top of that the overwriting behaviour is changed, so if I now create my own micropkg command inside this kedrojson plugin it doesn't overwrite the original kedro micropkg where in the non lazy-loading version it does do the overwriting.

While this solution does lead to performance gain, it also degrades the plugin command discoverability and therefore IMO usability of the plugins as a whole. Features like https://docs.kedro.org/en/stable/extend_kedro/plugins.html#global-and-project-commands will not exist anymore after this change.

All in all, I don't think the performance gains are worth losing the plugin features.

@@ -172,13 +174,19 @@ def main(
click.echo(hint)
sys.exit(exc.code)

@property
def plugin_groups(self) -> dict[str, EntryPoint]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring here to explain what the property is for?

Comment on lines 185 to 187
"""Property which loads all global command groups from plugins and
combines them with the built-in ones (eventually overriding the
built-in ones if they are redefined by plugins).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this needs to be updated, because it's not loading the plugins anymore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering whether the "overriding" behaviour mentioned here is changed with the lazy loading? Update: I tested this and the behaviour is indeed changed. With the lazy loading it seems like the overwritten version isn't called.

@@ -90,6 +90,13 @@ def wrapit(func: Any) -> Any:
return wrapit


def _partial_match(plugin_names: list[str], command_name: str) -> str | None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring here as well?

@ankatiyar
Copy link
Contributor Author

ankatiyar commented Jun 11, 2024

Fair point @merelcht! 🤔
I was also trying to implement a way where --help does trigger the loading of the plugins.
About overwriting of core Kedro commands eg. micropkg, that might also be possible with lazy loading. Let me explore these functionalities and update the PR (if it works!)!

**extra: Any,
) -> Any:
# Load plugins if the command is not found in the current sources
if args and args[0] not in self.list_commands(None): # type: ignore[arg-type]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does self.list_commands(None) do? From the docs it seems like it expects a ctx.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankatiyar
Copy link
Contributor Author

Unfortunately, I don't think the overriding behaviour of CLI is possible with lazy loading of plugins - i.e. if a plugin has overriding CLI commands for Kedro core commands such as kedro catalog list etc since the premise of the proposal is to only load plugins if a command does not exist in KedroCLI.
I think it's possible to trigger the loading of plugins and display plugin commands with kedro --help but I would like to get the team's opinions on if we should be going ahead with this at all
cc @astrojuanlu @noklam @merelcht

@astrojuanlu
Copy link
Member

The python CLI has --help and --help-all. Maybe for the next breaking release we could make --help display only the "core" commands, and --help-all display everything.

For retaining the current behavior while attaining much faster load times for all the rest of commands, I'd be 👍🏼 on force-triggering loading of all plugins in kedro --help

@ankatiyar
Copy link
Contributor Author

Moving this back to the drafts, will close it if no other comments come in. I'll focus on getting kedro-org/kedro-viz#1920 and #3883 ready for review first!

@ankatiyar
Copy link
Contributor Author

Closing this in favour of the other solution!

@ankatiyar ankatiyar closed this Jun 21, 2024
@ankatiyar ankatiyar deleted the lazy-load-plugins branch August 21, 2024 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[spike] Improve Kedro CLI startup time
4 participants