Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CLI with lazy subcommands and deferring imports #1920

Merged
merged 11 commits into from
Aug 29, 2024

Conversation

ankatiyar
Copy link
Contributor

@ankatiyar ankatiyar commented May 22, 2024

Description

Related to kedro-org/kedro#3883 and resolves a follow up action mentioned here

Thanks to @ankatiyar for suggesting the changes and making the POC

Development notes

  • Removed single package/kedro_viz/launchers/cli.py file which handles all Kedro-Viz CLI commands
  • Created multiple files to handle each Kedro-Viz CLI command (like package/kedro_viz/launchers/cli/build.py, run.py etc)
  • Deferred dependencies until they are required
  • Created LazyDefaultGroup class as mentioned here
  • Updated tests

Tip

Read on the docs for lazy-loading-subcommands and learn more about the initial proposal here

QA notes

  • All tests should pass
  • Test Kedro-Viz CLI commands (kedro viz -h, kedro viz, kedro viz run, kedro viz build, kedro viz deploy) with and without options
  • Make observations similar to the ones made below -

The lazy subcommand loading and refactoring of viz CLI helped in kedro init time as viz cli had some expensive imports all in one file cli.py.

Few observations on demo-project with my system running on macOS Sonoma (2.4 GHz 8-Core Intel i9, 64GB) -

( Comparing after system warmup (initial run) as I am not sure how to clear cache or make this a cold start in the CLI. Please let me know if there is a way to make a cold start again )

Before -

Time taken for kedro -h - 4sec

image

Time taken for kedro registry list - 8sec

image

Time taken for kedro viz run to reach the actual run method - 6sec

To run this, you need pyinstrument and speedscope

# Commands used for profiling

pyinstrument -r speedscope -o profile.speedscope.json -m kedro viz run
speedscope profile.speedscope.json
image

After -

Time taken for kedro -h - 2sec

image

Time taken for kedro registry list - 6sec

image

Time taken for kedro viz run to reach the actual run method - 2.5sec

To run this, you need pyinstrument and speedscope

# Commands used for profiling

pyinstrument -r speedscope -o profile.speedscope.json -m kedro viz run
speedscope profile.speedscope.json
image

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
@znfgnu
Copy link
Contributor

znfgnu commented Jun 20, 2024

I did something similar 1.5 years ago: #1196

@rashidakanchwala
Copy link
Contributor

Closing this for now as Ankita can no longer work on this. This issue is tracked here - #1952

@ravi-kumar-pilla ravi-kumar-pilla changed the title try delaying import + lazy subcommands Refactor CLI with lazy subcommands and deferring imports Aug 16, 2024
@ravi-kumar-pilla
Copy link
Contributor

ravi-kumar-pilla commented Aug 16, 2024

Extension : Future possible improvements

Note

This PR will not improve the kedro viz run time after the run command is reached. However, it will indirectly improve the time taken for the CLI (Kedro and Kedro-Viz) commands

To improve kedro viz run time after run command is reached, we can explore and experiment with changes I started here based on the flame graph

Some of the observations made while working on improving performance for Kedro-Viz -

image
  • Module imports related to experiment tracking, strawberry graphQL (2-3sec) - We need to see if we can delay graphql call until someone clicks on ET, remove getVersions call (a middleware which ingests graphQL api app once a request for graphQL is made)
  • Separate api app (main api + additional api) - This will help remove dependencies involved for other apis
  • Module imports related to deployer factory (2-3sec) - Deferring deployer factory imports until a /deploy call is made
  • Separate responses module and try to get the main route working quickly - This needs refactoring and breaking down our responses module.
  • NetworkX takes time for import - Deferring the networkX import until used
  • SQLiteStore initialization also takes time (reason for configure and bootstrap project taking time) - Need to find a way to defer store initialization (likely a North star)
  • Apart from all the above improvements, uvicorn.run takes ~10sec to make the app available. Need to experiment by making a lighter app, reducing api routes and dependencies.

Thank you

@ravi-kumar-pilla
Copy link
Contributor

@ankatiyar I am not able to add you as reviewer, could you please review the PR when free. Thank you

Copy link
Contributor Author

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't approve my own PR but it looks good to me!

@noklam
Copy link
Contributor

noklam commented Aug 27, 2024

If I understand correctly, before #794 goes in, only the defer import will help but not so much with lazy subcommand?

What should I focus on review or what do you want me to review? The PR is huge in terms of code change, would be great if you can comment on this.

@ravi-kumar-pilla
Copy link
Contributor

If I understand correctly, before #794 goes in, only the defer import will help but not so much with lazy subcommand?

Yes, the refactor and deferred imports help both kedro commands and kedro viz commands.

What should I focus on review or what do you want me to review? The PR is huge in terms of code change, would be great if you can comment on this.

The code change looks big but if you can check the development notes, most of this is -

  1. Splitting the existing code into multiple files - this will help load modules that are needed for the particular command
  2. Deferring imports - Some heavy imports like Azure Exceptions are moved into the functions where they are needed
  3. When the user executes kedro registry list or any subcommand other than viz, only main.py of Kedro-Viz CLI is invoked (this has the entry point). Earlier we had everything in one file and this was also slowing down kedro commands.
  4. Introduce LazyDefaultGroup to support lazy subcommands and default command run for kedro viz (i.e., kedro viz run and kedro viz do the same thing).

As a code reviewer, I am expecting you to -

  1. Test Kedro-Viz CLI commands (kedro viz -h, kedro viz, kedro viz run, kedro viz build, kedro viz deploy) with and without options (atleast check if the intended function is triggered)
  2. If you have time, make observations as mentioned in the QA notes

Thank you

@ravi-kumar-pilla ravi-kumar-pilla requested review from jitu5 and noklam and removed request for merelcht and ravi-kumar-pilla August 28, 2024 14:20
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! i have test it with the main and lazy-subcommands branch. It's definitely faster when I do kedro run and a couple of other commands.

Big change but I think the new structure is much better, we could have break down the cli into smaller pieces earlier. We found out similar opportunities when I am working on the extension, some module could be made smaller.

Nice work!

Copy link
Contributor

@jitu5 jitu5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ravi-kumar-pilla and @ankatiyar. 🎉

Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks @ravi-kumar-pilla , and @ankatiyar

@ravi-kumar-pilla ravi-kumar-pilla merged commit 084d3e7 into main Aug 29, 2024
33 checks passed
@ravi-kumar-pilla ravi-kumar-pilla deleted the lazy-subcommands branch August 29, 2024 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants