-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lightweight Kedro Viz Experimentation using AST #1966
Conversation
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
…ature/kedro-viz-lite Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only review the high level approach without diving into details yet. Can you tell me how should I test this PR or testing the parser separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Really happy to see this is getting to the finish line. This would be useful in many ways and speed up kedro viz a lot!
I approved with a minor comment as I don't want to block the PR.
💯 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this a quick test and it works! 💯 Thanks @ravi-kumar-pilla!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
Related to #1742
Kedro-Viz has lots of heavy dependencies. At the same time, it needs to import the pipeline code to be able to function, even when doing an initial export with --save-file. This means that sometimes using Kedro-Viz is difficult or impossible if Viz dependencies clash with the project dependencies, which can happen often.
One example of that has been the push for Pydantic v2 support #1603.
Another example, @inigohidalgo says "due to the heavy deps from viz i usually have my dev venv but I create another one just for viz where i just install viz over whatever project I have installed, overriding the project's dependencies with viz's" and asks "do you know if anybody has tested using kedro viz as an "app", so installing it through pipx or smth similar? is that even possible with how viz works?". https://linen-slack.kedro.org/t/16380121/question-regarding-kedro-viz-why-is-there-a-restriction-on-p#38213e99-ba9d-4b60-9001-c0add0e2555b
The acceptance criteria for this is simple - As a user I shouldn't need a full Spark installation to view Kedro-Viz for a project which uses Spark to process data.
Development notes
Added an option
--lite
in Kedro-Viz CLI. When users execute the commandkedro viz --lite
, it takes the approach mentioned below -Using AST + Mock Imports :
Steps:
1. Parse the Kedro project using AST
2. Locate all the import statements
3. Try importing the located statements
4. Mock the dependencies in-case of an import error
5. Patch sys modules with the mocked modules before retrieving the pipelines information from Kedro.
Testing:
I have tested basic Kedro projects with -
spark.driver.host
configured to localhost. The idea was to test spark initialization both via hooks and outside of hooksObservations:
On macOS Sonoma (2.4 GHz 8-Core Intel i9, 64GB) - These observations might differ as my system was a bit slow while doing tests. But this should give a basic idea of performance. All the tests are run using
time <command>
. To summarize,kedro viz --lite
was faster (~10-15sec) thoughget_mocked_modules()
took (~1-2sec), initializingDataCatalogLite
instead ofDataCatalog
saved time.Note
I have also performed monitoring using
line_profiler
which gave similar results. This ticket may not improvekedro viz
performance but makes it run with missing external dependencies. However, we can improve the overall performance once #1920 and #1920 (comment) are implementedLimitations:
1. If the datasets are not resolved in the catalog, they will be defaulted to MemoryDataset
2. Since MemoryDatasets do not have layer information, the layers will not be shown in the flowchart if the datasets are
not resolved
3. Experiment Tracking will not work if the datasets are not resolved and the pre-requisite of having kedro-datasets
version 2.1.0 and above is not met.
4. The metadata panel for a data node shows the data node type as MemoryDataset if the dataset is not resolved
Next Steps:
need to confirm with @stephkaiser
--lite
flag. Once this PR is merged and we have the above tasks complete, I will demo this featurein the Coffee chat (Sep 1st or 2nd week).
QA notes
Steps to test -
conda create -n viz-parser-test python=3.11
conda activate viz-parser-test
pip install kedro
git clone https://github.com/kedro-org/kedro-viz.git cd kedro-viz git checkout feature/kedro-viz-lite pip install -e package
kedro new --starter=spaceflights-pandas
cd spaceflights-pandas
kedro viz
kedro viz --lite
Credits:
Checklist
RELEASE.md
file