Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor backend integration with Kedro by replacing bootstrap_project with configure_project #1796

Merged
merged 20 commits into from
Apr 24, 2024

Conversation

rashidakanchwala
Copy link
Contributor

@rashidakanchwala rashidakanchwala commented Mar 7, 2024

Description

  • This ticket refactors backend integration with Kedro Project by replacing bootstrap_project with configure_project

Development notes

Since Kedro Viz CLI commands are part of KedroCLI, starting the backend server on viz does not need to bootstrap the Kedro project. We need to use the configure_project(package_name) instead of bootstrap. Since Kedro Viz server starts on a spawn process, we need to pass the package_name info via arguments to run_server.

  • Add package_name arg to run_server and pass down the argument to configure the kedro project during integration with kedro
  • Retain bootstrap_project when running viz in dev mode/when package_name is None
  • Update launchers cli.py and jupyter.py with package_name arg
  • Update kedro viz deploy and kedro viz build to include package_name arg

QA notes

  • All tests should pass

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

merelcht added 2 commits March 7, 2024 12:51
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
@rashidakanchwala rashidakanchwala changed the title Refactor - Replace bootstrap(project) with configure Refactor - Replace bootstrap(project) with configure project Mar 7, 2024
@noklam
Copy link
Contributor

noklam commented Mar 13, 2024

Copy the summary of discussion in Slack here:

After some debugging with
@Merel
, we found out what's the problem and have a solution now. The story turns out to be quite complicated:

  • bootstrap_project was run properly, so kedro-viz CLI is part of KedroCLI
  • Multiprocess behaves differently in Mac & Window (spawn process), on Linux it's "fork", so when we are testing things, CI is passing but fails locally.
  • "spawn" process will re-import every library, which means PACKAGE_NAME get reset as None
    The solution is replacing bootstrap_project with configure_project(project_name), the PACKAGE_NAME should be add as an argument to run_server. (Look at ParallelRunner in Kedro to see how this is handle, https://github.com/kedro-org/kedro/blob/da709d4316c141c5a7d6f676a87a5752807b33f4/kedro/runner/parallel_runner.py#L4)

Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
@ravi-kumar-pilla ravi-kumar-pilla changed the title Refactor - Replace bootstrap(project) with configure project Refactor backend integration with Kedro by replacing bootstrap_project with configure_project Apr 4, 2024
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
@ravi-kumar-pilla ravi-kumar-pilla added Enhancement Python Pull requests that update Python code labels Apr 4, 2024
@ravi-kumar-pilla ravi-kumar-pilla marked this pull request as ready for review April 4, 2024 20:13
@ravi-kumar-pilla ravi-kumar-pilla requested review from noklam, merelcht and SajidAlamQB and removed request for NeroOkwa and ravi-kumar-pilla April 4, 2024 20:14
@ravi-kumar-pilla ravi-kumar-pilla marked this pull request as draft April 15, 2024 14:07
@ravi-kumar-pilla
Copy link
Contributor

Hi Team,

To resolve #1761, before we spawn a new process for viz, we need to check if the path/current dir has a valid kedro project (we do not want to call configure_project or bootstrap as they do more than what we want at this point). The other 2 alternatives I saw were -

  1. Either use _get_project_metadata(Path.cwd()) (which raises an exception) -
RuntimeError: Could not find the project configuration file 'pyproject.toml' in /Users/Ravi_Kumar_Pilla/Library/CloudStorage/OneDrive-McKinsey&Company/Documents/Kedro/KedroOrg/kedro-viz. If you have created your project with Kedro version <0.17.0, make sure to update your project template. See https://github.com/kedro-org/kedro/blob/main/RELEASE.md#migration-guide-from-kedro-016-to-kedro-0170 for how to migrate your Kedro project.)
  1. Use _find_kedro_project method from kedro.utils

We can have either of these in kedro_viz.launchers.cli before we start the process

    1. # _get_project_metadata(Path.cwd())
    
    2.  # project_path = _find_kedro_project(Path.cwd())
        # if not project_path:
        #     display_cli_message(
        #         "ERROR: Unable to find Kedro project.",
        #         "red"
        #     )
        #     return

Since both of them are private and internal, I checked with @ankatiyar . She suggested - since the logic of _find_kedro_project method is not complex, we can implement similar logic on Kedro-Viz side to avoid coupling with importing a private function.

My thoughts - Kedro-Viz already uses few private methods of Kedro, so we can actually use either of the above 2 approaches (I would recommend using _find_kedro_project and make viz start with sub-directories). At the same time, we can reduce coupling by implementing logic directly on viz side as Ankita suggested.

cc: @rashidakanchwala @astrojuanlu

@astrojuanlu
Copy link
Member

She suggested - since the logic of _find_kedro_project method is not complex, we can implement similar logic on Kedro-Viz side to avoid coupling with importing a private function.

Any chance to make that function public on Kedro?

we can implement similar logic on Kedro-Viz side to avoid coupling with importing a private function.

Tempting, but I think it should be kedro responsibility to determine what a Kedro project is, and not having Kedro-Viz duplicate logic.

My 2 cents. What do you think @merelcht ?

@merelcht
Copy link
Member

She suggested - since the logic of _find_kedro_project method is not complex, we can implement similar logic on Kedro-Viz side to avoid coupling with importing a private function.

Any chance to make that function public on Kedro?

we can implement similar logic on Kedro-Viz side to avoid coupling with importing a private function.

Tempting, but I think it should be kedro responsibility to determine what a Kedro project is, and not having Kedro-Viz duplicate logic.

My 2 cents. What do you think @merelcht ?

I would actually prefer the opposite 😅 I would not be so quick to just make this method public and increasing the public API and would argue that this logic belongs as much to the framework as it does to Kedro Viz. Kedro Viz currently depends on there being a Kedro project, so I feel like it's not that weird to implement logic on the viz side to detect if a Kedro project is present. Using the private method is indeed the least preferred implementation, because of the coupling it introduces.

@astrojuanlu
Copy link
Member

@merelcht's opinion prevails 😄 I don't have a strong preference

@astrojuanlu
Copy link
Member

@ravi-kumar-pilla
Copy link
Contributor

She suggested - since the logic of _find_kedro_project method is not complex, we can implement similar logic on Kedro-Viz side to avoid coupling with importing a private function.

Any chance to make that function public on Kedro?

we can implement similar logic on Kedro-Viz side to avoid coupling with importing a private function.

Tempting, but I think it should be kedro responsibility to determine what a Kedro project is, and not having Kedro-Viz duplicate logic.
My 2 cents. What do you think @merelcht ?

I would actually prefer the opposite 😅 I would not be so quick to just make this method public and increasing the public API and would argue that this logic belongs as much to the framework as it does to Kedro Viz. Kedro Viz currently depends on there being a Kedro project, so I feel like it's not that weird to implement logic on the viz side to detect if a Kedro project is present. Using the private method is indeed the least preferred implementation, because of the coupling it introduces.

Thank you @merelcht for responding. While I agree the logic is also a responsibility of plugin developer, on maintenance front where in future if the logic of finding a kedro project changes (may be kedro framework team decides on having a new template), the plugin developer should also update the logic which duplicates the code twice.

I do not completely understand why should this util function is not made public. Could you please let me know your thoughts on this. Thank you

@merelcht
Copy link
Member

I do not completely understand why should this util function is not made public. Could you please let me know your thoughts on this. Thank you

In my opinion, these kind of utility functions that hold a lot of logic for internal specifics of Kedro just don't belong in a public API. Making something like this public would mean that if we ever decide to change it we'd have to do a major breaking release on Kedro. It would make it very hard to change internal logic without breaking releases. IMO breaking changes should serve a clear and useful purpose for the majority of our users, who are first and foremost, the Framework users and these users generally don't use or care at all for internal utility functions.

However, I do think we need to make a proper effort of de-coupling Kedro Viz and Kedro where possible and critically review how the Kedro API is used in Viz to assess how we can make sustainable version on the Kedro side to prevent situations like the one here and what we're seeing with the transcoding imports as well.

@rashidakanchwala
Copy link
Contributor Author

Thanks, @merelcht. I also believe these utility functions don't change significantly. Therefore, if we duplicate the above two functions, we could do the same for transcoding functions. Besides these, I don't think there are any other private functions that Kedro-viz imports.

Of course, we still import public classes and use private methods of those classes. But that's a more difficult problem to solve and need to be solved another way.

…periments-merel

Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
…periments-merel

Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
@ravi-kumar-pilla ravi-kumar-pilla marked this pull request as ready for review April 18, 2024 20:08
@rashidakanchwala rashidakanchwala marked this pull request as draft April 22, 2024 11:40
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com>
@ravi-kumar-pilla ravi-kumar-pilla marked this pull request as ready for review April 22, 2024 17:01
@noklam
Copy link
Contributor

noklam commented Apr 22, 2024

To resolve #1761, before we spawn a new process for viz, we need to check if the path/current dir has a valid kedro project

Can you explains why is this the case? Reading the PR, it's still using the configure_project way that I suggested. I am getting lost in the discussion since the PR was created so long ago, what's changed? Can you list out the requirements if this solution actually doesn't work?

For the record: kedro-mlflow also importing the private API https://github.com/Galileo-Galilei/kedro-mlflow/pull/539/files#diff-1391ca4513935426e2b4545a1f6f052cc12e0814ea9d3b21c8a16aa2d0de6baa

Why do kedro-mlflow need that? This may have changed since Kedro CLI can be run anywhere inside a repository now.

@ravi-kumar-pilla
Copy link
Contributor

To resolve #1761, before we spawn a new process for viz, we need to check if the path/current dir has a valid kedro project

Can you explains why is this the case? Reading the PR, it's still using the configure_project way that I suggested. I am getting lost in the discussion since the PR was created so long ago, what's changed? Can you list out the requirements if this solution actually doesn't work?

For the record: kedro-mlflow also importing the private API https://github.com/Galileo-Galilei/kedro-mlflow/pull/539/files#diff-1391ca4513935426e2b4545a1f6f052cc12e0814ea9d3b21c8a16aa2d0de6baa

Why do kedro-mlflow need that? This may have changed since Kedro CLI can be run anywhere inside a repository now.

I have updated the description and this PR aims at moving away from bootstrap_project as you suggested @noklam . I have another PR in draft which solves #1761 .

Copy link
Contributor Author

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't approve since its my PR but LGTM!

Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. I didn't test it manually as I have already done it initially and looks like the implementation doesn't change much.

@ravi-kumar-pilla ravi-kumar-pilla merged commit d5a8b83 into main Apr 24, 2024
47 checks passed
@ravi-kumar-pilla ravi-kumar-pilla deleted the experiments-merel branch April 24, 2024 16:50
This was referenced May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Python Pull requests that update Python code
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants