Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-2821] Convert all imports according to packaging specs in pyproject.toml #1304

Closed
lorenabalan opened this issue Mar 1, 2022 · 7 comments

Comments

@lorenabalan
Copy link
Contributor

At kedro pipeline package time, check if there are any packaging specs in pyproject.toml (keys under [tool.kedro.pipeline.package]). If so, refactor all imports according to the aliasing made explicit in the manifest.

[tool.kedro.pipeline.package]
"pipelines.alpha" = {alias = beta}
"pipelines.alpha.utils" = {alias = sigma}
"other_package" = {alias = utilities, destination = "dist/libraries"}

In the example above, utils should be renamed to sigma (and any references inside it to alpha should refer to beta), alpha to beta (and any references inside it to utils should refer to sigma), etc.

❓ How to work out the ideal order?

❓ Would working on a shadow-copy of the whole (or part of the) project help here?

❓ Checkout how nested packages would work, e.g. the alpha and alpha.utils. Consider doing top-level packages first, and then the nested ones.

For more context see comments on the original issue, copied over below.

@lorenabalan
Copy link
Contributor Author

lorenabalan commented Mar 1, 2022

We talked about the experience of packaging/pulling modular pipelines, and how we can actually think of it as micro-packaging any part of the Python code in a Kedro project, at arbitrary depth in the nested folder structure. We can think of pyproject.toml as an additional requirements.txt, which deals with "internal" dependencies.

Worked example detailed in #1304 (comment).

Main ideas & proposed actions:

  • Switch from .whl to sdist (built distribution vs source distribution, which is what we actually need)

Our wheels currently look like two packages (one for tests, one for src), and because of that it wouldn't be guaranteed to work with any other Python package. The less customisation we have the better. sdist seems like the appropriate standard for this (sharing source code), and tests can be specified in sdist's metadata, rather than another package. Users can generate a wheel file from sdist, which makes it more flexible if they want to share wheels or source code with the client.

  • Allow alias to be a Python module path ↔️ can package any subsection of the project
  • Make packaging work with any path relative to src/<package_name>, i.e. not just pipelines. ↔️ can package utilities
  • Aliasing improvements
    • Phase I: Look at import paths: if self-contained to that module, package; else, raise error (upstream importing is not allowed).
    • Phase II (on top of phase I): Look at import paths: if in pyproject.toml, rework imports to new location/name; else, raise error.

Out of scope: When a path is added to the PYTHONPATH and the imports look like 3rd party, when they're actually from the local project, somewhere upstream. We treat it as third party and that's it.

Open question:
❓ How to declare the explicit dependency between micro-packages ❓
(also how do we know if the second micro-package should come as source code or directly as packaged asset?)

@lorenabalan
Copy link
Contributor Author

Translating whiteboard exercise

In a tree like:

<package_name>
|__ pipelines
     |__ clean_pipeline
          |__ nullify
               |__ pipeline.py
               |__ nodes.py
|__ utils
     |__ features
     |__ cleaning
           |__ impute
                |__ main.py
                |__ sth.py
           |__ nullify
                |__ func.py
                |__ sth.py

With a pyproject.toml like:

[tool.kedro.pipeline.package]
"utils.cleaning.impute" = {alias = "banking_impute"}
"utils.cleaning.nullify" = {alias = "banking_nullify"}

[tool.kedro.pipeline.pull]
"banking_impute" = {alias = "pkg.utils.cleaning"}

If we try to package nullify from clean_pipeline (i.e. pipelines.clean_pipeline.nullify), we look at the imports inside:

from pkg.pipelines.clean_pipeline.nulify import nodes  # can package, no problem
from pkg.utils.cleaning.impute import main  # turn into `from banking_impute import main`, then package
from pkg.utils.features.windowing import daily  # can't package, needs to be at least in `pyproject.toml`

As mentioned before, upstream imports are not allowed. In practice: if we try to package utils.cleaning.impute, the following import would make the packaging workflow error out:

from pkg.utils import another_utility

@lorenabalan lorenabalan added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Mar 2, 2022
daniel-falk pushed a commit to daniel-falk/kedro that referenced this issue Mar 5, 2022
Updated all docs and CI to follow the new github naming convention (main)
@merelcht merelcht added this to the 0.18+ milestone Mar 7, 2022
@noklam noklam moved this to In Progress in Kedro Framework Mar 24, 2022
@noklam noklam removed the status in Kedro Framework Mar 24, 2022
lvijnck pushed a commit to lvijnck/kedro that referenced this issue Apr 7, 2022
Updated all docs and CI to follow the new github naming convention (main)

Signed-off-by: Laurens Vijnck <laurens_vijnck@mckinsey.com>
@AhdraMeraliQB
Copy link
Contributor

Open question: ❓ How to declare the explicit dependency between micro-packages ❓ (also how do we know if the second micro-package should come as source code or directly as packaged asset?)

This question is being addressed separately in #1478

@merelcht merelcht removed this from the 0.18+ milestone Jul 13, 2022
@merelcht
Copy link
Member

merelcht commented Dec 14, 2022

This issue was discussed in Technical Design on 14/12

  • In Fix micropkg manifest section in pyproject.toml #2119 it was discovered that the feature to add micropkg specs in pyproject.toml hasn't worked since March 2022. Nobody has reported this.
  • The team agreed that before implementing more advanced functionality to the micropkg specs in pyproject.toml, we need to investigate if users are using this functionality. Investigate if micropkg spec in pyproject.toml is used #2123
  • It would also be worth talking to the Alloy team and hearing how they do packaging and if they find the micro-packaging functionality in Kedro useful or not.

@astrojuanlu
Copy link
Member

Is this whole import mangling worth the effort? Maybe we could tell users "refactor common code/functions that are not in the pipeline itself as a separate package, then declare it as a dependency".

(Simplistic question meant to spark discussion and, in line with what @merelcht said right above, talk to our users about the micropackaging workflow).

@merelcht
Copy link
Member

I think we need to decide what to do with the micropackaging feature as a whole before introducing any more features like the suggested one here.

@astrojuanlu
Copy link
Member

Closing for now, we can always revisit in the future

@astrojuanlu astrojuanlu closed this as not planned Won't fix, can't repro, duplicate, stale Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants