Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kedro micropkg pull is slow #3457

Closed
astrojuanlu opened this issue Dec 21, 2023 · 1 comment
Closed

kedro micropkg pull is slow #3457

astrojuanlu opened this issue Dec 21, 2023 · 1 comment
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@astrojuanlu
Copy link
Member

Description

As per title.

Context

The reason is that, to account for both setup.py and pyproject.toml in a generic way, the current code builds a wheel and then extracts the metadata:

# This is much slower than parsing the requirements
# directly from the metadata files
# because it installs the package in an isolated environment,
# but it's the only reliable way of doing it
# without making assumptions on the project metadata.
library_meta = project_wheel_metadata(project_root_dir)

and this can take ~5 seconds or so, more in a slower system.

This was introduced in #2614

Possible Implementation

This is a proof of concept implementation, however it only works for pyproject.toml projects:

diff --git a/kedro/framework/cli/micropkg.py b/kedro/framework/cli/micropkg.py
index dcdfc5d99e..1bf894e227 100644
--- a/kedro/framework/cli/micropkg.py
+++ b/kedro/framework/cli/micropkg.py
@@ -17,13 +17,13 @@
 from omegaconf import OmegaConf
 from packaging.requirements import InvalidRequirement, Requirement
 from packaging.utils import canonicalize_name
+from pyproject_metadata import StandardMetadata
 from rope.base.project import Project
 from rope.contrib import generate
 from rope.refactor.move import MoveModule
 from rope.refactor.rename import Rename
 from setuptools.discovery import FlatLayoutPackageFinder
 
-from build.util import project_wheel_metadata
 from kedro.framework.cli.pipeline import (
     _assert_pkg_name_ok,
     _check_pipeline_name,
@@ -212,12 +212,8 @@ def _pull_package(  # noqa: PLR0913
             )
         project_root_dir = contents[0]
 
-        # This is much slower than parsing the requirements
-        # directly from the metadata files
-        # because it installs the package in an isolated environment,
-        # but it's the only reliable way of doing it
-        # without making assumptions on the project metadata.
-        library_meta = project_wheel_metadata(project_root_dir)
+        with open(project_root_dir / "pyproject.toml") as fh:
+            library_meta = StandardMetadata.from_pyproject(toml.load(fh))
 
         # Project name will be `my-pipeline` even if `pyproject.toml` says `my_pipeline`
         # because standards mandate normalization of names for comparison,
@@ -967,10 +963,7 @@ def _get_all_library_reqs(metadata):
     """Get all library requirements from metadata, leaving markers intact."""
     # See https://discuss.python.org/t/\
     # programmatically-getting-non-optional-requirements-of-current-directory/26963/2
-    return [
-        str(_EquivalentRequirement(dep_str))
-        for dep_str in metadata.get_all("Requires-Dist", [])
-    ]
+    return [str(_EquivalentRequirement(str(dep))) for dep in metadata.dependencies]
 
 
 def _safe_parse_requirements(
diff --git a/pyproject.toml b/pyproject.toml
index 1f83c98fc7..aa18a99eb0 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -13,7 +13,6 @@ description = "Kedro helps you build production-ready data and analytics pipelin
 requires-python = ">=3.8"
 dependencies = [
     "attrs>=21.3",
-    "build>=0.7.0",
     "cachetools>=4.1",
     "click>=4.0",
     "cookiecutter>=2.1.1,<3.0",
@@ -29,6 +28,7 @@ dependencies = [
     "parse>=1.19.0",
     "pluggy>=1.0",
     "pre-commit-hooks",
+    "pyproject_metadata",
     "PyYAML>=4.2,<7.0",
     "rich>=12.0,<14.0",
     "rope>=0.21,<2.0",  # subject to LGPLv3 license
diff --git a/tests/framework/cli/micropkg/test_micropkg_pull.py b/tests/framework/cli/micropkg/test_micropkg_pull.py
index 13754d9503..0b88dd1e65 100644
--- a/tests/framework/cli/micropkg/test_micropkg_pull.py
+++ b/tests/framework/cli/micropkg/test_micropkg_pull.py
@@ -584,17 +584,6 @@ def test_pull_from_pypi(
             return_value=tmp_path,
         )
 
-        # Mock needed to avoid an error when build.util.project_wheel_metadata
-        # calls tempfile.TemporaryDirectory, which is mocked
-        class _FakeWheelMetadata:
-            def get_all(self, name, failobj=None):
-                return []
-
-        mocker.patch(
-            "kedro.framework.cli.micropkg.project_wheel_metadata",
-            return_value=_FakeWheelMetadata(),
-        )
-
         options = ["-e", env] if env else []
         options += ["--alias", alias] if alias else []

A backwards-compatible approach would need

  • To retain build as a dependency,
  • Conditionally try to locate the PEP-621 metadata in pyproject.toml and otherwise use the slow codepath, and
  • Test both things.

Possible Alternatives

@astrojuanlu astrojuanlu added the Issue: Feature Request New feature or improvement to existing feature label Dec 21, 2023
@astrojuanlu
Copy link
Member Author

We decided in #3750 to deprecate kedro micropkg so we will not do this.

@astrojuanlu astrojuanlu closed this as not planned Won't fix, can't repro, duplicate, stale May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Archived in project
Development

No branches or pull requests

2 participants