Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable local packages and PROJECT_ROOT expansion in requirements files #2517

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

pamaury
Copy link

@pamaury pamaury commented Dec 20, 2024

The goal of this PR is to make working with local packages easier.

Motivation:

We have a large multilanguage repository with a nontrivial amount of Python code. Unfortunately, most of the Python developers do not like/use Bazel and prefer to run the code directly and/or would like it to be managed using pip (e.g. pip install -e). On the other hand, some of the python libraries are also used by bazel managed py_binarys. As a result, the BUILD files frequently become out of date and the Python developers are unhappy because the Python codebase is not managed by pip. To make matters worse, it is very easy to escape the sandbox in python by importing file based on the __file__ path. As a result, a python script can work in Bazel even with missing dependencies, leading to subtle caching bugs. The ideal situation would be that the Python codebase become a local Python package which is picked-up by pip and made available to the Bazel codebase like any other package.

Changes:

pip already knows how to install local package. There are however, two missing pieces to make it work in Bazel:

  • rules_python should ask bazel to watch the directory so that the wheel gets rebuilt when the content has changed. This PR makes whl_library automatically watch any package specified using a file:... path.
  • pip does not accept relative paths with the file:// protocol, which makes dealing with local packages a bit tricky. Since environment variables are expanded by pip in requirements lock files, the usual approach is to use an environment variable to point to the top of the project. It seems that poetry, pdm and uv have converged on the PROJECT_ROOT variable which is set automatically. This PR adds a new project_root attribute to whl_library and pip.parse which, if set, will automatically set the PROJECT_ROOT env var. By default, the PROJECT_ROOT variable is not set.

Examples:
This PR adds an example of local package installed by pip.parse. The showcases the project_root attribute as well. To make it more interesting, the local package exports an entry point and depends on another python package. Two test are provided in the example.

Issues/limitations/Comments:

There is some ambiguity about how local packages should be specified in requirements files. As far as I understand, the only officially supported way is to the use the file:// protocol which is explicitely supported in the specification. Support for any other non-standard paths is very spotty anyway in the Python ecosystem, and subtly broken in different ways in most tools. This PR does not add any non-standard behaviour and stick to the file:// protocol.

The requirement lock file was compiled by uv. It seems that modern managers like uv, poetry and pdm are aware of environment variables and internally expand them when resolving them but keeps them intact when writing the requirement lock file. This is not the case of pip-compile which expands it to a full path. I guess this is why rules_python automatically rewrites absolute paths after running pip-compile. Unfortunately this breaks in the presence of file:. Here is an example:

# In pyproject.toml:
file://${PROJECT_ROOT}/hello
# Expanded by pip-compile:
file:///absolute/path/to/hello
# Rewrite by rules_python:
file://hello

But now file://hello is considered invalid by pip: non-local file URIs are not supported on this platform.
As a result, the requirement file cannot be created by the compile_pip_requirements rule. I am not sure how this should be fixed.

This PR may be incompatible with #2345 but I think it may supercede it in a way: if a wheel file is passed with a file:// then it will be watched by Bazel (the code does not care whether it is a file or a directory) so this should produce exactly the same result. I haven't yet confirmed that this is the case.

@pamaury pamaury force-pushed the enable_local_packages branch 2 times, most recently from dd19d39 to 5d9e158 Compare December 20, 2024 13:43
Signed-off-by: Amaury Pouly <amaury.pouly@lowrisc.org>
Signed-off-by: Amaury Pouly <amaury.pouly@lowrisc.org>
Signed-off-by: Amaury Pouly <amaury.pouly@lowrisc.org>
@pamaury pamaury force-pushed the enable_local_packages branch from 5d9e158 to 8e5751d Compare December 20, 2024 14:08
Copy link
Collaborator

@aignas aignas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of your situation sounds like the problem is not that rules_python is lacking features but rather that the organization you are in does not like/prefer bazel as a build system and I think the solution to that problem is a non-technical. I very well understand this pain, but I am not sure rules_python features will ever solve this fully.

In order to make rules_python more maintainable we would like to keep it focused only to standards and it seems that PROJECT_ROOT concept is not something that I can easily find when googling. That means that the project would have the burden of teaching users what this new rules_python-specific thing means and how to use it in their setup.

  • uv treats PROJECT_ROOT as an env var, but I don't see any docs in their GH project except for this comment here
  • poetry doesn't have any mentions of that in their GH project
  • pdm seems to have it documented according to their GH project

What is more, I think that staying compatible with #2345 is something that I would prefer.

@pamaury, if you had #2345 available as part of rules_python feature set, would it be useful to you?

Other ways to make the PROJECT_ROOT work would be to extend the envsubst to also expand the requirements sources instead of creating an extra attribute just for a single string value.

if subst_req.startswith("file://"):
_, path = subst_req.split("file://", 1)
logger.info(lambda: "watching tree {} for wheel library {}".format(path, rctx.name))
rctx.watch_tree(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is the correct behaviour - will we start watching more than just the intended python files? What if the path is not a dir? The watch_tree will fail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if this is a directory then the correct behaviour should be to fall back to what #2345, or to use watch instead of watch_tree. Yes, it could end up watching more than the intended python files but as far as I know, pip simply copies all of the files as well so I think this is the correct thing to do in that sense. This is definitely a trade-off in that using a local package means giving up on precise dependency tracking of the files within that package.

@@ -193,6 +193,10 @@ def _whl_library_impl(rctx):
# Manually construct the PYTHONPATH since we cannot use the toolchain here
environment = _create_repository_execution_environment(rctx, python_interpreter, logger = logger)

# Add a PROJECT_ROOT environment variable
if rctx.attr.project_root:
environment["PROJECT_ROOT"] = str(rctx.path(rctx.attr.project_root).dirname)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need an extra value here instead of adding PROJECT_ROOT into envsubst attr?

# Assume that such packages are imported using a single line requirement
# of the form [<name> @] file://<path>
# We might have to perform some substitutions in the string before searching.
subst_req = envsubst(rctx.attr.requirement, environment.keys(), lambda x, dft: environment[x])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pamaury
Copy link
Author

pamaury commented Dec 22, 2024

Thank you for your comments, I understand that PROJECT_ROOT is not standard and this is not something that rules_python would like to have. My understanding is that #2345 only supports wheel files and not directories which makes it not ideal. If #2345 supported local directories by watching them automatically then I think it would be great.

The issue with PROJECT_ROOT is that it supposed to point to project root of this python requirement file, so every pyproject.toml is its own root basically and it cannot be handled by a global PROJECT_ROOT env var. An alternative I thought about was to add an attribute like environment but where the right-hand side is a label:

pip.parse(
    hub_name = "pypi",
    # If we want to expand the PROJECT_ROOT variable in the requirement lock file,
    # we need to pass a label to the file defining the project root.
    envlabels = {
       "PROJECT_ROOT: "//:pyproject.toml",
    },
    # We need to use the same version here as in the `python.toolchain` call.
    python_version = "3.9.13",
    requirements_lock = "//:requirements_lock.txt",
)

which has the advantage of being more general, although this is still quite a specific use case. That being said, the environment variable problem is less of an issue because I think I could work around it by creating a custom repository rule for the sole purpose of expanding this variable and then make this expanded version the input to pip.parse.

@pamaury
Copy link
Author

pamaury commented Jan 1, 2025

@aignas I have thought about the problem a bit more. I think the PROJECT_ROOT env var is orthogonal and I can easily create a custom repository rule to pre-process the python requirement file. However, there is still the issue of having a path to a directory in the requirement file. I understand that watch_tree may be too invasive in general for that. Maybe an alternative solution would be something similar to #2345 except that instead of providing a label of a wheel file, we pass a label to a repository? This way one can manually create a repository for local packages the way they want.

@aignas
Copy link
Collaborator

aignas commented Jan 2, 2025

Thanks. The probem is that there is no way to provide a label to a repository. We can only provide a label to something within the repository.

I'm still wondering if this is being solved in the right domain - this sounds a little bit what gazelle is trying to do - make python code automatically available as bazel targets. Having such code in the repository rule or module extension stage is high maintenance and long term it doesn't really work. Hence my (and other maintainers') preference to keep the logic to a minimum.

Is using gazelle out of question in your case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants