Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import rules_python doesn't work under bzlmod #1679

Open
rickeylev opened this issue Jan 10, 2024 · 16 comments
Open

import rules_python doesn't work under bzlmod #1679

rickeylev opened this issue Jan 10, 2024 · 16 comments
Labels

Comments

@rickeylev
Copy link
Contributor

rickeylev commented Jan 10, 2024

🐞 bug report

This is the rules_python issue for bazelbuild/bazel#18128

Affected Rule

Import the runfiles library via the @rules_python//python/runfiles depending as import rules_python.python.runfiles

Is this a regression?

Effectively, yes. The established name for the libraries rules_python provides are under the "rules_python" namespace.

Description

In bzlmod, import rules_python.python.runfiles doesn't work. It works for workspace builds.

This is because, in bzlmod, the directory names of repos in runfiles changes from "foo" to "foo~version" (where version is the module version plus some other parts). Such a name isn't importable because it can't be known in advance, and isn't a valid python package a name.

🔬 Minimal Reproduction

  • Create a py_binary
  • Depend on @rules_python//python/runfiles
  • Do import rules_python

An import error will occur.

Analysis

Fixing this is sort of straightforward -- we need to get a directory onto sys.path that has a rules_python sub-directory.

Some complicating factors are:

  1. We need import python.runfiles to continue working. This is the name we told people to use in the meantime, so we'll need to stick with that.
  2. We want to avoid double imports, e.g. the same file being compiled twice; this is what happens if you do "import rules_python.python.runfiles.runfiles" and "import python.runfiles.runfiles" in a workspace build today (Python keys things by the module name, not the underlying file name)
  3. Repo roots are still on sys.path and explicit init generation is still enabled by default. This means, wherever we create a sub directory, its top-most directory is going to end up on sys.path. This means a standard src-based layout might be prone to breaking things, as dependents might be relying on import src.bla.bla themselves. This came up via slack, where I've seen people recommend using a repo-relative import name like import src.bla.
  4. Pip-based dependencies use import runfiles, so we have to be not interfere with that
  5. The runfiles code uses its own path to help location the runfiles root, so make sure to update that; this is an internal detail; just noting it for myself
  6. Personally, I worry that something is relying on the $repoRoot/python/runfiles/runfiles.py file existing. So I'm hesitant to move that file. I don't know what or why that would be the case though, so perhaps I'm being
@aiuto
Copy link
Contributor

aiuto commented Jan 10, 2024

cc: @aiuto

@matts1
Copy link
Contributor

matts1 commented Jan 10, 2024

FWIW, in ChromeOS we worked around the issue of repo mapping by creating a sitecustomize.py that reads the repo_mapping file to determine what sys.modules should contain.

Mind you, ChromeOS has a few extra hacks in there to make existing non-bazel imports work, but you can see the general idea.

@matts1
Copy link
Contributor

matts1 commented Jan 11, 2024

@rickeylev pointed out that the toolchain itself is relevant here as well, so here's the link to our toolchain. We wrap the interpreter in a script that ensures that the sitecustomize.py is in your PYTHONPATH

@armandomontanez
Copy link
Contributor

Just hit this as well, I noticed the PR that supposedly fixed this is approved but stalled. Any updates?

@tpudlik
Copy link
Contributor

tpudlik commented Jul 22, 2024

Is there a broader bug tracking the issue of Python import paths in bzlmod? rules_python is just a particular special case. Any Bazel project workspace(name = "foo") whose users imported Python libraries from it as import foo.some.module will see these users broken when switching to bzlmod.

As pointed out in bazelbuild/bazel#2636 (comment), this also seems to make it infeasible to ever flip the --experimental_python_import_all_repositories flag. Is the plan to finally give up on that otherwise-desirable migration (in which case we should update and close that issue), or do we hope to come up with a solution?

@Wyverald @meteorcloudy

copybara-service bot pushed a commit to google/pigweed that referenced this issue Jul 23, 2024
Removes --noenable_bzlmod from Pigweed's .bazelrc and applies necessary
changes to get the build working again. Also makes Pigweed usable as a
bzlmod dependency.

Most of the file changes in this CL are Python import statement changes
required because of
bazelbuild/rules_python#1679.

Bug: b/258836641
Change-Id: I8d3e514e74cd6d76e721c5113c44dcbb8ddc9d40
Reviewed-on: https://pigweed-review.googlesource.com/c/pigweed/pigweed/+/211362
Commit-Queue: Ted Pudlik <tpudlik@google.com>
Reviewed-by: Taylor Cramer <cramertj@google.com>
Lint: Lint 🤖 <android-build-ayeaye@system.gserviceaccount.com>
@meteorcloudy
Copy link
Member

meteorcloudy commented Jul 23, 2024

My personal opinion is to abandon this usage since this is Bazel specific usage. I believe for most python projects, they have a top level directory matching its pip package name. But @rickeylev should make the final decision ;)

@groodt
Copy link
Collaborator

groodt commented Jul 23, 2024

The comment from #1679 (comment) is related, but also adjacent to the original issue. The original issue relates to the "runfiles" library specifically. This is a very narrow and special library that is jointly included as part of rules_python, but also published independently as a package on PyPI and that's when we discovered this particular issue that the Python "runfiles" module that comes with rules_python will no longer work with bzlmod under the old import paths.

The original issue does relate to general user space usage of all Python modules with Bazel. Those indeed should not be using workspace name as import prefixes. That's never been a documented or recommended approach.

@matts1
Copy link
Contributor

matts1 commented Jul 24, 2024

Those indeed should not be using workspace name as import prefixes. That's never been a documented or recommended approach.

IIRC (I last looked at this around a year ago), there wasn't any documented or recommended approach, and people used the workspace name because if I import foo.bar.baz python looks for the path foo/bar/baz, and the bazel python launcher added the external directory containing the repository foo, which meant that it just worked.

To the best of my knowledge, there's no import path that will work other than that one, unless you want to screw around with sitecustomize.py like I did (but I also never dug too deep into this, so I may just not be aware of the right way to do it).

@groodt
Copy link
Collaborator

groodt commented Jul 24, 2024

It's probably a documentation issue, that wouldn't surprise me. Our examples are not great, but they do demonstrate the intended usage (and always have, while they've existed):

e.g.

@matts1
Copy link
Contributor

matts1 commented Jul 25, 2024

In that example, libs.my_lib is defined in //libs/my_lib though. What we're asking is how to import a file in @foo//libs/my_lib.

Before repo mapping, I believe that import foo.libs.my_lib would work, but now that the directory is named _main~~foo~foo or something along those lines, it won't work.

@groodt
Copy link
Collaborator

groodt commented Jul 25, 2024

Got it. If you can please raise a different issue for that?

This specific issue is about the runfiles library that previously was encouraged to be used from inside rules_python. Your issue is about general inter-repository code sharing under bzlmod.

@matts1
Copy link
Contributor

matts1 commented Jul 25, 2024

I'll go ahead and do that, but FWIW, I think that workspace name as prefixes is the correct way to go, and that if we fix that, we get this for free.

@groodt
Copy link
Collaborator

groodt commented Jul 25, 2024

I'll go ahead and do that, but FWIW, I think that workspace name as prefixes is the correct way to go, and that if we fix that, we get this for free.

In some ways yes, but in others ways no. The runfiles library is a special case and also has easy alternatives right now. It already has a solution for bzlmod in so far as people can depend on the third-party dependency through the third-party language dependency management rules.

So either through a dependency specifier in that resolves to a locked version in PyPI e.g. runfiles ~=0.30

Or as a direct dependency url specifier such e.g. runfiles @ https://files.pythonhosted.org/packages/de/a4/7d6b131e92aa1e589e97d82b54fa87c1c9162c650c8128faa3800d96b8e2/bazel_runfiles-0.34.0-py3-none-any.whl#sha256=7df2573bfda9d95f4c352907333f016d0bc5e0a3c67101a933eacc4fb001786e

The reason for this issue being created, was because it can be seen as a backwards incompatible change with the integrated runfiles library that used to come with the core language rules of rules_python and was documented usage.

There's never been documented usage for how to use or resolve python source dependencies across bazel respositories to my knowledge.

@rickeylev
Copy link
Contributor Author

Yeah, using the Bazel repo name as the Python top-level import name was always a bad idea. Even prior to bzlmod, because repos names were decided by the end user, you never had a strong, reliable, top-level Python name for a library made available through that mechanism. The best analogy I can draw is with Java: import rules_java.com.google.bazel.blabla; is obviously non-sensical -- any Java person would immediately scoff at having to prefix their Java imports with rules_java (or whatever the Bazel repo name happened to be).

Unfortunately, using the repo name turned into some sort of mild convention. And bzlmod threw a big wrench into it. Doing the Right Thing (rehoming code to another directoy and using the imports attribute) is a bit of a lift for projects relying on that convention.

A lighter weight, more interim/transnational, solution would be nicer. I'm not sure exactly what that solution would look like, though. It's complicated by the fact that the runfiles root and each repo's root is added to sys.path -- both undesirable behaviors. I think the lighter solution looks something like, if you wanted to make import rules_python Just Work again, then it'd be something like:

  • Create $repo/rules_python/__init__.py (where $repo/ is the repo root, e.g. the root of the github source tree). This makes import rules_python work under bzlmod, but only because the repo root is still added to sys.path by default.
  • Make that $repo/rules_python/__init__.py do __path__.append('..'). This makes import rules_python.XXX search the parent directory of import rules_python, and thus find $repo/runfiles as rules_python.runfiles.
  • Make @rules_python//runfiles:runfiles do imports = ['..']. This makes deps=["@rules_python//runfiles:runfiles"] ensure the $repo is added to sys.path, and thus $repo/rules_python is importable as import rules_python. Even if after repo roots are no longer added to sys.path by default.

I think that would mostly work? The issue with playing games like this with sys.path and a package's sub-module search path (__path__) (or the import system in general) is it can break, or otherwise act weird, in edge cases.

Also, I'm pretty sure the above can also be done on the user side. e.g., if you're @foo, you can add a directory named rules_python, futz with __path__ as above, and cause it to import stuff from @rules_python. If multiple repos are doing that, things might get weird.

@matts1
Copy link
Contributor

matts1 commented Jul 26, 2024

Yeah, using the Bazel repo name as the Python top-level import name was always a bad idea. Even prior to bzlmod, because repos names were decided by the end user, you never had a strong, reliable, top-level Python name for a library made available through that mechanism. The best analogy I can draw is with Java: import rules_java.com.google.bazel.blabla; is obviously non-sensical -- any Java person would immediately scoff at having to prefix their Java imports with rules_java (or whatever the Bazel repo name happened to be).

I'm not sure I buy that it was a bad idea. Namespacing is what prevents conflicts. If I depend on the py_library targets @foo//lib and @bar//lib, then if we didn't include the repo name we'd be using import lib for both, which is both a conflict, but also makes it almost impossible to work out which file you're attempting to import.

I, for one, would much prefer writing import rules_python.runfiles over import runfiles, as it adds clarity about where I'm getting this "runfiles" from, in addition to preventing conflicts.

FWIW, in ChromeOS, our main repo was called cros, and every time you wanted to import something in the main repo we'd write import cros.foo.bar to reference @@//foo/bar. It worked very well, and there was never any confusion about where the imports came from. It also meant that if we ever exported our module, it would just work and others would just be able to import our libraries, since cros would always be repo-mapped to our repo from itself.

@armandomontanez
Copy link
Contributor

I feel like part of the problem here is that it feels like rules_python should be exposing runfiles as rules_python.runfiles--regardless of the underlying technical issues at hand. Admittedly maybe that's just due to my naive belief that overly generic names are problematic.

Generally, it's probably wisest for projects to create well-formed python packages, which unfortunately means storing the package in a subdirectory that provides a project/package namespace. In Pigweed, we almost never relied on the import paths provided by rules_python. Even when we did, the only exceptions were for some tests. I reluctantly feel like that's the most robust way to structure a project. Our Python libraries were well-formed python packages before we started embracing Bazel, so this change didn't impact us much (besides the confusion of rules_python.runfiles moving).

I would like to echo, though, that this is a pretty jarring behavioral change that seems to warrant some sort of migration path. It's surprising to me that this issue wasn't immediately closed with "this is intentional behavior, we flipped the default value of X flag with plans to remove the option entirely in version YY"

dpemmons pushed a commit to dpemmons/pigweed-quickstart that referenced this issue Aug 5, 2024
Removes --noenable_bzlmod from Pigweed's .bazelrc and applies necessary
changes to get the build working again. Also makes Pigweed usable as a
bzlmod dependency.

Most of the file changes in this CL are Python import statement changes
required because of
bazelbuild/rules_python#1679.

Original-Bug: b/258836641
Original-Reviewed-on: https://pigweed-review.googlesource.com/c/pigweed/pigweed/+/211362
Lint: Lint 🤖 <android-build-ayeaye@system.gserviceaccount.com>

https://pigweed.googlesource.com/pigweed/pigweed
pigweed, pw_toolchain Rolled-Commits: 4f0412dae17b20a..00d99767eb5ad66
Roller-URL: https://ci.chromium.org/b/8741667154616515185
GitWatcher: ignore
CQ-Do-Not-Cancel-Tryjobs: true
Change-Id: I5437982f041b87873fb4d81e4113584038a3f1cf
Reviewed-on: https://pigweed-review.googlesource.com/c/pigweed/quickstart/bazel/+/225374
Bot-Commit: Pigweed Roller <pigweed-roller@pigweed-service-accounts.iam.gserviceaccount.com>
Commit-Queue: Pigweed Roller <pigweed-roller@pigweed-service-accounts.iam.gserviceaccount.com>
Lint: Lint 🤖 <android-build-ayeaye@system.gserviceaccount.com>
virajbshah added a commit to virajbshah/gematria that referenced this issue Sep 11, 2024
 * `runfiles` is provided both as part of the Bazel `rules_python`
   repository as `@rules_python//python:runfiles` and on PyPI as
   the `bazel-runfiles` pip package.
 * It could be imported from the Bazel repository as
   `rules_python.python.runfiles` when using WORKSPACE, but needed
   to be imported as `python.runfiles` when using Bzlmod.
 * However, the pip package is independent of whether WORKSPACE or
   Bzlmod is being used.
 * See bazelbuild/rules_python#1679 for
   details.
virajbshah added a commit to google/gematria that referenced this issue Sep 11, 2024
* `runfiles` is provided both as part of the Bazel `rules_python`
   repository as `@rules_python//python:runfiles` and on PyPI as
   the `bazel-runfiles` pip package.
 * It could be imported from the Bazel repository as
   `rules_python.python.runfiles` when using WORKSPACE, but needed
   to be imported as `python.runfiles` when using Bzlmod.
 * However, the pip package is independent of whether WORKSPACE or
   Bzlmod is being used.
 * See bazelbuild/rules_python#1679 for
   details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants