Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using all_requirements causes OSError: [Errno 7] Argument list too long #792

Closed
anuragkanungo opened this issue Aug 16, 2022 · 13 comments
Closed
Labels
Can Close? Will close in 30 days if there is no new activity

Comments

@anuragkanungo
Copy link

🐞 bug report

Affected Rule

The issue is caused by the rule: all_requirements

Is this a regression?

Yes, the previous version in which this bug was not present was: 0.9.0
The bug exists in : 0.10.2 and 0.11.0

Description

When using all_requirements in a rule like

py_binary(
    name = "mybinary",
    srcs = ["main.py"],
    main = "main.py",
    deps = all_requirements,
)

It cause the error

Traceback (most recent call last):
  File "/home/user/.cache/bazel/_bazel_user/bde0c2a7e98d3dd9f99d720839ba72c4/execroot/org/bazel-out/k8-fastbuild/bin/third_party/ipython/ipython", line 392, in <module>
    Main()
  File "/home/user/.cache/bazel/_bazel_user/bde0c2a7e98d3dd9f99d720839ba72c4/execroot/org/bazel-out/k8-fastbuild/bin/third_party/ipython/ipython", line 382, in Main
    os.execv(args[0], args)
OSError: [Errno 7] Argument list too long: '/home/user/.cache/bazel/_bazel_user/bde0c2a7e98d3dd9f99d720839ba72c4/execroot/org/bazel-out/k8-fastbuild/bin/third_party/ipython/ipython.runfiles/python_x86_64-unknown-linux-gnu/bin/python3'

🌍 Your Environment

Operating System: Ubuntu 20.04
Output of bazel version: 5.2.0

@hrfuller
Copy link
Contributor

hrfuller commented Aug 19, 2022

This is an issue with the upstream native bazel rules not rules_python. The problem is that the PYTHONPATH set by the python_stub_template is longer than the allowed system limits. You can try to set --noexperimental_python_import_all_repositories to avoid adding PYTHONPATH entries for all external repositories, but you may encounter issues with rules_docker, rules_k8s and others that rely on implicit repository imports. Another possible path forward would be to use https://github.com/aspect-build/rules_py which uses a site-packages based import system for python binaries IIRC.

@anuragkanungo
Copy link
Author

anuragkanungo commented Aug 19, 2022

@hrfuller, the issue happens doesn't exist with rules_python version 0.9.0 but it does after that, with the same list of packages. I believe something did change in the refactor that happened after 0.9.0.

@el-gee
Copy link

el-gee commented Sep 23, 2022

Our team can confirm this behavior/bug in our environment, although we have yet to make an easily reproduceable and shareable example. rules_python version 0.10.0 throws the error while 0.9.0 does not. As pointed out, the issue isn't with one's python interpreter per say - despite the verbiage of the error - but rather the PYTHONPATH created (python_imports we believe on line ~294 of the generated script). We are not totally clear exactly what's going on, but it feels like both parties are correct here: something did change in rules python, but the thing actually throwing the error is from upstream native bazel.

Additionally, the situation appears even more complicated and subtle as it depends on the host machine configuration (aka some VMs throw the error for some targets while others do not). We think this is related to perhaps different ulimits/max arg limits across machines with different memory amounts and settings. Having said that, our very large CI machines do exhibit the issue while some random developer VMs appear to not, so it's not strictly a "sizing" issue. If it were possible to simply update an OS setting without recompiling the kernel we would be happy with that as a reasonable solution.

Finally, for some background, our team does not use the all_requirements helper, but rather the error comes about naturally from our own unique codebase. We have about 450 external dependencies and maybe 100-200 first party apps/libraries in a monorepo. A representative failing target show roughly 25 first party --output=package dependencies and 200 or so third party external dependencies. These total to a roughly 14,000 character python_imports line which we think is what's causing the issue.

We'd love to help however possible with a fix. There are exciting features after 0.9.0 that we're eager to bring in to our codebase. We really appreciate the all the hard work from and for the community😄

@el-gee
Copy link

el-gee commented Sep 26, 2022

Some further updates.

  • my previous post was incorrect wrt the versions, 0.10.0 has the issue, 0.9.0 does not. Post edited for clarity
  • the offending commit seems to be 09457c7 . This was verified by running our tests against each commit in the 0.10.0 release and seeing when things started to fail.
  • still no easily shareable & reproduceable case sadly

We'll try to dig in more as to exactly what specifically could have caused this from the commit.

@groodt
Copy link
Collaborator

groodt commented Sep 26, 2022

Just for context, all_requirements is considered a bit of an anti-pattern. Is there a way for you to explicitly list your dependencies instead? Does the same error occur?

@el-gee
Copy link

el-gee commented Sep 26, 2022

@groodt I just read your comment #715 (comment) and it seems very similar to our situation, except you get lucky and go just under the limit. As I mentioned in my ugly wall of text 🙃 we do not use the all_requirements helper but rather at least attempt to explicitly list our dependencies (understand there are others in the thread too however).

Really appreciate your time and efforts on this all!

@anuragkanungo
Copy link
Author

@groodt , on the all_requirements question, we use it to provide ipython or jupyter tooling for ML experimentation such that engineers don't have to predefine all the packages needed for experimentation, makes it very useful.

@anuragkanungo
Copy link
Author

Also, using the flag --noexperimental_python_import_all_repositories does fixes it, though I am not sure of associated side effects of that.

@groodt
Copy link
Collaborator

groodt commented Sep 27, 2022

@anuragkanungo Yes, it's still an antipattern though. You can build tooling outside the rules to enumerate things explicitly if you need to.

@el-gee
Copy link

el-gee commented Sep 30, 2022

This appears to be a subtle bug arising from some latent assumptions between rules_python and the upstream bazel native python_stub_template. The change effectively doubles the length of the PYTHONPATH , which is known to be an issue; see https://github.com/bazelbuild/bazel/blob/9d0163002ca63f4cdbaff9420380d72e2a6e38b3/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt#L424 .

This occurs, afaict, b/c the template's Deduplicate function merely puts everything into a set. See: https://github.com/bazelbuild/bazel/blob/9d0163002ca63f4cdbaff9420380d72e2a6e38b3/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt#L232

But this is no longer valid as the modules from GetRepositoriesImports do not match what rules_python passes in; the natively found modules do not have site-packages suffixed. See: https://github.com/bazelbuild/bazel/blob/9d0163002ca63f4cdbaff9420380d72e2a6e38b3/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt#L192

Thus the PYTHONPATH will have effectively 2 versions of the same package: /path/to/foo and /path/to/foo/site-packages . Only one of which is valid in the sense of providing working python imports.

For many users, this may not matter, but it seems to be an unintended side effect and at worst could be viewed as an incompatibility between rules_python and upstream bazel, if we understand things correctly. If this summary is accurate, we're not clear exactly what the best next steps would be.

We've found roughly 4 variables one can tweak to try and sneak under the limit if one finds themselves in the situation.

  • first and foremost, limiting the the number of deps, as each dep gets it’s own entry in the python path
  • limiting the the prefix for the external deps via pip_parses name, as each dep will be prefix’d with it
  • limiting the name of your target, as each dep will also basically be prefixed with that too in the pathing
  • limiting your username / workspace root depth

We haven't been able to make an easy test for this yet, but I imagine if one did the opposite of the suggestions (go larger for each rather than smaller) on a moderately sized package they could reproduce the error.

@rickeylev
Copy link
Collaborator

rickeylev commented Oct 1, 2022

Just to make sure I understand the situation: The basic problem is that, as a side-effect of using pip_parse (or pip_install?), many repos are created, and they end up on PYTHONPATH, and make it too long. These repos are just internal, implementation details. They don't need to be added to PYTHONPATH.

Is the above correct?

Also, using the flag --noexperimental_python_import_all_repositories does fixes it, though I am not sure of associated side effects of that.

The basic side effect is that every immediate-sub directory under the runfiles root will no longer be added to PYTHONPATH. see here

The basic implication here is that Python libraries that don't include their repo name in their import statements will break. Python libraries that do include their repo name in their import statements will be OK.

edit: I should clarify this is the case ignoring when something else adding things to PYTHONPATH, such as e.g. py_library(imports=...)

i.e.,:

# WORKSPACE:
http_archive(name=other_libs_repo, github.com/other-libs-project)
# Source layout 
github.com/other-libs-project
  other_libs/
    __init__.py
    other.py
http_archive(name=more_repo, github.com/more-project)
# source layout
github.com/more-project
  __init__.py
  morelib.py

workspace(name=myproject)
# Source layout
mylib1/
  __init__.py
  mylib1a.py
mylib2/
  __init__.py
  mylib2a.py

# -> results in runfiles layout
foo.runfiles/
  myproject/
    mylib1/
      __init__.py
      mylib1a.py
    mylib2/
     __init__.py
     mylib2a.py
  other_libs_repo/
    other_libs/
      __init__.py
      other.py
  more_repo/
    __init__.py
    morelib.py
# note I've omitted generated init files, which are enabled by default.

Before, "import mylib1" and "import other_libs" would work. After, they will no longer work. Whether this matters or not depends on what the code in a repo expects -- "import more_repo.morelib" works in both cases, for example.

Technically, you can work around this by setting e.g. py_binary(imports=<workspacename>). Each string in the imports attribute is treated as a repo-runfiles-root-relative path see here.

But, understandably, this is not an appealing option if you have a lot of targets. It's also problematic if you have a dependency you don't control.

I have some ideas that might help solve this, but gtg now.

We haven't been able to make an easy test for this yet, but I imagine if one did the opposite of the suggestions (go larger for each rather than smaller) on a moderately sized package they could reproduce the error.

The execve manpage says that the env limits are based on the stack size ulimit. So you can you probably do e.g. ulimit -s <smaller> to lower the limit to more easily reproduce. (When testing in your own shell, I highly recommend running ulimit and your test command in a sub-shell to avoid messing up you your own shell's settings)

We've found roughly 4 variables one can tweak to try and sneak under the limit if one finds themselves in the situation.

Is there something we can change in the rules_python bzl code to also shorten these paths as a mitigation measure that would help?

but it seems to be an unintended side effect and at worst could be viewed as an incompatibility between rules_python and upstream bazel

The real root causes all lie in native.py_*. The repo name should never have been so tightly coupled to the Python import name, the repo roots should never have been always added to PYTHONPATH, and PYTHONPATH should never have had the closure of dependencies appended into it.

As someone said upthread, the real fix for this is a site-packages like layout (or equiv) so that PYTHONPATH and sys.path don't have to scale with the dependency closure size. I've been talking to the Bazel team internally making the point that to do this efficiently, Bazel core needs some sort of feature to allow better control over where things are put within runfiles.

(This might be a bit of cold-comfort insofar as rules_python is concerned, but I still think it's worth calling out to get fixed; I brought this up internally this week, actually)

Ultimately, I think you want to enable that flag to remove the repo roots from path. The question then becomes how to get any imports that rely on the repo name working again. I have a couple ideas that we might be able to do on the rules_python Starlark side, but gtg right now.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days.
Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

@github-actions github-actions bot added the Can Close? Will close in 30 days if there is no new activity label Mar 31, 2023
@github-actions
Copy link

github-actions bot commented May 1, 2023

This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Can Close? Will close in 30 days if there is no new activity
Projects
None yet
Development

No branches or pull requests

5 participants