Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-determinism in pip_parse or pip_import #571

Closed
alexeagle opened this issue Nov 16, 2021 · 14 comments
Closed

Non-determinism in pip_parse or pip_import #571

alexeagle opened this issue Nov 16, 2021 · 14 comments

Comments

@alexeagle
Copy link
Collaborator

#551 (comment) reports that different machines running the pip operations result in different outputs, making cache busts on subsequent actions that depend on python requirements in their inputs.

@aignas
Copy link
Collaborator

aignas commented Nov 17, 2021

Funny enough I ran into this today and was wondering as to why the bazel-remote cache was not working as expected in our CI, but was working correctly in local setups.

@taylorjohnson
Copy link

Hi. Is this addressed by #570 or is there additional work needed? Thanks!

@UebelAndre
Copy link
Contributor

There is still more work. At the very least, packages which compile C/C++ code are subject to non-deterministic targets. There's still one open question on that PR which may mitigate this just a bit more (#570 (comment)) but ultimately, the pip rules need a way to allow users to express generally things that should be excluded from the generated targets and allow them to provide the same information from Bazel. Or, that's my own opinion. It'd be great to blanket ignore all *.so files and somehow annotate pip_parse to take a cc_binary at build time as a replacement. I'm not sure how that would be wired up though.

@UebelAndre
Copy link
Contributor

Though that said, if the packages you depend on are pure python packages (don't build any extra files when setting up the wheel) then I don't think you'll run into determinism issues. Though, if you're impacted by *.pyc files and __pycache__ directories, odds are your packages are doing just that.

@aignas
Copy link
Collaborator

aignas commented Dec 6, 2021

How is this related to use cases when users are using binary wheels and even though we have *.so files present in the wheel, they are not built by a local bazel setup, but rather downloaded from PyPI from a know wheel.

@groodt
Copy link
Collaborator

groodt commented Dec 6, 2021

How is this related to use cases when users are using binary wheels and even though we have *.so files present in the wheel, they are not built by a local bazel setup, but rather downloaded from PyPI from a know wheel.

Are you sure the entire transitive closure comprises of Wheels?

It's often the case that there is an sdist (aka source distribution) somewhere in the transitive closure and this builds on every host machine. This can easily introduce non-determinism, since the building of dependencies in Python is essentially eval setup.py.

@aignas
Copy link
Collaborator

aignas commented Dec 6, 2021

How is this related to use cases when users are using binary wheels and even though we have *.so files present in the wheel, they are not built by a local bazel setup, but rather downloaded from PyPI from a know wheel.

Are you sure the entire transitive closure comprises of Wheels?

It's often the case that there is an sdist (aka source distribution) somewhere in the transitive closure and this builds on every host machine. This can easily introduce non-determinism, since the building of dependencies in Python is essentially eval setup.py.

I am not 100% sure and I would like to understand this better, how would I check this? We have been using matplotlib and other python wheels which would require having a lot of build time dependencies on our CI docker images. We chose to not install those dependencies and instead rely on running pip-compile within the same docker image that we use for CI and that yields the desired result for now.

Before doing that we needed to have clang present in order to install the regexp package. Now we do not need to have clang installed, because we are locking the dependencies in a slightly better way (I think).

@UebelAndre
Copy link
Contributor

How is this related to use cases when users are using binary wheels and even though we have *.so files present in the wheel, they are not built by a local bazel setup, but rather downloaded from PyPI from a know wheel.

In my investigations, I've found packages which compile some C code that may produce different results per build. But I've also see wheels with embedded .so files. The former case is what seems to cause non-determinism. The latter obviously doesn't cause issues.

@groodt
Copy link
Collaborator

groodt commented Dec 6, 2021

I am not 100% sure and I would like to understand this better, how would I check this?

You can't test this with rules_python easily (maybe enable verbose logging and look for builds happening).

However, to check if your transitive closure has sdist, you can take your lockfile and try pull down only wheels as follows:

cd /tmp
python3 -m pip wheel --requirement requirements.txt --only-binary :all:

That will begin to download wheels from your package index (or PyPI) and will fail if there are missing wheels.

@alexeagle
Copy link
Collaborator Author

I would like to distinguish between non-determinism that has always been present in rules_python and the underlying ecosystem, and what was recently introduced by the pip_import rule. If the latter is just as deterministic as anything else, then we don't need to warn users away from it like in the 0.5.0 release notes.

@UebelAndre
Copy link
Contributor

I think there's been some confusion here. My comment (#551 (comment)) was saying that the rules were gave no indication that they were non-deterministic which (IMO) is the expectation from any Bazel rule. So when I ran into non-deterministic behavior, I thought it warranted re-applying the "experimental" badge to the rules so users don't adopt them without question and later find tons of cache invalidation. To my knowledge there's no specific issue introduced in 0.5.0 introducing non-deterministic behavior. That said, I feel strongly that the rules should have a way to guarantee deterministic results regardless of what package is being setup. Specifically, this would be some flags allowing users to ignore things that would be non-deterministic, flags that allow them to replace non-deterministic files with ones produced by bazel (in a deterministic way), and flags to explicitly disable executing any code (from a wheel) to setup a wheel. With these things and a note that not all packages are deterministic, I think the rules would be safe to consider "non-experimental" or "production ready".

TLDR I agree with @alexeagle 😄

@alexeagle
Copy link
Collaborator Author

Thanks @UebelAndre I think we can close this then.

The repo description/tagline is "Experimental Bazel Python Rules", and in this case the non-determinism comes from external tools we spawn so the proper fix belongs there (imagine pip install --deterministic or something)
same problem in npm where temporary files like Makefile in the node-gyp native compilation get left behind and cause cache busts.

We can and should do what we can do document it, and users need some determinism-checker tool to discover the problems more easily.

@UebelAndre
Copy link
Contributor

Are there other issues for those? If not I think they should be made and linked with this ticket

@UebelAndre
Copy link
Contributor

I think #589 is what's needed to truly resolve this issue. It allows users to correct issues related to non-determinism in their dependencies on a case-by-case basis. I think this is the best that can be done unless python itself is going to impose some determinism check.

bmclarnon added a commit to google-parfait/confidential-federated-compute that referenced this issue Mar 25, 2024
Enabling remote execution requires passing `--config=cfc-remote`, which
will allow us to roll out remote execution slowly -- and roll it back if
needed.

Several targets and tests are incompatible with remote execution had
have been tagged as such; these targets will be addressed in subsequent
changes:

  * `oci_image` doesn't work with remote execution, though a newer
    version may.
    bazel-contrib/rules_oci#477
  * `pip_parse` builds source dependencies based on the host platform,
    which (a) introduces non-determinism and (b) can result in binaries
    that cannot run on the remote host (e.g., due to incompatible glibc
    versions). We may be able to use annotations to substitute in
    versions of dependencies built by bazel, not pip:
    bazelbuild/rules_python#571.
    But we also might be able to remove the Python targets...

Bug: 321291571
Change-Id: Id4b2ece75f7945736e0e3c6bba056c85842e9618
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants