-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic bytecode in recent versions of Python #1419
Comments
You're also pre-compiling
Based on python/cpython#73894 (comment), I disabled multiprocessing and achieved something that gives the same hash every time (at least on Please note that your use-case is not supported by manylinux which only aims at providing images for building wheels. The interpreter is not meant to be portable (it might work or it might not, this is an implementation detail, not a feature).
|
Apologies if this is not the right place, but I don't know if it's an issue in cpython itself, or if it's coming from the build scripts/configs/dependencies of the manylinux images, so I'm opening this issue here first.
I'm using the manylinux images in order to build AppImages for my application, which bundle one of the pre-built python environments (after modifying the
RPATH
of each binary viapatchelf
and making them relative to the AppImage's root directory). While building the AppImages and installing my application in the container viapip
, bytecode gets compiled, and not just for the python application and its dependencies, but also for the stdlib during the execution ofpip
, as no stdlib bytecode is included in any of the manylinux images. Since my intention is to build reproducible AppImages, I'm expecting that everything is deterministic when building and copying the AppImage contents, including the compiled bytecode. The compiled bytecode however is not deterministic/reproducible, despite having set thePYTHONHASHSEED=0
andSOURCE_DATE_EPOCH=...
environment variables.The issue apparently seems to have only been introduced recently, and only in some versions of CPython. I've already spent a couple of hours figuring out the reason for this and testing stuff with the
compileall
module, but to no success.That's why I've created this simple BASH script which compiles the missing bytecode of the stdlib in each included CPython environment, in multiple different manylinux images, and it prints a sha256 checksum of all the compiled bytecode, so it can be compared against different runs.
As you can see in the following log output, the compiled bytecode was deterministic at some point in some versions, but then it started to compile non-deterministic bytecode. I haven't found the exact point yet where this started.
3.10.x
has been broken the whole time, while3.7.x
and3.11.x
appear stable. The first issues (according to the limited number of images tested) occurred in3.8.12
and3.9.9
, and after that the results a completely unstable.Once again, I'm aware that this might be an issue in cpython itself and doesn't belong here, but apparently the number of compiled stdlib modules of the same Python version differs between different manylinux images (see the stable checksums of 3.7.x), so I don't know what this is about, and this suggests that other modifications might have an influence here.
The text was updated successfully, but these errors were encountered: