Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-34033: distutils: byte_compile() sort files #8057

Closed
wants to merge 1 commit into from
Closed

bpo-34033: distutils: byte_compile() sort files #8057

wants to merge 1 commit into from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Jul 3, 2018

To get reproducible builds, byte_compile() of distutils.util now
sorts filenames.

Patch coming from OpenSUSE: distutils-reproducible-compile.patch.

https://bugs.python.org/issue34033

@vstinner
Copy link
Member Author

vstinner commented Jul 3, 2018

I'm not sure if this change should be documented in What's New in Python 3.8.

IMHO it's fine to backport this change to Python 3.7. But I don't think that it's worth it to backport the change to 2.7 and/or 3.6.

@vstinner
Copy link
Member Author

vstinner commented Jul 3, 2018

I created this PR as a follow-up of https://bugs.python.org/issue34022

Copy link
Member

@methane methane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some credits on NEWS entry?

@serhiy-storchaka
Copy link
Member

  1. How changing the order of compiling Python files affects the reproducibility of the build?

  2. In what cases the order of files is varying? It seems to me that in most cases byte_compile() is called with a single file or with a list created from a source with the stable order. Wouldn't be better to sort it only at the caller place if the order is not stable?

Perhaps it is worth to open an issue for discussing this and for reference in the NEWS file.

@vstinner
Copy link
Member Author

vstinner commented Jul 3, 2018

How changing the order of compiling Python files affects the reproducibility of the build?

Hum, it changes at least the creation date of the .pyc files, no?

In what cases the order of files is varying? It seems to me that in most cases byte_compile() is called with a single file or with a list created from a source with the stable order.

commands/build_py.py can call byte_compile() with more than 1 filename.

Wouldn't be better to sort it only at the caller place if the order is not stable?

A single sort() seems simpler and more reliable than modifying all calling site, especially because distutils is commonly monkey-patched and custom commands are written in setup.py files.

Perhaps it is worth to open an issue for discussing this

This change is linked to https://bugs.python.org/issue29708 "support reproducible Python builds" which is still open.

@methane
Copy link
Member

methane commented Jul 3, 2018

I found source of this patch. https://bugzilla.opensuse.org/show_bug.cgi?id=1049186

How changing the order of compiling Python files affects the reproducibility of the build?

Currently, marshal uses refcnt to determine using w_ref or not. Some immutable objects
(especially, long and str) can be cached and reused. It may affects refcnt when byte compiling.

I think we should use more deterministic way instead of refcnt. Maybe, count all constants in the module before marshal, like we did in compiling function for co_consts and co_names.
As a bonus, it may reduce resource usage too by merging constants over functions.
(e.g. ('self',) co_varnames and (None,) co_consts)

@mcepl
Copy link
Contributor

mcepl commented Jul 3, 2018

@vstinner I think the very original source of this patch is master...distropatches:pycrb36 (i.e., it was created by @bmwiedemann).

To get reproducible builds, byte_compile() of distutils.util now
sorts filenames.

Patch coming from OpenSUSE: distutils-reproducible-compile.patch.

Co-Authored-By: Bernhard M. Wiedemann <bernhard@zq1.de>
@vstinner
Copy link
Member Author

vstinner commented Jul 3, 2018

I think the very original source of this tiny patch is master...distropatches:pycrb36 (i.e., it was created by @bmwiedemann).

Thank you, I credit him in my commit message and the NEWS entry.

@vstinner
Copy link
Member Author

vstinner commented Jul 3, 2018

Serhiy: How changing the order of compiling Python files affects the reproducibility of the build?
Victor: Hum, it changes at least the creation date of the .pyc files, no?

Oh, https://bugzilla.opensuse.org/show_bug.cgi?id=1049186 contains the full rationale ;-)

@vstinner
Copy link
Member Author

vstinner commented Jul 3, 2018

@serhiy-storchaka: Do you want me to open a new issue, or is it ok to use https://bugs.python.org/issue29708 ?

@vstinner
Copy link
Member Author

vstinner commented Jul 3, 2018

@methane: "I think we should use more deterministic way instead of refcnt. Maybe, count all constants in the module before marshal, like we did in compiling function for co_consts and co_names."

Right, we can enhance marshal, but I don't think that this change is exclusive with this idea. This change is simple, tiny, safe, and cannot make Python less deterministic, no? :-)

@methane
Copy link
Member

methane commented Jul 3, 2018

Yes. We can mitigate the issue before fixing it. That's why I approved this.

@serhiy-storchaka
Copy link
Member

I think we need to understand the issue better before committing changes. When found the source of unstability of file names, we can find other similar sources and make them stable too. For example if the source is listdir() or glob(), we can consider sorting results of all listdir() or glob() in distutils and related methods.

On other side, if the problem is with reference counters in marshal, we can change the marshal module instead.

@vstinner vstinner changed the title bpo-29708: distutils: byte_compile() sort files bpo-34033: distutils: byte_compile() sort files Jul 3, 2018
@vstinner
Copy link
Member Author

vstinner commented Jul 3, 2018

Ok, I created https://bugs.python.org/issue34033 "distutils is not reproducible".

@vstinner
Copy link
Member Author

vstinner commented Jul 4, 2018

I added the scary red DO-NOT-MERGE label since @benjaminp and @serhiy-storchaka want to fix the issue differently: https://bugs.python.org/issue34033#msg320991

I keep the PR open until a different (better) fix is designed.

@vstinner
Copy link
Member Author

I'm now confused by the issue and I'm no longer sure that this PR adds anything. @methane and others are working on better solutions, so I close my PR. At least, it seems like this PR and https://bugs.python.org/issue34033 helped to highlight the issue ;-)

@vstinner vstinner closed this Jul 16, 2018
@vstinner vstinner deleted the distutils_sort branch July 16, 2018 14:50
bmwiedemann pushed a commit to bmwiedemann/openSUSE that referenced this pull request Aug 10, 2022
https://build.opensuse.org/request/show/994298
by user dirkmueller + dimstar_suse
- skip subversion tests, not that relevant to pull in
  dozens of dependencies into small bootstrap

- Add distutils-reproducible-compile.patch to make installed
  files ordered correctly and thus builds reproducible again
  (port of the fix for bpo#29708 and gh#python/cpython#8057).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants