Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-78214: marshal: Stabilize FLAG_REF usage #8226

Merged
merged 5 commits into from
May 4, 2022

Conversation

methane
Copy link
Member

@methane methane commented Jul 10, 2018

marshal.dumps() tests refcnt(obj)==1 to decide use FLAG_REF or not.

But refcnt of interned string is very unstable.
When compiling same source, refcnt of interned string in the output
may be 1 or >1. It makes FLAG_REF usage unstable.

To help reproducible build, use FLAG_REF for interned string even if
refcnt(obj)==1.

https://bugs.python.org/issue34033

@methane methane added type-feature A feature request or enhancement awaiting core review DO-NOT-MERGE and removed awaiting merge labels Jul 10, 2018
@methane methane changed the title bpo-34033: marshal: Use FLAG_REF stably bpo-34093: marshal: Use FLAG_REF stably Jul 11, 2018
@serhiy-storchaka serhiy-storchaka self-requested a review July 12, 2018 05:45
@serhiy-storchaka
Copy link
Member

I'm in process of reviewing.

@methane methane changed the title bpo-34093: marshal: Use FLAG_REF stably bpo-34093: marshal: Stabilize FLAG_REF usage Jul 14, 2018
@serhiy-storchaka
Copy link
Member

Do you mind to provide also your initial PR? I want to try to optimize or simplify it.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't using FLAG_REF for all interned strings slows down unmarshalling and increases the memory consumption for both marshalling and unmarshalling?

@methane
Copy link
Member Author

methane commented Jul 16, 2018

Isn't using FLAG_REF for all interned strings slows down unmarshalling and increases the memory consumption for both marshalling and unmarshalling?

For unmarshaling speed, it's possible. But interned string has cost of interning; create temoporary string and calling PyDict_SetDefault(). Overhead of FLAG_REF (PyList_Append) is much smaller than it.

For marshaling memory overhead, it will increase hashtable size for each interned string with refcnt==1.
For unmarshaling memory overhead, it will increase list length too.

I don't expect it can be pragmatic problem though.

@mcepl
Copy link
Contributor

mcepl commented Jun 16, 2020

@bmwiedemann Do you know about this one?

@bmwiedemann
Copy link
Contributor

I might have seen this patch 2 years ago. Chances are, I didnt try it, because it did not apply cleanly to the python we had in Tumbleweed.

@mcepl
Copy link
Contributor

mcepl commented Aug 6, 2020

@bmwiedemann Yes, you are right, this is basically non-applicable (especially the monstrosity in these two conflicting importlib*.h files).

@methane Do you think, you would be willing and able to port this PR to something more recent (e.g., 3.8 or master), please?

@methane
Copy link
Member Author

methane commented Aug 7, 2020

importlib.h and importlib_external.h are generated files. You can apply 6c8ea7c cleanly and "make regen-importlib".

@mcepl
Copy link
Contributor

mcepl commented Aug 7, 2020

@bmwiedemann
Copy link
Contributor

@mcepl I tried my multibuildrbk script for a python38-base build there and got:

--- old /usr/lib64/python3.8/test/test_json/__pycache__/test_scanstring.cpython-38.pyc (hex)
+++ new /usr/lib64/python3.8/test/test_json/__pycache__/test_scanstring.cpython-38.pyc (hex)
@@ -1,6 +1,6 @@
 000002c0  00 00 54 29 02 f5 06 00  00 00 7a f0 9d 84 a0 78  |..T)......z....x|
 000002d0  e9 05 00 00 00 7a 08 22  5c 75 30 30 37 62 22 29  |.....z."\u007b")|
-000002e0  02 fa 01 7b e9 08 00 00  00 7a 3c 22 41 20 4a 53  |...{.....z<"A JS|
+000002e0  02 da 01 7b e9 08 00 00  00 7a 3c 22 41 20 4a 53  |...{.....z<"A JS|
 000002f0  4f 4e 20 70 61 79 6c 6f  61 64 20 73 68 6f 75 6c  |ON payload shoul|
 00000300  64 20 62 65 20 61 6e 20  6f 62 6a 65 63 74 20 6f  |d be an object o|
 00000310  72 20 61 72 72 61 79 2c  20 6e 6f 74 20 61 20 73  |r array, not a s|

--- old /usr/lib64/python3.8/__pycache__/netrc.cpython-38.pyc (hex)
+++ new /usr/lib64/python3.8/__pycache__/netrc.cpython-38.pyc (hex)
@@ -1,7 +1,7 @@
 000007c0  63 64 65 66 7a 02 20 09  da 01 0a 7a 04 20 09 0d  |cdefz. ....z. ..|
 000007d0  0a 7a 15 62 61 64 20 74  6f 70 6c 65 76 65 6c 20  |.z.bad toplevel |
 000007e0  74 6f 6b 65 6e 20 25 72  3e 04 00 00 00 72 1e 00  |token %r>....r..|
-000007f0  00 00 72 21 00 00 00 72  20 00 00 00 72 22 00 00  |..r!...r ...r"..|
+000007f0  00 00 72 22 00 00 00 72  20 00 00 00 72 21 00 00  |..r"...r ...r!..|
 00000800  00 7a 26 6d 61 6c 66 6f  72 6d 65 64 20 25 73 20  |.z&malformed %s |
 00000810  65 6e 74 72 79 20 25 73  20 74 65 72 6d 69 6e 61  |entry %s termina|
 00000820  74 65 64 20 62 79 20 25  73 da 05 6c 6f 67 69 6e  |ted by %s..login|

--- old /usr/lib64/python3.8/__pycache__/pathlib.cpython-38.pyc (hex)
+++ new /usr/lib64/python3.8/__pycache__/pathlib.cpython-38.pyc (hex)
@@ -1,5 +1,5 @@
 000079c0  64 20 27 2e 2e 27 2e 0a  20 20 20 20 20 20 20 20  |d '..'..        |
-000079d0  3e 02 00 00 00 72 32 00  00 00 72 a9 00 00 00 4e  |>....r2...r....N|
+000079d0  3e 02 00 00 00 72 a9 00  00 00 72 32 00 00 00 4e  |>....r....r2...N|
 000079e0  29 05 72 64 01 00 00 72  69 01 00 00 72 b5 00 00  |).rd...ri...r...|
 000079f0  00 72 d5 00 00 00 72 f7  00 00 00 72 44 01 00 00  |.r....r....rD...|
 00007a00  72 22 00 00 00 72 22 00  00 00 72 23 00 00 00 da  |r"...r"...r#....|

The last 2 could be a different issue of ordering.

@methane methane requested a review from tiran as a code owner May 3, 2022 09:15
@ericsnowcurrently
Copy link
Member

ericsnowcurrently commented May 3, 2022

We started freezing a bunch of stdlib modules last year, which definitely impacts the story here.

Regardless, treating all interned strings as "ref" objects makes sense, so +1 from me.

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@methane methane changed the title bpo-34093: marshal: Stabilize FLAG_REF usage gh-78214: marshal: Stabilize FLAG_REF usage May 4, 2022
@methane methane merged commit 6dcfd6c into python:main May 4, 2022
@methane methane deleted the stable-marshal branch May 4, 2022 01:01
@methane
Copy link
Member Author

methane commented May 4, 2022

This is temporal fix and will be reverted after more complete fix is landed.

Millak pushed a commit to Millak/guix that referenced this pull request Jan 22, 2024
While Python build was reproducible on a single machine, once multiple file
systems entered the picture, it was no longer true.  The solution adopted by
the upstream (and Debian) was cherry-picked.

More info: <python/cpython#8226>.

* gnu/packages/python.scm (python-3.10) [source]: Apply reproducibility patch.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Modified-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Change-Id: I0273dc0f8511a7acdcc2b462a26cc29a9756c801
github-actions bot pushed a commit to guix-ru/guix that referenced this pull request May 7, 2024
While Python build was reproducible on a single machine, once multiple file
systems entered the picture, it was no longer true.  The solution adopted by
the upstream (and Debian) was cherry-picked.

More info: <python/cpython#8226>.

* gnu/packages/python.scm (python-3.10) [source]: Apply reproducibility patch.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Modified-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Change-Id: I0273dc0f8511a7acdcc2b462a26cc29a9756c801
github-actions bot pushed a commit to guix-ru/guix that referenced this pull request Jun 9, 2024
While Python build was reproducible on a single machine, once multiple file
systems entered the picture, it was no longer true.  The solution adopted by
the upstream (and Debian) was cherry-picked.

More info: <python/cpython#8226>.

* gnu/packages/python.scm (python-3.10) [source]: Apply reproducibility patch.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Modified-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Change-Id: I0273dc0f8511a7acdcc2b462a26cc29a9756c801
github-actions bot pushed a commit to guix-ru/guix that referenced this pull request Jun 9, 2024
While Python build was reproducible on a single machine, once multiple file
systems entered the picture, it was no longer true.  The solution adopted by
the upstream (and Debian) was cherry-picked.

More info: <python/cpython#8226>.

* gnu/packages/python.scm (python-3.10) [source]: Apply reproducibility patch.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Modified-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Change-Id: I0273dc0f8511a7acdcc2b462a26cc29a9756c801
github-actions bot pushed a commit to guix-ru/guix that referenced this pull request Jun 11, 2024
While Python build was reproducible on a single machine, once multiple file
systems entered the picture, it was no longer true.  The solution adopted by
the upstream (and Debian) was cherry-picked.

More info: <python/cpython#8226>.

* gnu/packages/python.scm (python-3.10) [source]: Apply reproducibility patch.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Modified-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Change-Id: I0273dc0f8511a7acdcc2b462a26cc29a9756c801
github-actions bot pushed a commit to guix-ru/guix that referenced this pull request Jun 17, 2024
While Python build was reproducible on a single machine, once multiple file
systems entered the picture, it was no longer true.  The solution adopted by
the upstream (and Debian) was cherry-picked.

More info: <python/cpython#8226>.

* gnu/packages/python.scm (python-3.10) [source]: Apply reproducibility patch.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Modified-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Change-Id: I0273dc0f8511a7acdcc2b462a26cc29a9756c801
github-actions bot pushed a commit to guix-ru/guix that referenced this pull request Jun 27, 2024
While Python build was reproducible on a single machine, once multiple file
systems entered the picture, it was no longer true.  The solution adopted by
the upstream (and Debian) was cherry-picked.

More info: <python/cpython#8226>.

* gnu/packages/python.scm (python-3.10) [source]: Apply reproducibility patch.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Modified-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Change-Id: I0273dc0f8511a7acdcc2b462a26cc29a9756c801
github-actions bot pushed a commit to guix-ru/guix that referenced this pull request Jun 29, 2024
While Python build was reproducible on a single machine, once multiple file
systems entered the picture, it was no longer true.  The solution adopted by
the upstream (and Debian) was cherry-picked.

More info: <python/cpython#8226>.

* gnu/packages/python.scm (python-3.10) [source]: Apply reproducibility patch.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Modified-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Change-Id: I0273dc0f8511a7acdcc2b462a26cc29a9756c801
github-actions bot pushed a commit to guix-ru/guix that referenced this pull request Jul 11, 2024
While Python build was reproducible on a single machine, once multiple file
systems entered the picture, it was no longer true.  The solution adopted by
the upstream (and Debian) was cherry-picked.

More info: <python/cpython#8226>.

* gnu/packages/python.scm (python-3.10) [source]: Apply reproducibility patch.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Modified-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Change-Id: I0273dc0f8511a7acdcc2b462a26cc29a9756c801
Z572 pushed a commit to Z572/guix that referenced this pull request Jul 22, 2024
While Python build was reproducible on a single machine, once multiple file
systems entered the picture, it was no longer true.  The solution adopted by
the upstream (and Debian) was cherry-picked.

More info: <python/cpython#8226>.

* gnu/packages/python.scm (python-3.10) [source]: Apply reproducibility patch.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Modified-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Change-Id: I0273dc0f8511a7acdcc2b462a26cc29a9756c801
Z572 pushed a commit to Z572/guix that referenced this pull request Jul 23, 2024
While Python build was reproducible on a single machine, once multiple file
systems entered the picture, it was no longer true.  The solution adopted by
the upstream (and Debian) was cherry-picked.

More info: <python/cpython#8226>.

* gnu/packages/python.scm (python-3.10) [source]: Apply reproducibility patch.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Modified-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Change-Id: I0273dc0f8511a7acdcc2b462a26cc29a9756c801
github-actions bot pushed a commit to guix-ru/guix that referenced this pull request Aug 10, 2024
While Python build was reproducible on a single machine, once multiple file
systems entered the picture, it was no longer true.  The solution adopted by
the upstream (and Debian) was cherry-picked.

More info: <python/cpython#8226>.

* gnu/packages/python.scm (python-3.10) [source]: Apply reproducibility patch.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Modified-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Change-Id: I0273dc0f8511a7acdcc2b462a26cc29a9756c801
github-actions bot pushed a commit to guix-ru/guix that referenced this pull request Aug 28, 2024
While Python build was reproducible on a single machine, once multiple file
systems entered the picture, it was no longer true.  The solution adopted by
the upstream (and Debian) was cherry-picked.

More info: <python/cpython#8226>.

* gnu/packages/python.scm (python-3.10) [source]: Apply reproducibility patch.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Modified-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Change-Id: I0273dc0f8511a7acdcc2b462a26cc29a9756c801
github-actions bot pushed a commit to guix-ru/guix that referenced this pull request Aug 31, 2024
While Python build was reproducible on a single machine, once multiple file
systems entered the picture, it was no longer true.  The solution adopted by
the upstream (and Debian) was cherry-picked.

More info: <python/cpython#8226>.

* gnu/packages/python.scm (python-3.10) [source]: Apply reproducibility patch.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Modified-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Change-Id: I0273dc0f8511a7acdcc2b462a26cc29a9756c801
SomberNight added a commit to SomberNight/python-for-android that referenced this pull request Oct 15, 2024
…ation

patches from:
python/cpython#27926
python/cpython#8226

Both of these are included in python 3.11 upstream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants