Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove *_INTERNED opcodes from marshal #75558

Open
benjaminp opened this issue Sep 7, 2017 · 8 comments
Open

remove *_INTERNED opcodes from marshal #75558

benjaminp opened this issue Sep 7, 2017 · 8 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@benjaminp
Copy link
Contributor

BPO 31377
Nosy @benjaminp, @methane, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2017-09-07.04:54:12.645>
labels = ['interpreter-core', 'type-feature', '3.7']
title = 'remove *_INTERNED opcodes from marshal'
updated_at = <Date 2018-07-11.07:58:22.225>
user = 'https://github.com/benjaminp'

bugs.python.org fields:

activity = <Date 2018-07-11.07:58:22.225>
actor = 'methane'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2017-09-07.04:54:12.645>
creator = 'benjamin.peterson'
dependencies = []
files = []
hgrepos = []
issue_num = 31377
keywords = []
message_count = 8.0
messages = ['301569', '301571', '301572', '301576', '301592', '301593', '301594', '321413']
nosy_count = 3.0
nosy_names = ['benjamin.peterson', 'methane', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue31377'
versions = ['Python 3.7']

@benjaminp
Copy link
Contributor Author

The *_INTERN opcodes inform the marsahl reader to intern the encoded string after deserialization. I believe for pycs this is pointless because PyCode_New ends up interning all strings that are interesting to intern. Writing this opcodes makes pycs non-deterministic because the intern state may be inconsistent in the writer. See https://bugzilla.opensuse.org/show_bug.cgi?id=1049186

@benjaminp benjaminp added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Sep 7, 2017
@serhiy-storchaka
Copy link
Member

Marshal is used not only in pyc files. It is used for fast data serialization, faster than pickle, json, etc.

@benjaminp
Copy link
Contributor Author

Used but not really supported. Anyway, I doubt intern round-tripping is a particularly important.

@methane
Copy link
Member

methane commented Sep 7, 2017

w_ref() depends on refcnt already.
I don't think removing *_INTERN opcode makes PYC reproducible.

cpython/Python/marshal.c

Lines 269 to 271 in 1f06a68

/* if it has only one reference, it definitely isn't shared */
if (Py_REFCNT(v) == 1)
return 0;

I think "intern one string, then share it 10 times" is faster than
"share one string 10 times, then intern each of 10 references".

@benjaminp
Copy link
Contributor Author

On Thu, Sep 7, 2017, at 01:17, INADA Naoki wrote:

INADA Naoki added the comment:

w_ref() depends on refcnt already.
I don't think removing *_INTERN opcode makes PYC reproducible.

cpython/Python/marshal.c

Lines 269 to 271 in 1f06a68

/* if it has only one reference, it definitely isn't shared */
if (Py_REFCNT(v) == 1)
return 0;

I know—we're going to have to do something about that, too. In practice,
though, the interning behavior seems to be a bigger reproducibility
problem.

I think "intern one string, then share it 10 times" is faster than
"share one string 10 times, then intern each of 10 references".

We end up interning each reference individually currently.

@methane
Copy link
Member

methane commented Sep 7, 2017

We end up interning each reference individually currently.

But interning interned string is much faster. It only checks flag.
Interning normal string requires dict lookup.

@benjaminp
Copy link
Contributor Author

On Thu, Sep 7, 2017, at 09:46, INADA Naoki wrote:

INADA Naoki added the comment:

> We end up interning each reference individually currently.

But interning interned string is much faster. It only checks flag.
Interning normal string requires dict lookup.

We could makes sure the version in the internal marshal memo is interned
if appropriate.

@methane
Copy link
Member

methane commented Jul 11, 2018

I doubt that interning cause reproduciblity problem.

AFAIK, all strings in code object are interned or not
interned deterministically.

https://bugzilla.opensuse.org/show_bug.cgi?id=1049186
This issue seems be caused by w_ref() based on object refcnt,
not interning.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants