-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interned strings are immortal, despite what the documentation says #113993
Comments
I would suggest that using "immortal" here could be confusing. The flagged checked by |
No. That is not the behaviour on Python 3.11.
IMO, the guarantee is that if two strings are interned, they can be compared by identity. The new behaviour can leak strings that are not likely to be used again. |
$ cat test.py
import sys
id1 = id(sys.intern('some string'))
other = sys.intern('other string')
id2 = id(sys.intern('some string'))
assert id1 == id2
$ python3.8 ./test.py
$ python3.9 ./test.py
$ python3.10 ./test.py
$ python3.11 ./test.py
$ python3.12 ./test.py
$ python3.13 ./test.py No assertions triggered. |
The the new string is allowed to reuse the old one's memory; I guess this is build-specific. What platform are you on? |
I'm on macOS using the official build. Strange thing: I can reproduce the behaviour you're seeing in the REPL. |
Nope, cannot reproduce using |
Ah! Right, literal constants are part the code object, which does normally stay around. You'll need something that's not folded to a constant, like: $ cat test.py
import sys
def intern_and_report(string):
string = sys.intern(string)
the_id = id(string)
print(f'{string!r} at {id(string)}: refcount {sys.getrefcount(string):x}')
return the_id
id1 = intern_and_report('a' + __name__)
str_b = sys.intern('b' + __name__)
id2 = intern_and_report('a' + __name__)
assert id1 == id2
|
Yes; thanks! |
See also #113601 |
@encukou just getting to this! You're right, as of now, all interned strings should be immortal, so the As for the leaks, as @erlend-aasland already called out, we have a PR to correctly clean them up in #113601! |
Sorry, I was not very clear. I was describing the behaviour we have with Python 3.12 and we had with older versions of Python (version 2.2 and before, it seems). It looks like this commit changed it: 45ec02a. Since 3.12 changed this, definitely the documentation should be updated. Further, I think we should have had a discussion and explicitly decide if this is desirable behaviour, since it reverts the effect of 45ec02a. Maybe there was one and I missed it? Perhaps free-threading doesn't require that every interned string become immortal but only some of them (e.g. symbols used by code objects etc). That might preserve free-threaded performance (e.g. interned strings shared between threads) while avoid the "leak" like encountered by pathlib (e.g. interned strings only used for a short time and then discarded). |
To be clear, I think this is bad. Some interned strings are temporary, and they now leak. They should be cleaned up sooner that at interpreter shutdown. |
The documentation for Some open source libraries depend on this behavior, like msgpack-python. Changing the behavior of sys.intern will cause problems for some applications. I work on an application that is experiencing out-of-memory errors because of this issue. One solution might be to add a separate function called Restoring the old, documented behavior of |
Does this mean that It seems like that could also be a source of leaks. Maybe the documentation for Alternately, maybe |
I don't see a good reason why interning strings should make them immortal. Why were interned strings changed to be immortal in 3.12? Since interned strings that are tied to code objects are effectively immortal, making those actually immortal seems fine if necessary for performance. But strings interned with |
The immortal objects PR seems to allow interned, but not immortals strings: |
It might also be worth making the distinction between "interpreter immortal" and "process immortal" strings, as the latter can be shared between multiple interpreters. |
The free-threaded build needs every interned string to be immortal, not just symbols, if you don't want to introduce bottlenecks that prevent scaling in multi-threaded programs. The documentation can describe the possible behaviors without promising a specific implementation. Something like: "Depending on the Python version and implementation, interned strings may or may not be immortal; you should keep a reference to the return value of intern() around to benefit from it." |
Why do all interned strings need to be immortal? Are there not strings that are contended and are not interned, or does the free-threading build intern so many strings that all possibly contended strings are interned? |
Interning implies sharing and sharing between threads leads to reference count contention. To take the pathlib example, I'd much rather pathlib did not intern strings, but if it does intern the components, it's important that those strings are immortalized, because otherwise you are likely to have a bunch of strings shared between threads that otherwise would not be shared.
It's definitely possible, but from what I've seen it's not the most common source of reference count contention. If you don't intern a string, it's much less likely to be incidentally shared between threads. |
This seems like a breaking change that is kind of flying under the radar. The documentation explains, "Interned strings are not immortal; you must keep a reference to the return value of intern() around to benefit from it." It looks like it been that way for 20 years I'd be worried that there are many applications that depend on the old behavior. If there's a way to maintain backwards compatibility, while also finding a way to avoid bottlenecks around shared strings, then that might be the best option. |
Can we make it mortal again on --enable-GIL build? |
Right. It looks like free-threading needs this, and the ecosystem (and stdlib) will need to adapt, so the question is how soon. I guess ultimately the RM should decide that. |
…sImmortal Older stable ABI extensions are allowed to make immortal objects mortal. Instead, use `_PyUnicode_STATE` (`interned` and `statically_allocated`).
… issues (pythonGH-120520) * Add an InternalDocs file describing how interning should work and how to use it. * Add internal functions to *explicitly* request what kind of interning is done: - `_PyUnicode_InternMortal` - `_PyUnicode_InternImmortal` - `_PyUnicode_InternStatic` * Switch uses of `PyUnicode_InternInPlace` to those. * Disallow using `_Py_SetImmortal` on strings directly. You should use `_PyUnicode_InternImmortal` instead: - Strings should be interned before immortalization, otherwise you're possibly interning a immortalizing copy. - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in backports, as they are now part of public API and version-specific ABI. * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery. * Make sure the statically allocated string singletons are unique. This means these sets are now disjoint: - `_Py_ID` - `_Py_STR` (including the empty string) - one-character latin-1 singletons Now, when you intern a singleton, that exact singleton will be interned. * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic). * Intern `_Py_STR` singletons at startup. * For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup. * Beef up the tests. Cover internal details (marked with `@cpython_only`). * Add lots of assertions Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
…ortal (GH-121358) Older stable ABI extensions are allowed to make immortal objects mortal. Instead, use `_PyUnicode_STATE` (`interned` and `statically_allocated`).
…_IsImmortal (pythonGH-121358) Older stable ABI extensions are allowed to make immortal objects mortal. Instead, use `_PyUnicode_STATE` (`interned` and `statically_allocated`). (cherry picked from commit 956270d) Co-authored-by: Petr Viktorin <encukou@gmail.com>
…lizing in other API (#121364) * Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs * Document immortality in some functions that take `const char *` This is PyUnicode_InternFromString; PyDict_SetItemString, PyObject_SetAttrString; PyObject_DelAttrString; PyUnicode_InternFromString; and the PyModule_Add convenience functions. Always point out a non-immortalizing alternative. * Don't immortalize user-provided attr names in _ctypes
…y_IsImmortal (GH-121358) (GH-121851) gh-113993: For string interning, do not rely on (or assert) _Py_IsImmortal (GH-121358) Older stable ABI extensions are allowed to make immortal objects mortal. Instead, use `_PyUnicode_STATE` (`interned` and `statically_allocated`). (cherry picked from commit 956270d) Co-authored-by: Petr Viktorin <encukou@gmail.com>
… keep immortalizing in other API (pythonGH-121364) * Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs * Document immortality in some functions that take `const char *` This is PyUnicode_InternFromString; PyDict_SetItemString, PyObject_SetAttrString; PyObject_DelAttrString; PyUnicode_InternFromString; and the PyModule_Add convenience functions. Always point out a non-immortalizing alternative. * Don't immortalize user-provided attr names in _ctypes (cherry picked from commit b4aedb2) Co-authored-by: Petr Viktorin <encukou@gmail.com>
… issues (pythonGH-120520) * Add an InternalDocs file describing how interning should work and how to use it. * Add internal functions to *explicitly* request what kind of interning is done: - `_PyUnicode_InternMortal` - `_PyUnicode_InternImmortal` - `_PyUnicode_InternStatic` * Switch uses of `PyUnicode_InternInPlace` to those. * Disallow using `_Py_SetImmortal` on strings directly. You should use `_PyUnicode_InternImmortal` instead: - Strings should be interned before immortalization, otherwise you're possibly interning a immortalizing copy. - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in backports, as they are now part of public API and version-specific ABI. * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery. * Make sure the statically allocated string singletons are unique. This means these sets are now disjoint: - `_Py_ID` - `_Py_STR` (including the empty string) - one-character latin-1 singletons Now, when you intern a singleton, that exact singleton will be interned. * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic). * Intern `_Py_STR` singletons at startup. * For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup. * Beef up the tests. Cover internal details (marked with `@cpython_only`). * Add lots of assertions Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
…_IsImmortal (pythonGH-121358) Older stable ABI extensions are allowed to make immortal objects mortal. Instead, use `_PyUnicode_STATE` (`interned` and `statically_allocated`).
…mmortalizing in other API (python#121364) * Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs * Document immortality in some functions that take `const char *` This is PyUnicode_InternFromString; PyDict_SetItemString, PyObject_SetAttrString; PyObject_DelAttrString; PyUnicode_InternFromString; and the PyModule_Add convenience functions. Always point out a non-immortalizing alternative. * Don't immortalize user-provided attr names in _ctypes
…immortalizing in other API (GH-121364) (GH-121854) * Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs * Document immortality in some functions that take `const char *` This is PyUnicode_InternFromString; PyDict_SetItemString, PyObject_SetAttrString; PyObject_DelAttrString; PyUnicode_InternFromString; and the PyModule_Add convenience functions. Always point out a non-immortalizing alternative. * Don't immortalize user-provided attr names in _ctypes (cherry picked from commit b4aedb2)
…related issues (pythonGH-120520) * Add an InternalDocs file describing how interning should work and how to use it. * Add internal functions to *explicitly* request what kind of interning is done: - `_PyUnicode_InternMortal` - `_PyUnicode_InternImmortal` - `_PyUnicode_InternStatic` * Switch uses of `PyUnicode_InternInPlace` to those. * Disallow using `_Py_SetImmortal` on strings directly. You should use `_PyUnicode_InternImmortal` instead: - Strings should be interned before immortalization, otherwise you're possibly interning a immortalizing copy. - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in backports, as they are now part of public API and version-specific ABI. * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery. * Make sure the statically allocated string singletons are unique. This means these sets are now disjoint: - `_Py_ID` - `_Py_STR` (including the empty string) - one-character latin-1 singletons Now, when you intern a singleton, that exact singleton will be interned. * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic). * Intern `_Py_STR` singletons at startup. * Beef up the tests. Cover internal details (marked with `@cpython_only`). * Add lots of assertions Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
… keep immortalizing in other API (pythonGH-121364) * Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs * Document immortality in some functions that take `const char *` This is PyUnicode_InternFromString; PyDict_SetItemString, PyObject_SetAttrString; PyObject_DelAttrString; PyUnicode_InternFromString; and the PyModule_Add convenience functions. Always point out a non-immortalizing alternative. * Don't immortalize user-provided attr names in _ctypes (cherry picked from commit b4aedb2)
The 3.12 backport is at #123065. @Yhg1s, re:
Unfortunately it's far from a straightforward revert. |
…H-121903, GH-122303) (#123065) This backports several PRs for gh-113993, making interned strings mortal so they can be garbage-collected when no longer needed. * Allow interned strings to be mortal, and fix related issues (GH-120520) * Add an InternalDocs file describing how interning should work and how to use it. * Add internal functions to *explicitly* request what kind of interning is done: - `_PyUnicode_InternMortal` - `_PyUnicode_InternImmortal` - `_PyUnicode_InternStatic` * Switch uses of `PyUnicode_InternInPlace` to those. * Disallow using `_Py_SetImmortal` on strings directly. You should use `_PyUnicode_InternImmortal` instead: - Strings should be interned before immortalization, otherwise you're possibly interning a immortalizing copy. - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in backports, as they are now part of public API and version-specific ABI. * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery. Make sure the statically allocated string singletons are unique. This means these sets are now disjoint: - `_Py_ID` - `_Py_STR` (including the empty string) - one-character latin-1 singletons Now, when you intern a singleton, that exact singleton will be interned. * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic). * Intern `_Py_STR` singletons at startup. * Beef up the tests. Cover internal details (marked with `@cpython_only`). * Add lots of assertions * Don't immortalize in PyUnicode_InternInPlace; keep immortalizing in other API (GH-121364) * Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs * Document immortality in some functions that take `const char *` This is PyUnicode_InternFromString; PyDict_SetItemString, PyObject_SetAttrString; PyObject_DelAttrString; PyUnicode_InternFromString; and the PyModule_Add convenience functions. Always point out a non-immortalizing alternative. * Don't immortalize user-provided attr names in _ctypes * Immortalize names in code objects to avoid crash (GH-121903) * Intern latin-1 one-byte strings at startup (GH-122303) There are some 3.12-specific changes, mainly to allow statically allocated strings in deepfreeze. (In 3.13, deepfreeze switched to the general `_Py_ID`/`_Py_STR`.) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
…rt) _Py_IsImmortal (pythonGH-121358) (pythonGH-121851) pythongh-113993: For string interning, do not rely on (or assert) _Py_IsImmortal (pythonGH-121358) Older stable ABI extensions are allowed to make immortal objects mortal. Instead, use `_PyUnicode_STATE` (`interned` and `statically_allocated`). (cherry picked from commit 956270d) Co-authored-by: Petr Viktorin <encukou@gmail.com>
…y_IsImmortal (GH-121358) (GH-124938) gh-113993: For string interning, do not rely on (or assert) _Py_IsImmortal (GH-121358) Older stable ABI extensions are allowed to make immortal objects mortal. Instead, use `_PyUnicode_STATE` (`interned` and `statically_allocated`). (cherry picked from commit 956270d) Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
Bug report
Bug description:
The
sys.intern
documentation explicitly says:However, they were made immortal in #19474 (Implement Immortal Objects -- implementation of PEP 683), without a documentation update. The 3.12 What's New also doesn't mention the change. [edit: it's now clear that the PR author (@eduardo-elizondo) intended it but the reviewer (@markshannon) did not]
PEP-683 itself only mentions the change as a possible optimization.
Meanwhile,
pathlib
interns every path segment it processes, presumably depending on the documented behaviour. In CPython's test suite, that causes names of temporary directories to leak (stealthily, since interned strings are excluded from the total refcount).[edit: Worse,
type_setattro
,PyObject_SetAttr
orPyDict_SetItemString
immortalize keys, so strings used for key/attribute names this way are not reclaimed until interpreter shutdown.]On the other hand, free-threading (PEP-703)
seems to rely[edit: relies] on these being immortal.What's the correct behaviour? [edit: specifically, in 3.12 and non-free-threading 3.13, should this change be reverted or documented?]
@eduardo-elizondo @ericsnowcurrently
Linked PRs
Related fixes:
The text was updated successfully, but these errors were encountered: