-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
bpo-46006: Move the interned strings and identifiers to _PyRuntimeState. #30131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-46006: Move the interned strings and identifiers to _PyRuntimeState. #30131
Conversation
I didn't review the code but this seems like a good way to fix bpo-46006. We can have more time to figure out how to make them per-interpreter (assuming that's the decision). |
|
||
/* Unicode identifiers (_Py_Identifier): see _PyUnicode_FromId() */ | ||
struct _Py_unicode_ids { | ||
PyThread_type_lock lock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this lock unnecessary? The GIL is held whenever an identifier is used, isn't it?
Another way to look at this is that to say that the actual reference | ||
count of a string is: s->ob_refcnt + (s->state ? 2 : 0) | ||
*/ | ||
PyObject *unicode_interned; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use "string" or "str" rather than "unicode". Python 2 is history 🙂
count of a string is: s->ob_refcnt + (s->state ? 2 : 0) | ||
*/ | ||
PyObject *unicode_interned; | ||
} cached; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please drop this struct.
@@ -233,6 +233,9 @@ static int unicode_is_singleton(PyObject *unicode); | |||
#endif | |||
|
|||
|
|||
#define IDENTIFIERS _Py_SINGLETON(unicode_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you drop the IDENTIFIERS
and INTERNED
macros. They impair readability.
You can leave _Py_SINGLETON
as it conveys some meaning.
struct _Py_unicode_runtime_ids *rt_ids = &interp->runtime->unicode_ids; | ||
|
||
PyThread_acquire_lock(rt_ids->lock, WAIT_LOCK); | ||
PyThread_acquire_lock(IDENTIFIERS.lock, WAIT_LOCK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop this, and assert that the GIL is held?
I agree with @nascheme that this is probably the best fix for 3.11a4. Let's hope that we can keep the overhead of immortal objects low enough that we can have parallel sub-interpreters and static global objects. |
https://bugs.python.org/issue46006 is about interned strings. Fixing this issue doesn't need to share again identifiers between all interpreters. I'm not convinced by the value of moving the structures from PyInterpreterState to _PyRuntimeState. I propose a different PR which only reverts ea25180 : PR #30422. It only moves the |
I merged my GH-20085 revert. |
Currently the interned strings (and strings created for
_Py_IDENTIFIER()
) are per-interpreter. This is causing some bugs because other objects which may hold a reference to the string are still global. So until we are closer to moving the bulk of the global objects to per-interpreter, the simplest thing is to move the interned strings (and identifiers) to_PyRuntimeState
.https://bugs.python.org/issue46006