Skip to content

bpo-46006: Move the interned strings and identifiers to _PyRuntimeState. #30131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ericsnowcurrently
Copy link
Member

@ericsnowcurrently ericsnowcurrently commented Dec 16, 2021

Currently the interned strings (and strings created for _Py_IDENTIFIER()) are per-interpreter. This is causing some bugs because other objects which may hold a reference to the string are still global. So until we are closer to moving the bulk of the global objects to per-interpreter, the simplest thing is to move the interned strings (and identifiers) to _PyRuntimeState.

https://bugs.python.org/issue46006

@nascheme
Copy link
Member

nascheme commented Jan 4, 2022

I didn't review the code but this seems like a good way to fix bpo-46006. We can have more time to figure out how to make them per-interpreter (assuming that's the decision).


/* Unicode identifiers (_Py_Identifier): see _PyUnicode_FromId() */
struct _Py_unicode_ids {
PyThread_type_lock lock;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this lock unnecessary? The GIL is held whenever an identifier is used, isn't it?

Another way to look at this is that to say that the actual reference
count of a string is: s->ob_refcnt + (s->state ? 2 : 0)
*/
PyObject *unicode_interned;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use "string" or "str" rather than "unicode". Python 2 is history 🙂

count of a string is: s->ob_refcnt + (s->state ? 2 : 0)
*/
PyObject *unicode_interned;
} cached;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop this struct.

@@ -233,6 +233,9 @@ static int unicode_is_singleton(PyObject *unicode);
#endif


#define IDENTIFIERS _Py_SINGLETON(unicode_ids)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you drop the IDENTIFIERS and INTERNED macros. They impair readability.
You can leave _Py_SINGLETON as it conveys some meaning.

struct _Py_unicode_runtime_ids *rt_ids = &interp->runtime->unicode_ids;

PyThread_acquire_lock(rt_ids->lock, WAIT_LOCK);
PyThread_acquire_lock(IDENTIFIERS.lock, WAIT_LOCK);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this, and assert that the GIL is held?

@markshannon
Copy link
Member

I agree with @nascheme that this is probably the best fix for 3.11a4.
I think longer term we will have to go for statically allocated identifiers.
Any other approach is just too error prone.

Let's hope that we can keep the overhead of immortal objects low enough that we can have parallel sub-interpreters and static global objects.

@vstinner
Copy link
Member

vstinner commented Jan 5, 2022

https://bugs.python.org/issue46006 is about interned strings. Fixing this issue doesn't need to share again identifiers between all interpreters. I'm not convinced by the value of moving the structures from PyInterpreterState to _PyRuntimeState.

I propose a different PR which only reverts ea25180 : PR #30422. It only moves the interned variable from PyInterpreterState to unicodeobject.c (static variable). Identifiers are left unchanged.

@vstinner
Copy link
Member

vstinner commented Jan 6, 2022

I merged my GH-20085 revert.

@vstinner vstinner closed this Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants