-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
PyInterpreterState.config.int_max_str_digits Should Not Be Modified #98417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm pretty sure I had a version of the code doing exactly this originally and was guided towards making it simpler by not duplicating the field between Do we document the semantics of PyConfig being read-only anywhere? |
I agree about documenting the read-only-after-init constraint somewhere. I was thinking of a comment right above the |
To be clear, I'm operating under the assumption that the cached FWIW, PEP 587 doesn't say anything about read-only-after-init, but PEP 432 (from which 587 was partially derived) says:
The C-API docs don't say anything about config mutability-after-init. That may be fine though, since the fact that we cache a copy of the original config on each interpreter state is an internal implementation detail. |
That comment is more about preserving the original config value vs. updating during init, isn't it? The question of the source of the information during runtime (and where mutable state lives) seems like a separate one. |
Updating the PyConfig struct definition comment to say it should be treated read-only after init makes sense to me. I'd have seen that when making changes. I'm much less likely to go back and read PEPs as they're more often initial design ideas rather than policies and actual implementation. |
What are advantages of having a read-only PyConfig? For me, it's fine to modify it. But if its value is copied somewhere (ex: sys module), it's good to attempt to keep both consistent (whenever possible). Currently, _testinternalcapi.set_config() always copies PyConfig.module_search_paths to sys.path, erasing sys.path change done after early Python init, which makes the function annoying in practice and that's why I don't want to expose it in public. My main motivation to design PyConfig was to put all inputs at the same place. My ideal would be that PyConfig would be the only input to initialize Python. But once Python is initialized, for me, it's convenient to reuse this structure for a few variables. The only case I'm aware where the "initial" value matters is |
We should've (and still can) allow PyConfig to be fully provided by the caller, rather than making them go via our memory management functions. This would be very convenient for some situations, but relies on us not modifying the structure at all. Honestly, we shouldn't even copy it around the way we do. It should exist once, and if we need to refer back to it we can get to it through a pointer (bearing in mind that the host app owns it, not us, and so they might have modified it themselves). But really, it just looks like a set of parameters to |
* main: (40 commits) pythongh-98461: Fix source location in comprehensions bytecode (pythonGH-98464) pythongh-98421: Clean Up PyObject_Print (pythonGH-98422) pythongh-98360: multiprocessing now spawns children on Windows with correct argv[0] in virtual environments (pythonGH-98462) CODEOWNERS: Become a typing code owner (python#98480) [doc] Improve logging cookbook example. (pythonGH-98481) Add more tkinter.Canvas tests (pythonGH-98475) pythongh-95023: Added os.setns and os.unshare functions (python#95046) pythonGH-98363: Presize the list for batched() (pythonGH-98419) pythongh-98374: Suppress ImportError for invalid query for help() command. (pythongh-98450) typing tests: `_overload_dummy` raises `NotImplementedError`, not `RuntimeError` (python#98351) pythongh-98354: Add unicode check for 'name' attribute in _imp_create_builtin (pythonGH-98412) pythongh-98257: Make _PyEval_SetTrace() reentrant (python#98258) pythongh-98414: py.exe launcher does not use defaults for -V:company/ option (pythonGH-98460) pythongh-98417: Store int_max_str_digits on the Interpreter State (pythonGH-98418) Doc: Remove title text from internal links (python#98409) [doc] Refresh the venv introduction documentation, and correct the statement about VIRTUAL_ENV (pythonGH-98350) Docs: Bump sphinx-lint and fix unbalanced inline literal markup (python#98441) pythongh-92886: Replace assertion statements in `handlers.BaseHandler` to support running with optimizations (`-O`) (pythonGH-93231) pythongh-92886: Fix tests that fail when running with optimizations (`-O`) in `_test_multiprocessing.py` (pythonGH-93233) pythongh-92886: Fix tests that fail when running with optimizations (`-O`) in `test_py_compile.py` (pythonGH-93235) ...
Currently, PyConfig sits between two chairs: it's used to initialized Python (as expected), but it also stays during the whole Python lifecycle. I hesitated to build a separated structure to store this configuration in PyInterpreterState: it would avoid storing things which become useless once Python is initialized. The problem is that it caused a lot of code to be duplicated, since we would have two similar but different structures. I chose to put PyConfig simply to keep the code as simple as possible. So yeah, we waste a few bytes in memory, but a single structure has to be handled in the C code. Keeping PyConfig around also makes Py_NewInterpreter() simpler: it just copies the PyConfig from the current interpreter. By the way, _Py_NewInterpreterFromConfig() allows to run an interpreter with a different configuration.
Py_InitializeFromConfig() is a wild beast: it changes the memory allocator with implicit Python pre-initialization. If we do everything in a single PyConfig, the ownership of memory allocated on the stack is not well defined, and it's also unclear which memory allocator should be used to release the memory. Moreover, PyConfig design is to collect all data to initialize Python without modifying Python, and then "write" the configuration: _PyConfig_Write(). This design also allows to override/replace PyConfig at runtime, currently implemented as _testinternalcapi.set_config(). This feature is also used by Yeah, PyConfig API is weird and surprising, but they are reasons for its special design :-) |
Keeping it around is great! I love it! Modifying it after we've initialised is the issue. It needs to be treated as read-only, because it's not our memory. And really, getpath shouldn't modify it either, but should go directly to setting up |
I don't get it: PyInterpreterState.config is a PyConfig structure, not a pointer to somewhere else. What do you call "our memory"?
Well. Python has a long history and the initialization is complicated. You need many things to compute the path configuration, and the sys module is created "in the middle". There is always room for enhancement :-) On my side, I gave up at some point since I already sent like 1 or 2 years to deeply reorganize the code to make it "better" ;-) I'm not strongly against duplicating variables to keep some PyConfig members as read-only, or even try to make the whole PyConfig structure read-only. I'm just trying to get the advantages compared to the current state where some members are copied (ex: to the sys module), and some PyConfig members are read and modified in-place. Anyway, I'm glad that getpath is now implemented in pure Python. In that it was the main motivation for @ncoghlan and @ericsnowcurrently, at least at the beginning, to reorganize the Python initialization. My goal was just to fix my implementation of the Maybe PyConfig should really and only belong to Python initialization. |
I see, it gets deep copied into every interpreter. Even so, I don't think we should blur the line between interpreter state and interpreter config. Right now, that's what the issue is, and the fix is to move Whether we keep a copy of the config around is a separate discussion, probably to be driven by perf or memory needs rather than anything else. |
PyConfig
is for configuring the runtime during initialization, not for storing runtime state. Currently insys.set_int_max_str_digits()
we are modifying the interpreter's PyConfig directly.That field should never be modified outside runtime init. Instead, the config value should be copied to a field on
PyInterpreterState
and that is what should be modified bysys
(and used in longobject.c).In fact, this is mostly what we were doing until a couple weeks ago (and what we still are doing in 3.11 and earlier). This was changed on main (3.12) in gh-96944. The fix is relatively trivial.
CC @gpshead
The text was updated successfully, but these errors were encountered: