-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop manually interning strings in pathlib #119518
Comments
CC @pitrou |
That's a good question. The contention is that, if you keep a lot of related paths in memory, interning the path components would yield significant memory savings. But how useful it is would depend on the use case; and it's probably possible to construct use cases where it would be detrimental. |
Thank you :)
Surprisingly, diff --git a/Lib/pathlib/_local.py b/Lib/pathlib/_local.py
index 49d9f813c5..d20512bd9b 100644
--- a/Lib/pathlib/_local.py
+++ b/Lib/pathlib/_local.py
@@ -270,7 +270,7 @@ def _parse_path(cls, path):
elif len(drv_parts) == 6:
# e.g. //?/unc/server/share
root = sep
- parsed = [sys.intern(str(x)) for x in rel.split(sep) if x and x != '.']
+ parsed = [x for x in rel.split(sep) if x and x != '.']
return drv, root, parsed
@property |
It can probably be micro-optimized if you really care. For example I'm not sure the >>> x = 'foo'
>>> %timeit str(x)
40.3 ns ± 0.169 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> %timeit sys.intern(x)
46.9 ns ± 0.402 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> %timeit sys.intern(str(x))
79.7 ns ± 0.635 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> _intern = sys.intern
>>> %timeit _intern(x)
28.3 ns ± 0.11 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) |
Hum! That change used to trip up the |
Can pathlib have its own "interned strings" cache with a limit on the cache size? Well, I don't know if it's worth it :-) Pseudo-code with a limit on 3 entries: cache = {}
def get(key):
try:
return cache[key]
except KeyError:
pass
if len(cache) >= 3:
return key
cache[key] = key
return key
abc = get(b'abc'.decode())
abc2 = get(b'abc'.decode())
assert abc2 is abc
d = get(b'd'.decode())
e = get(b'e'.decode())
e2 = get(b'e'.decode())
assert e2 is e
# cache no longer used
f = get(b'f'.decode())
print("cache size", len(cache)) Can pathlib remove entries from such cache? |
Remove `sys.intern(str(x))` calls when normalizing a path in pathlib. This speeds up `str(Path('foo/bar'))` by about 10%.
Remove `sys.intern(str(x))` calls when normalizing a path in pathlib. This speeds up `str(Path('foo/bar'))` by about 10%.
Remove `sys.intern(str(x))` calls when normalizing a path in pathlib. This speeds up `str(Path('foo/bar'))` by about 10%.
When parsing and normalizing a path, pathlib calls
sys.intern()
on the string parts:cpython/Lib/pathlib/_local.py
Line 273 in 96b392d
I've never been able to establish that this is a worthwhile thing to do. The implementation seems incomplete, because the path normalization only occurs when a user manually initialises a path object, and not in paths generated from
path.iterdir()
,path.walk()
, etc. Drives/roots/anchors aren't interned despite most likely to be shared.Previous discussion: #112856 (comment)
Linked PRs
The text was updated successfully, but these errors were encountered: