Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path.exists() - Memory leak #122849

Closed
RegisGraptin opened this issue Aug 9, 2024 · 8 comments
Closed

Path.exists() - Memory leak #122849

RegisGraptin opened this issue Aug 9, 2024 · 8 comments
Labels
topic-pathlib type-bug An unexpected behavior, bug, or error

Comments

@RegisGraptin
Copy link

RegisGraptin commented Aug 9, 2024

Bug report

Bug description:

I am currently having an issue when using Path.exists() function. It seems that if you use it for intensive usage, I have a memory leak.

Here is a python script, where I iterate over some items, and when I called the exists() function it seems the RAM keeps growing.

from collections.abc import Iterator
from pathlib import Path


def check_file(filename: Path) -> None:
    if filename.exists():
        return
    return


def iterate() -> Iterator[int]:
    yield from range(10_000_000)


if __name__ == "__main__":
    directory = Path("/")
    for item in iterate():
        filename = directory / str(item)
        check_file(filename)

Here also a screen shot that I have using memory_profiler library to illustrate it.

clean_memory_leak

I do not know if I am missing something here... Let me know if you need more info to reproduce it. I am using Python 3.12.3 on Ubuntu 22.04.4 LTS.

Thanks

CPython versions tested on:

3.12

Operating systems tested on:

Linux

@RegisGraptin RegisGraptin added the type-bug An unexpected behavior, bug, or error label Aug 9, 2024
@ZeroIntensity
Copy link
Member

Pathlib interns strings, so that's probably the cause of the inflated memory usage. Duplicate of #119518 (and also take a look at #113993).

@Rogdham
Copy link

Rogdham commented Aug 9, 2024

For what it's worth, the issue seems to be only present in Python 3.12 (not before, nor in 3.13).

Running the following script with docker official slim images:

from pathlib import Path
import tracemalloc

directory = Path("/")
found = False
tracemalloc.start()

snapshot1 = tracemalloc.take_snapshot()
for item in range(1_000_000):
    found |= (directory / str(item)).exists()
snapshot2 = tracemalloc.take_snapshot()

top_stats = snapshot2.compare_to(snapshot1, 'lineno')
for stat in top_stats[:10]:
    print(stat)
  • Python 3.9.19
/usr/local/lib/python3.9/pathlib.py:82: size=288 KiB (+288 KiB), count=1 (+1), average=288 KiB
/tmp/docker/m.py:10: size=944 B (+944 B), count=2 (+2), average=472 B
/usr/local/lib/python3.9/tracemalloc.py:423: size=512 B (+512 B), count=3 (+3), average=171 B
/usr/local/lib/python3.9/pathlib.py:976: size=488 B (+488 B), count=1 (+1), average=488 B
/usr/local/lib/python3.9/pathlib.py:738: size=488 B (+488 B), count=1 (+1), average=488 B
/usr/local/lib/python3.9/tracemalloc.py:560: size=472 B (+472 B), count=2 (+2), average=236 B
/usr/local/lib/python3.9/pathlib.py:748: size=464 B (+464 B), count=1 (+1), average=464 B
/usr/local/lib/python3.9/pathlib.py:753: size=456 B (+456 B), count=1 (+1), average=456 B
/usr/local/lib/python3.9/pathlib.py:740: size=456 B (+456 B), count=1 (+1), average=456 B
/usr/local/lib/python3.9/pathlib.py:1426: size=424 B (+424 B), count=1 (+1), average=424 B
  • Python 3.10.14
/usr/local/lib/python3.10/pathlib.py:74: size=288 KiB (+288 KiB), count=1 (+1), average=288 KiB
/tmp/docker/m.py:10: size=928 B (+928 B), count=2 (+2), average=464 B
/usr/local/lib/python3.10/tracemalloc.py:423: size=504 B (+504 B), count=3 (+3), average=168 B
/usr/local/lib/python3.10/pathlib.py:855: size=480 B (+480 B), count=1 (+1), average=480 B
/usr/local/lib/python3.10/pathlib.py:617: size=480 B (+480 B), count=1 (+1), average=480 B
/usr/local/lib/python3.10/tracemalloc.py:560: size=464 B (+464 B), count=2 (+2), average=232 B
/usr/local/lib/python3.10/pathlib.py:627: size=456 B (+456 B), count=1 (+1), average=456 B
/usr/local/lib/python3.10/pathlib.py:632: size=448 B (+448 B), count=1 (+1), average=448 B
/usr/local/lib/python3.10/pathlib.py:619: size=440 B (+440 B), count=1 (+1), average=440 B
/usr/local/lib/python3.10/pathlib.py:1290: size=424 B (+424 B), count=1 (+1), average=424 B
  • Python 3.11.9
/usr/local/lib/python3.11/pathlib.py:74: size=203 KiB (+203 KiB), count=1 (+1), average=203 KiB
/usr/local/lib/python3.11/tracemalloc.py:560: size=320 B (+320 B), count=2 (+2), average=160 B
/usr/local/lib/python3.11/tracemalloc.py:423: size=320 B (+320 B), count=2 (+2), average=160 B
/tmp/docker/m.py:9: size=32 B (+32 B), count=1 (+1), average=32 B
  • Python 3.12.5
<frozen posixpath>:160: size=44.7 MiB (+44.7 MiB), count=999951 (+999951), average=47 B
/usr/local/lib/python3.12/pathlib.py:404: size=29.3 MiB (+29.3 MiB), count=9 (+9), average=3338 KiB
/usr/local/lib/python3.12/tracemalloc.py:560: size=312 B (+312 B), count=2 (+2), average=156 B
/usr/local/lib/python3.12/tracemalloc.py:423: size=312 B (+312 B), count=2 (+2), average=156 B
/tmp/docker/m.py:9: size=32 B (+32 B), count=1 (+1), average=32 B
  • Python 3.13.0 rc1
/usr/local/lib/python3.13/pathlib/_local.py:274: size=405 KiB (+405 KiB), count=1 (+1), average=405 KiB
/usr/local/lib/python3.13/tracemalloc.py:560: size=328 B (+328 B), count=1 (+1), average=328 B
/usr/local/lib/python3.13/tracemalloc.py:423: size=328 B (+328 B), count=1 (+1), average=328 B
/tmp/docker/m.py:9: size=32 B (+32 B), count=1 (+1), average=32 B

@Rogdham
Copy link

Rogdham commented Aug 9, 2024

Possibly closer duplicate of #121780, with patch in #120520 that has been backported to 3.13 but not 3.12 it seems.

@ZeroIntensity
Copy link
Member

@encukou, was #120520 supposed to be backported to 3.12?

@encukou
Copy link
Member

encukou commented Aug 19, 2024

Yes, but it's a big PR and there are several fixes on top. I'm now working on the backport in #123065.

@rafsaf
Copy link

rafsaf commented Sep 14, 2024

Possibly closer duplicate of #121780, with patch in #120520 that has been backported to 3.13 but not 3.12 it seems.

Small note: while patched to "standard" 3.13, on experimental free-threaded build 3.13.0rc2t that is still the case - memory usage increases indefinitely because of interned strings use in pathlib _parse_path method. But afaik and from what I understand, at least there it is expected for now and probably will be addressed in pathlib code itself in the future or handled somehow :p

@rafsaf
Copy link

rafsaf commented Oct 10, 2024

@RegisGraptin This was fixed in 3.12.7 release on October 1st.

https://docs.python.org/release/3.12.7/whatsnew/changelog.html#python-3-12-7

The issue can be closed. Thanks!

@RegisGraptin
Copy link
Author

Thank you for the work 🙏 , doubled tested on my side on 3.12.7 with the code of the issue, and no more memory leak noticed 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-pathlib type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

6 participants