-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Steady increase of unfreed memory #4343
Comments
Memory will often not go to 0 because that is not how allocators work. If any object is still alive the memory before that object cannot be returned. However, overtime your memory should saturate to a constant amount if you run things in a loop. I tried to run your script and also a few other parquet files I have got locally and I cannot reproduce your experience. Could you share a bit more about your machine? How much memory have you got etc. |
Okay I see your point regarding the memory allocation. It's not defragmenting and consolidating the heap when you free something. Anyway, managed to run this on my home machine (macOS) and see something similar but perhaps not as dramatic. It goes from a peak of around 775 to about 1050 (~25% increase). I was sort of expecting it would just go up and down to the same levels each time (modulo a bit of delta from the sampling nature of the profiler) For the machine I'm actually interested in, I have 256GB RAM running Windows 10 with 32 logical processors. I ran |
@CHDev93 On Linux jemalloc is used, while on MacOS and Windows mimalloc is used for memory allocation. I think jemalloc is slightly better at preventing heap fragmentation (although this can change in further releases). |
@ghuls that's very interesting, thanks for pointing that out. So you don't observe the steadily increasing memory on Linux at all? If it's the allocator implementation then that is very subtle |
Will go ahead an close this as it seems to be an allocator specific detail that isn't likely to change. Thanks for clearing up this behaviour @ritchie46 , @ghuls , @cbilot ! |
What language are you using?
Python
Have you tried latest version of polars?
What version of polars are you using?
0.13.62
What operating system are you using polars on?
Windows 10
What language version are you using
python 3.8
Describe your bug.
I'm doing the following in a loop
I would expect the memory usage "high watermark" to be independent of the number of loops I run. Instead it seems to steadily increase with iteration. Perhaps related to this issue
What are the steps to reproduce the behavior?
The code below should repro the increasing memory when run using
mprof run
(python'smemory-profiler
library)What is the actual behavior?
Running
mprof run foo.py
(ideally with the@profile
decorator around main added) and then runmprof plot
, you'll see each run ofmain
starts at a slightly higher memory usage even though the dataframe is no longer in scope.I can't upload this image from the environment I'm working in unfortunately but should be very easy to reproduce.
What is the expected behavior?
I still don't entirely understand why the memory usage doesn't go back down ~0 after the dataframe is released (the other issue linked to a reddit post indicating this might be a CPython thing) but I really don't understand why memory usage should be steadily increasing.
For the actual larger problem being worked on, this causes OOM issues that can only be resolved by doing the loading and preprocessing in a subprocess which ensures the memory gets released
The text was updated successfully, but these errors were encountered: