How to explain these memory behaviors when using numpy? #31

Phylliade · 2020-02-06T20:24:07Z

When running a very simple script based on numpy functions, we can get the following results:

test2.py: % of CPU time = 100.00% out of   3.59s.
  	 |     CPU % |     CPU % | Avg memory  | Memory      |
  Line	 |  (Python) |  (native) | growth (MB) | usage (%)   | [test2.py]
--------------------------------------------------------------------------------
     1	 |           |           |             |             | import numpy as np
     2	 |           |           |             |             |
     3	 |     0.30% |    48.40% |         -80 |       1.03% | x = np.array(range(10**7))
     4	 |     0.59% |    50.72% |           0 |      98.97% | np.array(np.random.uniform(0, 100, size=10**8))
     5	 |           |           |             |             |

How can we get:

A negative memory growth for the first line?
A null memory growth on the second line?

System info

Platform : Mac OS X
Python: 3.7 (brew)
Numpy: 1.16.4

The text was updated successfully, but these errors were encountered:

emeryberger · 2020-02-07T04:43:48Z

Thanks for your note and the helpful repro case!

First, note that in your original code that there's definitely a fair amount of allocation going on in the second np.array call (line 4) despite the 0 growth, as reflected by the 98.97% memory usage. This means that that line is nearly all of the malloc and free activity, but the mallocs and the frees all evened out.

Second, I strongly suspect what's happening is that the first line is allocating a bunch of objects and then freeing them, leaving behind an allocated object that is big, but smaller than the other allocated objects that were freed in the process.

In any event, the results for scalene (and memory_profiler, its "competitor" for memory profiling) can be confusing because (a) there's allocation and freeing happening on each line, and (b) there's all the implicit garbage collection that Python is doing. I broke the operations into separate lines (and inserted del to immediately trigger deallocation via ref counting) so we can more clearly see what's going on.

(Note you have to add this @profile to every function for memory_profiler to do anything; Scalene ignores such directives.)

import numpy as np

@profile
def main():
    r = list(range(10**7))
    x = np.array(r)
    y = np.random.uniform(0, 100, size=10**8)
    a = np.array(y)
    del r
    del x
    del y
    del a

main()

This is with the latest version (you may need to do a brew reinstall libscalene package and pip install --upgrade --force-reinstall scalene).

% scalene test2.py
test2.py: % of CPU time = 100.00% out of   3.37s.
  	 |     CPU % |     CPU % | Avg memory  | Memory      | 
  Line	 |  (Python) |  (native) | growth (MB) | usage (%)   | [test2.py]
--------------------------------------------------------------------------------
     1	 |           |           |             |             | import numpy as np
     2	 |           |           |             |             | 
     3	 |           |           |             |             | @profile
     4	 |           |           |             |             | def main():
     5	 |     0.35% |    10.83% |          16 |       0.98% |     r = list(range(10**7))
     6	 |     0.35% |    24.35% |          80 |       4.90% |     x = np.array(r)
     7	 |     0.35% |    43.37% |         768 |      47.06% |     y = np.random.uniform(0, 100, size=10**8)
     8	 |     0.35% |    15.63% |         768 |      47.06% |     a = np.array(y)
     9	 |     0.35% |     4.08% |         -96 |             |     del r
    10	 |           |           |         -80 |             |     del x
    11	 |           |           |        -768 |             |     del y
    12	 |           |           |        -768 |             |     del a
    13	 |           |           |             |             | 
    14	 |           |           |             |             | main()

Now you can see the allocations, mostly reasonably accounted for. (There's sampling going on, so everything is counted in multiples of 16MB.)

Running with memory_profiler (which can be over 1000x slower than Scalene, so I don't recommend using it in practice for larger programs) gives you more or less similar results (though line 5 is quite different).

Filename: test2.py

Line #    Mem usage    Increment   Line Contents
================================================
     3   45.195 MiB   45.195 MiB   @profile
     4                             def main():
     5  431.469 MiB  386.273 MiB       r = list(range(10**7))
     6  507.766 MiB   76.297 MiB       x = np.array(r)
     7 1270.730 MiB  762.965 MiB       y = np.random.uniform(0, 100, size=10**8)
     8 2033.672 MiB  762.941 MiB       a = np.array(y)
     9 1723.922 MiB    0.000 MiB       del r
    10 1723.922 MiB    0.000 MiB       del x
    11  960.980 MiB    0.000 MiB       del y
    12  198.039 MiB    0.000 MiB       del a

I don't understand the disparity between what memory_profiler and Scalene show for line 5. Looking into it.

Also, my current thinking was that showing negative growth is more useful in understanding what's going on than just printing 0, but maybe just indicating growth (mallocs) and ignoring free would be clearer and perhaps more useful? (Doing so makes it report 16MB for line 5 and 768MB for line 6.) I'd welcome your feedback.

Anyway, here's what memory_profiler says for your original program:

Filename: test2-original.py

Line #    Mem usage    Increment   Line Contents
================================================
     3   45.062 MiB   45.062 MiB   @profile
     4                             def main():
     5  197.781 MiB  152.719 MiB       x = np.array(range(10**7))
     6  197.805 MiB    0.023 MiB       np.array(np.random.uniform(0, 100, size=10**8))

You can see that memory_profiler also reports no growth for line 6, so I think it is subtracting freed memory.

emeryberger · 2020-02-09T19:08:13Z

Thanks once again for posting this. It led me to discovering some issues that I have now resolved in the latest version. The most important problem was that Scalene was missing some samples due to lost signals (something I was already accounting for, but not in the case when native code leads to many signals, which all get coalesced). This fix resolves the difference between memory_profiler and Scalene for test2.py, shown above.

test2.py: % of CPU time = 100.00% out of   3.68s.
  	 |     CPU % |     CPU % | Avg memory  | Memory      | 
  Line	 |  (Python) |  (native) | growth (MB) | usage (%)   | [test2.py]
--------------------------------------------------------------------------------
     1	 |           |           |             |             | import numpy as np
     2	 |           |           |             |             | 
     3	 |           |           |             |             | @profile
     4	 |           |           |             |             | def main():
     5	 |     0.31% |    11.25% |         405 |      22.10% |     r = list(range(10**7))
     6	 |     0.31% |    22.96% |          77 |       3.72% |     x = np.array(r)
     7	 |     0.31% |    42.46% |         763 |      37.09% |     y = np.random.uniform(0, 100, size=10**8)
     8	 |     0.31% |    15.83% |         763 |      37.09% |     a = np.array(y)
     9	 |     0.31% |     5.97% |        -391 |             |     del r
    10	 |           |           |         -76 |             |     del x
    11	 |           |           |        -763 |             |     del y
    12	 |           |           |        -763 |             |     del a
    13	 |           |           |             |             | 
    14	 |           |           |             |             | main()

For the original program you posted, here are the results:

test2-original.py: % of CPU time = 100.00% out of   4.63s.
  	 |     CPU % |     CPU % | Avg memory  | Memory      | 
  Line	 |  (Python) |  (native) | growth (MB) | usage (%)   | [test2-original.py]
--------------------------------------------------------------------------------
     1	 |           |           |             |             | import numpy as np
     2	 |           |           |             |             | 
     3	 |           |           |             |             | @profile
     4	 |           |           |             |             | def main():
     5	 |     0.23% |    53.41% |          90 |      46.24% |     x = np.array(range(10**7))
     6	 |     0.47% |    45.89% |          -0 |      53.76% |     np.array(np.random.uniform(0, 100, size=10**8))
     7	 |           |           |             |             | 
     8	 |           |           |        -839 |             | main()

As you can see, the results for Scalene (for lines 5 and 6) are similar to those from memory_profiler. I actually don't understand why memory_profiler reports a much larger number (152MB). Here is me manually verifying the size of the allocated object.

>>> import numpy as np
>>> import sys
>>> x = np.array(range(10**7))
>>> sys.getsizeof(x)
80000096
>>> sys.getsizeof(x) / (1024 * 1024)
76.29403686523438

And 90 is a lot closer to 76 than 152. :)

emeryberger · 2020-02-09T19:10:17Z

I am going to close this issue since I believe it has been resolved. I am still considering extending the output to show average mallocs, average frees, and average growth (mallocs - frees). Right now, it only shows the latter, which may be confusing.

Phylliade · 2020-02-15T10:19:33Z

Thanks for the quick support!!

Regarding

Also, my current thinking was that showing negative growth is more useful in understanding what's going on than just printing 0, but maybe just indicating growth (mallocs) and ignoring free would be clearer and perhaps more useful? (Doing so makes it report 16MB for line 5 and 768MB for line 6.)

Having the overall growth is clear to me, but also having (maybe as an option) the possibility to see detailed mallocs and frees would be interesting

About the current issue, i can get perfect results with the following:

test2.py: % of CPU time = 100.00% out of   2.82s.
  	 |     CPU % |     CPU % | Avg memory  | Memory      | 
  Line	 |  (Python) |  (native) | growth (MB) | usage (%)   | [test2.py]
--------------------------------------------------------------------------------
     1	 |           |           |             |             | import numpy as np
     2	 |           |           |             |             | 
     3	 |     0.36% |    42.25% |          88 |      46.33% | x = np.array(range(10**7))
     4	 |     0.36% |    41.10% |         763 |      26.85% | z = np.random.uniform(0, 100, size=(10**8))
     5	 |     0.36% |    15.55% |         763 |      26.83% | y = np.array(z)

90M for an array of 10**7 doubles (line 1), 760M for an array of 10**8 doubles (line 2), the same for a copy (line 3) makes sense.

However, when adjusting the script a bit, results get very strange:

test2.py: % of CPU time = 100.00% out of   2.88s.
  	 |     CPU % |     CPU % | Avg memory  | Memory      | 
  Line	 |  (Python) |  (native) | growth (MB) | usage (%)   | [test2.py]
--------------------------------------------------------------------------------
     1	 |           |           |             |             | import numpy as np
     2	 |           |           |             |             | 
     3	 |     0.36% |    42.53% |          88 |      46.33% | x = np.array(range(10**7))
     4	 |     0.71% |    56.40% |          -0 |      53.67% | y = np.array(np.random.uniform(0, 100, size=(10**8)))

And even stranger:

test2.py: % of CPU time = 100.00% out of   1.60s.
  	 |     CPU % |     CPU % | Avg memory  | Memory      | 
  Line	 |  (Python) |  (native) | growth (MB) | usage (%)   | [test2.py]
--------------------------------------------------------------------------------
     1	 |           |           |             |             | import numpy as np
     2	 |           |           |             |             | 
     3	 |           |           |             |             | #x = np.array(range(10**7))
     4	 |     1.32% |    98.68% |         388 |     100.00% | y = np.array(np.random.uniform(0, 100, size=(10**8)))

When doing the lines 2 and 3 of the first script in one line:

There is 0 memory growth if doing something before (e.g. keeping line 1)
There is a memory growth if not doing anything (moreover, this growth does not relate to the sum of the individual growths)
while we saw there was actual memory growth (which makes totally sense)

emeryberger · 2020-02-17T16:20:35Z

This was super helpful! It revealed a bug. Scalene tracks average memory growth. That means it needs to know how many times each line of code executes. But that's wrong! It really needs to know how many times each bytecode index executes. Using lines of code was leading to inaccuracies when there were lots of allocations or frees on individual lines, as in your example. I have now fixed this (commit coming momentarily). Here's the output for each of your tests.

Test 1:

% DYLD_INSERT_LIBRARIES=./libscalene.dylib PYTHONMALLOC=malloc python3 -m scalene phyllidae/test2-1.py
phyllidae/test2-1.py: % of CPU time = 100.00% out of   4.46s.
  	 |     CPU % |     CPU % | Avg memory  | Memory      | 
  Line	 |  (Python) |  (native) | growth (MB) | usage (%)   | [phyllidae/test2-1.py]
--------------------------------------------------------------------------------
     1	 |           |           |             |             | import numpy as np
     2	 |           |           |             |             | 
     3	 |     0.24% |    57.44% |          90 |      46.28% | x = np.array(range(10**7))
     4	 |     0.24% |    30.51% |         763 |      26.87% | z = np.random.uniform(0, 100, size=(10**8))
     5	 |     0.24% |    11.33% |         763 |      26.85% | y = np.array(z)

Test 2:

% DYLD_INSERT_LIBRARIES=./libscalene.dylib PYTHONMALLOC=malloc python3 -m scalene phyllidae/test2-2.py
phyllidae/test2-2.py: % of CPU time = 100.00% out of   4.36s.
  	 |     CPU % |     CPU % | Avg memory  | Memory      | 
  Line	 |  (Python) |  (native) | growth (MB) | usage (%)   | [phyllidae/test2-2.py]
--------------------------------------------------------------------------------
     1	 |           |           |             |             | import numpy as np
     2	 |           |           |             |             | 
     3	 |     0.25% |    54.74% |          90 |      46.25% | x = np.array(range(10**7))
     4	 |     0.49% |    44.52% |         763 |      53.75% | y = np.array(np.random.uniform(0, 100, size=(10**8)))

Test 3:

% DYLD_INSERT_LIBRARIES=./libscalene.dylib PYTHONMALLOC=malloc python3 -m scalene phyllidae/test2-3.py
phyllidae/test2-3.py: % of CPU time = 100.00% out of   2.08s.
  	 |     CPU % |     CPU % | Avg memory  | Memory      | 
  Line	 |  (Python) |  (native) | growth (MB) | usage (%)   | [phyllidae/test2-3.py]
--------------------------------------------------------------------------------
     1	 |           |           |             |             | import numpy as np
     2	 |           |           |             |             | 
     3	 |           |           |             |             | #x = np.array(range(10**7))
     4	 |     1.11% |    98.89% |         777 |     100.00% | y = np.array(np.random.uniform(0, 100, size=(10**8)))

Thanks again for your help in identifying this problem; please keep me updated on how it works out going forwards!

emeryberger · 2020-02-17T16:22:39Z

Fixed with emeryberger@bcaddaf.

Phylliade · 2020-03-01T17:48:05Z

Amazing!

I may have one last interrogation:
In test 2, we see a memory growth of 763 Mb for the line: y = np.array(np.random.uniform(0, 100, size=(10**8)));
However, when this line is cut into two lines in Test 1, we see that the underlying memory growth is twice: 763Mb for the np.random.uniform(0, 100, size=(10**8)) (line 2), then again 763Mb for the copy (line 3).

My interpretation is that since, in Test 2, the first np.random.uniform(0, 100, size=(10**8)) is not allocated at the Python level, there is no memory growth at the Python level, i.e. the memory is allocated after the start of the bytecode, and then freed before the end of the bytecode.
However, as we can see in Test 1, there is actually internally an allocation of 763Mb going on: It's a peak, and is untracked, because memory is released before the end of the bytecode.

Do you think there could be ways to track suck peaks? When using big numpy arrays, such internal peaks could be super useful to track, because they can be a witness of unwanted memory copies. When doing programming with big numpy arrays (for example in Machine Learning/Deep Learning), this can be a major waste of time.
(I can open a new issue if necessary!)

BTW, The new memory usage graph is awesome!

emeryberger · 2020-03-01T19:01:34Z

Scalene intercepts all memory allocations (from Python and native code). What's happening is that np.array is in fact making a copy (adding 763MB to the pre-existing 763MB from invoking np.random.uniform). It then deletes the old array. The result is a net 763MB growth. (There's no need to invoke np.array, since np.random.uniform already returns an array, but numpy is happy to make you a new array anyway.) I verified this by not subtracting deallocations. I am not really sure what the best way is to show this info -- it's not really a peak, but I can see that it's informative to see what the actual allocations vs. deallocations are, rather than just their difference. I wonder whether getting rid of the usage % column would make sense, and then have two fileds: "avg allocated" and "avg net growth"?

emeryberger · 2020-03-01T19:02:11Z

Another possibility is to work in sparklines for each line of code...

Phylliade · 2020-03-01T19:05:12Z

Having avg allocated and avg net growth seems a good idea!

emeryberger · 2020-03-01T19:28:39Z

So scrap the usage %? (Just verifying.) Also, would memory use sparklines for each line of code help (if that's even possible)?

Phylliade · 2020-03-02T19:35:36Z

Actually, having the sparklines for each line of code would be a must.
If that's not possible, the averages could be sufficient. (Anyways the averages are useful as statistics for each line, with or without sparklines)

emeryberger · 2020-03-02T19:53:30Z

I guess what I am asking is do you find the usage % info of value? I see it as potentially useful as it shows where the most allocation activity is.

emeryberger · 2020-03-03T02:52:57Z

Just committed a version with memory usage as per-line sparklines (I wonder if printing a % beside it would be of value). Take a look!
emeryberger@037dbd2

emeryberger · 2020-03-03T05:34:38Z

Memory usage: ▁▁▁▁▁▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅ (max: 1542.64MB)
phylliade/test2-3.py: % of CPU time =  89.31% out of   2.42s.
  	     |     CPU % |     CPU % | Avg memory  | Memory      | 
  Line	 |  (Python) |  (native) | growth (MB) | usage       | [phylliade/test2-3.py]
--------------------------------------------------------------------------------
     1	 |           |           |             |             | import numpy as np
     2	 |           |           |             |             | 
     3	 |           |           |             |             | @profile
     4	 |           |           |             |             | def main():
     5	 |           |           |             |             |     #x = np.array(range(10**7))
     6	 |     0.83% |    88.48% |         777 | ▅▁▅         |     y = np.array(np.random.uniform(0, 100, size=(10**8)))
     7	 |           |           |             |             | 
     8	 |           |           |             |             | main()
     9	 |           |           |             |             |

In this example, you can see the increase and decrease over time, which to me is pretty suggestive that there is copying going on (from an old, reclaimed object -- corresponding to the dip -- into the newly allocated one).

Phylliade · 2020-03-05T07:19:40Z

Amazing!!

For me, the current metrics (growth, % of use) as well as the sparklines are enough to interpret what's going on.
I'll begin with the %, and if I see something unexpected, I'll take a look at the sparklines, to try to get an explanation.

Regarding the sparline of the 6th line, shouldn't there be a "increase, increase, decrease" (e.g. allocation of first array, allocation of second array, deallocations of first array), instead of "increase, decrease, increase"?

emeryberger · 2020-03-09T15:26:15Z

Great catch. This took a surprisingly long time to unwind.

Memory usage: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅█▅ (max: 1540.58MB)
phylliade/test2-3.py: % of CPU time =  88.77% out of   2.15s.
       |    CPU % |    CPU % |  Net  | Memory                    | 
  Line | (Python) | (native) |  (MB) | usage                     | [phylliade/test2-3.py]
--------------------------------------------------------------------------------
     1 |          |          |       |                           | import numpy as np
     2 |          |          |       |                           | 
     3 |          |          |       |                           | @profile
     4 |          |          |       |                           | def main():
     5 |          |          |       |                           |     #x = np.array(range(10**7))
     6 |    0.93% |   87.84% |   777 | ▁▁▁▁▁▁▅█▅          (100%) |     y = np.array(np.random.uniform(0, 100, size=(10**8)))
     7 |          |          |       |                           | 
     8 |          |          |       |                           | main()
     9 |          |          |       |                           |

Phylliade · 2020-03-16T14:57:32Z

Nice, that's really awesome!

Closing.

(A last detail, on my terminal emulator, the sparklines touch each other:

)

But this time it's closed on my side ;) )

emeryberger closed this as completed Feb 9, 2020

emeryberger reopened this Feb 16, 2020

emeryberger closed this as completed Feb 17, 2020

emeryberger reopened this Mar 3, 2020

Phylliade closed this as completed Mar 16, 2020

emeryberger mentioned this issue Jan 23, 2023

Success stories using Scalene's optimization suggestions #554

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to explain these memory behaviors when using numpy? #31

How to explain these memory behaviors when using numpy? #31

Phylliade commented Feb 6, 2020

emeryberger commented Feb 7, 2020 •

edited

Loading

emeryberger commented Feb 9, 2020

emeryberger commented Feb 9, 2020

Phylliade commented Feb 15, 2020

emeryberger commented Feb 17, 2020

emeryberger commented Feb 17, 2020

Phylliade commented Mar 1, 2020

emeryberger commented Mar 1, 2020

emeryberger commented Mar 1, 2020

Phylliade commented Mar 1, 2020

emeryberger commented Mar 1, 2020

Phylliade commented Mar 2, 2020

emeryberger commented Mar 2, 2020

emeryberger commented Mar 3, 2020

emeryberger commented Mar 3, 2020

Phylliade commented Mar 5, 2020

emeryberger commented Mar 9, 2020

Phylliade commented Mar 16, 2020

How to explain these memory behaviors when using numpy? #31

How to explain these memory behaviors when using numpy? #31

Comments

Phylliade commented Feb 6, 2020

System info

emeryberger commented Feb 7, 2020 • edited Loading

emeryberger commented Feb 9, 2020

emeryberger commented Feb 9, 2020

Phylliade commented Feb 15, 2020

emeryberger commented Feb 17, 2020

emeryberger commented Feb 17, 2020

Phylliade commented Mar 1, 2020

emeryberger commented Mar 1, 2020

emeryberger commented Mar 1, 2020

Phylliade commented Mar 1, 2020

emeryberger commented Mar 1, 2020

Phylliade commented Mar 2, 2020

emeryberger commented Mar 2, 2020

emeryberger commented Mar 3, 2020

emeryberger commented Mar 3, 2020

Phylliade commented Mar 5, 2020

emeryberger commented Mar 9, 2020

Phylliade commented Mar 16, 2020

emeryberger commented Feb 7, 2020 •

edited

Loading