Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage analysis #372

Closed
jpmckinney opened this issue Aug 7, 2024 · 2 comments
Closed

Memory usage analysis #372

jpmckinney opened this issue Aug 7, 2024 · 2 comments

Comments

@jpmckinney
Copy link
Member

jpmckinney commented Aug 7, 2024

Setup

pip install guppy3

In settings.py:

DEBUG = False

exporter worker

The server process was using about 800MB when idle, which seems high. (Restarting it bring RES down to 65MB.)

In exporter.py, import the profile library, like one of:

from guppy import hpy; heap=hpy()
import tracemalloc; tracemalloc.start()

At the start of the callback() function:

    breakpoint()

Run:

env LOG_LEVEL=DEBUG ./manage.py exporter

In RabbitMQ, publish messages with properties of content_type = application/json and payload like {"collection_id":36,"job_id":1}

At breakpoint, run the profiling code, like one of:

heap.heap()
for stat in tracemalloc.take_snapshot().statistics('lineno')[:10]: print(stat)

On the first message, heap.heap() looks like:

Partition of a set of 348213 objects. Total size = 49102249 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  99584  29 10645651  22  10645651  22 str
     1  22956   7  8758824  18  19404475  40 types.CodeType
     2  78784  23  5682104  12  25086579  51 tuple
     3   3867   1  5231928  11  30318507  62 type
     4  47099  14  4290658   9  34609165  70 bytes
     5  24907   7  3785864   8  38395029  78 function
     6   5653   2  1759504   4  40154533  82 dict (no owner)
     7   3867   1  1377728   3  41532261  85 dict of type
     8   1146   0  1098248   2  42630509  87 dict of module
     9    602   0   520176   1  43150685  88 set

After a few messages, number of objects should stabilize, like:

Partition of a set of 348825 objects. Total size = 49193686 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  99830  29 10665554  22  10665554  22 str
     1  22979   7  8775152  18  19440706  40 types.CodeType
     2  78887  23  5691032  12  25131738  51 tuple
     3   3869   1  5235304  11  30367042  62 type
     4  47147  14  4300536   9  34667578  70 bytes
     5  24925   7  3788600   8  38456178  78 function
     6   5660   2  1761416   4  40217594  82 dict (no owner)
     7   3869   1  1378464   3  41596058  85 dict of type
     8   1147   0  1099080   2  42695138  87 dict of module
     9    602   0   520176   1  43215314  88 set

Similar behavior from mprof (two different collection IDs were submitted multiple times):

Screenshot 2024-08-07 at 6 48 54 PM
@jpmckinney
Copy link
Member Author

jpmckinney commented Aug 7, 2024

flattener worker

The server process was using about 400MB when idle, which seems high. (Restarting it bring RES down to 115MB.)

Sent 5 identical messages. Memory seems to increase with each message. Using memory-profiler:

Screenshot 2024-08-07 at 7 09 40 PM

Sent a few more identical messages. Seems like memory growth might just be GC running at different times? Using psrecord:

plot

@jpmckinney
Copy link
Member Author

jpmckinney commented Aug 8, 2024

Can't replicate in local tests and nothing obvious from reading code, so closing.

Noting that Kingfisher Process' wiper worker also had about 700MB RES before restart (61MB). Again, not sure where a leak might be, since django.db.connection.cursor shouldn't do any caching.

The RES numbers might be due to https://bloomberg.github.io/memray/memory.html#memory-is-not-freed-immediately They might not be replicated locally, since the server has much more free memory.

Pelican's data_item worker had about 1,400MB RES, but it's probably not worth investigating, in light of #291 It might also not be a leak, but just resident memory that hasn't yet been freed despite heap memory being freed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant