Memory usage analysis #372

jpmckinney · 2024-08-07T21:48:31Z

Setup

pip install guppy3

In settings.py:

DEBUG = False

exporter worker

The server process was using about 800MB when idle, which seems high. (Restarting it bring RES down to 65MB.)

In exporter.py, import the profile library, like one of:

from guppy import hpy; heap=hpy()
import tracemalloc; tracemalloc.start()

At the start of the callback() function:

    breakpoint()

Run:

env LOG_LEVEL=DEBUG ./manage.py exporter

In RabbitMQ, publish messages with properties of content_type = application/json and payload like {"collection_id":36,"job_id":1}

At breakpoint, run the profiling code, like one of:

heap.heap()
for stat in tracemalloc.take_snapshot().statistics('lineno')[:10]: print(stat)

On the first message, heap.heap() looks like:

Partition of a set of 348213 objects. Total size = 49102249 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  99584  29 10645651  22  10645651  22 str
     1  22956   7  8758824  18  19404475  40 types.CodeType
     2  78784  23  5682104  12  25086579  51 tuple
     3   3867   1  5231928  11  30318507  62 type
     4  47099  14  4290658   9  34609165  70 bytes
     5  24907   7  3785864   8  38395029  78 function
     6   5653   2  1759504   4  40154533  82 dict (no owner)
     7   3867   1  1377728   3  41532261  85 dict of type
     8   1146   0  1098248   2  42630509  87 dict of module
     9    602   0   520176   1  43150685  88 set

After a few messages, number of objects should stabilize, like:

Partition of a set of 348825 objects. Total size = 49193686 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  99830  29 10665554  22  10665554  22 str
     1  22979   7  8775152  18  19440706  40 types.CodeType
     2  78887  23  5691032  12  25131738  51 tuple
     3   3869   1  5235304  11  30367042  62 type
     4  47147  14  4300536   9  34667578  70 bytes
     5  24925   7  3788600   8  38456178  78 function
     6   5660   2  1761416   4  40217594  82 dict (no owner)
     7   3869   1  1378464   3  41596058  85 dict of type
     8   1147   0  1099080   2  42695138  87 dict of module
     9    602   0   520176   1  43215314  88 set

Similar behavior from mprof (two different collection IDs were submitted multiple times):

The text was updated successfully, but these errors were encountered:

jpmckinney · 2024-08-07T23:12:06Z

flattener worker

The server process was using about 400MB when idle, which seems high. (Restarting it bring RES down to 115MB.)

Sent 5 identical messages. Memory seems to increase with each message. Using memory-profiler:

Sent a few more identical messages. Seems like memory growth might just be GC running at different times? Using psrecord:

jpmckinney · 2024-08-08T06:10:35Z

Can't replicate in local tests and nothing obvious from reading code, so closing.

Noting that Kingfisher Process' wiper worker also had about 700MB RES before restart (61MB). Again, not sure where a leak might be, since django.db.connection.cursor shouldn't do any caching.

The RES numbers might be due to https://bloomberg.github.io/memray/memory.html#memory-is-not-freed-immediately They might not be replicated locally, since the server has much more free memory.

Pelican's data_item worker had about 1,400MB RES, but it's probably not worth investigating, in light of #291 It might also not be a leak, but just resident memory that hasn't yet been freed despite heap memory being freed.

jpmckinney closed this as completed Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage analysis #372

Memory usage analysis #372

jpmckinney commented Aug 7, 2024 •

edited

Loading

jpmckinney commented Aug 7, 2024 •

edited

Loading

jpmckinney commented Aug 8, 2024 •

edited

Loading

Memory usage analysis #372

Memory usage analysis #372

Comments

jpmckinney commented Aug 7, 2024 • edited Loading

Setup

exporter worker

jpmckinney commented Aug 7, 2024 • edited Loading

flattener worker

jpmckinney commented Aug 8, 2024 • edited Loading

jpmckinney commented Aug 7, 2024 •

edited

Loading

jpmckinney commented Aug 7, 2024 •

edited

Loading

jpmckinney commented Aug 8, 2024 •

edited

Loading