usage of memory and disk grow rapidly #660

yixiangxx · 2019-12-12T06:41:09Z

The usage of memory and disk is growing rapidly when I run http task

I deploy three dkron server and add 10 http task @every 1s, everything is ok at the beginning, but the usage of memory and disk is always increase, from 450M to 3G in a few hours. It seems the gc of executions not work?

or I use it in the wrong way?

yvanoers · 2019-12-12T13:48:53Z

@RoomCat Is the increase of 450M to 3G you're seeing related memory or disk?
Disk usage would increase over time, as more and more executions are stored as jobs are run. The max. number of executions stored is capped, though, IIRC at 100 per job. So at some point I'd expect the disk footprint to become steady.

The memory footprint should also level, and I expect that to happen sooner than disk usage.

This might be a memory leak. If so, it should be fairly easy to reproduce in this scenario.

To have all the variables, could you also please tell us the version of Dkron you're using, and on which platform?

davidgengenbach · 2019-12-13T01:24:10Z

I can second that! The memory footprint of Dkron seems to be quite high: we have a memory requirement of 600MiB for three jobs - one executing every minute, two executing every 30 minutes.
The (HTML) outputs from the executions are small. I am really wondering how some HTTP requests and some cluster coordination information can take up that much memory.

I really do not want to come of as ungrateful - I am grateful for Dkron but also a little worried that this will get worse as we scale up, put it into production and rely on it on a daily basis. Are there any benchmarks or something like that?
Like: job executions per hour + maximum memory requirements during these requests?

yvanoers · 2019-12-20T21:20:50Z

I've been looking into the memory usage. The 550MB+ memory footprint when Dkron starts is largely because BadgerDB allocates some memory to work with (~83MB) and fires up a cache (Ristretto under the hood) that reserves 384MB right off the bat. I haven't look at whether Ristretto can be configured to require less memory yet - nor whether that is prudent.

I did find some interesting behavior when a job gets deleted: Badger would free then reallocate memory, which does get GC'ed, but not released to the OS (at least not immediately). This causes the process to jump up ~83MB in mem use every time a job is deleted (until the GC decides to release the mem to the OS).
I have some code in place that reuses memory instead of freeing-then-reallocating, which prevents that behavior. Not sure yet whether this is worth while committing upstream.

With respect to the http executor increasing memory gradually: I haven't looked at that specifically yet, but I am wondering whether the runtime might be too busy handling tasks, causing the GC to not get a chance to free memory and/or release it to the OS.
I will investigate the http executor soon.

vcastellm · 2019-12-28T11:54:01Z

Do you have some metrics where we can observe the http executor behaviour?

Yes, Badger can consume some memory, never got a problem for me but we can experiment in memory tuning with it.

@davidgengenbach I did several tests of memory usage and amount of job executions, though I did not formalised them, I found not leakage on job amounts in the order of > 1000

Currently > 200 jobs here http://test.dkron.io:8080/dashboard/ memory usage ~2,88Gb stable since 25 days ago, though not using the http executor.

fopina · 2020-02-12T11:26:51Z

I was using 2.0.0-rc7 in one of my setups (in very low memory conditions) and 2 days upgraded to 2.0.4 and started running out of memory.

Thanks to the docker releases, took 1min to pinpoint the release

47f786d9b8c4        dk-2.0.4            0.65%               424.5MiB / 3.846GiB   10.78%              936B / 0B           8.59MB / 193kB      72
aee1f389a989        dk-2.0.0-rc7        0.70%               35.95MiB / 3.846GiB   0.91%               936B / 0B           17.5MB / 180kB      71
82f990ca504f        dk-2.0.1            2.15%               261.2MiB / 3.846GiB   6.63%               866B / 0B           61.1MB / 180kB      72
ea9307705b41        dk-2.0.2            0.74%               229.6MiB / 3.846GiB   5.83%               866B / 0B           56.3MB / 180kB      71
e280579ce6d6        dk-2.0.3            0.77%               319MiB / 3.846GiB     8.10%               866B / 0B           52.7MB / 180kB      72
2cfa8ad793b8        dk-2.0.0            0.63%               424.1MiB / 3.846GiB   10.77%              796B / 0B           2.02MB / 156kB      66
860c2b225033        dk-2.0.0-rc8        0.71%               432.7MiB / 3.846GiB   10.99%              586B / 0B           807kB / 143kB       65

So rc7 was the last one with low memory footprint.
Looking at changelog, it seems likely related to badger upgrade to v2 (done in 2.0.0-rc8)...

fopina · 2020-02-12T12:29:53Z

a quick related question: is the badger v2 upgrade valuable?
just cloned master (2.0.4), removed /v2 from those imports, dkron seems to work and memory is back down to 40M

fopina mentioned this issue Feb 12, 2020

boltdb for no-cluster setups #680

Closed

vcastellm mentioned this issue Feb 13, 2020

fix: Refactor Badger Cleanup on start #682

Merged

vcastellm closed this as completed in #682 Feb 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

usage of memory and disk grow rapidly #660

usage of memory and disk grow rapidly #660

yixiangxx commented Dec 12, 2019

yvanoers commented Dec 12, 2019

davidgengenbach commented Dec 13, 2019

yvanoers commented Dec 20, 2019

vcastellm commented Dec 28, 2019 •

edited

Loading

fopina commented Feb 12, 2020

fopina commented Feb 12, 2020

usage of memory and disk grow rapidly #660

usage of memory and disk grow rapidly #660

Comments

yixiangxx commented Dec 12, 2019

yvanoers commented Dec 12, 2019

davidgengenbach commented Dec 13, 2019

yvanoers commented Dec 20, 2019

vcastellm commented Dec 28, 2019 • edited Loading

fopina commented Feb 12, 2020

fopina commented Feb 12, 2020

vcastellm commented Dec 28, 2019 •

edited

Loading