Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usage of memory and disk grow rapidly #660

Closed
yixiangxx opened this issue Dec 12, 2019 · 6 comments · Fixed by #682
Closed

usage of memory and disk grow rapidly #660

yixiangxx opened this issue Dec 12, 2019 · 6 comments · Fixed by #682

Comments

@yixiangxx
Copy link

The usage of memory and disk is growing rapidly when I run http task

I deploy three dkron server and add 10 http task @every 1s, everything is ok at the beginning, but the usage of memory and disk is always increase, from 450M to 3G in a few hours. It seems the gc of executions not work?

or I use it in the wrong way?

@yvanoers
Copy link
Collaborator

@RoomCat Is the increase of 450M to 3G you're seeing related memory or disk?
Disk usage would increase over time, as more and more executions are stored as jobs are run. The max. number of executions stored is capped, though, IIRC at 100 per job. So at some point I'd expect the disk footprint to become steady.

The memory footprint should also level, and I expect that to happen sooner than disk usage.

This might be a memory leak. If so, it should be fairly easy to reproduce in this scenario.

To have all the variables, could you also please tell us the version of Dkron you're using, and on which platform?

@davidgengenbach
Copy link

I can second that! The memory footprint of Dkron seems to be quite high: we have a memory requirement of 600MiB for three jobs - one executing every minute, two executing every 30 minutes.
The (HTML) outputs from the executions are small. I am really wondering how some HTTP requests and some cluster coordination information can take up that much memory.

I really do not want to come of as ungrateful - I am grateful for Dkron but also a little worried that this will get worse as we scale up, put it into production and rely on it on a daily basis. Are there any benchmarks or something like that?
Like: job executions per hour + maximum memory requirements during these requests?

@yvanoers
Copy link
Collaborator

I've been looking into the memory usage. The 550MB+ memory footprint when Dkron starts is largely because BadgerDB allocates some memory to work with (~83MB) and fires up a cache (Ristretto under the hood) that reserves 384MB right off the bat. I haven't look at whether Ristretto can be configured to require less memory yet - nor whether that is prudent.

I did find some interesting behavior when a job gets deleted: Badger would free then reallocate memory, which does get GC'ed, but not released to the OS (at least not immediately). This causes the process to jump up ~83MB in mem use every time a job is deleted (until the GC decides to release the mem to the OS).
I have some code in place that reuses memory instead of freeing-then-reallocating, which prevents that behavior. Not sure yet whether this is worth while committing upstream.

With respect to the http executor increasing memory gradually: I haven't looked at that specifically yet, but I am wondering whether the runtime might be too busy handling tasks, causing the GC to not get a chance to free memory and/or release it to the OS.
I will investigate the http executor soon.

@vcastellm
Copy link
Member

vcastellm commented Dec 28, 2019

Do you have some metrics where we can observe the http executor behaviour?

Yes, Badger can consume some memory, never got a problem for me but we can experiment in memory tuning with it.

@davidgengenbach I did several tests of memory usage and amount of job executions, though I did not formalised them, I found not leakage on job amounts in the order of > 1000

Currently > 200 jobs here http://test.dkron.io:8080/dashboard/ memory usage ~2,88Gb stable since 25 days ago, though not using the http executor.

@fopina
Copy link
Contributor

fopina commented Feb 12, 2020

I was using 2.0.0-rc7 in one of my setups (in very low memory conditions) and 2 days upgraded to 2.0.4 and started running out of memory.

Thanks to the docker releases, took 1min to pinpoint the release

47f786d9b8c4        dk-2.0.4            0.65%               424.5MiB / 3.846GiB   10.78%              936B / 0B           8.59MB / 193kB      72
aee1f389a989        dk-2.0.0-rc7        0.70%               35.95MiB / 3.846GiB   0.91%               936B / 0B           17.5MB / 180kB      71
82f990ca504f        dk-2.0.1            2.15%               261.2MiB / 3.846GiB   6.63%               866B / 0B           61.1MB / 180kB      72
ea9307705b41        dk-2.0.2            0.74%               229.6MiB / 3.846GiB   5.83%               866B / 0B           56.3MB / 180kB      71
e280579ce6d6        dk-2.0.3            0.77%               319MiB / 3.846GiB     8.10%               866B / 0B           52.7MB / 180kB      72
2cfa8ad793b8        dk-2.0.0            0.63%               424.1MiB / 3.846GiB   10.77%              796B / 0B           2.02MB / 156kB      66
860c2b225033        dk-2.0.0-rc8        0.71%               432.7MiB / 3.846GiB   10.99%              586B / 0B           807kB / 143kB       65

So rc7 was the last one with low memory footprint.
Looking at changelog, it seems likely related to badger upgrade to v2 (done in 2.0.0-rc8)...

@fopina
Copy link
Contributor

fopina commented Feb 12, 2020

a quick related question: is the badger v2 upgrade valuable?
just cloned master (2.0.4), removed /v2 from those imports, dkron seems to work and memory is back down to 40M :trollface:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants