Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Memory Leak in rev 1868be43 #167

Closed
r-lindner opened this issue Nov 22, 2022 · 18 comments
Closed

Remote Memory Leak in rev 1868be43 #167

r-lindner opened this issue Nov 22, 2022 · 18 comments

Comments

@r-lindner
Copy link

Hello,
I upgraded mod-gearman-module from 4.0.2 to 4.0.4 some days ago and noticed the SWAP on my gearman-job-server node went up and up until it was OOM. It is around 120MB per hour in my case. I tested the commits between 4.0.2 and 4.0.3:

[OK ] Naemon 1.3.1 + Mod-Gearman 4.0.2 (18. 18:00 - 19. 10:00)
[ERR] Naemon 1.3.1 + Mod-Gearman 4.0.3 (19. 10:00 - 19. 12:40, +100MB/h)
[OK ] Naemon 1.4.0 + Mod-Gearman 4.0.2 (19. 12:40 - 20. 11:00)
[ERR] Naemon 1.4.0 + Mod-Gearman 4.0.3 (20. 11:00 - 20. 18:00, +1000MB/7h)
[ERR] Naemon 1.4.0 + Mod-Gearman 4.0.2+7 1868be43 (21. 07:30 - 21. 10:30, +100MB/h)
[OK ] Naemon 1.4.0 + Mod-Gearman 4.0.2+4 e3fa5795 (21. 10:30 - 21. 13:30)
[OK ] Naemon 1.4.0 + Mod-Gearman 4.0.2+5 efee02a5 (21. 13:30 - 21. 15:10)
[OK ] Naemon 1.4.0 + Mod-Gearman 4.0.2+6 adc47f0c (21. 15:10 - 21. 18:00)
[ERR] Naemon 1.4.0 + Mod-Gearman 4.0.2+7 1868be43 (21. 18:00 - 22. 03:40)
[OK ] Naemon 1.4.0 + Mod-Gearman 4.0.2+6 adc47f0c (22. 03:40 -

I also tried disabling SWAP (in the first 2 cm of the image) but then RAM usage goes up.
The RAM / SWAP on the naemon node where mod-gearman-module is installed does not change.

image

@sni
Copy link
Owner

sni commented Nov 22, 2022

I'll have a look.

@dlware
Copy link

dlware commented Nov 28, 2022

I'm seeing the same thing. Post upgrade to mod_gearman-4.0.4, RAM usage grows until it exhausts. restart clears it up, but growth begins again. rinse and repeat.

RHEL7 server.

@sni
Copy link
Owner

sni commented Nov 28, 2022

it's probably this one: naemon/naemon-core#404

@sni
Copy link
Owner

sni commented Nov 29, 2022

Could you try the latest nightly naemon build to see if that helped. I don't see any leaks anymore, regardsless of using mod-gearman or not.

@r-lindner
Copy link
Author

r-lindner commented Jan 2, 2023

sorry, I was out of office for some time... the nightly naemon from the 2022-12-17 had no problems in the last 2 hours, looks good.
I was wrong, it went OOM again. I had looked at the memory consumption of the wrong server. :-( I am rolling back the module to adc47f0 again.

@lamaral
Copy link

lamaral commented Jan 4, 2023

We have also been affected by this after upgrading mod-gearman to 4.0.3.
We experienced some oom kills in our instance and upon checking, the gearman-job-server was the culprit.

I think the issue in naemon/naemon-core#404 is unrelated to this, as the leak is not in Naemon itself, but in the gearman-job-server process.

@r-lindner
Copy link
Author

The current mod-gearman and Naemon still has the memory leak :-(

tested and not working:
mod-gearman 5.0.1 + Naemon 1.4.0
mod-gearman 5.0.1 + Naemon 1.4.0-1 (nightly naemon 2022-12-17)
mod-gearman 5.0.1 + Naemon 1.4.1

My last working version is still mod-gearman 4.0.2 adc47f0, no matter which Naemon version I use.

@sni
Copy link
Owner

sni commented Feb 10, 2023

Cannot reproduce this so far. Is it the naemon process which is growing?

@r-lindner
Copy link
Author

I have gearman-job-server (and nothing else) on a separate server where other servers (mod-gearman-worker, pnp-gearman, mod-gearman-module) are connecting to. As soon as I install mod-gearman-module > 4.0.2 adc47f0 and restart the naemon process, the RAM+Swap usage on the gearman-job-server host is going up up up.

@sni
Copy link
Owner

sni commented Feb 10, 2023

and which gearmand version is that?

@r-lindner
Copy link
Author

I tried 1.1.19.1+ds-2+b2 (Debian 11) and 0.33-8 (Consol)

@lamaral
Copy link

lamaral commented Feb 10, 2023

I ran into the same issue with 1.1.18+ds-3+b3 on Debian 10.

@sni
Copy link
Owner

sni commented Jun 19, 2023

i've seen machines with high geamand memory usage but if i restart the service, memory usage is stable again (at least as long as i watched) and i still couldn't reproduce this behaviour i a lab.
Does the memory usage increase linear and directly after restarting gearmand?

@jframeau
Copy link
Contributor

jframeau commented Jun 19, 2023

Last week:
image

omd reload in crontab each day at 1 pm. If no reload, memory usage raises up to 2 Go / week.

Focus just after 1 pm:

image

So next to 1 pm and during 2 hours, memory usage is kind of flat.

gearmand 1.1.20, omd 5.10.

jfr

@ghost
Copy link

ghost commented Jun 19, 2023

Same here.
Gearmand: v1.1.19.1
Package: 5.00-labs-edition
OS: Debian 11
For me, it is 10G in two days and the gearmand service starts to be unresponsive until restart (in most cases restart does not happen nicely so have to kill it forcefully)

@sni
Copy link
Owner

sni commented Jun 19, 2023

i see, that's a good point. So i was too impatient...
Did run valgrind massif here and got similar results now:
2023-06-19_17-27

@sni
Copy link
Owner

sni commented Jun 21, 2023

indeed, seems like 1868be4 introduced this issue.
I guess it's the call to
gearman_client_add_options( client, GEARMAN_CLIENT_NON_BLOCKING|GEARMAN_CLIENT_FREE_TASKS|GEARMAN_CLIENT_UNBUFFERED_RESULT);
which results in gearmand misbehaviour.

Let's see how this can be solved...

@sni sni closed this as completed in 87e2220 Jun 21, 2023
@sni
Copy link
Owner

sni commented Jun 21, 2023

i switched back to blocking io. This seems to fix the memory issue in gearmand. I'll run some tests to see if this has any performance impacts. So far it looks promising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants