Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very high memory usage #390

Closed
FallingSnow opened this issue Sep 12, 2022 · 22 comments
Closed

Very high memory usage #390

FallingSnow opened this issue Sep 12, 2022 · 22 comments
Assignees
Labels
blocker Blocks the next release it's targeted for

Comments

@FallingSnow
Copy link

I'm running inadyn within a docker container and am seeing some unusually high memory usage. It's configured for only 6 cloudflare domains (3 using ipv6).

$ ps aux | grep inadyn
root     2532483  0.2  4.8 4022684 997664 ?      Ss   Aug18  78:30 /usr/sbin/inadyn --foreground
$ grep Vm /proc/2532483/status
17:VmPeak:	 4022004 kB
18:VmSize:	 4021896 kB
19:VmLck:	       0 kB
20:VmPin:	       0 kB
21:VmHWM:	 1001644 kB
22:VmRSS:	  997320 kB
26:VmData:	 4015544 kB
27:VmStk:	     132 kB
28:VmExe:	      48 kB
29:VmLib:	    2456 kB
30:VmPTE:	    7916 kB
31:VmSwap:	 1759296 kB

From within the container:

/ # inadyn --version
2.9.1
/ # uptime
 05:17:50 up 43 days,  7:42,  load average: 0.42, 0.69, 0.90
@FallingSnow
Copy link
Author

Before and after stopping inadyn. ~1.3gb released.

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            19Gi        14Gi       961Mi       247Mi       4.2Gi       4.6Gi
Swap:          9.7Gi       6.1Gi       3.6Gi
$ free -h
               total        used        free      shared  buff/cache   available
Mem:            19Gi        12Gi       2.3Gi       247Mi       4.2Gi       5.9Gi
Swap:          9.7Gi       4.4Gi       5.3Gi

@troglobit
Copy link
Owner

Yeah that doesn't look right ... even thought it's a bit hard sometimes to see what's actually being used and what's buffered by the kernel.

Unfortunately I'm not using the Cloudflare plugin myself (don't even have a login for the automatic tests, anyone able to help donate a login would be very appreciated), so I can't test or verify this. Maybe @SimonPilkington can shed some light on this apparent leak?

The Cloudflare plugin is possibly the most advanced in In-a-Dyn. It does a lot of malloc/free and even juggles json to figure stuff out. With that complexity it's not unlikely to be a leak in there.

@SimonPilkington
Copy link
Contributor

I'm afraid I have no insight to offer. I run inadyn on an embedded device and after 19 days of uptime its VSZ is less than 4MiB. Admittedly my setup is much simpler with only one IPv4 domain but I have no good way to investigate this further.

@SimonPilkington
Copy link
Contributor

@FallingSnow can consider running Inadyn outside of the container with the same config and under Valgrind assuming they are technically savvy to see if it finds any leaks.

@FallingSnow
Copy link
Author

I don't currently have the time to look into this but if I do in the future I'll probably do as you said and run inadyn via Valgrind for a few days and see if memory usage is climbing.

@SimonPilkington are you using the cloudflare plugin?

@troglobit
Copy link
Owner

That sounds like a great plan. Someone who has the ability to run and test is imo best suited to run valgrind --leak-check=yes --leak-check=full inadyn ... to diagnose this.

@FallingSnow yeah Simon is the main author of that plugin :)

@SimonPilkington
Copy link
Contributor

Yes, I've been using it basically every day since my initial version made it into the RMerlin firmware. If there were leaks along any common path I believe I'd notice them blowing up my router. I did also run Valgrind with my config and cleaned up everything it found, but that was some time ago. If you do find anything feel free to report back.

@FallingSnow
Copy link
Author

So I just checked it and inadyn was using 180MB of ram. So I've started it up using valgrind --leak-check=yes --leak-check=full inadyn --foreground inside the container. Will report back in about 1-2 weeks.

@FallingSnow
Copy link
Author

Okay, It's been about 2 weeks.

$ ps aux | grep inadyn
root      505883  2.6  5.6 1455296 1143348 ?     Rs   Sep23 446:26 valgrind --leak-check=yes --leak-check=full inadyn --foreground
$ grep Vm /proc/505883/status 
17:VmPeak:	 1455328 kB
18:VmSize:	 1455328 kB
19:VmLck:	       0 kB
20:VmPin:	       0 kB
21:VmHWM:	 1143612 kB
22:VmRSS:	 1143600 kB
26:VmData:	 1446484 kB
27:VmStk:	     132 kB
28:VmExe:	    1444 kB
29:VmLib:	    2628 kB
30:VmPTE:	    2860 kB
31:VmSwap:	   50808 kB

And finally the good stuffs.

...
inadyn[1]: STOP command received, exiting.
==1== 
==1== HEAP SUMMARY:
==1==     in use at exit: 1,011,273,497 bytes in 556,964 blocks
==1==   total heap usage: 1,113,636,376 allocs, 1,113,079,412 frees, 231,191,705,460 bytes allocated
==1== 
==1== 184 bytes in 1 blocks are possibly lost in loss record 3 of 36
==1==    at 0x48A6BEE: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1==    by 0x49DE0C9: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x494C181: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x4975657: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x494C4AD: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493EF27: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493F301: gnutls_handshake (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x114837: ssl_open (gnutls.c:237)
==1==    by 0x1128B6: http_init (http.c:68)
==1==    by 0x10F7B2: server_transaction (ddns.c:139)
==1==    by 0x10F7B2: get_address_remote.constprop.0 (ddns.c:341)
==1==    by 0x10FD67: get_address_backend.constprop.0 (ddns.c:501)
==1==    by 0x10FFE0: get_address (ddns.c:533)
==1==    by 0x10FFE0: check_address (ddns.c:876)
==1==    by 0x10FFE0: ddns_main_loop (ddns.c:984)
==1== 
==1== 16,895 bytes in 1 blocks are possibly lost in loss record 4 of 36
==1==    at 0x48A26D5: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1==    by 0x4938090: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493087F: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x4939B9F: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493BB39: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493EF80: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493F301: gnutls_handshake (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x114837: ssl_open (gnutls.c:237)
==1==    by 0x1128B6: http_init (http.c:68)
==1==    by 0x10F7B2: server_transaction (ddns.c:139)
==1==    by 0x10F7B2: get_address_remote.constprop.0 (ddns.c:341)
==1==    by 0x10FA4F: get_address_backend.constprop.0 (ddns.c:481)
==1==    by 0x10FFE0: get_address (ddns.c:533)
==1==    by 0x10FFE0: check_address (ddns.c:876)
==1==    by 0x10FFE0: ddns_main_loop (ddns.c:984)
==1== 
==1== 33,790 bytes in 2 blocks are possibly lost in loss record 5 of 36
==1==    at 0x48A26D5: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1==    by 0x4938090: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493087F: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x4939B9F: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493BB39: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493EF80: ??? (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x493F301: gnutls_handshake (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x114837: ssl_open (gnutls.c:237)
==1==    by 0x1128B6: http_init (http.c:68)
==1==    by 0x10F7B2: server_transaction (ddns.c:139)
==1==    by 0x10F7B2: get_address_remote.constprop.0 (ddns.c:341)
==1==    by 0x10FD67: get_address_backend.constprop.0 (ddns.c:501)
==1==    by 0x10FFE0: get_address (ddns.c:533)
==1==    by 0x10FFE0: check_address (ddns.c:876)
==1==    by 0x10FFE0: ddns_main_loop (ddns.c:984)
==1== 
==1== 502,028,604 (89,152,624 direct, 412,875,980 indirect) bytes in 12,206 blocks are definitely lost in loss record 35 of 36
==1==    at 0x48A6BEE: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1==    by 0x495E460: gnutls_init (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x114725: ssl_open (gnutls.c:205)
==1==    by 0x1128B6: http_init (http.c:68)
==1==    by 0x10F7B2: server_transaction (ddns.c:139)
==1==    by 0x10F7B2: get_address_remote.constprop.0 (ddns.c:341)
==1==    by 0x10FD67: get_address_backend.constprop.0 (ddns.c:501)
==1==    by 0x10FFE0: get_address (ddns.c:533)
==1==    by 0x10FFE0: check_address (ddns.c:876)
==1==    by 0x10FFE0: ddns_main_loop (ddns.c:984)
==1==    by 0x10E133: main (main.c:557)
==1== 
==1== 509,193,951 (89,934,152 direct, 419,259,799 indirect) bytes in 12,313 blocks are definitely lost in loss record 36 of 36
==1==    at 0x48A6BEE: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1==    by 0x495E460: gnutls_init (in /usr/lib/libgnutls.so.30.34.1)
==1==    by 0x114725: ssl_open (gnutls.c:205)
==1==    by 0x1128B6: http_init (http.c:68)
==1==    by 0x10F7B2: server_transaction (ddns.c:139)
==1==    by 0x10F7B2: get_address_remote.constprop.0 (ddns.c:341)
==1==    by 0x10FA4F: get_address_backend.constprop.0 (ddns.c:481)
==1==    by 0x10FFE0: get_address (ddns.c:533)
==1==    by 0x10FFE0: check_address (ddns.c:876)
==1==    by 0x10FFE0: ddns_main_loop (ddns.c:984)
==1==    by 0x10E133: main (main.c:557)
==1== 
==1== LEAK SUMMARY:
==1==    definitely lost: 179,086,776 bytes in 24,519 blocks
==1==    indirectly lost: 832,135,779 bytes in 532,439 blocks
==1==      possibly lost: 50,869 bytes in 4 blocks
==1==    still reachable: 73 bytes in 2 blocks
==1==         suppressed: 0 bytes in 0 blocks
==1== Reachable blocks (those to which a pointer was found) are not shown.
==1== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1== 
==1== For lists of detected and suppressed errors, rerun with: -s
==1== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0)

@troglobit
Copy link
Owner

Ah, great stuff thanks!

I see you're using the GnuTLS backend. Maybe @SimonPilkington is using OpenSSL? That could explain why you see such different behavior.

This is so much easier to debug and gives a great idea of where the root cause to this lies, awesome!

@SimonPilkington
Copy link
Contributor

Indeed. The RMerlin firmware builds inadyn with OpenSSL and I also used OpenSSL when I was doing testing on x86. Will you be handling this then @troglobit?

@troglobit
Copy link
Owner

Yeah I'll have a look at it @SimonPilkington from the outside it looks like another case of different behavior between the two backends. Shouldn't be impossible to find, but I can't give any promises on timelines.

@troglobit troglobit self-assigned this Oct 5, 2022
@troglobit troglobit added blocker Blocks the next release it's targeted for and removed help-wanted labels Oct 5, 2022
@FallingSnow
Copy link
Author

@troglobit Just so you know I'm using https://hub.docker.com/r/troglobit/inadyn in case you want to try to reproduce the issue. Otherwise I can test.

@troglobit
Copy link
Owner

@FallingSnow is it the latest tag, and if so from which date, or is it the latest tag, v2.9.1?

I've got no login at Cloudflare to test with so I've tested some long-runs against FreeDNS with GnuTLS using the latest mainline inadyn. So far no leaks in sight :-/

@FallingSnow
Copy link
Author

FallingSnow commented Oct 8, 2022

Latest tag. Created | 2022-09-23 10:20:36

That is the creation date of the container for which the leak report above is for.

I have this for the image:

Image details
ID 	sha256:a5bde0c73041823d2a4f36326b10be45eb49916d34283e623f9ea280f7142dc2
Size 	12.5 MB
Created 	2022-09-04 15:09:07
Build 	Docker on linux, amd64

@FallingSnow
Copy link
Author

FallingSnow commented Oct 8, 2022

Sorry for the close, misclicked.

@FallingSnow FallingSnow reopened this Oct 8, 2022
@troglobit
Copy link
Owner

There, I'm pretty sure I've found the root cause of the leak(s). We had a similar issue with the OpenSSL backend a year ago, and like this one, was caused by missing calls to close the session in case of various types of errors, e.g. DDNS server timeouts etc.

@troglobit
Copy link
Owner

Docker Hub container images updated.

If you have a chance to verify it, @FallingSnow, I'd be very thankful, because I think it's time to cut a new official release.

@FallingSnow
Copy link
Author

Testing now.

@troglobit
Copy link
Owner

Any updates on the memory usage, @FallingSnow?

@FallingSnow
Copy link
Author

Looks like the issue has been solved. Thanks for your work @troglobit.

$ ps aux | grep inadyn
root     2261324  0.6  0.0  13852  8272 ?        Ss   Oct09 109:24 inadyn --foreground
$ grep Vm /proc/2261324/status
17:VmPeak:	   14064 kB
18:VmSize:	   13852 kB
19:VmLck:	       0 kB
20:VmPin:	       0 kB
21:VmHWM:	   10980 kB
22:VmRSS:	    8272 kB
26:VmData:	    7180 kB
27:VmStk:	     132 kB
28:VmExe:	      48 kB
29:VmLib:	    2544 kB
30:VmPTE:	      64 kB
31:VmSwap:	     204 kB

@troglobit
Copy link
Owner

Awesome, thanks for reporting back! ❤️

Well then it appears it's time for a release! 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker Blocks the next release it's targeted for
Projects
None yet
Development

No branches or pull requests

3 participants