-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High dhcpd memory usage #129
Comments
Should be solved on v1.9.4-rc1 but needs more testing. |
I'm not seeing this dhcpd issue so I don't think I could verify a fix. As a sidebar - what are you using for host/process monitoring with Burmilla? |
Yea that is tricky part as we see it on multiple servers on but not on all of them so need to run new RC version couple of weeks on some of those problematic ones to be sure.
That picture is from Dynatrace. Deployed as container like described on https://www.dynatrace.com/support/help/setup-and-configuration/setup-on-container-platforms/docker/set-up-dynatrace-oneagent-as-docker-container#run-oneagent-as-a-docker-container BurmillaOS is unsupported by Dynatrace but looks to be working fine. |
Thanks - that looks similar to how the Elastic Beats and Telegraf agent containers work - wasn't sure if something like that should be running as a system service or if there was some better way to manage those super privileged containers. |
On theory optimal solution would be running system-docker containers but as it runs inside of initrd any of the monitoring would not works without heavy modifications. Also as we use Debian console now it is possible to install services inside of it also if needed. Like example iscsid actually need to run for those of us who need it. Btw. I just found this issue which might affect our new rc version moby/moby#43262 |
Cool. Both new Docker v20.10.13 (which based on release notes fixed at least some OOM issue) and new LTS version 2022.02 of buildroot looks to be released today so I will prepare 1.9.4 version based on those. |
We see more servers appearing where this issue exist. Most probably it have something to do with dhcpcd log size, etc. |
I find out that issue happens on servers where a lot of containers are coming and going. I used this Docker Stack on both v1.9.3 and 1.9.5-rc1: version: "3.4"
services:
alpine:
image: alpine
command: sleep 30s
deploy:
mode: replicated
replicas: 10 Unfortunately it looks that issue happens still on 1.9.5-rc1 also (maybe situation is little bit less bad but still). However new things which I noticed was that if I use more aggressive settings like 1s sleep and 100 replicas then So I will try configuration proposed on here https://unix.stackexchange.com/a/634852 next. |
Extending cloud-init config with this one ( rancher:
services:
network:
restart: always
mem_limit: 20971520 |
We have a hardware host with v1.9.5 where the network container permanently runs out of memory. I have already set in the network container /etc/dhcpcd.conf Anything I can do to debug this? |
@netsandbox how long network container stays running when memory limit is 30 MB?
Not easily. However I see that there is quite many commits in dhcpcd after 9.4.1 version release NetworkConfiguration/dhcpcd@dhcpcd-9.4.1...master and at least two of those refers memory leak. We get dhcpcd from buildroot https://github.com/buildroot/buildroot/blob/e644e5df39c4d63ce7ae28ce2d02bfbf2a230cff/package/dhcpcd/dhcpcd.mk#L7 So we probably should try build dhcpcd from latest version on their repo and if that looks fixing issue then request them to release new version and that it gets updated to buildroot. |
When I had a look this morning on the host, I saw that there still where network container restarts in the middle of the night. We have planned for tomorrow to upgrade the host from v1.9.5 to v1.9.6. Both versions still uses the same dhcpd version, but maybe the memory problem is related to a kernel library which is used for our network interfaces. |
I think that this is actually same bug than NetworkConfiguration/dhcpcd#157 which is already fixed and plan looks to be that new dhcpcd version will be released after NetworkConfiguration/dhcpcd#149 is fixed. However os-base build tooling made by Rancher look supporting patches so I managed to build new version of dhcpcd where that single patch is included with https://github.com/burmilla/os-base/blob/c810a8a2c1818ed36bfe4e8b625c3ad7d497026d/patches/dhcpcd-9.4.1-with-405507a.patch That is now included to just released v2.0.0-beta6 In additionally you can update network container to existing v1.9.6 installation by running these commands: sudo system-docker pull burmilla/os-base:v1.9.6-dhcpcd-patched1
sudo ros config set rancher.services.network.image burmilla/os-base:v1.9.6-dhcpcd-patched1 and rebooting. But take backup/snapshot of server first and make sure that image was pulled suggesfully before second command. Other why console will not appear on next boot at all. |
After setting Regarding the |
We are seeing very high dhcpd memory usage on our environment with multiple Burmilla nodes:
Burmilla v1.9.3 uses dhcpcd v9.4.0 and there is later version 9.4.1 available. Difference can be seen from NetworkConfiguration/dhcpcd@dhcpcd-9.4.0...dhcpcd-9.4.1 with quick look it sounds that issue would be already fixed on NetworkConfiguration/dhcpcd@ba9f382
The text was updated successfully, but these errors were encountered: