-
-
Notifications
You must be signed in to change notification settings - Fork 39
100% CPU usage and spawns process for host user #79
Comments
I cannot reproduce the issue. Can you provide the log? Also, you do not need two containers for v4 and v6. The second config can be replace - CF_DNS__DOMAINS_0__ZONE_NAME=${DOMAIN}
- CF_DNS__DOMAINS_0__ZONE_ID=${CF_ZONE_ID:-}
- CF_DNS__DOMAINS_0__NAME=${DDNS_SUBDOMAIN}.${DOMAIN}
- CF_DNS__DOMAINS_0__PROXIED=${DDNS_PROXY}
- CF_DNS__DOMAINS_0__CREATE=true
- CF_DNS__DOMAINS_0__TYPE=A
- CF_DNS__DOMAINS_1__ZONE_NAME=${DOMAIN}
- CF_DNS__DOMAINS_1__ZONE_ID=${CF_ZONE_ID:-}
- CF_DNS__DOMAINS_1__NAME=${DDNS_SUBDOMAIN}.${DOMAIN}
- CF_DNS__DOMAINS_1__PROXIED=${DDNS_PROXY}
- CF_DNS__DOMAINS_1__CREATE=true
- CF_DNS__DOMAINS_1__TYPE=AAAA |
I will dig into this later, just dropped issue yesterday because I thought maybe it is something obvious in the code.
Thanks, this is good to know. Forwarded that information to eth-docker. |
I removed the container yesterday and restarted it today, so far I am not seeing any increase in system CPU usage. Will keep monitoring this.. Regarding, the process uid, this is likely due to
Maybe was some strange behavior when updating from 3.0.0 to 3.0.1. |
OK, I start seeing it again, my system CPU spiked around 7pm, here are logs (socker hang up looks suspicious)
This is the process that spawned for my host user that causes the CPU usage > ps -p 4086735 -o pid,user,comm,lstart,etime,args
PID USER COMMAND STARTED ELAPSED COMMAND
4086735 nico npm run start:p Tue Jun 20 16:54:59 2023 02:41:37 npm run start:pretty
---
> ps -p 4086747 -o pid,user,comm,lstart,etime,args
PID USER COMMAND STARTED ELAPSED COMMAND
4086747 nico node Tue Jun 20 16:55:00 2023 02:45:03 node index.mjs
---
> ps -p 4086747 -o %cpu,%mem,cmd
%CPU %MEM CMD
99.8 0.1 node index.mjs |
Since I still cannot reproduce the issue, I would like to ask for your help.
You should able to see something like the following log
or if you want to grep by {"level":50,"time":1687315899968,"pid":235,"hostname":"5c04695bba47","processId":"78ef9502-75f6-4416-8f7f-df4014ad4f10","method":"updateDnsRecords","msg":"Failed to update"} I would like to ask you to look for the log during the CPU spike and check the following things:
|
So either this only happens because I am running 2 containers (which would still be a bug) or this is related to the socket hangup which makes it hard to reproduce.
Glad to help debug this, thanks for providing the image. I deployed it now with the configs you mentioned,
Let's wait and see, will report back here once issue happens again |
Issue happened again, I don't see anything suspicious in the logs though. Regarding your questions:
These are the processes inside the container
Killing the process (1986) fixes the issue
No, I tested this multiple times, it is fixed for a while but then happens again.
It happens 24/7 until process is killed.
No suspicious log found
yes Those answers are probably not that helpful, I will try to find out what the process does using some others tools. |
Does it happen to IPv4 or IPv6? The default for IPv4 are IPv6
|
Happens on both, when I initially reported it there were 2 processes. Since I killed the process it didn't happen again, it has been 24h. |
I have this reproduced with two containers. Restarting with one, let's see how it does. It took one entire core, but no more. "Two services" may not be related. On another box I've been up for 8 days with two services and it doesn't take a full core. |
Collected a bit more information on what the process is doing using perf + inferno (see ChainSafe/lodestar#5604 (comment) for full instructions) but the results or not that helpful either. Would have to add A few guesses on what could be the issue:
|
@nflaig That function should not run if you did not define any webhook docker-cloudflare/packages/app/src/index.ts Lines 28 to 50 in 5b4fd91
If you are using |
You might be right, I haven't looked further into it if this is really the case. eth-docker now just uses 1 container for both ipv4 and ipv6 (traefik-cf.yml#L51) but I am still seeing this issue. Disabled the container for now on my end. I might take another look at it over the weekend if I find the time. |
I'm having the same issue: sudden and continued CPU usage of 300%. I restarted the container and it went back to normal, until it went to 100% again the next day (again for a continued period) until I restarted. No anomalies in the logs that I could find, just an endless repetition of:
|
A new version v3.1.0 has been released. I suspect it may be related to s6-overlay. Please see if there are still any issues, |
Deployed, s6-overlay seems like a good candidate since the process was also spawned for my host user. Not sure when I can confirm that the issue is fixed, guess have to at least wait for 1 week. |
So far so good on my end. |
It happened again with v3.1.0. The difference now seems to be that the process is executed by root instead of my host user. > ps -p 1032708 -o pid,user,comm,lstart,etime,args
PID USER COMMAND STARTED ELAPSED COMMAND
1032708 root node Thu Jul 6 23:00:00 2023 09:10:48 node index.mjs
> ps -p 1032708 -o %cpu,%mem,cmd
%CPU %MEM CMD
99.9 0.1 node index.mjs |
How about |
Yes, same result there
|
Idem on my end: once again 100 to 200% CPU usage since yesterday.
|
I have published a Debian based image. The tag have |
Deployed |
Edit: Issue happens with the Debian image as well |
I have a test image Can you try it? |
Deployed |
so far so good on my end, anyone else running this image? |
It has been over a week now and still no problems, if you do another release with fixes from |
Can you check if it is similar to pi-hole/docker-pi-hole#1042 (comment)? See if it is related to the version |
Doesn't use libseccomp from what I can see
It's an alpine image If it were to use it, it'd be a recent version
|
@yorickdowne You should look at the link in the comment. https://docs.linuxserver.io/faq
It is talking about host. I have checkout my host which uses 2.5.1 and I did not have the problem. |
Oh, wow. Let me see! I'll say "probably not". I have seen this behavior and I am on either Ubuntu 22.04 (libseccomp2 2.5.3) or Debian 12 (libseccomp2 2.5.4). |
This issue is somewhat rare. I have three machines right now where I'm seeing it; one of those is on Debian 12, libseccomp2 2.5.4, docker 24.0.5 I'll leave it in this state. What data can I get you? |
From the issue above, it should only happen in old version of libseccomp. |
Which means this is not that same issue. Whatever is going on here, is not related to the issue that pihole encountered. |
Can you check is your system using a 32 or 64 bits OS? |
Always 64-bit. These are all pretty simple systems: Either baremetal (OVH) or an AWS/GCP instance, with usually Debian 12 and some Ubuntu 22.04. Sample Debian uname
Sample Ubuntu uname
|
These overwhelmingly use docker-ce, though a few are on docker.io. docker.io is 10.20.x, and docker-ce currently 24.0.x The number of systems that show the issue is a simple matter of time. 3 days ago it was 3, and I restarted 2 of them; now it is 8. |
Can you test |
@joshuaavalon I would suggest to get another perf profile (#79 (comment)), could you publish an image with |
@nflaig You should able to use add |
@joshuaavalon yeah right, that's a bit simpler than another build With This is sadly not much more helpful then the previous flame graph. |
|
I can confirm this behaviour on the current When running htop on the host system, I can see 100% cpu usage for this process:
|
Can you try feat-docker tag and see how it behaves @leomeinel ? That one updates node |
I'll test that tag in the following days, thanks for the suggestion. Are there plans to merge this into latest/main if it resolves this issue? |
I deployed |
I can merge if you don't have any problems. |
Unfortunately this issue continues with |
I can confirm. For me it also behaves the same. |
Issues are closed after 30 days of inactivity. It’s been at least 20 days since the last update here. |
Any further ideas @joshuaavalon ? This has been a bear of an issue to track down. I am surprised you cannot reproduce. If I run this project on a sufficient number of servers, I can always see the high CPU issue crop up within weeks. |
@yorickdowne This docker is just cron job running a Node.js script. If you see my replies above, my best guest is the related to base Docker image cron. There are really nothing I can change. You can try different base image to see anything different. |
Is there an existing issue for this?
Version
3.0.1
Describe The Bug
Causes 100% CPU utilization 24/7 and spawns process for host user.
CPU usage
User processes
Steps To Reproduce
Run cloudflare-dds via docker, see https://github.com/eth-educators/eth-docker/blob/63b651d56e83d7b513d9350cd00eca3383eecfc0/traefik-cf.yml#L52
Expected Behavior
Should not spawn process for host user and cause 100% CPU utilization 24/7.
ENV Configuration
JS Configuration
No response
Relevant Output Log
No response
The text was updated successfully, but these errors were encountered: