-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU Usage caused by librdkafka #1858
Comments
Those stacktraces don't make any sense (in the librdkafka parts) unfortunately, it looks like maybe the wrong symbol file was used. What exact librdkafka version were you using (v0.11.5 is not released yet so that's not a thing)? |
@edenhill , I corrected the description by updating the call stacks. Please take a look. Sorry for confusion. We built the library from master branch, that is why the version number shows 0.11.5. And I believe we're using librdkafka.redist, I am able t find a couple of things like: This issue seems to be intermittent issue. I cannot reproduce it so far, but in our Prod environment we are able to see it still happening. And the dump file still showing almost the the same thing. What else I can check? |
@edenhill , I updated the call stack. Please take a look. Thanks! |
Do you set Btw, if you are installing the librdkafka.redist nuget package you don't need to build librdkafka yourself, it comes prebuilt. |
Are the 4 tracebacks above from the 4 threads with a lot of CPU usage? |
@edenhill , the socket.blocking.max.ms=1. in terms of the 4 thread call stacks, I don't know the CPU usage of them, the data I provided above is the time the threads has been running, more than 1 hour. The other threads mostly are less than 1 second, here is the data: |
Are you setting |
Are any of the connections to the brokers down? |
@edenhill , to your question "Are any of the connections to the brokers down", the answer is yes and no. We encountered this issue multiple times:
The reference #1762 you gave above sounds reasonable to me, because the socket.blocking.max.ms is set with a very low value, which many cause the 30 threads keep sinning. But this is intermittently happening, we have thousands of machines, but around 10 of them getting this kind of problems. But for the 2. scenario above, the socket.blocking.max.ms=1000, and there' 4 threads only, this is also causing high CPU. And the issue is intermittently happening. Looks like there would be another root cause. |
This PR aims to improve on this scenario: |
Hi @edenhill it seems you are working on CPU usage problem. let me add problem on my end ChecklistPlease provide the following information: librdkafka version (release number or git tag): master(downloaded on OCT 2018) NoteAs you mentioned in #2143 creating producer objects are costly. still, I am using the same mechanism (create/destroy on each request). I am checking on if it does create "increasing CPU usage" problem Issue:On top of librdkafka, we have our own threads which will pickup request from the queue and produce message by calling librdkakfka API. Generate issueI have used rdkafka_simple_producer.c Thanks |
openVOS is not a supported platform, none of the librdkafka maintainers have ever tried running on openVOS and it is very likely that subtle differences in threads, libc, posix, et.al, may have CPU effects. Secondly, it is strongly recommended to reuse the producer object, creating a new producer for each message is typically too costly to be practical, and really does not serve much purpose unless the messaging interval is longer than a couple of minutes between messages. |
@edenhill thank you! |
Hello I have got the similar case, here are the data: Symptom: 100% CPU Utilization.
Consumer diffs:
Note: I can do archive, on request
Working on this
gcore:
|
Hello
33 It seems like this is a deadloop:
from rdkafka_queue.c:532
print outs:
|
I can provide a coredump on demand, pls help! |
|
|
Description - updated
The dump file shows 15 threads, 4 of them shows very long CPU time. I am posting the call stacks of the 4 threads here, if you like, you can also open the attached file.
The symptom is similar to this issue but it should have been fixed: #1569
And I really see the Kafka brokers are down and up at the time this issue happened, but I am unable to reproduce this.
Dump file info, I got this by running ~*k command in windbg:
...
How to reproduce
Unfortunately I cannot reproduce this issue.
When this issue happened, we have about 40K consumers and 10K producers connected to 10 Kafka Brokers.
IMPORTANT: Always try to reproduce the issue on the latest released version (see https://github.com/edenhill/librdkafka/releases), if it can't be reproduced on the latest version the issue has been fixed.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
The text was updated successfully, but these errors were encountered: