-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU utilization to 100% exists always on producing message stress test #1553
Comments
Thanks for a great bug report. We've fixed some critical issues since v0.11.0, could you try to reproduce this on latest git master? |
I build the library on latest git master and have a stress test just a moment ago, but met the same problem version: |
okay, thanks! |
In gdb for the spinning thread, can you do |
the backtrace changed right now when I gdb for the spinning thread, may because I builded the new library. But 'p *rkb' did not have output, like it: (gdb) bt |
|
the output is: (gdb) bt |
Thank you, The poll timeout is limited by |
I set the queue.buffering.max.ms to 1ms and retry the test , but met the same problem, and the backtrace of the spinning thread changed now: (gdb) bt I will try the larger value of queue.buffering.max.ms , like 10ms |
backtrace did not change, I observed some other spinning thread, and the backtrace is still the same: (gdb) bt |
What happens if you set queue.buffering.max.ms and socket.blocking.max.ms to 1000 ? |
I will try it right now, wait me |
there are much more failing request when both of the value set to 1000, I think it is not a reasonable value |
Right, you will need to increase message.timeout.ms to allow for the increased queueing time |
I change message.timeout.ms value to 3000 and retry it now |
The cost time of singe request is about 1 second now , It is too long for us, and the longer cost reduce the throughput of stress test , there were no errors when connecting the broker and so I can not reproduce the problem. the problem only exist when there are errors connecting the broker root@tsung02:/home/work# time curl -H "host:api.dqd.com" http://127.0.0.1/demo/kafka real 0m1.009s |
Okay. |
As for the CPU usage issue, we'll try to fix it. |
I try the value 10ms and 100ms for queue.buffering.max.ms, and problem exist when set to 10ms, I think the CPU usage problem is related to the error of connecting broker , not related to the value of queue.buffering.max.ms. Thank you very much. |
It is socket.blocking.max.ms that is affecting the io wait timeout when connecting (and otherwise), |
I set the value of socket.blocking.max.ms 5ms before( the cpu usage problem exists), and the time-consuming of single request is the minimum when I set it to 5ms ( I test all kinds of value and get the conclusion before),should I set it to 10ms now? Thanks . |
@micweaver Is this still an issue? |
The problem don't exist when I set the value of socket.blocking.max.ms to 50ms. Thank you very much. |
It is very strange, but I've encountered the same problem with Confluent.Kafka 0.11.3, but only on linux(docker image microsoft/dotnet:2.0.3-runtime). On Windows 8.1 everything is fine. |
There's been improvements of producer batching which could affect CPU usage, maybe you could try the RC https://www.nuget.org/packages/librdkafka.redist/0.11.4-RC1B ? |
You should typically not need to set socket.blocking.max.ms on Linux |
Thank you for quick response!
After setting back socket.blocking.max.ms to 1, producer performance on windows was improved greatly(messages are sent under 1ms) It seems that I need to configure kafka client conditionally(in runtime: linux/windows), but it would be great if behavior of kafka client in different runtimes was the same |
That's good news! We're looking to improve the situation on Windows, but setting |
hi @edenhill ping response time:Pinging xx.xx.xx.xx with 32 bytes of data: as discussed above, the behavior seems similar to windows to Linux latency problem are you guys working on it to improve RTT? is there any workaround for it? Thanks |
Description
we made stress test on librdkafka( use php-rdkafka base on this library) , produce large numbers of message to kafka cluster, and there were some connection errors, like it :
-192:Local: Message timed out
strangely,after a few minutes, the cpu utilization always be 100%, occupied by the php process
I observed the process hading problem like this:
root@tsung02:/home/work# ps -eLf | grep 9698
work 9698 9661 9698 0 4 11:52 ? 00:00:00 php-fpm: pool php7-www
work 9698 9661 60420 0 4 11:53 ? 00:00:00 php-fpm: pool php7-www
work 9698 9661 60426 5 4 11:53 ? 00:00:36 php-fpm: pool php7-www
work 9698 9661 60430 5 4 11:53 ? 00:00:35 php-fpm: pool php7-www
root 99109 9239 99109 0 1 12:04 pts/1 00:00:00 grep --color=auto 9698
root@tsung02:/home/work# strace -p 60420
strace: Process 60420 attached
futex(0x7f9003fff9d0, FUTEX_WAIT, 60426, NULLstrace: Process 60420 detached
<detached ...>
root@tsung02:/home/work# gdb -p 60426
(gdb) bt
#0 0x00007f903ad78cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1 0x00007f9030f42b85 in cnd_timedwait_ms (cnd=cnd@entry=0x168dec8, mtx=mtx@entry=0x168dea0, timeout_ms=timeout_ms@entry=0) at tinycthread.c:501
#2 0x00007f9030f0ee4a in rd_kafka_q_pop_serve (rkq=0x168dea0, timeout_ms=0, version=version@entry=0, cb_type=cb_type@entry=RD_KAFKA_Q_CB_RETURN,
callback=callback@entry=0x0, opaque=opaque@entry=0x0) at rdkafka_queue.c:364
#3 0x00007f9030f0efa0 in rd_kafka_q_pop (rkq=, timeout_ms=, version=version@entry=0) at rdkafka_queue.c:395
#4 0x00007f9030ef8da4 in rd_kafka_broker_serve (rkb=rkb@entry=0x1861630, abs_timeout=abs_timeout@entry=88121366769) at rdkafka_broker.c:2184
#5 0x00007f9030ef9228 in rd_kafka_broker_ua_idle (rkb=rkb@entry=0x1861630, timeout_ms=, timeout_ms@entry=-1) at rdkafka_broker.c:2270
#6 0x00007f9030ef96b8 in rd_kafka_broker_thread_main (arg=arg@entry=0x1861630) at rdkafka_broker.c:3119
#7 0x00007f9030f42927 in _thrd_wrapper_function (aArg=) at tinycthread.c:624
#8 0x00007f903ad74e25 in start_thread () from /usr/lib64/libpthread.so.0
#9 0x00007f903c0a034d in clone () from /usr/lib64/libc.so.6
all the process having problem had the same backtrace, and always be.
How to reproduce
can be reproduced when producing message on stress test
Checklist
Please provide the following information:
librdkafka version (release number or git tag): librdkafka version (build) => 0.11.0.0
Apache Kafka version: 0.11.0.1
librdkafka client configuration:
Operating system: CentOS Linux 7
Using the legacy Consumer
Using the high-level KafkaConsumer
Provide logs (with
debug=..
as necessary) from librdkafkaProvide broker log excerpts
Critical issue
The text was updated successfully, but these errors were encountered: