-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two questions on multi-threads and rd_kafka_poll #399
Comments
If you dont register any of the following callbacks: dr_cb, dr_msg_cb, error_cb, stats_cb, The backside of that is that you will not get any delivery reports, so there is no telling if your messages were produced or not. |
Thanks a lot! When the process using librdkafka caused %CPU soaring up, the call stack of the corresponding thread is always like this: #0 0x0000003d6f60ba0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 |
If you're in China, did DST just kicked in? Forget my comment, I see this is reproducible. |
Yes, I'm in China. But there is no DST in China. |
Does anyone have any suggetions on config of librdkafka producer? // queue.buffering.max.messages is set to 500000 |
In some threads, I called rd_kafka_produce(), and in another thread, I called rd_kafka_poll() with timeout_ms=2000. However I often suffered that %CPU soared up intermittently. |
I have four topics to produce to, so there are four threads in which rd_kafka_produce() is called. And there is another thread runing rd_kafka_poll(). |
And what might be more helpful:
% time seconds usecs/call calls errors syscall 99.34 3.855998 55 70646 poll 100.00 3.881772 73769 1111 total
% time seconds usecs/call calls errors syscall 94.15 2.415633 603908 4 nanosleep 100.00 2.565729 18312 299 total |
restart_syscall seems to indicate that system calls are being interrupted alot, which is usually caused by signals. Is your application using any of those calls? |
The program depends on nothing but librdkafka and C++98 STL, and I never called pthread_cancel and seteuid family of functions. But I called pthread_detach and sleep functions. Do they use these signals? |
BTW, I used inotify (man 7 inotify). |
Run your program in gdb or strace or similar to see if it is being bombarded with signals. |
I used gdb to run my program and meanwhile I used top -H -p to watch the %CPU in thread granularity. In gdb there was no any signal interruption whenever. However when a single thread consumed the highest %CPU (case 2 in the top post), I interrupted the running program (by ^C) and used thread N to switch to the thread and bt. Result is following: Program received signal SIGINT, Interrupt. The result is the same. Yes, it's always the same. I've tried to debug using strace and pstack/gstack too. The results are the same. And I googled "nanosleep high cpu" and got some findings, such as: And information about the system I use: -bash-4.1$ cat /boot/config- CONFIG_HZ_100 is not setCONFIG_HZ_250 is not setCONFIG_HZ_300 is not setCONFIG_HZ_1000=y So the conclusion is that the criminal is "nanosleep" on some specific platforms. Right? |
I'm glad you found the cause of the CPU spikes. |
That's just the preliminary conclusion. Anyway, thank you so so so much. |
According to my test, if the message amount to be produced is huge, and meanwhile "queue.buffering.max.ms" and "batch.num.messages" are set small, e.g. 5 and 100, CPU utilization is opt to be high. |
That's expected, having small batches will forces a lot of overhead on high volume topics. |
Updated questions:
I have four topics to produce to, so there are four threads in which rd_kafka_produce() is called. And there is another thread running rd_kafka_poll().
However, I often suffered that %CPU soared up intermittently.
Could anyone tell how to use librdkafka in a multi-threaded program?
When %CPU soared up, I made some coredumps (by gcore) which looked like the below (not the main thread):
#0 0x0000003d6f60ba0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fe27e4b48b7 in pthread_cond_timedwait_ms (cond=0xf71a30, mutex=0xf71a08, timeout_ms=) at rdkafka.c:109
#2 0x00007fe27e4c97e0 in rd_kafka_timers_run (rk=0xf71740, timeout=) at rdkafka_timer.c:154
#3 0x00007fe27e4b502d in rd_kafka_thread_main (arg=0xf71740) at rdkafka.c:1204
#4 0x0000003d6f607a51 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003d6eee893d in clone () from /lib64/libc.so.6
And what might be more helpful:
% time seconds usecs/call calls errors syscall
99.34 3.855998 55 70646 poll
0.30 0.011476 44 263 sendmsg
0.24 0.009251 4 2267 932 futex
0.11 0.004154 593 7 madvise
0.02 0.000893 2 585 179 recvmsg
0.00 0.000000 0 1 restart_syscall
100.00 3.881772 73769 1111 total
% time seconds usecs/call calls errors syscall
94.15 2.415633 603908 4 nanosleep
5.61 0.143978 143978 1 restart_syscall
0.12 0.003122 0 17343 read
0.09 0.002197 549 4 madvise
0.03 0.000799 1 927 299 futex
0.00 0.000000 0 6 write
0.00 0.000000 0 1 open
0.00 0.000000 0 1 close
0.00 0.000000 0 6 stat
0.00 0.000000 0 1 fstat
0.00 0.000000 0 1 lseek
0.00 0.000000 0 1 mmap
0.00 0.000000 0 1 munmap
0.00 0.000000 0 5 rt_sigaction
0.00 0.000000 0 10 rt_sigprocmask
100.00 2.565729 18312 299 total
Original questions:
Firstly, would anyone be kind enough to provide some multi-thread examples using librdkafka? E.g., when and where to call rd_kafka_poll and the like? When I use librdkafka in multi-thread program, I suffered that %CPU soared up intermittently.
Secondly, what if I don't call rd_kafka_poll(), what would happen? Is this way of using librdkafka OK?
Code snippets:
// Theoritically, if outq_size() is less than qs (standing for "Queue Size"),
// there would be no chances to call rd_kafka_poll()
// FYI: typedef rd_kafka_t kafka_producer_t;
// queue.buffering.max.messages is set to 500000
// queue.buffering.max.ms is set to 5
// batch.num.messages is set to 100
int poll_producer(kafka_producer_t *producer, int milli, int how, int qs = 10000)
{
int n = 0;
}
The text was updated successfully, but these errors were encountered: