CPU utilization increases to 100% when broker is down/unavailable #1569

suprajhaS · 2017-11-29T21:45:22Z

Description

The CPU usage increases to 100% while trying to send messages to a kafka cluster when brokers are unavailable.
We recently had some issues with our kafka brokers going down temporarily and the CPU usage on the machines producing messages to the brokers spiked up to 100% (the top process was nginx which writes to Kafka) .

The CPU usage was still at a 100% even after the brokers were up and running.

 2285 ubuntu    20   0 2064976 471128 132608 S 139.9  6.2   9659:41 nginx
 2287 ubuntu    20   0 1998352 388580 132916 S  57.1  5.1   1416:16 nginx
 2286 ubuntu    20   0 1998304 390072 132896 S  48.5  5.1   1417:28 nginx

On debugging, we discovered that the thread which was utilizing 100% of the CPU was a librdkafka thread which kept spinning while trying to connect to a broker.
This is the stack trace of the thread -

#0  0x00007f98db1124ee in __pthread_mutex_unlock_usercnt (decr=1, mutex=mutex@entry=0x7f98b401af68)
    at pthread_mutex_unlock.c:55
#1  __GI___pthread_mutex_unlock (mutex=mutex@entry=0x7f98b401af68) at pthread_mutex_unlock.c:314
#2  0x00007f98d72bd5e9 in mtx_unlock (mtx=mtx@entry=0x7f98b401af68) at tinycthread.c:284
#3  0x00007f98d72744a7 in rd_refcnt_get (R=0x7f98b401af68) at rd.h:307
#4  rd_kafka_broker_thread_main (arg=arg@entry=0x7f98b401adc0) at rdkafka_broker.c:3070
#5  0x00007f98d72bd4d7 in _thrd_wrapper_function (aArg=<optimized out>) at tinycthread.c:624
#6  0x00007f98db10e6ba in start_thread (arg=0x7f98aeffd700) at pthread_create.c:333
#7  0x00007f98d9f1d3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

How to reproduce

Produce messages to a cluster when brokers are down and pbserve CPU after broker is back up.

Checklist

Please provide the following information:

librdkafka version (release number or git tag): v0.10.1.1
Apache Kafka version: 0.10.1.1
librdkafka client configuration: batch.num.messages=10000;queue.buffering.max.messages=1000000;log.connection.close=false
Operating system: Ubuntu 16.04
Provide logs (with debug=.. as necessary) from librdkafka
Provide broker log excerpts
Critical issue: Yes

The text was updated successfully, but these errors were encountered:

edenhill · 2017-11-29T22:24:20Z

I think you are hitting #1397
Can you try to reproduce on latest master?

suprajhaS · 2017-11-29T22:47:26Z

Ok. Will try that. Thanks!

edenhill · 2018-01-03T15:12:07Z

@suprajhaS Any luck with reproducing?

suprajhaS · 2018-01-03T18:08:54Z

We've been using the latest release (v0.11.3) since the last couple of weeks and haven't observed this issue as of now.

edenhill · 2018-01-03T18:10:02Z

@suprajhaS Great! I'll close this issue for now, please reopen if the problem resurfaces.

suprajhaS closed this as completed Nov 29, 2017

suprajhaS reopened this Nov 29, 2017

edenhill closed this as completed Jan 3, 2018

edenhill added the bug label Jan 3, 2018

wuqingjun mentioned this issue Jun 22, 2018

High CPU Usage caused by librdkafka #1858

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU utilization increases to 100% when broker is down/unavailable #1569

CPU utilization increases to 100% when broker is down/unavailable #1569

suprajhaS commented Nov 29, 2017 •

edited

Loading

edenhill commented Nov 29, 2017

suprajhaS commented Nov 29, 2017

edenhill commented Jan 3, 2018

suprajhaS commented Jan 3, 2018

edenhill commented Jan 3, 2018

CPU utilization increases to 100% when broker is down/unavailable #1569

CPU utilization increases to 100% when broker is down/unavailable #1569

Comments

suprajhaS commented Nov 29, 2017 • edited Loading

Description

How to reproduce

Checklist

edenhill commented Nov 29, 2017

suprajhaS commented Nov 29, 2017

edenhill commented Jan 3, 2018

suprajhaS commented Jan 3, 2018

edenhill commented Jan 3, 2018

suprajhaS commented Nov 29, 2017 •

edited

Loading