Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU utilization increases to 100% when broker is down/unavailable #1569

Closed
5 of 7 tasks
suprajhaS opened this issue Nov 29, 2017 · 5 comments
Closed
5 of 7 tasks

CPU utilization increases to 100% when broker is down/unavailable #1569

suprajhaS opened this issue Nov 29, 2017 · 5 comments
Labels

Comments

@suprajhaS
Copy link

suprajhaS commented Nov 29, 2017

Description

The CPU usage increases to 100% while trying to send messages to a kafka cluster when brokers are unavailable.
We recently had some issues with our kafka brokers going down temporarily and the CPU usage on the machines producing messages to the brokers spiked up to 100% (the top process was nginx which writes to Kafka) .

The CPU usage was still at a 100% even after the brokers were up and running.

 2285 ubuntu    20   0 2064976 471128 132608 S 139.9  6.2   9659:41 nginx
 2287 ubuntu    20   0 1998352 388580 132916 S  57.1  5.1   1416:16 nginx
 2286 ubuntu    20   0 1998304 390072 132896 S  48.5  5.1   1417:28 nginx

On debugging, we discovered that the thread which was utilizing 100% of the CPU was a librdkafka thread which kept spinning while trying to connect to a broker.
This is the stack trace of the thread -

#0  0x00007f98db1124ee in __pthread_mutex_unlock_usercnt (decr=1, mutex=mutex@entry=0x7f98b401af68)
    at pthread_mutex_unlock.c:55
#1  __GI___pthread_mutex_unlock (mutex=mutex@entry=0x7f98b401af68) at pthread_mutex_unlock.c:314
#2  0x00007f98d72bd5e9 in mtx_unlock (mtx=mtx@entry=0x7f98b401af68) at tinycthread.c:284
#3  0x00007f98d72744a7 in rd_refcnt_get (R=0x7f98b401af68) at rd.h:307
#4  rd_kafka_broker_thread_main (arg=arg@entry=0x7f98b401adc0) at rdkafka_broker.c:3070
#5  0x00007f98d72bd4d7 in _thrd_wrapper_function (aArg=<optimized out>) at tinycthread.c:624
#6  0x00007f98db10e6ba in start_thread (arg=0x7f98aeffd700) at pthread_create.c:333
#7  0x00007f98d9f1d3dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

How to reproduce

Produce messages to a cluster when brokers are down and pbserve CPU after broker is back up.

Checklist

Please provide the following information:

  • librdkafka version (release number or git tag): v0.10.1.1
  • Apache Kafka version: 0.10.1.1
  • librdkafka client configuration: batch.num.messages=10000;queue.buffering.max.messages=1000000;log.connection.close=false
  • Operating system: Ubuntu 16.04
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue: Yes
@edenhill
Copy link
Contributor

I think you are hitting #1397
Can you try to reproduce on latest master?

@suprajhaS
Copy link
Author

Ok. Will try that. Thanks!

@suprajhaS suprajhaS reopened this Nov 29, 2017
@edenhill
Copy link
Contributor

edenhill commented Jan 3, 2018

@suprajhaS Any luck with reproducing?

@suprajhaS
Copy link
Author

We've been using the latest release (v0.11.3) since the last couple of weeks and haven't observed this issue as of now.

@edenhill
Copy link
Contributor

edenhill commented Jan 3, 2018

@suprajhaS Great! I'll close this issue for now, please reopen if the problem resurfaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants