rdkafka_partition.c:165:rd_kafka_toppar_remove: assert: rktp->rktp_removed==0 #558

wucanhui · 2016-02-22T07:15:14Z

We are using librdkafka-0.9.0.99, and it crash when keeping on consuming message for hours.
The logs message:
Consume failed: Local: Operation in progress
Consume failed: Local: Operation in progress
Consume failed: Local: Operation in progress
Consume failed: Local: Operation in progress
Consume failed: Local: Operation in progress
*** rdkafka_partition.c:165:rd_kafka_toppar_remove: assert: rktp->rktp_removed==0 ***
Aborted (core dumped)

The gdb info is like this:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff40195000
Core was generated by `./etld etl.properties'.
Program terminated with signal 6, Aborted.
#0 0x00007ff1e34630d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6

(gdb) bt
#0 0x00007ff1e34630d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ff1e346683b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ff1e28402aa in rd_kafka_crash (file=, line=, function=, rk=0x0,

reason=<optimized out>) at rdkafka.c:2096

#3 0x00007ff1e2870ad5 in rd_kafka_toppar_remove (rktp=0x7ff1bc003020) at rdkafka_partition.c:165
#4 0x00007ff1e2870af2 in rd_kafka_toppar_destroy_final (rktp=0x7ff1bc003020) at rdkafka_partition.c:183
#5 0x00007ff1e28514d8 in rd_kafka_topic_leader_update (rkb=0x7ff1d8002890, partition=1, rkt=0x7ff1bc0011d0,

leader_id=<optimized out>) at rdkafka_topic.c:351

#6 rd_kafka_topic_metadata_update (rkb=0x7ff1b8003f40, mdt=0x7ff1b8001547) at rdkafka_topic.c:735
#7 0x00007ff1e2860d31 in rd_kafka_parse_Metadata (rkb=0x7ff1b8003f40, rkt=0x7ff1bc0011d0, rkbuf=)

at rdkafka_request.c:1657

#8 0x00007ff1e28692b8 in rd_kafka_op_handle_Metadata (rkb=0x7ff1b8003f40, err=RD_KAFKA_RESP_ERR_NO_ERROR, rkbuf=0x2325640,

request=<optimized out>, opaque=0x7ff1bc000970) at rdkafka_request.c:1721

#9 0x00007ff1e2859959 in rd_kafka_buf_callback (rkb=0x7ff1b8003f40, err=RD_KAFKA_RESP_ERR_NO_ERROR, response=0x2325640,

request=0x23257f0) at rdkafka_buf.c:472

#10 0x00007ff1e2838609 in rd_kafka_toppar_q_cb (rk=, rko=, cb_type=,

opaque=<optimized out>) at rdkafka.c:886

#11 0x00007ff1e285bd0f in rd_kafka_q_serve (rkq=0x7ff1d8001670, timeout_ms=, max_cnt=, cb_type=0,

callback=0x7ff1e2838600 <rd_kafka_toppar_q_cb>, opaque=0x0) at rdkafka_queue.c:371

#12 0x00007ff1e2841dbd in rd_kafka_toppars_q_serve (timeout_ms=, rkq=0x7ff1d8001670) at rdkafka.c:892
#13 rd_kafka_thread_main (arg=0x7ff1d80015e0) at rdkafka.c:929
#14 0x00007ff1e28781ff in _thrd_wrapper_function (aArg=) at tinycthread.c:599
#15 0x00007ff1e3d08e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#16 0x00007ff1e352038d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x0000000000000000 in ?? ()

Do you know why it crash like this? Thanks.

The text was updated successfully, but these errors were encountered:

edenhill · 2016-03-09T15:06:09Z

From the backtrace it seems that one of your topics were suddenly missing from the metadata returned from the broker, or that the partition count changed for a topic. Does that sound familiar?

librdkafka still shouldnt fail because of this and I'll try to reproduce it.

alanyin · 2016-05-09T06:20:48Z

I have the exact same issue, the coredump stacks is the same.
Is this issue still open? Any idea when it will be fixed?

edenhill · 2016-05-09T06:23:44Z

Is this still happening on latest master?

alanyin · 2016-05-09T06:29:13Z

I downloaded the source on 9 Apr.
Didn't try the latest one? Should I try latest master?

Alan

From: Magnus Edenhill <notifications@github.com mailto:notifications@github.com>
Reply-To: edenhill/librdkafka <reply@reply.github.com mailto:reply@reply.github.com>
Date: Monday, May 9, 2016 at 4:23 PM
To: edenhill/librdkafka <librdkafka@noreply.github.com mailto:librdkafka@noreply.github.com>
Cc: Alan Yin <ayin@threatmetrix.com mailto:ayin@threatmetrix.com>, Comment <comment@noreply.github.com mailto:comment@noreply.github.com>
Subject: Re: [edenhill/librdkafka] rdkafka_partition.c:165:rd_kafka_toppar_remove: assert: rktp->rktp_removed==0 (#558)

Is this still happening on latest master?

You are receiving this because you commented.
Reply to this email directly or view it on GitHubhttps://github.com//issues/558#issuecomment-217784806

edenhill · 2016-05-09T06:33:55Z

Yes please, try latest master and if problem persists outline exact steps to reproduce. Thanks

edenhill · 2016-05-11T19:00:49Z

This seems to be caused by connecting to two clusters simultaneously where the topic state differs.
This is a problem on the user side but we still shouldn't crash.

jleducq · 2016-08-19T16:12:08Z

This error is still happening with the latest version:
librdkafka version: latest version
kafka version: 0.10.0

My setup:

bin/kafka-topics.sh --zookeeper server1,server2,server3,server4,server5,server6 --describe --topic myTopic
Topic:myTopic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: myTopic Partition: 0 Leader: 1 Replicas: 1,2 Isr: 2,1
Topic: myTopic Partition: 1 Leader: 3 Replicas: 3,4 Isr: 4,3
Topic: myTopic Partition: 2 Leader: 5 Replicas: 5,6 Isr: 5,6

My application is producing on partition 1 and consuming from partition 0 and 2.

Scenario:

While the application is running (producing) I shutdown the Kafka servers 3 and 4 (both replicas for partition 1) to see the application reaction.
I also had this issue without doing anything (all my partitions/brokers were up) and it failed randomly overnight while the application was not producing/consuming. I had to restart every single kafka broker to restart the application. Otherwise it would automatically fail at startup with the below error.

Application log:

INFO - DeliveryReportCallback: Ack of message delivery for (124 bytes): Success
INFO - DeliveryReportCallback: Ack of message delivery for (124 bytes): Success
INFO - DeliveryReportCallback: Ack of message delivery for (124 bytes): Success
INFO - DeliveryReportCallback: Ack of message delivery for (124 bytes): Success
INFO - KafkaEventCallback: Received a log event: server4:9092/4: Receive failed: Disconnected(3, FAIL)
ERROR - KafkaEventCallback: Received an error event: server4:9092/4: Receive failed: Disconnected(Local: Broker transport failure)
INFO - KafkaEventCallback: Received a log event: Topic myTopic partition count changed from 3 to 1(5, PARTCNT)
INFO - KafkaEventCallback: Received a log event: Topic myTopic: partition [1] is unknown (partition_cnt 1)(4, LEADER)
*** rdkafka_partition.c:207:rd_kafka_toppar_remove: assert: rktp->rktp_removed==0 ***
Aborted (core dumped)

GDB back trace:

#0 0x0000003c24432625 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64 cyrus-sasl-plain-2.1.23-15.el6_6.2.x86_64 db4-4.7.25-18.el6_4.x86_64 glibc-2.12-1.149.el6_6.7.x86_64 gperftools-libs-2.0-11.el6.3.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-37.el6_6.x86_64 libcom_err-1.41.12-21.el6.x86_64 libcurl-7.19.7-40.el6_6.4.x86_64 libgcc-4.4.7-11.el6.x86_64 libidn-1.18-2.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64 libssh2-1.4.2-1.el6_6.1.x86_64 libstdc++-4.4.7-11.el6.x86_64 libunwind-1.1-2.el6.x86_64 libuuid-2.17.2-12.18.el6.x86_64 nspr-4.10.6-1.el6_5.x86_64 nss-3.16.2.3-3.el6_6.x86_64 nss-softokn-freebl-3.14.3-22.el6_6.x86_64 nss-util-3.16.2.3-2.el6_6.x86_64 openssl-1.0.1e-30.el6.8.x86_64 xerces-c-3.0.1-20.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x0000003c24432625 in raise () from /lib64/libc.so.6
#1 0x0000003c24433e05 in abort () from /lib64/libc.so.6
#2 0x00007f4cc612ad33 in rd_kafka_crash (file=file@entry=0x7f4cc617f6a5 "rdkafka_partition.c", line=line@entry=207, function=function@entry=0x7f4cc6180460 <FUNCTION.22025> "rd_kafka_toppar_remove", rk=rk@entry=0x0,
reason=reason@entry=0x7f4cc617f70a "assert: rktp->rktp_removed==0") at rdkafka.c:2395
#3 0x00007f4cc6169b31 in rd_kafka_toppar_remove (rktp=rktp@entry=0x7f4c68000b40) at rdkafka_partition.c:207
#4 0x00007f4cc6169b52 in rd_kafka_toppar_destroy_final (rktp=0x7f4c68000b40) at rdkafka_partition.c:225
#5 0x00007f4cc614dd8c in rd_kafka_op_destroy (rko=0x7f4c680238b0) at rdkafka_op.c:106
#6 0x00007f4cc613b827 in rd_kafka_broker_op_serve (rkb=rkb@entry=0x13275f0, rko=0x7f4c680238b0) at rdkafka_broker.c:3016
#7 0x00007f4cc613c32b in rd_kafka_broker_serve (rkb=rkb@entry=0x13275f0, timeout_ms=timeout_ms@entry=10) at rdkafka_broker.c:3035
#8 0x00007f4cc613c689 in rd_kafka_broker_ua_idle (rkb=rkb@entry=0x13275f0, timeout_ms=timeout_ms@entry=0) at rdkafka_broker.c:3095
#9 0x00007f4cc613cf29 in rd_kafka_broker_thread_main (arg=arg@entry=0x13275f0) at rdkafka_broker.c:4407
#10 0x00007f4cc6171107 in _thrd_wrapper_function (aArg=) at tinycthread.c:611
#11 0x0000003c24c079d1 in start_thread () from /lib64/libpthread.so.0
#12 0x0000003c244e88fd in clone () from /lib64/libc.so.6

Note that I am using the KafkaConsumer, not the legacy Consumer. I also do not have a custom rebalance callback.

Thanks.

edenhill · 2016-08-29T17:40:59Z

It looks like this happens when different brokers report different partition counts for the same topic.

jleducq · 2016-09-15T12:03:20Z

Just saw that this bug was part of 0.9.2 milestone. Do you have any date in mind for the release? Thanks.

edenhill · 2016-09-15T13:14:38Z

Ive reproduced it and will fix it soon.

edenhill · 2016-09-29T08:24:56Z

This is fixed on the partition_changes branch

edenhill · 2016-10-11T15:12:06Z

This issue is now fixed on master, please try to verify the fix in your environment.
Thank you.

edenhill added bug consumer labels Mar 9, 2016

edenhill added this to the 0.9.1 milestone Mar 9, 2016

edenhill modified the milestones: 0.9.2, 0.9.1 Apr 11, 2016

edenhill removed the consumer label Sep 15, 2016

edenhill closed this as completed Oct 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rdkafka_partition.c:165:rd_kafka_toppar_remove: assert: rktp->rktp_removed==0 #558

rdkafka_partition.c:165:rd_kafka_toppar_remove: assert: rktp->rktp_removed==0 #558

wucanhui commented Feb 22, 2016

edenhill commented Mar 9, 2016

alanyin commented May 9, 2016

edenhill commented May 9, 2016

alanyin commented May 9, 2016

edenhill commented May 9, 2016

edenhill commented May 11, 2016

jleducq commented Aug 19, 2016

edenhill commented Aug 29, 2016

jleducq commented Sep 15, 2016

edenhill commented Sep 15, 2016

edenhill commented Sep 29, 2016

edenhill commented Oct 11, 2016

rdkafka_partition.c:165:rd_kafka_toppar_remove: assert: rktp->rktp_removed==0 #558

rdkafka_partition.c:165:rd_kafka_toppar_remove: assert: rktp->rktp_removed==0 #558

Comments

wucanhui commented Feb 22, 2016

edenhill commented Mar 9, 2016

alanyin commented May 9, 2016

edenhill commented May 9, 2016

alanyin commented May 9, 2016

edenhill commented May 9, 2016

edenhill commented May 11, 2016

jleducq commented Aug 19, 2016

edenhill commented Aug 29, 2016

jleducq commented Sep 15, 2016

edenhill commented Sep 15, 2016

edenhill commented Sep 29, 2016

edenhill commented Oct 11, 2016