Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rd_kafka_broker_handle_ApiVersion() should not call rd_kafka_broker_fail within bufq_timeout_scan #2326

Closed
wants to merge 1 commit into from
Closed

rd_kafka_broker_handle_ApiVersion() should not call rd_kafka_broker_fail within bufq_timeout_scan #2326

wants to merge 1 commit into from

Conversation

kenneth-jia
Copy link
Contributor

@kenneth-jia kenneth-jia commented May 14, 2019

Sometimes librdkafka would crash with
*** rdkafka_buf.c:201:rd_kafka_bufq_deq: assert: rd_atomic32_get(&rkbufq->rkbq_cnt) > 0 ***

(gdb) bt
#0  0x00007f09f26e2495 in raise () from /lib64/libc.so.6
#1  0x00007f09f26e3c75 in abort () from /lib64/libc.so.6
#2  0x00007f09f3410353 in rd_kafka_crash (file=file@entry=0x7f09f34caf7f "rdkafka_buf.c", line=line@entry=197, function=function@entry=0x7f09f34cb130 <__FUNCTION__.22201> "rd_kafka_bufq_deq", rk=rk@entry=0x0,
    reason=reason@entry=0x7f09f34caff8 "assert: rd_atomic32_get(&rkbufq->rkbq_cnt) > 0") at rdkafka.c:3432
#3  0x00007f09f343b975 in rd_kafka_bufq_deq (rkbufq=rkbufq@entry=0xb32180, rkbuf=rkbuf@entry=0xb36050) at rdkafka_buf.c:197
#4  0x00007f09f342300b in rd_kafka_broker_bufq_timeout_scan (rkb=rkb@entry=0xb31fb0, is_waitresp_q=is_waitresp_q@entry=0, rkbq=rkbq@entry=0xb32180, partial_cntp=partial_cntp@entry=0x7f09ee56125c,
    err=err@entry=RD_KAFKA_RESP_ERR__TIMED_OUT_QUEUE, now=now@entry=215591720139) at rdkafka_broker.c:558
#5  0x00007f09f34291ed in rd_kafka_broker_timeout_scan (now=215591720139, rkb=0xb31fb0) at rdkafka_broker.c:594
#6  rd_kafka_broker_serve (rkb=rkb@entry=0xb31fb0, abs_timeout=abs_timeout@entry=215591720118) at rdkafka_broker.c:2562
#7  0x00007f09f3429597 in rd_kafka_broker_ua_idle (rkb=rkb@entry=0xb31fb0, timeout_ms=<optimized out>, timeout_ms@entry=-1) at rdkafka_broker.c:2617
#8  0x00007f09f3429a24 in rd_kafka_broker_thread_main (arg=arg@entry=0xb31fb0) at rdkafka_broker.c:3552
#9  0x00007f09f34763d7 in _thrd_wrapper_function (aArg=<optimized out>) at tinycthread.c:583
#10 0x00007f09f2a4baa1 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f09f2798bdd in clone () from /lib64/libc.so.6

(gdb) f 4
#4  0x00007f09f342300b in rd_kafka_broker_bufq_timeout_scan (rkb=rkb@entry=0xb31fb0, is_waitresp_q=is_waitresp_q@entry=0, rkbq=rkbq@entry=0xb32180, partial_cntp=partial_cntp@entry=0x7f09ee56125c,
err=err@entry=RD_KAFKA_RESP_ERR__TIMED_OUT_QUEUE, now=now@entry=215591720139) at rdkafka_broker.c:558

(gdb) p rkb->rkb_outbufs
$58 = {rkbq_bufs = {tqh_first = 0x0, tqh_last = 0x7f09ee560fd0}, rkbq_cnt = {val = 0}, rkbq_msg_cnt = {val = 0}}

(gdb) p &rkb->rkb_outbufs
$59 = (rd_kafka_bufq_t *) 0xb32180

(gdb) p cnt
$60 = 1

Here is how it happened,

  1. “rd_kafka_broker_bufq_timeout_scan” will iterate each buf in “rkb->rkb_outbufs”. De-queue one and process one each time.
  2. But, there’s one kind of buf, whose callback is “rd_kafka_broker_handle_ApiVersion()”, and the callback would directly call “rd_kafka_broker_failed()” in advance.
  3. Unfortunately, “rd_kafka_broker_failed()” will touch the “rkb->rkb_outbufs” (and “rkb_waitresps”) as well. It will clear the “rkb->rkb_outbufs”!
  4. Now, the “rkb_outbufs” becomes empty, within the loop in “TAILQ_FOREACH_SAFE(…, rkb_outbufs, next)” (see step 1). Since the “next” pointer previously saved could be not NULL, and loop would continue, thus call “rd_kafka_bufq_deq()” which triggered the “assert statement”.

So, "rd_kafka_broker_handle_ApiVersion()" should not always call "rd_kafka_broker_fail()", -- we should use the error code (RD_KAFKA_RESP_ERR__TIMED_OUT_QUEUE) to determin whether it happens within the rd_kafka_broker_timeout_scan iterating loop


THE FOLLOWING DISCLAIMER APPLIES TO ALL SOFTWARE CODE AND OTHER MATERIALS CONTRIBUTED IN CONNECTION WITH THIS SOFTWARE:

THIS SOFTWARE IS LICENSED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OF NON-INFRINGEMENT, ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THIS SOFTWARE MAY BE REDISTRIBUTED TO OTHERS ONLY BY EFFECTIVELY USING THIS OR ANOTHER EQUIVALENT DISCLAIMER IN ADDITION TO ANY OTHER REQUIRED LICENSE TERMS.

ONLY THE SOFTWARE CODE AND OTHER MATERIALS CONTRIBUTED IN CONNECTION WITH THIS SOFTWARE, IF ANY, THAT ARE ATTACHED TO (OR OTHERWISE ACCOMPANY) THIS SUBMISSION (AND ORDINARY COURSE CONTRIBUTIONS OF FUTURES PATCHES THERETO) ARE TO BE CONSIDERED A CONTRIBUTION. NO OTHER SOFTWARE CODE OR MATERIALS ARE A CONTRIBUTION.

@kenneth-jia
Copy link
Contributor Author

kenneth-jia commented May 14, 2019

I submitted an similar PR before, however, I'd like to use this new one instead, -- added some disclaimer (applying to Firm's rule) and changed the previous solution a bit.

@edenhill
Copy link
Contributor

Thank you for this PR.
I'm not sure what to do with the disclaimer. Any code contributions to librdkafka effectively transfer ownership of the contribution from the contributor to the librdkafka project maintainers.
librdkafka itself is already covered by the 2-clause BSD license and its disclaimer.

@kenneth-jia
Copy link
Contributor Author

Thank you for this PR.
I'm not sure what to do with the disclaimer. Any code contributions to librdkafka effectively transfer ownership of the contribution from the contributor to the librdkafka project maintainers.
librdkafka itself is already covered by the 2-clause BSD license and its disclaimer.

@edenhill ,
Sorry for putting a large paragraph of the disclaimer at the very beginning to scare people. I came back to my colleges who handled similar issues before, and, I just edited the PR, -- putting the disclaimer at the end, -- it seems like a good best practice.
Anyway, this kind of disclaimer has no harm, -- just recap those legal things. It's just our firm's rules. (normally, our developers might not notice these different licences for kinds of oss projects, -- putting the disclaimer everywhere just to avoid any trouble)

@edenhill edenhill added this to the v1.3.0 milestone Aug 31, 2019
@edenhill edenhill modified the milestones: v1.3.0, v1.4.0 Jan 20, 2020
@edenhill edenhill modified the milestones: v1.4.0, v1.5.0 Apr 6, 2020
@edenhill
Copy link
Contributor

edenhill commented May 8, 2020

Fixed in #2877

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants