Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash observed consistently while calling begin() ( seems to have an invalid/corrupted iterator) on the vector of pointers to partition metadata of one of the topics #1790

Closed
6 of 7 tasks
ameyapg opened this issue May 2, 2018 · 7 comments

Comments

@ameyapg
Copy link

ameyapg commented May 2, 2018

Read the FAQ first: https://github.com/edenhill/librdkafka/wiki/FAQ

Description

Crash observed consistently while calling begin() ( seems to have an invalid/corrupted iterator) on the vector of pointers to partition metadata of one of the topics

How to reproduce

Crashes for this version of the method ->

bool dmapi::validate_metadata_for_topic_health(const RdKafka::Metadata &metadata, const std::vector<std::string> &topics)
{
    const std::unordered_set<std::string> topics_hashset(topics.begin(),topics.end());
	
        
    /* Iterate topics */
    RdKafka::Metadata::TopicMetadataIterator it;
    for (it = metadata.topics()->begin();
         it != metadata.topics()->end();
         ++it) {
                   
            if(topics_hashset.find((*it)->topic().c_str()) != topics_hashset.end()) {
        
				if ((*it)->err() != RdKafka::ERR_NO_ERROR) {

		        	return false; // As the leader for the local topic is not ready.
				}
                   

        	/* Iterate topic's partitions */
        	RdKafka::TopicMetadata::PartitionMetadataIterator ip;
        	for (ip = (*it)->partitions()->begin();  // Crash observed while calling begin()
            	 ip != (*it)->partitions()->end();
            	 ++ip) {
                     
		
            	if((*ip)->id() == 0 && ( (*ip)->leader() == -1 || (*ip)->err() != RdKafka::ERR_NO_ERROR ) )
            	    return false;
        		}
		
			}	
	   }	

    
    return true;
}

But does not crash for this version of the method ->

bool dmapi::validate_metadata_for_topic_health(const RdKafka::Metadata &metadata, const std::vector<std::string> &topics)
{
    const std::unordered_set<std::string> topics_hashset(topics.begin(),topics.end());
   	
   
    /* Iterate topics */
    RdKafka::Metadata::TopicMetadataIterator it;
    for (it = metadata.topics()->begin();
         it != metadata.topics()->end();
         ++it) {
      

        if (topics_hashset.find((*it)->topic().c_str()) != topics_hashset.end()) {

            if ((*it)->err() != RdKafka::ERR_NO_ERROR) {
                
                return false; 
            }

            /* Iterate topic's partitions */
            RdKafka::TopicMetadata::PartitionMetadataIterator ip;
                        
            UINT64 partitionSize = (*it)->partitions()->size();

            for (UINT64 i = 0; i < partitionSize; ++i) {
                if ((*it)->partitions()->at(i)->id() == 0 && ((*it)->partitions()->at(i)->leader() == -1 ||
                                                              (*it)->partitions()->at(i)->err() !=
                                                              RdKafka::ERR_NO_ERROR)) {  // DOES NOT CRASH
                    return false;
                }
            }
        }
    }

    return true;
}

IMPORTANT: Always try to reproduce the issue on the latest released version (see https://github.com/edenhill/librdkafka/releases), if it can't be reproduced on the latest version the issue has been fixed.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

  • librdkafka version (release number or git tag): <**v0.11.4** >

  • Apache Kafka version: <**1.0.1**>

  • librdkafka client configuration: <topic.metadata.refresh.interval.ms : 3000 , auto.offset.reset : earliest , statistics.interval.ms : 3000>

  • Operating system: <Centos 7 (x64)>

  • Provide logs (with debug=.. as necessary) from librdkafka - log snippet added below

CRASH LOG : While using LIBRDKAFKA Versions -> 0.11.4


2018-05-01T21:40:22.602028+00:00 machine001-red1 TaskC[9239]: Program terminated with signal 11, Segmentation fault.
2018-05-01T21:40:22.602109+00:00 machine001-red1 TaskC[9239]: #0  0x00007fc523825f06 in ?? () from /lib64/libstdc++.so.6
2018-05-01T21:40:22.602183+00:00 machine001-red1 TaskC[9239]: #0  0x00007fc523825f06 in ?? () from /lib64/libstdc++.so.6
2018-05-01T21:40:22.602257+00:00 machine001-red1 TaskC[9239]: #1  0x00007fc523825f50 in ?? () from /lib64/libstdc++.so.6
2018-05-01T21:40:22.602844+00:00 machine001-red1 TaskC[9239]: #2  0x000000000198d122 in __gnu_debug::_Safe_iterator_base::_Safe_iterator_base (this=0x7ffd28b8e1d0, __seq=0xa522870, __constant=true) at /usr/include/c++/4.8.2/debug/safe_base.h:89
2018-05-01T21:40:22.602939+00:00 machine001-red1 TaskC[9239]: #3  0x00000000030c1171 in __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<RdKafka::PartitionMetadata const* const*, std::__cxx1998::vector<RdKafka::PartitionMetadata const*, std::allocator<RdKafka::PartitionMetadata const*> > >, std::__debug::vector<RdKafka::PartitionMetadata const*, std::allocator<RdKafka::PartitionMetadata const*> > >::_Safe_iterator (this=0x7ffd28b8e1d0, __i=, __seq=0xa522858) at /usr/include/c++/4.8.2/debug/safe_iterator.h:152
2018-05-01T21:40:22.603021+00:00 machine001-red1 TaskC[9239]: #4  0x00000000030bc1bb in std::__debug::vector<RdKafka::PartitionMetadata const*, std::allocator<RdKafka::PartitionMetadata const*> >::begin (this=0xa522858) at /usr/include/c++/4.8.2/debug/vector:221
2018-05-01T21:40:22.603098+00:00 machine001-red1 TaskC[9239]: #5  0x00000000030b1cdf in test::validate_metadata_for_topic_health (metadata=..., topics=std::__debug::vector of length 3, capacity 3 = {...}) at /work/test.cpp:249
2018-05-01T21:40:22.603175+00:00 machine001-red1 TaskC[9239]: #6  0x00000000030b46d8 in test::Consumer::initialize (this=0xa502a60, group_id="groupid", brokers="xx.xx.xx.xx", topics=std::__debug::vector of length 3, capacity 3 = {...}, rebalance_cb=0xa496508, event_cb=0xa496510, socket_cb=0x0, consume_cb=0xa496520) at @

  • Provide broker log excerpts - N/A
  • Critical issue - YES
@ameyapg
Copy link
Author

ameyapg commented May 2, 2018

Updated checklist

@ameyapg
Copy link
Author

ameyapg commented May 15, 2018

@edenhill : Any updates about this issue? Thanks.

@edenhill
Copy link
Contributor

Try to debug the issue in gdb, your output does not tell me what line it is crashing on.

@ameyapg
Copy link
Author

ameyapg commented May 16, 2018

bool dmapi::validate_metadata_for_topic_health(const RdKafka::Metadata &metadata, const std::vectorstd::string &topics)
{
const std::unordered_setstd::string topics_hashset(topics.begin(),topics.end());

/* Iterate topics */
RdKafka::Metadata::TopicMetadataIterator it;
for (it = metadata.topics()->begin();
     it != metadata.topics()->end();
     ++it) {

        if(topics_hashset.find((*it)->topic().c_str()) != topics_hashset.end()) {

            if ((*it)->err() != RdKafka::ERR_NO_ERROR) {

                return false; // As the leader for the local topic is not ready.
            }

        **auto ip = (*it)->partitions()->begin();**
       }
   }
return true;

}


Added a simplified version of the method. The crash is observed on calling begin().

@ameyapg
Copy link
Author

ameyapg commented May 16, 2018

@edenhill: Hi, Here's the stack trace from the core file. It crashes on calling begin() on the vector as shown in the previous examples. Stack frame 4. Thanks.

(gdb) bt
#0 0x00007ff64f770f06 in ?? () from /lib64/libstdc++.so.6
#1 0x00007ff64f770f50 in ?? () from /lib64/libstdc++.so.6
#2 0x0000000001a24332 in __gnu_debug::_Safe_iterator_base::_Safe_iterator_base (this=0x7fff4b43a4c0, __seq=0xb3b2840, __constant=true)
at /usr/include/c++/4.8.2/debug/safe_base.h:89
#3 0x00000000032063df in __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<RdKafka::PartitionMetadata const* const*, std::__cxx1998::vector<RdKafka::PartitionMetadata const*, std::allocator<RdKafka::PartitionMetadata const*> > >, std::__debug::vector<RdKafka::PartitionMetadata const*, std::allocator<RdKafka::PartitionMetadata const*> > >::Safe_iterator (this=0x7fff4b43a4c0, __i=, __seq=0xb3b2828) at /usr/include/c++/4.8.2/debug/safe_iterator.h:152
**
#4 0x0000000003201a2b in std::_debug::vector<RdKafka::PartitionMetadata const*, std::allocator<RdKafka::PartitionMetadata const*> >::begin (this=0xb3b2828)
at /usr/include/c++/4.8.2/debug/vector:221
**
#5 0x00000000031f7619 in test::validate_metadata_for_topic_health (metadata=..., topics=std::__debug::vector of length 3, capacity 3 = {...})
at /work/test.cpp:239
#6 0x00000000031fa08a in test::Consumer::initialize (this=0xb38eb80, group_id="groupid", brokers="xx.xx.xx.xx",
topics=std::__debug::vector of length 3, capacity 3 = {...}, rebalance_cb=0xb322508, event_cb=0xb322510, socket_cb=0x0, consume_cb=0xb322520,
topic_health_notification_interval=1000) at /work/test.cpp:529

@ameyapg
Copy link
Author

ameyapg commented May 17, 2018

@edenhill - We've seen sporadic crashes while calling begin() on the topicMetadataVector too.
Thanks.

@edenhill
Copy link
Contributor

Could you provide a full example program that reproduces this problem?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants