Error sending heartbeat: Kafka::RebalanceInProgress #847

marcoroth · 2020-07-01T08:57:31Z

Hey there

We are using racecar to write our Kafka consumers. While running them in staging and production we noticed this exception multiple times for a longer period of time in our logs:

{}:] Error sending heartbeat: Kafka::RebalanceInProgress

After debugging a little bit it seems like that the exception is raised here in the ruby-kafka library. Also all the code for handling the heartbeat is here - that's why I opened this issue in this repo.

The RebalanceInProgress exception is getting catched and logged here:

ruby-kafka/lib/kafka/consumer_group.rb

Lines 116 to 118 in f7c8510

    
           rescue ConnectionError, UnknownMemberId, RebalanceInProgress, IllegalGeneration => e 
        
             @logger.error "Error sending heartbeat: #{e}" 
        
             raise HeartbeatError, e

After testing for a while we figured out that there is no "issue" or miss-behaviour if that exception is thrown - but it is logged as an error anyway. The exception gets thrown every time a consumer joins or leaves a consumer-group => every time the group is rebalancing.

My question now is: Is this really an error which needs to get logged on the error log-level or is there a negative side-effect we are not seeing/experiencing and should handle while a rebalance is in progress.

I haven't seen an option to change the behaviour of the heartbeats. The only option I see is the heartbeat_interval config. But changing the value has no effect - the exceptions are thrown instantly.

Versions

Version of Ruby: 2.6.5
Version of Kafka: 2.5.0
Version of ruby-kafka: 0.7.10
Version of racecar: 1.0.1

Steps to reproduce

I was able to write a script to reproduce the behaviour:

require 'kafka'

kafka = Kafka.new(['localhost:9092'])

topics = ["asset_value", "asset_values"]

topics.each do |topic|
  kafka.create_topic(topic) rescue Kafka::TopicAlreadyExists
  kafka.create_partitions_for(topic, num_partitions: 32) rescue Kafka::InvalidPartitions
end

kafka.consumer(group_id: "asset-values-consumer")
kafka.consumer(group_id: "asset-value-consumer")

The corresponding racecar consumer:

class AssetValuesConsumer < Racecar::Consumer
  subscribes_to "asset_values"

  def process(message)
    puts message
  end
end

The first consumer works fine:

$ bundle exec racecar AssetValuesConsumer
[...]

As soon as the second consumer is running (and causing a rebalance of the group) the error is logged:

$ bundle exec racecar AssetValuesConsumer

[...]

{}:] Error sending heartbeat: Kafka::RebalanceInProgress

The issue gets logged again after one of the both consumer is shut down.

Expected outcome

Either the Kafka::RebalanceInProgress exception doesn't pop up or the message isn't logged as error.

Actual outcome

The Kafka::RebalanceInProgress exception gets logged as error.

The text was updated successfully, but these errors were encountered:

dasch · 2020-08-03T12:45:33Z

One could argue that warn would suffice here. It is important for debugging purposes to be able to monitor this stuff, though, as consumer groups can get into weird states.

marcoroth · 2020-08-07T12:58:40Z

Thank you for getting back @dasch!

Is there a specific reason why it's being logged as error then? I understand that it makes sense to log it for debugging and traceability reasons. But somehow it seems to me that a rather "normal" scenario - which happens quite often - is logged as error while it isn't really one 😊

Do you see any consequences or do you have any concerns when we would lower that log level to warn in this case?

Thanks a lot!

dasch · 2020-08-07T13:58:38Z

If you write a PR that changes it to warn I'll merge it 😄

Resolves zendesk#847

marcoroth · 2020-08-08T20:16:49Z

Sure! I just put up #858. Let me know if I should change something 😊

marcoroth added a commit to marcoroth/ruby-kafka that referenced this issue Aug 8, 2020

log RebalanceInProgress in ConsumerGroup#heartbeat as warning

6a864e0

Resolves zendesk#847

marcoroth mentioned this issue Aug 8, 2020

Log RebalanceInProgress in ConsumerGroup#heartbeat as warning #858

Merged

dasch closed this as completed in #858 Aug 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error sending heartbeat: Kafka::RebalanceInProgress #847

Error sending heartbeat: Kafka::RebalanceInProgress #847

marcoroth commented Jul 1, 2020

dasch commented Aug 3, 2020

marcoroth commented Aug 7, 2020

dasch commented Aug 7, 2020

marcoroth commented Aug 8, 2020

Error sending heartbeat: Kafka::RebalanceInProgress #847

Error sending heartbeat: Kafka::RebalanceInProgress #847

Comments

marcoroth commented Jul 1, 2020

Versions

Steps to reproduce

Expected outcome

Actual outcome

dasch commented Aug 3, 2020

marcoroth commented Aug 7, 2020

dasch commented Aug 7, 2020

marcoroth commented Aug 8, 2020