Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error sending heartbeat: Kafka::RebalanceInProgress #847

Closed
marcoroth opened this issue Jul 1, 2020 · 4 comments · Fixed by #858
Closed

Error sending heartbeat: Kafka::RebalanceInProgress #847

marcoroth opened this issue Jul 1, 2020 · 4 comments · Fixed by #858

Comments

@marcoroth
Copy link
Contributor

Hey there

We are using racecar to write our Kafka consumers. While running them in staging and production we noticed this exception multiple times for a longer period of time in our logs:

{}:] Error sending heartbeat: Kafka::RebalanceInProgress

After debugging a little bit it seems like that the exception is raised here in the ruby-kafka library. Also all the code for handling the heartbeat is here - that's why I opened this issue in this repo.

The RebalanceInProgress exception is getting catched and logged here:

rescue ConnectionError, UnknownMemberId, RebalanceInProgress, IllegalGeneration => e
@logger.error "Error sending heartbeat: #{e}"
raise HeartbeatError, e

After testing for a while we figured out that there is no "issue" or miss-behaviour if that exception is thrown - but it is logged as an error anyway. The exception gets thrown every time a consumer joins or leaves a consumer-group => every time the group is rebalancing.

My question now is: Is this really an error which needs to get logged on the error log-level or is there a negative side-effect we are not seeing/experiencing and should handle while a rebalance is in progress.

I haven't seen an option to change the behaviour of the heartbeats. The only option I see is the heartbeat_interval config. But changing the value has no effect - the exceptions are thrown instantly.

Versions
  • Version of Ruby: 2.6.5
  • Version of Kafka: 2.5.0
  • Version of ruby-kafka: 0.7.10
  • Version of racecar: 1.0.1
Steps to reproduce

I was able to write a script to reproduce the behaviour:

require 'kafka'

kafka = Kafka.new(['localhost:9092'])

topics = ["asset_value", "asset_values"]

topics.each do |topic|
  kafka.create_topic(topic) rescue Kafka::TopicAlreadyExists
  kafka.create_partitions_for(topic, num_partitions: 32) rescue Kafka::InvalidPartitions
end

kafka.consumer(group_id: "asset-values-consumer")
kafka.consumer(group_id: "asset-value-consumer")

The corresponding racecar consumer:

class AssetValuesConsumer < Racecar::Consumer
  subscribes_to "asset_values"

  def process(message)
    puts message
  end
end

The first consumer works fine:

$ bundle exec racecar AssetValuesConsumer
[...]

As soon as the second consumer is running (and causing a rebalance of the group) the error is logged:

$ bundle exec racecar AssetValuesConsumer

[...]

{}:] Error sending heartbeat: Kafka::RebalanceInProgress

The issue gets logged again after one of the both consumer is shut down.

Expected outcome

Either the Kafka::RebalanceInProgress exception doesn't pop up or the message isn't logged as error.

Actual outcome

The Kafka::RebalanceInProgress exception gets logged as error.

@dasch
Copy link
Contributor

dasch commented Aug 3, 2020

One could argue that warn would suffice here. It is important for debugging purposes to be able to monitor this stuff, though, as consumer groups can get into weird states.

@marcoroth
Copy link
Contributor Author

Thank you for getting back @dasch!

Is there a specific reason why it's being logged as error then? I understand that it makes sense to log it for debugging and traceability reasons. But somehow it seems to me that a rather "normal" scenario - which happens quite often - is logged as error while it isn't really one 😊

Do you see any consequences or do you have any concerns when we would lower that log level to warn in this case?

Thanks a lot!

@dasch
Copy link
Contributor

dasch commented Aug 7, 2020

If you write a PR that changes it to warn I'll merge it 😄

@marcoroth
Copy link
Contributor Author

Sure! I just put up #858. Let me know if I should change something 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants