-
Notifications
You must be signed in to change notification settings - Fork 914
Auto Offset Reset for existing Consumer Group #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I wonder if there should be 3 options for offset config:
It seems odd that a configuration parameter that implies "reset offset" is simply ignored if an offset exists, forcing you to edit Zookeeper or use other tools (e.g. Burrow or CLI) to be able to replay a topic consumer for a given group, or having to create another group and leave stale data (false lag for former group if not deleted). |
@mikesparr I think you might be misunderstanding the meaning of auto.offset.reset. It sets the policy for resetting offsets when there are no committed offsets. The point of tracking offsets in Kafka is that you can track the progress you've made, even if all your consumer instances fail, and pick up where you left off so you don't reprocess data unnecessarily. This is actually quite important -- if you start up a consumer and you find committed offsets for the . Generally you want the reset you're talking about to be explicit. In fact, it sounds like what you actually want is to delete the group and start fresh (which you could do with Kafka's kafka-consumer-groups.sh command). You're using resetting offsets to the beginning of topic partitions as a shortcut, although you should be careful about doing that if your topic subscription list is dynamic since you may leave stale offset data in your offsets topic. If you want to do this from your application, I would suggest that you can just pass a flag to your program and use the on_assign callback to seek to the beginning, but I think we're currently missing a seek method. We'll want to add this, in which case you can do this yourself (and this is the same approach the console consumer uses for handling the --from-beginning and --offset arguments). |
The assign() method actually allows you to set the initial offset for each partition, so try something like this:
There are currently no symbolic names for the special logical offsets (stored, invalid, beginning, end,..), so you'll need to use internal values for them: -2 for beginning and -1 for end. In the above example I set it to -2 to always start consuming from the earliest/beginning offset, but you can also set it to an absolute offset (e.g., 12352) if you are storing the offsets outside Kafka. |
Thanks @ewencp and @edenhill for the detailed response and workaround example. I do understand the thinking behind the offsets and I have a special case why I want to "reuse" the group but reset. My plan is to have several Python scripts to re-consume a source topic and publish to a sink topic. The sink topic is consumed by Logstash and indexed in Elasticsearch. I want ability to reindex Elasticsearch, so I can start these scripts via Supervisor from time to time; I don't want to have to change Supervisor config and restart it because it runs other apps as well, so just being able to call command to start the script is goal. I'll be looking at Kafka Streams soon to replace the Python app, but use from time to time for quick solutions. Thanks again! |
Travis CI integration
Is there an equivalent to the
--from-beginning
flag in console consumer?I'm migrating from
kafka-python
client to this one and in some initial tests, it's not behaving as expected. I start a consumer with grouptestgroup1
consuming topictest
which I've populated with a dozen messages or so. I haveauto.offset.reset
set tosmallest
and expect it to replay the topic from beginning. The first time it plays all. I stop and restart it and it does not.I assume that it starts back again with latest offset, but am confused of the purpose of
auto.offset.reset
values. I expectearliest | smallest
to consume from beginning. I expectlargest | latest
to pick up where last offset.If I run in shell the
kafka-console-consumer
and add the--from-beginning
flag it will replay all messages every time. I checked thelibrdkafka
config documentation to see if there's an extra flag I'm missing likeoffset.store.sync.interval
thinking it has to reset or something.What am I missing or is it possible to restart the script and re-run without changing the consumer
group.id
The text was updated successfully, but these errors were encountered: