Skip to content

OffsetOutOfRange Error #263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jimmyislive opened this issue Nov 6, 2014 · 12 comments
Closed

OffsetOutOfRange Error #263

jimmyislive opened this issue Nov 6, 2014 · 12 comments
Labels

Comments

@jimmyislive
Copy link

I initially posted this to the kafka users group, but they suggested to post it here as I am using the python client:

I recently received a OffsetOutOfRange error in production while it was functioning properly for like 4-5 days.

In the kafka broker log:

[2014-11-03 21:39:25,658] ERROR [KafkaApi-8] Error when processing fetch request for partition [activity.stream,5] offset 7475239 from consumer with correlation id 69 (kafka.server.KafkaApis)
kafka.common.OffsetOutOfRangeException: Request for offset 7475239 but we only have log segments in the range 8014610 to 10227768.

And on the client side I saw:

Nov 03 21:39:25 INFO kafka consumer.py: Commit offset 10227769 in SimpleConsumer: group=ActivityStream, topic=activity.stream, partition=5
...
...
Nov 03 21:39:26 ERROR demux.consumer_stream consumer_stream.py: OffsetOutOfRangeError(FetchResponse(topic='activity.stream', partition=5, error=1, highwaterMark=-1, messages=<generator object _decode_message_set_iter at 0x3dcf870>),)

Why would the kafka client commit offset 10227769, and then, 1 second later, turn around and ask kafka for the offset 7475239?

We are using a fork with some minor modifications of the kafka-python client.

@wizzat
Copy link
Collaborator

wizzat commented Nov 6, 2014

Every time I've seen this happen it's been my fault (as an application developer, not library maintainer). Can you share a bit more about the modifications you've made to the client and how your forks happen?

@jimmyislive
Copy link
Author

The fork is available here:

https://github.com/Livefyre/kafka-python/tree/c246cb74c498b607b14dd3dbff3fbc0f2337393d

This is the version on our production.

Production still uses a SimpleConsumer. We have made some minor enhancements to some error checking but have not changed any of the offset committing code.

@wizzat
Copy link
Collaborator

wizzat commented Nov 6, 2014

The fork I was referring to was the process forking. Though, thanks for the link to the github repo

@dpkp
Copy link
Owner

dpkp commented Nov 6, 2014

what is happening is your consumer marks an offset, waits for some time -- perhaps a sleep, perhaps a long cron process -- and during that time the kafka server rolls the log data. Meaning the place in the queue that your consumer wants to read from no longer exists (the server expired it and deleted).

The mainline Java consumer supports an auto.offset.reset policy to tell your consumer what to do when that happens. There are really only 2 options: skip to the head of the current topic/partition. Or skip to the tail of the current topic/partition. (well 3rd option is to crash)

Unfortunately this auto reset feature is missing from the current kafka SimpleConsumer. I have added support for it to the new KafkaConsumer (PR #234), but that has not merged yet and is also a totally different API.

For your current problem, I suggest wrapping your consumer calls in a try / except. Catch OffsetOutOfRangeException and do a consumer.seek(0,0) [skips to the head] or consumer.seek(0,2) [skips to the tail]

@dpkp
Copy link
Owner

dpkp commented Nov 6, 2014

i realize i may have totally answered the wrong question, if so hopefully it is at least useful for others that stumble upon this Q!

@jimmyislive
Copy link
Author

Thanks for the explanation.

We did a seek(0,0) and continued.

A few questions about your explanation:

  1. Where did the offset 7475239 come from? If the kafka logs had rolled off and deleted a bunch of old entries, the offset in zookeeper would have still been 10227769?
  2. How are the offsets renumbered today when log roll-off happens? Caus then every python client will get an OffsetOutOfRange exception whenever roll-off happens as it's commit offset in ZK will be different from that in kafka broker.
  3. Moving to the head(i.e. seek(0,0)) turns out to be a pain, as we found out. It takes a couple of days for kafka to catch up as we have rewound to the beginning and we have a lot of data. With moving to the tail i.e. seek(0,2), we could possibly loose some messages. I guess it's a tradeoff...

@wizzat
Copy link
Collaborator

wizzat commented Nov 6, 2014

Have you checked your logs for when that offset was last touched? How - exactly - does your process fork work? Again, I've seen this before but it's always boiled down to me building my app incorrectly.

On Nov 6, 2014, at 12:21, Jimmy John notifications@github.com wrote:

Thanks for the explanation.

We did a seek(0,0) and continued.

A few questions about your explanation:

Where did the offset 7475239 come from? If the kafka logs had rolled off and deleted a bunch of old entries, the offset in zookeeper would have still been 10227769?
How are the offsets renumbered today when log roll-off happens? Caus then every python client will get an OffsetOutOfRange exception whenever roll-off happens as it's commit offset in ZK will be different from that in kafka broker.
Moving to the head(i.e. seek(0,0)) turns out to be a pain, as we found out. It takes a couple of days for kafka to catch up as we have rewound to the beginning and we have a lot of data. With moving to the tail i.e. seek(0,2), we could possibly loose some messages. I guess it's a tradeoff...

Reply to this email directly or view it on GitHub.

@wizzat
Copy link
Collaborator

wizzat commented Nov 6, 2014

That is: me, as an app developer, (unwittingly via a bug) asked for that offset.

On Nov 6, 2014, at 12:21, Jimmy John notifications@github.com wrote:

Thanks for the explanation.

We did a seek(0,0) and continued.

A few questions about your explanation:

Where did the offset 7475239 come from? If the kafka logs had rolled off and deleted a bunch of old entries, the offset in zookeeper would have still been 10227769?
How are the offsets renumbered today when log roll-off happens? Caus then every python client will get an OffsetOutOfRange exception whenever roll-off happens as it's commit offset in ZK will be different from that in kafka broker.
Moving to the head(i.e. seek(0,0)) turns out to be a pain, as we found out. It takes a couple of days for kafka to catch up as we have rewound to the beginning and we have a lot of data. With moving to the tail i.e. seek(0,2), we could possibly loose some messages. I guess it's a tradeoff...

Reply to this email directly or view it on GitHub.

@slotrans
Copy link

It's also worth noting that if OffsetOutOfRangeError occurs with a MultiProcessConsumer, you can't catch it, and everything just explodes.

In my case this is preventing me from using MultiProcessConsumer at all. Any time I try to use it, including with a never-before-used consumer group identifier, which ought to have no offset to be out of range in the first place, as soon as I start iterating all the worker processes die with OffsetOutOfRangeError. The same problem afflicts SimpleConsumer, but in that case I can at least do an explicit seek to tail or head which fixes things.

@ecanzonieri
Copy link
Contributor

It would be nice if the offsets fix introduced in the new KafkaConsumer class was also available in the base Consumer class.
If there are no strong opinions against this I could prepare a pr in the next few days.

@dpkp dpkp added the consumer label Jan 13, 2015
@realmeh
Copy link

realmeh commented Jan 16, 2015

MultiProcessConsumer really need offsets fix in OffsetOutOfRangeError problem, hopes

@dpkp
Copy link
Owner

dpkp commented Mar 31, 2015

not sure what the cause is, but #296 and #356 include some possible fixes to underlying issues. the fork you are using has a lot of tweaks in it -- consumer rebalancing issues could be a cause (I see you are using mahendra's zookeeper patch!). if you can reproduce w/ a released kafka-python client, please re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants