Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for offsets_for_times, beginning_offsets and end_offsets APIs. #1161

Merged
merged 6 commits into from
Aug 7, 2017

Conversation

tvoinarovskyi
Copy link
Collaborator

@tvoinarovskyi tvoinarovskyi commented Jul 30, 2017

Still needs to group by nodes and send in parallel.

consumer = self.kafka_consumer()
tp = TopicPartition(self.topic, 0)

with self.assertRaises():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to pass in the exception type expected here

@tvoinarovskyi
Copy link
Collaborator Author

Fixes #1036

@tvoinarovskyi
Copy link
Collaborator Author

@dpkp @jeffwidman Can I have another 2 eyes here, just in case.

Copy link
Collaborator

@jeffwidman jeffwidman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked it over, looks fine to me, although I'm not super familiar with the timestamps functionality as we have yet to enable that at work.

How closely does this follow the Java implementation?

kafka/conn.py Outdated
((0, 10, 1), MetadataRequest[2])
((0, 10, 1), MetadataRequest[2]),
((0, 10, 2), OffsetFetchRequest[2]),
((0, 11, 0), FetchRequest[5]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afraid I don't understand the purpose of this? I thought that from 0.10 onwards, we were just going to query the ApiVersionsRequest to identify the version and rely on that. I can't remember if there's a mapping of supported API calls somewhere, if not, is this just a hack to get around that?

Also, given that it checks in descending order, and returns the first one that works, shouldn't the highest broker version be listed first? Otherwise the comment should get tweaked.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it's not really something that was introduced in this PR, I just extended it with new checks. I did propose using ApiVersion response for min-max per-broker checks on protocol versions in #865. Here we just determine api_version like we used to in versions before, only for v0.10 and above we don't do additional requests, only check that max version for a protocol is supported.
I'll change the comment to be clear on ordering.

def _offset(self, partition, timestamp):
"""Fetch a single offset before the given timestamp for the partition.
def _retrieve_offsets(self, timestamps, timeout_ms=float("inf")):
""" Fetch offset for each partition passed in ``timestamps`` map.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I more commonly see no space between """ and the first word, as in """Fetch...

remaining_ms = timeout_ms - elapsed_ms

raise Errors.KafkaTimeoutError(
"Failed to get offsets by times in %s ms" % timeout_ms)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about replacing "times" with "timestamp"?

The word "times" in English has several meanings, normally it's obvious from context, but here there's just a hint of ambiguity for those less familiar with Kafka...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I don't mind. But the API offsets_for_times should probably be left as is to match offsetsForTimes in Java client

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree to copy Java client.

@@ -861,6 +861,48 @@ def metrics(self, raw=False):
metrics[k.group][k.name] = v.value()
return metrics

def offsets_for_times(self, timestamps):
"""
Look up the offsets for the given partitions by timestamp. The returned
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look up the offsets for the given partitions...

This confuses me because "given" implies a list of partitions is passed in, but that's not present in the method args. If the partitions are attached to the class, then maybe use wording other than "given" to describe the partitions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamps is a map {TopicPartition: int}. The wording was taken from https://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#offsetsForTimes(java.util.Map), I think it's OK...

partition.

Note:
Notice that this method may block indefinitely if the partition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Notice that" is superfluous

brokers = '%s:%d' % (self.server.host, self.server.port)
producer = KafkaProducer(
bootstrap_servers=brokers, **configs)
return producer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this producer should just get moved to a pytest fixture. Not something that needs to be handled in this PR, but should probably create an issue saying we should clean it up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly we can't use pytest fixtures here, as it's a TestCase class. We will need to refactor those classes to simple functions to support fixtures.

Copy link
Collaborator

@jeffwidman jeffwidman Aug 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. It is possible to apply pytest fixtures at the class level in pytest, but only if it's a generic class, aka designed for pytest and not unittest. I filed #1167 in case someone wants to migrate it at some point.

@tvoinarovskyi
Copy link
Collaborator Author

tvoinarovskyi commented Aug 5, 2017

How closely does this follow the Java implementation?

Pretty much 1 to 1. I did not move all changes (update_fetch_positions is done in bulk in Java Client), want to do it as a followup PR to lower the amount of changes here

@tvoinarovskyi
Copy link
Collaborator Author

Added beggining_offsets and end_offsets APIs

@tvoinarovskyi tvoinarovskyi changed the title Added basic support for offsets_for_times API. Added support for offsets_for_times, beginning_offsets and end_offsets APIs. Aug 6, 2017
@dpkp
Copy link
Owner

dpkp commented Aug 7, 2017

Sorry... I created a merge conflict :(

@tvoinarovskyi tvoinarovskyi force-pushed the issue1036_offset_by_time branch from 61668aa to 55ded55 Compare August 7, 2017 09:47
@tvoinarovskyi tvoinarovskyi merged commit 8cf4484 into master Aug 7, 2017
@tvoinarovskyi tvoinarovskyi deleted the issue1036_offset_by_time branch August 7, 2017 10:34
@jeffwidman
Copy link
Collaborator

🍰

@Jiayi-Liao
Copy link

Thanks ~
I was trying to add the same function, but I found that it has been finished when I pulled codes from master.
LOL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants