Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for on-demand topic metadata fetch #1541

Merged
merged 1 commit into from
Aug 31, 2018
Merged

add support for on-demand topic metadata fetch #1541

merged 1 commit into from
Aug 31, 2018

Conversation

andyxning
Copy link
Contributor

@andyxning andyxning commented Jul 13, 2018

This PR adds support for on-demand topic metadata fetching. For now, a KafkaClient will fetch metadata for all topics in the initialization. This will have some performance problem when run with a large Kafka cluster.

Adding a topics (type set) key-value parameter to KafkaProducer. This should be all the topics that send will be invoked on. For example, if send will be invoked on send('a', ...) and send('b', ...), then the value for topics will be {'a', 'b'}.


This change is Reviewable

@andyxning
Copy link
Contributor Author

@dpkp
Copy link
Owner

dpkp commented Jul 14, 2018

I see the problem you're trying to solve and understand your approach here, but I'm not excited about the additional conceptual complexity here. I think the broker version probing issue could be avoided for now by setting an explicit api_version . And I wonder if we should just stop asking for topic metadata during _bootstrap at all. The java client, for example, "bootstraps" locally by creating fake broker objects derived from the bootstrap.servers list. As I understand the official client, it does not actually make a MetadataRequest until a Consumer or Producer action requires it for a specific topic (or all topics when consuming with a pattern match, etc). So I think it might be better to consider following this path. As a first step we could not request any topics when that is supported (MetadataRequest v1+). And then later we might think about how we could change the bootstrap process entirely to not make a MetadataRequest and just populate internal data structures based on the bootstrap_servers data. What do you think?

@andyxning
Copy link
Contributor Author

andyxning commented Jul 16, 2018

And I wonder if we should just stop asking for topic metadata during _bootstrap at all. The java client, for example, "bootstraps" locally by creating fake broker objects derived from the bootstrap.servers list. As I understand the official client, it does not actually make a MetadataRequest until a Consumer or Producer action requires it for a specific topic (or all topics when consuming with a pattern match, etc). So I think it might be better to consider following this path.

This is a good question. AFAIK, sarama will not request any topic metadata before a real producer send operation. I like the lazy initialization.

I think the broker version probing issue could be avoided for now by setting an explicit api_version

This is not a doable way to resolve this problem. This just works around it. In case we operate many different Kafka version clusters, it is hard to let the users set api_version correctly themselves.

As a first step we could not request any topics when that is supported (MetadataRequest v1+). And then later we might think about how we could change the bootstrap process entirely to not make a MetadataRequest and just populate internal data structures based on the bootstrap_servers data. What do you think?

I agree with this approach. But it may take a long time, imo.

but I'm not excited about the additional conceptual complexity here.

Yes. i also think it is not so grace. But it is implemented optionally and have no backward incompatibility. I think it can resolve the problem quickly and even later we support to no do any metadata prefetching, it should be alright and no backward incompatibility will be made.

@dpkp WDYT?

@andyxning
Copy link
Contributor Author

Ping @dpkp

1 similar comment
@andyxning
Copy link
Contributor Author

Ping @dpkp

@andyxning
Copy link
Contributor Author

ping @dpkp

@dpkp
Copy link
Owner

dpkp commented Jul 20, 2018 via email

@andyxning
Copy link
Contributor Author

@dpkp Big thanks for your answer and waiting for your comments. :)

@andyxning
Copy link
Contributor Author

Ping @dpkp

1 similar comment
@andyxning
Copy link
Contributor Author

Ping @dpkp

@andyxning
Copy link
Contributor Author

Sorry to periodically ping on this PR. @dpkp

@andyxning
Copy link
Contributor Author

Ping @dpkp

2 similar comments
@andyxning
Copy link
Contributor Author

Ping @dpkp

@andyxning
Copy link
Contributor Author

Ping @dpkp

@dpkp
Copy link
Owner

dpkp commented Aug 28, 2018

If we're going to land this I think we should name the config something more specific. Perhaps bootstrap_topics_filter .

@andyxning
Copy link
Contributor Author

@dpkp Done. Please take a look at this.

Copy link
Owner

@dpkp dpkp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 3 files at r1, 2 of 2 files at r2.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@dpkp dpkp merged commit a7d3063 into dpkp:master Aug 31, 2018
@andyxning andyxning deleted the add_support_for_on-demand_topic_metadata_fetch branch September 1, 2018 04:08
@andyxning
Copy link
Contributor Author

@dpkp Big thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants