Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list_consumer_group_offsets is always 10x slower that origin kafka-consumer-groups.sh --describe #1798

Closed
jdxin0 opened this issue May 5, 2019 · 6 comments

Comments

@jdxin0
Copy link

jdxin0 commented May 5, 2019

list_consumer_group_offsets is always 10x slower that origin kafka-consumer-groups.sh --describe.

bin/kafka-consumer-groups.sh --bootstrap-server 127.0.0.1:9092 --describe --group test
admin_client = KafkaAdminClient(bootstrap_servers=["127.0.0.1:9092"])
admin_client.list_consumer_group_offsets('test')

kafka version 1.1
kafka-python version 1.4.6

@jdxin0 jdxin0 closed this as completed May 5, 2019
@jdxin0 jdxin0 reopened this May 20, 2019
@jeffwidman
Copy link
Collaborator

If you list multiple consumer groups, this may help: #1807

If you're just doing a single one, then keep in mind that IIRC, version 1.1 doesn't use the admin APIs but instead internal scala code and that may short-circuit some extra checks/code in the broker... I'm not sure.

I would normally expect the Java code to be slightly faster than Python, but not a huge amount for this trivial of thing.

Can you show timings to illustrate the 10x slowness?

@jdxin0
Copy link
Author

jdxin0 commented May 22, 2019

I have almost ten kafka clusters to manage, it all seem the same.

Here is the performance of one 3-nodes cluster with version 1.1.
This cluster has 24 consumer_groups.

list_consumer_groups

kakfa-python

code

admin_client = KafkaAdminClient(bootstrap_servers='bootstrap_servers')
start = time.time()
print(len(admin_client.list_consumer_groups()))
end = time.time()
print(end-start)

output

24
60.0907678604

kafka java client

command

time bin/kafka-consumer-groups.sh --bootstrap-server bootstrap_servers --list

output

real	0m1.957s
user	0m1.536s
sys	0m0.108s

describe_consumer_groups

kakfa-python

code

admin_client = KafkaAdminClient(bootstrap_servers='bootstrap_servers')
start = time.time()
print(len(admin_client.describe_consumer_groups(['group'])))
end = time.time()
print(end-start)

output

1
30.0456149578

kafka java client

command

time bin/kafka-consumer-groups.sh --bootstrap-server bootstrap_servers --describe --group group

output

real	0m1.948s
user	0m1.780s
sys	0m0.128s

list_consumer_group_offsets

kakfa-python

code

admin_client = KafkaAdminClient(bootstrap_servers='bootstrap_servers')
start = time.time()
print(len(admin_client.list_consumer_group_offsets('group')))
end = time.time()
print(end-start)

output

3
30.0500910282

@jeffwidman
Copy link
Collaborator

Hmm... I'm curious whether the slowdown is in the initial bootstrapping of the client (fetching metadata, identifying broker version, etc) or in the actual calls.

Could you try running a different KafkaAdminClient command right after you instantiate it and then start your timing? That way you're measuring a fully-bootstrapped client.

To be clear, not saying this is a good thing, I would expect python to be a little slower, but not this much slower, and I'm curious where the source of the slowdown is.

Also, try setting the api_version param so that KafkaAdminClient doesn't have to probe the cluster to figure it out... see if that speeds things up.

@jeffwidman
Copy link
Collaborator

@jdxin0 can you try this on master?

I saw absolutely massive performance increases when fetching consumer offsets after #1823... details in #1823 (comment).

If you're still seeing the issue on latest master, I'm happy to re-open, but I suspect this should be solved for you...

@jdxin0
Copy link
Author

jdxin0 commented Jun 25, 2019

@jeffwidman
I have tried the latest master version. This problem is solved .
Good work!

@nbommu1
Copy link

nbommu1 commented Aug 20, 2019

it works fine with single broker but same slow response list_consumer_groups() if it is cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants