-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metadata refresh is really expensive! #786
Comments
Oh, sorry for duplicating your pr, I'll go ahead and close mine. It would make semantic sense to be able to set None in order to completely disable the refresh but I can always just set the value to 1000000 for effectively the same result. |
@rukai Could you describe the setup of your benchmark in more detail? I'm not able to reproduce the throughput dip around 60s. For my testing, I'm using cql-stress which uses Rust Driver, Logs
|
I tested my bencher again and observed that it doesnt happen locally, only when run on AWS. I have a 3 node cassandra cluster running on 3 aws m6a.large instances You should be able to reproduce with:
BIG WARNING THOUGH: this will create amazon EC2 instances if you have AWS credentials setup on your machine. The throughput drop happens at about 42s into the bench since the driver is started before benching starts. |
I am using scylla-rust-driver as the driver in a cassandra benchmark.
When that benchmark reaches 60s passed, throughput roughly halves for a few seconds and I have tracked this down to the metadata refresh.
I am able to eliminate the loss in throughput by changing this to a very large number : https://github.com/scylladb/scylla-rust-driver/blob/4efc84dfbc7bb204b49a8564378537e35cfe3ad1/scylla/src/transport/cluster.rs#L485`
I would like to raise two issues as a result:
issue 1
The metadata refresh should be made more performant.
I did a quick investigation and found it was
MetadataReader::read_metadata
that was impacting throughput.The atomic swapping of metadata results seems to be working fine as removing the swap did not improve throughput in anyway.
I'm not sure if the cause is cassandra slowing down, the client slowing down due to running queries or the client slowing down due to processing results of queries, or a combination of these or something else entirely.
So I think it would be a good idea for the scylla-rust-driver team to give this a thorough investigation as it seems like this would cause a dip in production performance every 60s.
However for the needs of my project I think all I will need is what I describe in issue 2
issue 2
We need a way to disable and/or change the timing of the metadata refresh.
As I am writing a benchmark from which I can guarantee no other client is altering the schema or topology, I would like a way to completely disable such background work so I can evaluate the average throughput alone.
The text was updated successfully, but these errors were encountered: