metadata refresh is really expensive! #786

rukai · 2023-08-08T02:49:08Z

I am using scylla-rust-driver as the driver in a cassandra benchmark.
When that benchmark reaches 60s passed, throughput roughly halves for a few seconds and I have tracked this down to the metadata refresh.
I am able to eliminate the loss in throughput by changing this to a very large number : https://github.com/scylladb/scylla-rust-driver/blob/4efc84dfbc7bb204b49a8564378537e35cfe3ad1/scylla/src/transport/cluster.rs#L485`

I would like to raise two issues as a result:

issue 1

The metadata refresh should be made more performant.
I did a quick investigation and found it was MetadataReader::read_metadata that was impacting throughput.
The atomic swapping of metadata results seems to be working fine as removing the swap did not improve throughput in anyway.
I'm not sure if the cause is cassandra slowing down, the client slowing down due to running queries or the client slowing down due to processing results of queries, or a combination of these or something else entirely.

So I think it would be a good idea for the scylla-rust-driver team to give this a thorough investigation as it seems like this would cause a dip in production performance every 60s.
However for the needs of my project I think all I will need is what I describe in issue 2

issue 2

We need a way to disable and/or change the timing of the metadata refresh.
As I am writing a benchmark from which I can guarantee no other client is altering the schema or topology, I would like a way to completely disable such background work so I can evaluate the average throughput alone.

The text was updated successfully, but these errors were encountered:

rishabharyal · 2023-08-08T11:24:43Z

Hi @rukai There is a PR active to manually set the topology refresh interval.
#776

However, other changes are not addressed in that pull request.

rukai · 2023-08-08T11:56:24Z

Oh, sorry for duplicating your pr, I'll go ahead and close mine.

It would make semantic sense to be able to set None in order to completely disable the refresh but I can always just set the value to 1000000 for effectively the same result.

avelanarius · 2023-08-11T11:34:51Z

@rukai Could you describe the setup of your benchmark in more detail?

I'm not able to reproduce the throughput dip around 60s. For my testing, I'm using cql-stress which uses Rust Driver, ./target/release/cql-stress-scylla-bench -mode write -workload sequential -nodes "127.0.0.1:9042" --partition-count 1000000. I tried it with both Scylla 5.4.0 (master) and Cassandra 3.11.15 and I don't see a throughput dip:

Logs

$ ./target/release/cql-stress-scylla-bench -mode write -workload sequential -nodes "127.0.0.1:9042" --partition-count 1000000
Configuration
Mode:			 write
Workload:		 sequential
Timeout:		 5.0s
Consistency level:	 quorum
Partition count:	 1000000
Clustering rows:	 100
Clustering row size:	 Fixed(4)
Rows per request:	 1
Page size:		 1000
Concurrency:		 16
Maximum rate:		 unlimited
Client compression:	 true
time        ops/s  rows/s errors    max 99.9th   99th   95th   90th median   mean
1.0s        62790   62790      0 1.60ms  716μs  442μs  354μs  326μs  248μs  254μs
2.0s        70961   70961      0 36.1ms  549μs  322μs  290μs  274μs  214μs  225μs
3.0s        70589   70589      0 30.9ms  424μs  331μs  295μs  278μs  217μs  226μs
4.0s        70173   70173      0 22.7ms  414μs  331μs  297μs  280μs  220μs  227μs
5.0s        71800   71800      0 23.7ms  934μs  396μs  291μs  272μs  212μs  222μs
6.0s        74249   74249      0 1.95ms  412μs  317μs  287μs  271μs  212μs  215μs
7.0s        71489   71489      0 29.4ms  351μs  316μs  288μs  273μs  215μs  223μs
8.0s        75378   75378      0  718μs  349μs  313μs  284μs  268μs  209μs  211μs
9.0s        71557   71557      0 36.0ms  820μs  337μs  287μs  271μs  211μs  223μs
10.0s       72476   72476      0 35.0ms  388μs  315μs  284μs  269μs  210μs  220μs
11.0s       74803   74803      0 1.46ms  358μs  313μs  284μs  269μs  211μs  213μs
12.0s       72413   72413      0 34.3ms  369μs  319μs  287μs  270μs  210μs  220μs
13.0s       74634   74634      0 1.24ms  359μs  314μs  285μs  270μs  212μs  213μs
14.0s       71706   71706      0 35.6ms  370μs  316μs  287μs  272μs  213μs  222μs
15.0s       71063   71063      0 34.0ms  755μs  346μs  290μs  273μs  213μs  224μs
16.0s       73064   73064      0 1.45ms  397μs  323μs  292μs  276μs  216μs  218μs
17.0s       72503   72503      0 33.4ms  368μs  316μs  285μs  270μs  210μs  220μs
18.0s       73485   73485      0 1.34ms  352μs  316μs  287μs  272μs  213μs  214μs
19.0s       71586   71586      0 34.8ms  411μs  332μs  294μs  276μs  215μs  225μs
20.0s       70933   70933      0 34.2ms  547μs  327μs  290μs  274μs  214μs  225μs
21.0s       72102   72102      0  515μs  375μs  325μs  296μs  281μs  219μs  221μs
22.0s       70153   70153      0 32.6ms  880μs  338μs  295μs  278μs  216μs  227μs
23.0s       72741   72741      0 1.35ms  352μs  311μs  284μs  270μs  211μs  213μs
24.0s       71337   71337      0 34.5ms  403μs  329μs  297μs  281μs  220μs  230μs
25.0s       71712   71712      0 35.0ms  378μs  317μs  288μs  272μs  212μs  222μs
26.0s       73984   73984      0 1.85ms  463μs  323μs  290μs  273μs  213μs  215μs
27.0s       72069   72069      0 35.8ms  366μs  316μs  285μs  270μs  212μs  221μs
28.0s       72660   72660      0 34.0ms  354μs  312μs  283μs  268μs  210μs  219μs
29.0s       72374   72374      0 1.59ms  870μs  344μs  292μs  276μs  216μs  220μs
30.0s       72031   72031      0 33.9ms  375μs  319μs  288μs  271μs  212μs  221μs
31.0s       74030   74030      0 1.84ms  488μs  326μs  289μs  273μs  212μs  215μs
32.0s       72400   72400      0 33.3ms  369μs  315μs  285μs  270μs  210μs  220μs
33.0s       71191   71191      0 34.8ms  382μs  322μs  290μs  274μs  214μs  224μs
34.0s       73433   73433      0  573μs  349μs  316μs  289μs  274μs  215μs  217μs
35.0s       70170   70170      0 37.4ms  781μs  347μs  293μs  276μs  215μs  227μs
36.0s       73203   73203      0 1.36ms  356μs  318μs  290μs  275μs  216μs  218μs
37.0s       71726   71726      0 34.5ms  440μs  319μs  286μs  271μs  213μs  222μs
38.0s       70500   70500      0 34.3ms  373μs  324μs  292μs  276μs  217μs  226μs
39.0s       74087   74087      0 1.43ms  367μs  319μs  288μs  272μs  213μs  215μs
40.0s       68335   68335      0 35.3ms  403μs  338μs  302μs  285μs  223μs  233μs
41.0s       68565   68565      0 27.1ms  412μs  336μs  302μs  286μs  225μs  232μs
42.0s       70874   70874      0 22.8ms 1.15ms  482μs  294μs  275μs  212μs  225μs
43.0s       71805   71805      0 28.0ms  471μs  322μs  289μs  273μs  214μs  222μs
44.0s       73421   73421      0 1.39ms  384μs  328μs  293μs  276μs  215μs  217μs
45.0s       72422   72422      0 33.7ms  361μs  314μs  285μs  269μs  211μs  220μs
46.0s       74150   74150      0 3.16ms  384μs  315μs  287μs  272μs  212μs  215μs
47.0s       72288   72288      0 35.0ms  368μs  313μs  284μs  269μs  211μs  220μs
48.0s       71578   71578      0 33.5ms  361μs  317μs  287μs  272μs  214μs  223μs
49.0s       73275   73275      0 2.47ms  880μs  333μs  290μs  273μs  213μs  217μs
50.0s       71695   71695      0 33.4ms  360μs  314μs  286μs  271μs  213μs  222μs
51.0s       74182   74182      0 1.34ms  356μs  316μs  288μs  272μs  213μs  215μs
52.0s       71606   71606      0 34.6ms  369μs  319μs  289μs  273μs  213μs  223μs
53.0s       72331   72331      0 36.0ms  360μs  313μs  284μs  268μs  211μs  220μs
54.0s       73054   73054      0  679μs  405μs  322μs  291μs  276μs  216μs  218μs
55.0s       70644   70644      0 34.3ms  686μs  346μs  292μs  275μs  213μs  226μs
56.0s       71554   71554      0 33.2ms  674μs  319μs  289μs  273μs  213μs  223μs
57.0s       74404   74404      0 1.30ms  351μs  314μs  286μs  271μs  212μs  214μs
58.0s       71700   71700      0 33.1ms  375μs  318μs  288μs  273μs  213μs  222μs
59.0s       72182   72182      0 1.68ms  384μs  326μs  296μs  280μs  219μs  221μs
1m0.0s      71219   71219      0 32.2ms  433μs  323μs  290μs  274μs  214μs  224μs
1m1.0s      71140   71140      0 33.2ms  379μs  319μs  289μs  274μs  215μs  224μs
1m2.0s      72428   72428      0 5.56ms  886μs  342μs  294μs  277μs  216μs  220μs
1m3.0s      71487   71487      0 32.4ms  373μs  322μs  289μs  273μs  214μs  223μs
1m4.0s      72515   72515      0 1.54ms  368μs  323μs  294μs  278μs  218μs  220μs
1m5.0s      71244   71244      0 32.8ms  388μs  327μs  292μs  276μs  214μs  224μs
1m6.0s      71459   71459      0 35.3ms  388μs  319μs  289μs  273μs  213μs  223μs
1m7.0s      73278   73278      0 1.30ms  380μs  322μs  292μs  276μs  215μs  217μs
1m8.0s      71029   71029      0 33.5ms  761μs  338μs  291μs  274μs  213μs  224μs

rukai · 2023-08-18T00:13:39Z

I tested my bencher again and observed that it doesnt happen locally, only when run on AWS.
Maybe the smaller node size, or the extra latency is the cause?
But I'll do some more investigation myself, when I get the chance.

I have a 3 node cassandra cluster running on 3 aws m6a.large instances
I have a bencher running on aws m6a.large instance.

You should be able to reproduce with:

git clone https://github.com/shotover/shotover-proxy
cd shotover-proxy
cargo windsock --cloud --name cassandra,compression=none,driver=scylla,operation=write_blob,protocol=v4,shotover=none,topology=cluster3 --bench-length-seconds 60

BIG WARNING THOUGH: this will create amazon EC2 instances if you have AWS credentials setup on your machine.
It will attempt to clean up after itself but you should make sure that it succeeds, possibly running cargo windsock --cleanup-cloud-resources to force a cleanup if it panics midway through.
If that sounds scary, fair enough, maybe just setup your own bench manually on cloud infrastructure and see if you can reproduce that way.

The throughput drop happens at about 42s into the bench since the driver is started before benching starts.

rukai mentioned this issue Aug 8, 2023

add topology_refresh_interval config #787

Closed

8 tasks

Lorak-mmk self-assigned this Nov 15, 2023

wprzytula mentioned this issue Apr 11, 2024

Planned API breaking changes - umbrella issue #979

Open

This was referenced Jun 6, 2024

Metadata: don't refresh periodically by default. #1008

Open

Metadata API changes - umbrella issue #1010

Open

wprzytula added the area/metadata label Jul 9, 2024

Lorak-mmk added this to the 1.x.0 milestone Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metadata refresh is really expensive! #786

metadata refresh is really expensive! #786

rukai commented Aug 8, 2023 •

edited

Loading

rishabharyal commented Aug 8, 2023 •

edited

Loading

rukai commented Aug 8, 2023

avelanarius commented Aug 11, 2023

rukai commented Aug 18, 2023

metadata refresh is really expensive! #786

metadata refresh is really expensive! #786

Comments

rukai commented Aug 8, 2023 • edited Loading

issue 1

issue 2

rishabharyal commented Aug 8, 2023 • edited Loading

rukai commented Aug 8, 2023

avelanarius commented Aug 11, 2023

rukai commented Aug 18, 2023

rukai commented Aug 8, 2023 •

edited

Loading

rishabharyal commented Aug 8, 2023 •

edited

Loading