Low ops due high latencies when CL=ALL #98

soyacz · 2024-07-04T09:21:07Z

I tried cql-stress in performance test, during preload cluster phase we don't throttle anything to achieve max throughput. Example command executed:
cql-stress-cassandra-stress write no-warmup cl=ALL n=162500000 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate threads=400 -col 'size=FIXED(128) n=FIXED(8)' -pop seq=1..162500000
this is repeated across 4 loaders on c6i.2xlarge machines.
The outcome is very unsatisfying, as db load is not saturated and this preload stage takes much longer than using cassandra-stress (cluster reaches ~60kops on average, while capable more than double of that).
From the other hand, when running with throttling (again across 4 loaders), we reach desired ops value:
cassandra-stress write no-warmup cl=QUORUM duration=2850m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate 'threads=250 fixed=20332/s' -col 'size=FIXED(128) n=FIXED(8)' -pop 'dist=gauss(1..650000000,325000000,9750000)'
See graphs showing first preload stage and then throttled one:

There's also another possibility - seq dist used in preload vs gauss in throttled stage or CL=ALL affecting - adapt title if my guess was wrong.

More details about this run:

Packages

Scylla version: 6.1.0~dev-20240625.c80dc5715668 with build-id bf0032dbaafe5e4d3e01ece0dcb7785d2ec7a098

Kernel Version: 5.15.0-1063-aws

Installation details

Cluster size: 3 nodes (i3en.2xlarge)

Scylla Nodes used in this run:

elasticity-test-ubuntu-db-node-ab781f2c-6 (52.48.240.173 | 10.4.2.155) (shards: 7)
elasticity-test-ubuntu-db-node-ab781f2c-5 (54.246.192.181 | 10.4.0.33) (shards: 7)
elasticity-test-ubuntu-db-node-ab781f2c-4 (54.75.65.230 | 10.4.1.6) (shards: 7)
elasticity-test-ubuntu-db-node-ab781f2c-3 (3.254.170.189 | 10.4.3.246) (shards: 7)
elasticity-test-ubuntu-db-node-ab781f2c-2 (54.194.233.63 | 10.4.3.3) (shards: 7)
elasticity-test-ubuntu-db-node-ab781f2c-1 (3.250.192.245 | 10.4.1.166) (shards: 7)

OS / Image: ami-09006ca344092e50b (aws: undefined_region)

Test: elasticity-test
Test id: ab781f2c-b3fe-4294-b3bc-83fcfe105c2d
Test name: scylla-staging/lukasz/elasticity-test
Test config file(s):

perf-regression-latency-650gb-elasticity.yaml

Logs and commands

Restore Monitor Stack command: $ hydra investigate show-monitor ab781f2c-b3fe-4294-b3bc-83fcfe105c2d
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs ab781f2c-b3fe-4294-b3bc-83fcfe105c2d

Logs:

db-cluster-ab781f2c.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/db-cluster-ab781f2c.tar.gz
sct-runner-events-ab781f2c.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/sct-runner-events-ab781f2c.tar.gz
sct-ab781f2c.log.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/sct-ab781f2c.log.tar.gz
loader-set-ab781f2c.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/loader-set-ab781f2c.tar.gz
monitor-set-ab781f2c.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/monitor-set-ab781f2c.tar.gz

Jenkins job URL
Argus

The text was updated successfully, but these errors were encountered:

soyacz · 2024-07-04T12:53:24Z

possbile cause is also high latencies for CL=ALL:

soyacz · 2024-07-04T13:03:47Z

analogous results for c-s:

So possibly issue is common for both c-s and cql-stress and high latencies are cause of that

But one more thing is that latencies using cql-stress are higher and wer'e a bit (couple minutes) slower than c-s.

wprzytula assigned muzarski Jul 4, 2024

soyacz changed the title ~~Low ops without throttling~~ Low ops due high latencies when CL=ALL Jul 4, 2024

soyacz mentioned this issue Jul 8, 2024

fix(docker-loaders): add --security-opt seccomp=unconfined to command scylladb/scylla-cluster-tests#7865

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low ops due high latencies when CL=ALL #98

Low ops due high latencies when CL=ALL #98

soyacz commented Jul 4, 2024

Logs:

soyacz commented Jul 4, 2024

soyacz commented Jul 4, 2024

Low ops due high latencies when CL=ALL #98

Low ops due high latencies when CL=ALL #98

Comments

soyacz commented Jul 4, 2024

Packages

Installation details

Logs:

soyacz commented Jul 4, 2024

soyacz commented Jul 4, 2024