Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low ops due high latencies when CL=ALL #98

Open
soyacz opened this issue Jul 4, 2024 · 2 comments
Open

Low ops due high latencies when CL=ALL #98

soyacz opened this issue Jul 4, 2024 · 2 comments
Assignees

Comments

@soyacz
Copy link

soyacz commented Jul 4, 2024

I tried cql-stress in performance test, during preload cluster phase we don't throttle anything to achieve max throughput. Example command executed:
cql-stress-cassandra-stress write no-warmup cl=ALL n=162500000 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate threads=400 -col 'size=FIXED(128) n=FIXED(8)' -pop seq=1..162500000
this is repeated across 4 loaders on c6i.2xlarge machines.
The outcome is very unsatisfying, as db load is not saturated and this preload stage takes much longer than using cassandra-stress (cluster reaches ~60kops on average, while capable more than double of that).
From the other hand, when running with throttling (again across 4 loaders), we reach desired ops value:
cassandra-stress write no-warmup cl=QUORUM duration=2850m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate 'threads=250 fixed=20332/s' -col 'size=FIXED(128) n=FIXED(8)' -pop 'dist=gauss(1..650000000,325000000,9750000)'
See graphs showing first preload stage and then throttled one:
image

There's also another possibility - seq dist used in preload vs gauss in throttled stage or CL=ALL affecting - adapt title if my guess was wrong.

More details about this run:

Packages

Scylla version: 6.1.0~dev-20240625.c80dc5715668 with build-id bf0032dbaafe5e4d3e01ece0dcb7785d2ec7a098

Kernel Version: 5.15.0-1063-aws

Installation details

Cluster size: 3 nodes (i3en.2xlarge)

Scylla Nodes used in this run:

  • elasticity-test-ubuntu-db-node-ab781f2c-6 (52.48.240.173 | 10.4.2.155) (shards: 7)
  • elasticity-test-ubuntu-db-node-ab781f2c-5 (54.246.192.181 | 10.4.0.33) (shards: 7)
  • elasticity-test-ubuntu-db-node-ab781f2c-4 (54.75.65.230 | 10.4.1.6) (shards: 7)
  • elasticity-test-ubuntu-db-node-ab781f2c-3 (3.254.170.189 | 10.4.3.246) (shards: 7)
  • elasticity-test-ubuntu-db-node-ab781f2c-2 (54.194.233.63 | 10.4.3.3) (shards: 7)
  • elasticity-test-ubuntu-db-node-ab781f2c-1 (3.250.192.245 | 10.4.1.166) (shards: 7)

OS / Image: ami-09006ca344092e50b (aws: undefined_region)

Test: elasticity-test
Test id: ab781f2c-b3fe-4294-b3bc-83fcfe105c2d
Test name: scylla-staging/lukasz/elasticity-test
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor ab781f2c-b3fe-4294-b3bc-83fcfe105c2d
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs ab781f2c-b3fe-4294-b3bc-83fcfe105c2d

Logs:

Jenkins job URL
Argus

@soyacz
Copy link
Author

soyacz commented Jul 4, 2024

possbile cause is also high latencies for CL=ALL:
image

@soyacz
Copy link
Author

soyacz commented Jul 4, 2024

analogous results for c-s:
image
image
So possibly issue is common for both c-s and cql-stress and high latencies are cause of that

But one more thing is that latencies using cql-stress are higher and wer'e a bit (couple minutes) slower than c-s.

@soyacz soyacz changed the title Low ops without throttling Low ops due high latencies when CL=ALL Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants