-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health checks take more than 2 hours running on 60 DB nodes setup after each nemesis, even skipped ones #9547
Comments
|
maybe we could introduce 'fast healthcheck'? Shouldn't be it quick using raft? |
I don't know what it means ? not use nodetool ? just examine group0 on one node ? or multiple nodes ? |
something like that - just group0, or do it in parallel on all nodes. @aleksbykov can you suggest some fast&reliable way of using raft to quickly verify cluster health? |
Packages
Scylla version:
2024.3.0~dev-20241209.b5f1d87f3e83
with build-ida322e4f0d7b174dd5052eb3992c8e459d1a03b7a
Kernel Version:
6.8.0-1019-aws
Issue description
Running 5dc, 60 DB nodes test the healch checks take more than 2 hours after each of the nemesis:
Moreover, redundant health checks cycle gets runs even if nemesis was skipped.
See Argus screenshot:
Impact
Significant waste of a test run time.
How frequently does it reproduce?
1/1
Installation details
Cluster size: 60 nodes (i3en.large)
Scylla Nodes used in this run:
OS / Image:
ami-09e71469bd2c21908 ami-0da21ef58bb231de7 ami-0201515e28dca41b1 ami-0fd8175c8145eb79f ami-03ff16ab9428aadda
(aws:eu-central-1, eu-north-1, eu-west-1, eu-west-2, us-east-1
)Test:
vp-longevity-aws-custom-d2-workload1-multidc-big
Test id:
54f56c8f-465d-4f59-8ba8-4829871ccff3
Test name:
scylla-staging/valerii/vp-longevity-aws-custom-d2-workload1-multidc-big
Test method:
longevity_test.LongevityTest.test_custom_time
Test config file(s):
Logs and commands
$ hydra investigate show-monitor 54f56c8f-465d-4f59-8ba8-4829871ccff3
$ hydra investigate show-logs 54f56c8f-465d-4f59-8ba8-4829871ccff3
Logs:
Jenkins job URL
Argus
The text was updated successfully, but these errors were encountered: