-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixed shard repair reproducer #8435
base: master
Are you sure you want to change the base?
Conversation
05d9461
to
3060c15
Compare
I am not familiar with the SCT code, but the description looks good to me. |
3060c15
to
7f38a55
Compare
|
7f38a55
to
9502ed2
Compare
|
9502ed2
to
d09d884
Compare
|
4b62220
to
13c631d
Compare
master-60-59-58 |
poc2-60-59-58
decoded:
|
3f3986d
to
b3929c0
Compare
7fca2ff
to
a948bb9
Compare
@Deexie How did you execute the new sct test introduced in this PR? Do you run through Jenkins? Could you share the details? |
@pehala please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we have db_nodes_shards_selection option, that we use in asymetrical test cases, would this work here? Or could be extended to work here, without so many changes in the core?
With this option, we can have different numbers of shards, but they are taken randomly. Here, we can specify the exact number of shards per node. If we reuse Originally, it was done for AWS only. Maybe that's a good way. |
Please note that "features" kind of tests are tests that aren't being triggered regularly (especially not automatically). Till now we had the asymmetrical longevities to exercise this path, it's obviously not enough since it didn't detect the issue we had in the field. |
@Deexie what is the status of this PR? |
@Deexie what is the status here? |
I'm getting back to it. Currently, the PR contains a feature that enables setting the shard number for each server and the test that was used in the mixed shard issue. I do not see how to achieve what's tested here without the custom shard num feature, nor how to make it a regression test that runs periodically.
@pehala please see my response above (#8435 (comment)). Do you think that the change may get in as is? Does it need additional testing? Should I run it with each backend and check whether the number of cores is as specified? I don't think we can go with
@roydahan This test wasn't meant to run periodically. The bug was examined based on metrics and logs. I don't know how to convert this into longevity. |
Maybe one simple way is to change the current "asymmetric" longevities configuration to use "nodes_smp: [X, Y, Z]" instead of the current random, with number of smp that we think will stress this feature the most. |
sdcm/sct_config.py
Outdated
@@ -499,6 +499,9 @@ class SCTConfiguration(dict): | |||
In case of random option - Scylla will start with different (random) shards on every node of the cluster | |||
"""), | |||
|
|||
dict(name="nodes_smp", env="SCT_NODES_SMP", type=list, | |||
help="List of shard numbers of nodes in Scylla cluster; list of int, like [4, 5, 3]"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help="List of shard numbers of nodes in Scylla cluster; list of int, like [4, 5, 3]"), | |
help="List of shard number to set per node in Scylla cluster; list of int, like [4, 5, 3]"), |
I wonder how it would work with multi-dc cases:
region_name: 'eu-west-1 us-east-1'
n_db_nodes: '2 1'
nodes_smp: [12, 12, 15]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number is based on node_index and I think it does not depend on dc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
- we might be able to name a bit better the configuration option
- arguments shouldn't be mutable
d6babe8
to
3ba3fec
Compare
|
3ba3fec
to
3d55f23
Compare
|
@Deexie new branch |
Add custom shard number config for Scylla clusters.
…es with custom shard number Copy asimetric jenkins longevity pipelines and set custom shard number for them.
3d55f23
to
8482007
Compare
|
Reproducer for mixed shard repair to choose the best solution for scylladb/scylladb#18269.
Sets up a 3-node cluster on AWS with 1TB of data and runs repair.
It will be run with jenkins with the following configurations: