-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(test-case): update 5000 tables test case configuration #9843
Conversation
List of changes: - Disable per-table metrics due to significant perf impact. - Enable cluster health checks which work with this case just fine. - Decrease the nemesis interval from 60 minutes to just 3 keeping in mind that health checks will take some time too. - Reduce stress time for each of the 5000 commands. Having 20 minutes per cmd we will get about 1.5 days long test runs instead of the 2.5 days. - Reduce number of loaders from 5 to 3 to use resources more efficiently. In current case the bottleneck is the RAM. Note that this scenario hits following bug: - scylladb/scylla-enterprise#5093 If 'destroy_data_then_repair' nemesis gets triggered aganst the setup of this scenario.
878caa3
to
6db9fa9
Compare
Is this with tablets or with vnodes? (we probably need for both and compare between them) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
test can run with both, and the issue mentioned above happened with both you can see the summerie here: but it doesn't seem like anyone is attending to the issue raised by @vponomaryov |
The 'test_longevity.py::test_test_user_batch_custom_time' unit test uses the 'test-cases/scale/longevity-5000-tables.yaml' config file for running a short longevity test which triggers a nemesis. If health checks are enabled then the "nemesis call" runs much longer while health checks are completed 2 times - before and after the nemesis. And while is it ongoing the nemesis lock gets held. And the problem with it is that it runs even after finish of this test. So, any another unit test which tries to get a nemesis lock will stumble upon a held lock for 10+ minutes. It started happening after the merge a PR [1] which enabled health checks in the mentioned config file. The affected test is following: - test_nemesis.py::test_list_nemesis_of_added_disrupt_methods Alphabetically it runs after the 'test_longevity.py::test_test_user_batch_custom_time' one. So, disable health checks to avoid side-effects and doing redundant stuff which was not planned when the test was written. [1] scylladb#9843
The 'test_longevity.py::test_test_user_batch_custom_time' unit test uses the 'test-cases/scale/longevity-5000-tables.yaml' config file for running a short longevity test which triggers a nemesis. If health checks are enabled then the "nemesis call" runs much longer while health checks are completed 2 times - before and after the nemesis. And while it is ongoing the nemesis lock gets held. And the problem with it is that it runs even after finish of this test. So, any another unit test which tries to get a nemesis lock will stumble upon a held lock for 10+ minutes. It started happening after the merge a PR [1] which enabled health checks in the mentioned config file. The affected test is following: - test_nemesis.py::test_list_nemesis_of_added_disrupt_methods Alphabetically it runs after the 'test_longevity.py::test_test_user_batch_custom_time' one. So, disable health checks to avoid side-effects and doing redundant stuff which was not planned when the test was written. [1] scylladb#9843
The 'test_longevity.py::test_test_user_batch_custom_time' unit test uses the 'test-cases/scale/longevity-5000-tables.yaml' config file for running a short longevity test which triggers a nemesis. If health checks are enabled then the "nemesis call" runs much longer while health checks are completed 2 times - before and after the nemesis. And while it is ongoing the nemesis lock gets held. And the problem with it is that it runs even after finish of this test. So, any another unit test which tries to get a nemesis lock will stumble upon a held lock for 10+ minutes. It started happening after the merge of the PR [1] which enabled health checks in the mentioned config file. The affected test is following: - test_nemesis.py::test_list_nemesis_of_added_disrupt_methods Alphabetically it runs after the 'test_longevity.py::test_test_user_batch_custom_time' one. So, disable health checks to avoid side-effects and doing redundant stuff which was not planned when the test was written. [1] scylladb#9843
The 'test_longevity.py::test_test_user_batch_custom_time' unit test uses the 'test-cases/scale/longevity-5000-tables.yaml' config file for running a short longevity test which triggers a nemesis. If health checks are enabled then the "nemesis call" runs much longer while health checks are completed 2 times - before and after the nemesis. And while it is ongoing the nemesis lock gets held. And the problem with it is that it runs even after finish of this test. So, any another unit test which tries to get a nemesis lock will stumble upon a held lock for 10+ minutes. It started happening after the merge of the PR [1] which enabled health checks in the mentioned config file. The affected test is following: - test_nemesis.py::test_list_nemesis_of_added_disrupt_methods Alphabetically it runs after the 'test_longevity.py::test_test_user_batch_custom_time' one. So, disable health checks to avoid side-effects and doing redundant stuff which were not planned when the test was written. [1] scylladb#9843
The 'test_longevity.py::test_test_user_batch_custom_time' unit test uses the 'test-cases/scale/longevity-5000-tables.yaml' config file for running a short longevity test which triggers a nemesis. If health checks are enabled then the "nemesis call" runs much longer while health checks are completed 2 times - before and after the nemesis. And while it is ongoing the nemesis lock gets held. And the problem with it is that it runs even after finish of this test. So, any another unit test which tries to get a nemesis lock will stumble upon a held lock for 10+ minutes. It started happening after the merge of the PR [1] which enabled health checks in the mentioned config file. The affected test is following: - test_nemesis.py::test_list_nemesis_of_added_disrupt_methods Alphabetically it runs after the 'test_longevity.py::test_test_user_batch_custom_time' one. So, disable health checks to avoid side-effects and doing redundant stuff which were not planned when the test was written. [1] scylladb#9843
The 'test_longevity.py::test_test_user_batch_custom_time' unit test uses the 'test-cases/scale/longevity-5000-tables.yaml' config file for running a short longevity test which triggers a nemesis. If health checks are enabled then the "nemesis call" runs much longer while health checks are completed 2 times - before and after the nemesis. And while it is ongoing the nemesis lock gets held. And the problem with it is that it runs even after finish of this test. So, any another unit test which tries to get a nemesis lock will stumble upon a held lock for 10+ minutes. It started happening after the merge of the PR [1] which enabled health checks in the mentioned config file. The affected test is following: - test_nemesis.py::test_list_nemesis_of_added_disrupt_methods Alphabetically it runs after the 'test_longevity.py::test_test_user_batch_custom_time' one. So, disable health checks to avoid side-effects and doing redundant stuff which were not planned when the test was written. [1] #9843
The 'test_longevity.py::test_test_user_batch_custom_time' unit test uses the 'test-cases/scale/longevity-5000-tables.yaml' config file for running a short longevity test which triggers a nemesis. If health checks are enabled then the "nemesis call" runs much longer while health checks are completed 2 times - before and after the nemesis. And while it is ongoing the nemesis lock gets held. And the problem with it is that it runs even after finish of this test. So, any another unit test which tries to get a nemesis lock will stumble upon a held lock for 10+ minutes. It started happening after the merge of the PR [1] which enabled health checks in the mentioned config file. The affected test is following: - test_nemesis.py::test_list_nemesis_of_added_disrupt_methods Alphabetically it runs after the 'test_longevity.py::test_test_user_batch_custom_time' one. So, disable health checks to avoid side-effects and doing redundant stuff which were not planned when the test was written. [1] scylladb#9843 (cherry picked from commit a78d65f)
The 'test_longevity.py::test_test_user_batch_custom_time' unit test uses the 'test-cases/scale/longevity-5000-tables.yaml' config file for running a short longevity test which triggers a nemesis. If health checks are enabled then the "nemesis call" runs much longer while health checks are completed 2 times - before and after the nemesis. And while it is ongoing the nemesis lock gets held. And the problem with it is that it runs even after finish of this test. So, any another unit test which tries to get a nemesis lock will stumble upon a held lock for 10+ minutes. It started happening after the merge of the PR [1] which enabled health checks in the mentioned config file. The affected test is following: - test_nemesis.py::test_list_nemesis_of_added_disrupt_methods Alphabetically it runs after the 'test_longevity.py::test_test_user_batch_custom_time' one. So, disable health checks to avoid side-effects and doing redundant stuff which were not planned when the test was written. [1] scylladb#9843 (cherry picked from commit a78d65f)
The 'test_longevity.py::test_test_user_batch_custom_time' unit test uses the 'test-cases/scale/longevity-5000-tables.yaml' config file for running a short longevity test which triggers a nemesis. If health checks are enabled then the "nemesis call" runs much longer while health checks are completed 2 times - before and after the nemesis. And while it is ongoing the nemesis lock gets held. And the problem with it is that it runs even after finish of this test. So, any another unit test which tries to get a nemesis lock will stumble upon a held lock for 10+ minutes. It started happening after the merge of the PR [1] which enabled health checks in the mentioned config file. The affected test is following: - test_nemesis.py::test_list_nemesis_of_added_disrupt_methods Alphabetically it runs after the 'test_longevity.py::test_test_user_batch_custom_time' one. So, disable health checks to avoid side-effects and doing redundant stuff which were not planned when the test was written. [1] scylladb#9843 (cherry picked from commit a78d65f)
The 'test_longevity.py::test_test_user_batch_custom_time' unit test uses the 'test-cases/scale/longevity-5000-tables.yaml' config file for running a short longevity test which triggers a nemesis. If health checks are enabled then the "nemesis call" runs much longer while health checks are completed 2 times - before and after the nemesis. And while it is ongoing the nemesis lock gets held. And the problem with it is that it runs even after finish of this test. So, any another unit test which tries to get a nemesis lock will stumble upon a held lock for 10+ minutes. It started happening after the merge of the PR [1] which enabled health checks in the mentioned config file. The affected test is following: - test_nemesis.py::test_list_nemesis_of_added_disrupt_methods Alphabetically it runs after the 'test_longevity.py::test_test_user_batch_custom_time' one. So, disable health checks to avoid side-effects and doing redundant stuff which were not planned when the test was written. [1] scylladb#9843 (cherry picked from commit a78d65f)
The 'test_longevity.py::test_test_user_batch_custom_time' unit test uses the 'test-cases/scale/longevity-5000-tables.yaml' config file for running a short longevity test which triggers a nemesis. If health checks are enabled then the "nemesis call" runs much longer while health checks are completed 2 times - before and after the nemesis. And while it is ongoing the nemesis lock gets held. And the problem with it is that it runs even after finish of this test. So, any another unit test which tries to get a nemesis lock will stumble upon a held lock for 10+ minutes. It started happening after the merge of the PR [1] which enabled health checks in the mentioned config file. The affected test is following: - test_nemesis.py::test_list_nemesis_of_added_disrupt_methods Alphabetically it runs after the 'test_longevity.py::test_test_user_batch_custom_time' one. So, disable health checks to avoid side-effects and doing redundant stuff which were not planned when the test was written. [1] #9843 (cherry picked from commit a78d65f)
The 'test_longevity.py::test_test_user_batch_custom_time' unit test uses the 'test-cases/scale/longevity-5000-tables.yaml' config file for running a short longevity test which triggers a nemesis. If health checks are enabled then the "nemesis call" runs much longer while health checks are completed 2 times - before and after the nemesis. And while it is ongoing the nemesis lock gets held. And the problem with it is that it runs even after finish of this test. So, any another unit test which tries to get a nemesis lock will stumble upon a held lock for 10+ minutes. It started happening after the merge of the PR [1] which enabled health checks in the mentioned config file. The affected test is following: - test_nemesis.py::test_list_nemesis_of_added_disrupt_methods Alphabetically it runs after the 'test_longevity.py::test_test_user_batch_custom_time' one. So, disable health checks to avoid side-effects and doing redundant stuff which were not planned when the test was written. [1] #9843 (cherry picked from commit a78d65f)
List of changes:
Note that this scenario hits following bug:
If
destroy_data_then_repair
nemesis gets triggered aganst the setup of this scenario.Testing
PR pre-checks (self review)
backport
labelsReminders
sdcm/sct_config.py
)unit-test/
folder)