-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate Cassandra tests flakiness #620
Comments
The #629 PR changed Cassandra version from 4.1(latest) to 4.0.7 in CI to avoid the flakiness problem, however, the root cause of the flakiness is still unknown. I have tried to look at the Cassandra logs after a failed CI and found an issue that is probably the root cause of the flakiness but it needs to be somehow approved. |
I opened a new issue about failing tests with Cassandra 4.1: #633
Could you explain in more detail what you mean? The link you provided points to this issue. Do you know the real root cause of the flakiness? If so, then let's continue the discussion on the new issue. |
Sorry, that was a copy-paste issue, I have updated the link to the Cassandra issue. Unfortunately, I cannot confirm that the patch resolving the mentioned issue is the root cause of the problem but the issue with the shared logs may help in further investigations. |
In January the tests started failing on Cassandra `4.1`, so we pinned the cassandra version to `4.0.7`, which didn't have these problems. `4.0.7` is the version that we've been using in the CI since then. In the meantime there were new releases of `Cassandra`, at the moment the latest is `4.1.3`. Let's try to go back to the latest version, maybe the problems fixed themselves during the last nine months. Refs: scylladb#620 Refs: scylladb#629 Fixes: scylladb#633 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Recently we started to get random failures in test runs with Cassandra. Some examples of failed runs:
The above runs are for PRs, but the failures don't seem related in any way to the changes they introduce. I can see timeouts, queries not returning all expected data,
Unknown CF
errors...This should be investigated, but my initial suspicion is that we are stressing the Cassandra cluster too much with parallel tests (would explain timeouts) and may use consistency that is too low (would explain missing data in reads). This might also be related in some way to the fact that we switched to a 3-node Cassandra cluster.
The text was updated successfully, but these errors were encountered: