Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Cassandra tests flakiness #620

Closed
piodul opened this issue Jan 5, 2023 · 3 comments · Fixed by #629
Closed

Investigate Cassandra tests flakiness #620

piodul opened this issue Jan 5, 2023 · 3 comments · Fixed by #629
Assignees

Comments

@piodul
Copy link
Collaborator

piodul commented Jan 5, 2023

Recently we started to get random failures in test runs with Cassandra. Some examples of failed runs:

The above runs are for PRs, but the failures don't seem related in any way to the changes they introduce. I can see timeouts, queries not returning all expected data, Unknown CF errors...

This should be investigated, but my initial suspicion is that we are stressing the Cassandra cluster too much with parallel tests (would explain timeouts) and may use consistency that is too low (would explain missing data in reads). This might also be related in some way to the fact that we switched to a 3-node Cassandra cluster.

@Gor027
Copy link
Contributor

Gor027 commented Jan 26, 2023

The #629 PR changed Cassandra version from 4.1(latest) to 4.0.7 in CI to avoid the flakiness problem, however, the root cause of the flakiness is still unknown. I have tried to look at the Cassandra logs after a failed CI and found an issue that is probably the root cause of the flakiness but it needs to be somehow approved.

@piodul
Copy link
Collaborator Author

piodul commented Jan 26, 2023

I opened a new issue about failing tests with Cassandra 4.1: #633

found an issue that is probably the root cause of the flakiness but it needs to be somehow approved.

Could you explain in more detail what you mean? The link you provided points to this issue. Do you know the real root cause of the flakiness? If so, then let's continue the discussion on the new issue.

@Gor027
Copy link
Contributor

Gor027 commented Jan 26, 2023

Sorry, that was a copy-paste issue, I have updated the link to the Cassandra issue. Unfortunately, I cannot confirm that the patch resolving the mentioned issue is the root cause of the problem but the issue with the shared logs may help in further investigations.

cvybhu added a commit to cvybhu/scylla-rust-driver that referenced this issue Oct 19, 2023
In January the tests started failing on Cassandra `4.1`, so we pinned the cassandra
version to `4.0.7`, which didn't have these problems. `4.0.7` is the version that we've
been using in the CI since then.

In the meantime there were new releases of `Cassandra`, at the moment the latest is `4.1.3`.
Let's try to go back to the latest version, maybe the problems fixed themselves during the last nine months.

Refs: scylladb#620
Refs: scylladb#629

Fixes: scylladb#633

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants