Investigate Cassandra tests flakiness #620

piodul · 2023-01-05T12:00:07Z

Recently we started to get random failures in test runs with Cassandra. Some examples of failed runs:

The above runs are for PRs, but the failures don't seem related in any way to the changes they introduce. I can see timeouts, queries not returning all expected data, Unknown CF errors...

This should be investigated, but my initial suspicion is that we are stressing the Cassandra cluster too much with parallel tests (would explain timeouts) and may use consistency that is too low (would explain missing data in reads). This might also be related in some way to the fact that we switched to a 3-node Cassandra cluster.

The text was updated successfully, but these errors were encountered:

Gor027 · 2023-01-26T13:31:27Z

The #629 PR changed Cassandra version from 4.1(latest) to 4.0.7 in CI to avoid the flakiness problem, however, the root cause of the flakiness is still unknown. I have tried to look at the Cassandra logs after a failed CI and found an issue that is probably the root cause of the flakiness but it needs to be somehow approved.

piodul · 2023-01-26T14:30:06Z

I opened a new issue about failing tests with Cassandra 4.1: #633

found an issue that is probably the root cause of the flakiness but it needs to be somehow approved.

Could you explain in more detail what you mean? The link you provided points to this issue. Do you know the real root cause of the flakiness? If so, then let's continue the discussion on the new issue.

Gor027 · 2023-01-26T14:41:59Z

Sorry, that was a copy-paste issue, I have updated the link to the Cassandra issue. Unfortunately, I cannot confirm that the patch resolving the mentioned issue is the root cause of the problem but the issue with the shared logs may help in further investigations.

In January the tests started failing on Cassandra `4.1`, so we pinned the cassandra version to `4.0.7`, which didn't have these problems. `4.0.7` is the version that we've been using in the CI since then. In the meantime there were new releases of `Cassandra`, at the moment the latest is `4.1.3`. Let's try to go back to the latest version, maybe the problems fixed themselves during the last nine months. Refs: scylladb#620 Refs: scylladb#629 Fixes: scylladb#633 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

piodul mentioned this issue Jan 5, 2023

Add serialize and deserialize support for DateTime<Utc> and secret #619

Merged

6 tasks

avelanarius assigned Gor027 Jan 9, 2023

Gor027 mentioned this issue Jan 13, 2023

Downgrade Cassandra version for CI #629

Merged

6 tasks

cvybhu closed this as completed in #629 Jan 13, 2023

piodul mentioned this issue Jan 26, 2023

Investigate why tests fail with the newest Cassandra and bring it back in the CI #633

Closed

cvybhu mentioned this issue Oct 19, 2023

cassandra/docker-compose: go back to using the latest cassandra version #850

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate Cassandra tests flakiness #620

Investigate Cassandra tests flakiness #620

piodul commented Jan 5, 2023

Gor027 commented Jan 26, 2023 •

edited

Loading

piodul commented Jan 26, 2023

Gor027 commented Jan 26, 2023

Investigate Cassandra tests flakiness #620

Investigate Cassandra tests flakiness #620

Comments

piodul commented Jan 5, 2023

Gor027 commented Jan 26, 2023 • edited Loading

piodul commented Jan 26, 2023

Gor027 commented Jan 26, 2023

Gor027 commented Jan 26, 2023 •

edited

Loading