Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and test token aware routing #854

Merged
merged 19 commits into from
Oct 24, 2022
Merged

Conversation

conorbros
Copy link
Member

Pre-req:

Fixes a bug in token aware routing and tests against the Scylla driver.

@conorbros conorbros force-pushed the token-routing branch 2 times, most recently from b233bb9 to 5293e6e Compare October 12, 2022 05:33
This was referenced Oct 12, 2022
@shotover shotover deleted a comment from rukai Oct 12, 2022
@conorbros conorbros force-pushed the token-routing branch 3 times, most recently from a18e19a to 9f315d9 Compare October 19, 2022 04:26
@conorbros conorbros force-pushed the token-routing branch 2 times, most recently from bae8e25 to 4b967a0 Compare October 21, 2022 04:03
@conorbros conorbros marked this pull request as ready for review October 21, 2022 04:23
@rukai
Copy link
Member

rukai commented Oct 24, 2022

I removed the bug fix and confirmed that causes the new test case to fail.

I also added some debug logs to verify that we were indeed routing to different nodes each time:

2022-10-23T23:56:02.855725Z ERROR lib::cassandra_int_tests::routing: delete hit: 172.16.1.3
2022-10-23T23:56:03.065194Z ERROR lib::cassandra_int_tests::routing: delete hit: 172.16.1.4
2022-10-23T23:56:03.274862Z ERROR lib::cassandra_int_tests::routing: delete hit: 172.16.1.3
2022-10-23T23:56:03.483010Z ERROR lib::cassandra_int_tests::routing: delete hit: 172.16.1.2
2022-10-23T23:56:03.691256Z ERROR lib::cassandra_int_tests::routing: delete hit: 172.16.1.2

The new routing.rs test case takes 84s to run on my machine.
Thats quite a hit to our total test runtime (+168s) and to the runtime of our standard cluster test case test_cluster_single_rack_v4 (84s)

With only 800 prepared queries that is 100ms per query, that seemed pretty bad to me?
But even when I skip shotover and query cassandra directly it still takes 83s, so seems cassandra is just slow 💤

Can we lower the loop to something like 0..10, yes it opens us up to a likelier chance of a false negative but that is not as likely as it first sounds since that still leaves us with 10 * 4 * 2 = 80 asserts and that number will grow as we expand our test suite to run against more scenarios.
(would also be down for 0..20 if 10 is pushing it)
We would also pretty quickly catch such cases as its bound to fail the next time we run tests.

Copy link
Member

@rukai rukai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job on coming up with a way to test that our routing and then figuring out what was wrong with our implementation.
Using tracing to ensure our routing matches that of a known good implementation is really clever!

Copy link
Member

@benbromhead benbromhead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one - clever!!

@rukai rukai merged commit fccf27c into shotover:main Oct 24, 2022
@conorbros conorbros deleted the token-routing branch October 24, 2022 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants