LBP: Return Option<Shard> instead of Shard #969

Lorak-mmk · 2024-03-22T10:51:57Z

This was already documented as such, but due to an oversight the code was in disagreement with documentation. Approach from the documentation is better, because the currently implemented approach prevented deduplication in Plan from working correctly.

I didn't change deduplication in DefaultPolicy, so currently for unprepared statements everything is deduplicated, and for prepared statements:

First in plan all are replicas (nodes + shards), properly deduplicated
Then nodes, also deduplicated
There is however no deduplication between replicas and nodes, so it's possible for prepared statement plan to first contain replica (node X, Some(n)) and then node (node X, None).

This makes sense to me: if none of the replicas responded, then something is very wrong. We can try different nodes, but we might as well try different shards on replica nodes. I don't see any big benefits from either approach, but I might be wrong.

In DefaultPolicy code there is a lot of places where impl Fn... is passed as a way to filter nodes. I changed some of the types so that:

The signature fits the purpose of the function. For example, if a function operates on replicas, then it's filter should accept Shard instead of Option<Shard>. On the contrary, if a function operated only on nodes, its filter now accepts just NodeRef.
Where possible I got rid of &NodeRef - NodeRef is alread just a reference and is Copy, so this gets rid of unnecessary indirection.
Where possible I made the functions accept 2 arguments instead of a tuple - the code is imho simpler and easier to read.

When reviewing I'd suggest focusing on

Are the types used properly or does it make sens to change some of the occurenceS? Shard vs Option<Shard> vs absence of shard, &NodeRef vs NodeRef, &(NodeRef, Shard) vs NodeRef, Shard etc
Correctness with regards to produced plans, deduplication, API

Fixes: #967

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

github-actions · 2024-03-22T10:56:41Z

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳
Checked commit: e90f102

Lorak-mmk · 2024-03-22T11:01:10Z

It looks like tests are not too happy with my changes...

Lorak-mmk · 2024-03-22T14:53:07Z

Ok, I see what is changed - previously deduplication worked by accident, because tests set shard count to 1 (so deduplication worked because shards were always equal).
I'll change the policy so the returned results are like before.
Then for the sake of better testing I'll change the shard amount in tests.

I'm also thinking about moving shard randomizing to another layer so that Plan is easier to use in tests.

Lorak-mmk · 2024-03-22T15:17:16Z

Using .unique_by() with the following struct:

        struct DefaultPolicyTargetComparator {
            host_id: Uuid,
            shard: Option<Shard>,
        }

        impl PartialEq for DefaultPolicyTargetComparator {
            fn eq(&self, other: &Self) -> bool {
                match (self.shard, other.shard) {
                    (_, None) | (None, _) => self.host_id.eq(&other.host_id),
                    (Some(shard_left), Some(shard_right)) => {
                        self.host_id.eq(&other.host_id) && shard_left.eq(&shard_right)
                    }
                }
            }
        }

        impl Eq for DefaultPolicyTargetComparator {}

        impl Hash for DefaultPolicyTargetComparator {
            fn hash<H: Hasher>(&self, state: &mut H) {
                self.host_id.hash(state);
            }
        }

Fixes the test. This implements the previous semantics: we deduplicate sharded targets if they are fully equal (node and shard) and we deduplicate non-sharded targets if their node is equal to node of one of the previous targets.

My question is, just to make sure: Is this the semantics we want? I guess it makes sense for larger cluster, but for e.g. a 3-node cluster with RF=3 we may want to query other shard on replica nodes. right now the plan will just have 3 elements.
OTOH if there are other nodes to query, we should probably query them. WDYT @wprzytula @piodul ?

Lorak-mmk · 2024-03-25T16:21:37Z

For now I added the aforementioned comparator, this should bring back previous semantics if I'm not mistaken. I'd really like some feedback on that.

wprzytula · 2024-03-26T09:08:43Z

My question is, just to make sure: Is this the semantics we want? I guess it makes sense for larger cluster, but for e.g. a 3-node cluster with RF=3 we may want to query other shard on replica nodes. right now the plan will just have 3 elements.

When now I think of it, RetryPolicy comes to my mind. After all, it's the main actor in the field of the query plan.

RetryPolicy returns RetryDecision, which currently enables retrying either on the same node or on the next node. Now that Plan consists of targets instead of nodes, this semantics has changed to retrying either on the same target or on the next target.

I think we should consider extending RetryDecision with a variant RetryAnotherShard, which would keep the same node but change the shard. I can't yet see a practical application of this, but the idea is related to your doubts.

OTOH if there are other nodes to query, we should probably query them.

You meant other targets, didn't you? If so, I think I agree. The fallback plan should give plenty of choices (targets) to try, provided the user has a relatively big number of allowed retries set.

Another point of this discussion is then: how many targets?

For every node, all its shards? This would make our plans just too large, considering a probable scenario of 3-node cluster with 8 shards on each node.
For every node, up to n shards, where n is a reasonably selected constant? Let's say n := 2. Then we could try to mitigate the case when the exact one shard is overloaded.

wprzytula · 2024-03-26T09:12:59Z

OTOH, nobody ever complained about our plans containing too few targets.
Perhaps it's just because once a production cluster got overloaded, no one was eager to blame the driver for misbehaviour in such harsh conditions.
Anyway, I think that the above considerations should be prioritised rather low; let's stick to the previous semantics, e.g. using the comparator you proposed.

Lorak-mmk · 2024-03-26T12:47:23Z

My question is, just to make sure: Is this the semantics we want? I guess it makes sense for larger cluster, but for e.g. a 3-node cluster with RF=3 we may want to query other shard on replica nodes. right now the plan will just have 3 elements.

When now I think of it, RetryPolicy comes to my mind. After all, it's the main actor in the field of the query plan.

RetryPolicy returns RetryDecision, which currently enables retrying either on the same node or on the next node. Now that Plan consists of targets instead of nodes, this semantics has changed to retrying either on the same target or on the next target.

Right, we should rename those things and adjust the documentation. I'll open an issue about it.

I think we should consider extending RetryDecision with a variant RetryAnotherShard, which would keep the same node but change the shard. I can't yet see a practical application of this, but the idea is related to your doubts.

I'm not sure how would you want this to work. Plan is a list of targets. If RetryPolicy returns RetryAnotherShard do you want to skip plan elements until you find the same node with different shard? Doesn't seem like a good idea.
Re-use previous target but use new random shard? That makes more sense. I'm not completely sure it will by useful.

OTOH if there are other nodes to query, we should probably query them.

You meant other targets, didn't you? If so, I think I agree. The fallback plan should give plenty of choices (targets) to try, provided the user has a relatively big number of allowed retries set.

No, I meant nodes. If you have cluster with 10 nodes, and RF=3, then after trying 3 initial targets you probably want to try one of the other nodes first, before going to other shards on 3 already queried nodes.

Another point of this discussion is then: how many targets?

* For every node, all its shards? This would make our plans just too large, considering a probable scenario of 3-node cluster with 8 shards on each node.

Is the size a problem? Plan is a iterator anyway, is there any scenario where we need to materialize whole plan?

* For every node, up to `n` shards, where `n` is a reasonably selected constant? Let's say `n := 2`. Then we could try to mitigate the case when the exact one shard is overloaded.

Makes sense.

OTOH, nobody ever complained about our plans containing too few targets. Perhaps it's just because once a production cluster got overloaded, no one was eager to blame the driver for misbehaviour in such harsh conditions. Anyway, I think that the above considerations should be prioritised rather low; let's stick to the previous semantics, e.g. using the comparator you proposed.

Well, nobody will complain if the driver is too resilient to harsh conditions.
I agree that we shouldn't solve it in this PR - I think we can merge it as is (after a proper review ofc) and discuss semantics later.
Note: In the context of #968 we might want to specify that plans coming from DefaultPolicy are not subject to semver, meaning they can change between minor updates.

wprzytula · 2024-03-26T13:47:25Z

Right, we should rename those things and adjust the documentation. I'll open an issue about it.

Don't forget about SpeculativeExecutionPolicy, too. What's interesting, yet another term is used there (clearly inherited from Java driver): host.

Re-use previous target but use new random shard? That makes more sense.

This was the idea I had.

No, I meant nodes. If you have cluster with 10 nodes, and RF=3, then after trying 3 initial targets you probably want to try one of the other nodes first, before going to other shards on 3 already queried nodes.

Ah, right. I fully agree.

Is the size a problem? Plan is a iterator anyway, is there any scenario where we need to materialize whole plan?

Plan indeed is an iterator, and furthermore one created purely on the stack IIUC, so allocations are not a big deal here.
I'm still reluctant to make plans so large; no one would be that stubborn that they would make so many attempts. Especially DefaultPolicy should not exhibit such unwavering determination no matter the cost (of querying so many times).

Note: In the context of #968 we might want to specify that plans coming from DefaultPolicy are not subject to semver, meaning they can change between minor updates.

Yes, please.

scylla/src/transport/load_balancing/plan.rs

examples/custom_load_balancing_policy.rs

wprzytula · 2024-03-26T14:21:47Z

scylla/tests/integration/execution_profiles.rs

        cluster: &'a ClusterData,
-    ) -> Option<(NodeRef<'a>, Shard)> {
+    ) -> Option<(NodeRef<'a>, Option<Shard>)> {
        self.report_node(Report::LoadBalancing);
-        cluster.get_nodes_info().iter().next().map(|node| (node, 0))
+        cluster
+            .get_nodes_info()
+            .iter()
+            .next()
+            .map(|node| (node, None))


The test semantics are changed here: from deterministic to random. This, however, does not break the tests that use this struct, because they only check which node was queried (the exact shard is irrelevant for them).

scylla/tests/integration/utils.rs

scylla/src/transport/load_balancing/default.rs

Lorak-mmk · 2024-03-26T21:22:21Z

Fixed fmt / clippy issues in @wprzytula
Addressed one comment, responded to the rest

scylla/src/transport/load_balancing/plan.rs

This was already documented as such, but due to an oversight the code was in disagreement with documentation. Approach from the documentation is better, because the currently implemented approach prevented deduplication in Plan from working correctly.

Co-authored-by: Wojciech Przytuła <wojciech.przytula@scylladb.com>

As there is a brilliant `std::iter::from_iter()` function that creates a new iterator based on a closure, it can be used instead of verbose boilerplate incurred by introducing IteratorWithSkippedNodes.

`with_pseudorandom_shard` and `FixedOrderLoadBalancer` are no longer used. No need to keep them around.

Lorak-mmk requested review from avelanarius, wprzytula and muzarski March 22, 2024 10:51

Lorak-mmk self-assigned this Mar 22, 2024

Lorak-mmk marked this pull request as draft March 22, 2024 14:45

Lorak-mmk force-pushed the fix_shard_load_balancing branch from 80d648f to 2589fa7 Compare March 25, 2024 16:21

Lorak-mmk requested a review from piodul March 25, 2024 17:06

Lorak-mmk marked this pull request as ready for review March 25, 2024 17:06

Lorak-mmk mentioned this pull request Mar 26, 2024

Describe API stability guarantees in README and docs.rs #968

Open

wprzytula requested changes Mar 26, 2024

View reviewed changes

Lorak-mmk force-pushed the fix_shard_load_balancing branch 3 times, most recently from c877af8 to 2b565ab Compare March 26, 2024 21:21

Lorak-mmk force-pushed the fix_shard_load_balancing branch from 2b565ab to 5a8adce Compare March 26, 2024 21:51

Lorak-mmk requested a review from wprzytula March 27, 2024 07:04

wprzytula reviewed Mar 27, 2024

View reviewed changes

scylla/src/transport/load_balancing/plan.rs Outdated Show resolved Hide resolved

Lorak-mmk and others added 4 commits March 27, 2024 11:35

default_lbp: get rid of redundant indirection

9156191

Co-authored-by: Wojciech Przytuła <wojciech.przytula@scylladb.com>

default_lbp: replace boilerplate with from_iter()

53801ae

As there is a brilliant `std::iter::from_iter()` function that creates a new iterator based on a closure, it can be used instead of verbose boilerplate incurred by introducing IteratorWithSkippedNodes.

Integration tests: Remove unused utils

e90f102

`with_pseudorandom_shard` and `FixedOrderLoadBalancer` are no longer used. No need to keep them around.

Lorak-mmk force-pushed the fix_shard_load_balancing branch from 5a8adce to e90f102 Compare March 27, 2024 10:39

Lorak-mmk requested a review from wprzytula March 27, 2024 10:53

wprzytula approved these changes Mar 28, 2024

View reviewed changes

wprzytula merged commit 1fcadc9 into scylladb:main Mar 28, 2024
11 checks passed

Lorak-mmk mentioned this pull request May 9, 2024

Release 0.13 #995

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LBP: Return Option<Shard> instead of Shard #969

LBP: Return Option<Shard> instead of Shard #969

Lorak-mmk commented Mar 22, 2024 •

edited

Loading

github-actions bot commented Mar 22, 2024 •

edited

Loading

Lorak-mmk commented Mar 22, 2024

Lorak-mmk commented Mar 22, 2024

Lorak-mmk commented Mar 22, 2024 •

edited

Loading

Lorak-mmk commented Mar 25, 2024 •

edited

Loading

wprzytula commented Mar 26, 2024

wprzytula commented Mar 26, 2024

Lorak-mmk commented Mar 26, 2024 •

edited

Loading

wprzytula commented Mar 26, 2024

wprzytula Mar 26, 2024

Lorak-mmk commented Mar 26, 2024

LBP: Return Option<Shard> instead of Shard #969

LBP: Return Option<Shard> instead of Shard #969

Conversation

Lorak-mmk commented Mar 22, 2024 • edited Loading

Pre-review checklist

github-actions bot commented Mar 22, 2024 • edited Loading

Lorak-mmk commented Mar 22, 2024

Lorak-mmk commented Mar 22, 2024

Lorak-mmk commented Mar 22, 2024 • edited Loading

Lorak-mmk commented Mar 25, 2024 • edited Loading

wprzytula commented Mar 26, 2024

wprzytula commented Mar 26, 2024

Lorak-mmk commented Mar 26, 2024 • edited Loading

wprzytula commented Mar 26, 2024

wprzytula Mar 26, 2024

Choose a reason for hiding this comment

Lorak-mmk commented Mar 26, 2024

Lorak-mmk commented Mar 22, 2024 •

edited

Loading

github-actions bot commented Mar 22, 2024 •

edited

Loading

Lorak-mmk commented Mar 22, 2024 •

edited

Loading

Lorak-mmk commented Mar 25, 2024 •

edited

Loading

Lorak-mmk commented Mar 26, 2024 •

edited

Loading