Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LBP: Return Option<Shard> instead of Shard #969

Merged
merged 4 commits into from
Mar 28, 2024

Conversation

Lorak-mmk
Copy link
Collaborator

@Lorak-mmk Lorak-mmk commented Mar 22, 2024

This was already documented as such, but due to an oversight the code was in disagreement with documentation. Approach from the documentation is better, because the currently implemented approach prevented deduplication in Plan from working correctly.

I didn't change deduplication in DefaultPolicy, so currently for unprepared statements everything is deduplicated, and for prepared statements:

  • First in plan all are replicas (nodes + shards), properly deduplicated
  • Then nodes, also deduplicated
  • There is however no deduplication between replicas and nodes, so it's possible for prepared statement plan to first contain replica (node X, Some(n)) and then node (node X, None).

This makes sense to me: if none of the replicas responded, then something is very wrong. We can try different nodes, but we might as well try different shards on replica nodes. I don't see any big benefits from either approach, but I might be wrong.

In DefaultPolicy code there is a lot of places where impl Fn... is passed as a way to filter nodes. I changed some of the types so that:

  • The signature fits the purpose of the function. For example, if a function operates on replicas, then it's filter should accept Shard instead of Option<Shard>. On the contrary, if a function operated only on nodes, its filter now accepts just NodeRef.
  • Where possible I got rid of &NodeRef - NodeRef is alread just a reference and is Copy, so this gets rid of unnecessary indirection.
  • Where possible I made the functions accept 2 arguments instead of a tuple - the code is imho simpler and easier to read.

When reviewing I'd suggest focusing on

  • Are the types used properly or does it make sens to change some of the occurenceS? Shard vs Option<Shard> vs absence of shard, &NodeRef vs NodeRef, &(NodeRef, Shard) vs NodeRef, Shard etc
  • Correctness with regards to produced plans, deduplication, API

Fixes: #967

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@Lorak-mmk Lorak-mmk self-assigned this Mar 22, 2024
Copy link

github-actions bot commented Mar 22, 2024

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳
Checked commit: e90f102

@Lorak-mmk
Copy link
Collaborator Author

It looks like tests are not too happy with my changes...

@Lorak-mmk Lorak-mmk marked this pull request as draft March 22, 2024 14:45
@Lorak-mmk
Copy link
Collaborator Author

Ok, I see what is changed - previously deduplication worked by accident, because tests set shard count to 1 (so deduplication worked because shards were always equal).
I'll change the policy so the returned results are like before.
Then for the sake of better testing I'll change the shard amount in tests.

I'm also thinking about moving shard randomizing to another layer so that Plan is easier to use in tests.

@Lorak-mmk
Copy link
Collaborator Author

Lorak-mmk commented Mar 22, 2024

Using .unique_by() with the following struct:

        struct DefaultPolicyTargetComparator {
            host_id: Uuid,
            shard: Option<Shard>,
        }

        impl PartialEq for DefaultPolicyTargetComparator {
            fn eq(&self, other: &Self) -> bool {
                match (self.shard, other.shard) {
                    (_, None) | (None, _) => self.host_id.eq(&other.host_id),
                    (Some(shard_left), Some(shard_right)) => {
                        self.host_id.eq(&other.host_id) && shard_left.eq(&shard_right)
                    }
                }
            }
        }

        impl Eq for DefaultPolicyTargetComparator {}

        impl Hash for DefaultPolicyTargetComparator {
            fn hash<H: Hasher>(&self, state: &mut H) {
                self.host_id.hash(state);
            }
        }

Fixes the test. This implements the previous semantics: we deduplicate sharded targets if they are fully equal (node and shard) and we deduplicate non-sharded targets if their node is equal to node of one of the previous targets.

My question is, just to make sure: Is this the semantics we want? I guess it makes sense for larger cluster, but for e.g. a 3-node cluster with RF=3 we may want to query other shard on replica nodes. right now the plan will just have 3 elements.
OTOH if there are other nodes to query, we should probably query them. WDYT @wprzytula @piodul ?

@Lorak-mmk
Copy link
Collaborator Author

Lorak-mmk commented Mar 25, 2024

For now I added the aforementioned comparator, this should bring back previous semantics if I'm not mistaken. I'd really like some feedback on that.

@Lorak-mmk Lorak-mmk force-pushed the fix_shard_load_balancing branch from 80d648f to 2589fa7 Compare March 25, 2024 16:21
@Lorak-mmk Lorak-mmk requested a review from piodul March 25, 2024 17:06
@Lorak-mmk Lorak-mmk marked this pull request as ready for review March 25, 2024 17:06
@wprzytula
Copy link
Collaborator

My question is, just to make sure: Is this the semantics we want? I guess it makes sense for larger cluster, but for e.g. a 3-node cluster with RF=3 we may want to query other shard on replica nodes. right now the plan will just have 3 elements.

When now I think of it, RetryPolicy comes to my mind. After all, it's the main actor in the field of the query plan.

RetryPolicy returns RetryDecision, which currently enables retrying either on the same node or on the next node. Now that Plan consists of targets instead of nodes, this semantics has changed to retrying either on the same target or on the next target.

I think we should consider extending RetryDecision with a variant RetryAnotherShard, which would keep the same node but change the shard. I can't yet see a practical application of this, but the idea is related to your doubts.

OTOH if there are other nodes to query, we should probably query them.

You meant other targets, didn't you? If so, I think I agree. The fallback plan should give plenty of choices (targets) to try, provided the user has a relatively big number of allowed retries set.

Another point of this discussion is then: how many targets?

  • For every node, all its shards? This would make our plans just too large, considering a probable scenario of 3-node cluster with 8 shards on each node.
  • For every node, up to n shards, where n is a reasonably selected constant? Let's say n := 2. Then we could try to mitigate the case when the exact one shard is overloaded.

@wprzytula
Copy link
Collaborator

OTOH, nobody ever complained about our plans containing too few targets.
Perhaps it's just because once a production cluster got overloaded, no one was eager to blame the driver for misbehaviour in such harsh conditions.
Anyway, I think that the above considerations should be prioritised rather low; let's stick to the previous semantics, e.g. using the comparator you proposed.

@Lorak-mmk
Copy link
Collaborator Author

Lorak-mmk commented Mar 26, 2024

My question is, just to make sure: Is this the semantics we want? I guess it makes sense for larger cluster, but for e.g. a 3-node cluster with RF=3 we may want to query other shard on replica nodes. right now the plan will just have 3 elements.

When now I think of it, RetryPolicy comes to my mind. After all, it's the main actor in the field of the query plan.

RetryPolicy returns RetryDecision, which currently enables retrying either on the same node or on the next node. Now that Plan consists of targets instead of nodes, this semantics has changed to retrying either on the same target or on the next target.

Right, we should rename those things and adjust the documentation. I'll open an issue about it.

I think we should consider extending RetryDecision with a variant RetryAnotherShard, which would keep the same node but change the shard. I can't yet see a practical application of this, but the idea is related to your doubts.

I'm not sure how would you want this to work. Plan is a list of targets. If RetryPolicy returns RetryAnotherShard do you want to skip plan elements until you find the same node with different shard? Doesn't seem like a good idea.
Re-use previous target but use new random shard? That makes more sense. I'm not completely sure it will by useful.

OTOH if there are other nodes to query, we should probably query them.

You meant other targets, didn't you? If so, I think I agree. The fallback plan should give plenty of choices (targets) to try, provided the user has a relatively big number of allowed retries set.

No, I meant nodes. If you have cluster with 10 nodes, and RF=3, then after trying 3 initial targets you probably want to try one of the other nodes first, before going to other shards on 3 already queried nodes.

Another point of this discussion is then: how many targets?

* For every node, all its shards? This would make our plans just too large, considering a probable scenario of 3-node cluster with 8 shards on each node.

Is the size a problem? Plan is a iterator anyway, is there any scenario where we need to materialize whole plan?

* For every node, up to `n` shards, where `n` is a reasonably selected constant? Let's say `n := 2`. Then we could try to mitigate the case when the exact one shard is overloaded.

Makes sense.

OTOH, nobody ever complained about our plans containing too few targets. Perhaps it's just because once a production cluster got overloaded, no one was eager to blame the driver for misbehaviour in such harsh conditions. Anyway, I think that the above considerations should be prioritised rather low; let's stick to the previous semantics, e.g. using the comparator you proposed.

Well, nobody will complain if the driver is too resilient to harsh conditions.
I agree that we shouldn't solve it in this PR - I think we can merge it as is (after a proper review ofc) and discuss semantics later.
Note: In the context of #968 we might want to specify that plans coming from DefaultPolicy are not subject to semver, meaning they can change between minor updates.

@wprzytula
Copy link
Collaborator

Right, we should rename those things and adjust the documentation. I'll open an issue about it.

Don't forget about SpeculativeExecutionPolicy, too. What's interesting, yet another term is used there (clearly inherited from Java driver): host.

Re-use previous target but use new random shard? That makes more sense.

This was the idea I had.

No, I meant nodes. If you have cluster with 10 nodes, and RF=3, then after trying 3 initial targets you probably want to try one of the other nodes first, before going to other shards on 3 already queried nodes.

Ah, right. I fully agree.

Is the size a problem? Plan is a iterator anyway, is there any scenario where we need to materialize whole plan?

Plan indeed is an iterator, and furthermore one created purely on the stack IIUC, so allocations are not a big deal here.
I'm still reluctant to make plans so large; no one would be that stubborn that they would make so many attempts. Especially DefaultPolicy should not exhibit such unwavering determination no matter the cost (of querying so many times).

Note: In the context of #968 we might want to specify that plans coming from DefaultPolicy are not subject to semver, meaning they can change between minor updates.

Yes, please.

scylla/src/transport/load_balancing/plan.rs Outdated Show resolved Hide resolved
scylla/src/transport/load_balancing/plan.rs Show resolved Hide resolved
examples/custom_load_balancing_policy.rs Show resolved Hide resolved
Comment on lines 53 to +60
cluster: &'a ClusterData,
) -> Option<(NodeRef<'a>, Shard)> {
) -> Option<(NodeRef<'a>, Option<Shard>)> {
self.report_node(Report::LoadBalancing);
cluster.get_nodes_info().iter().next().map(|node| (node, 0))
cluster
.get_nodes_info()
.iter()
.next()
.map(|node| (node, None))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test semantics are changed here: from deterministic to random. This, however, does not break the tests that use this struct, because they only check which node was queried (the exact shard is irrelevant for them).

scylla/tests/integration/utils.rs Outdated Show resolved Hide resolved
scylla/src/transport/load_balancing/default.rs Outdated Show resolved Hide resolved
scylla/src/transport/load_balancing/default.rs Outdated Show resolved Hide resolved
@Lorak-mmk Lorak-mmk force-pushed the fix_shard_load_balancing branch 3 times, most recently from c877af8 to 2b565ab Compare March 26, 2024 21:21
@Lorak-mmk
Copy link
Collaborator Author

  • Fixed fmt / clippy issues in @wprzytula
  • Addressed one comment, responded to the rest

@Lorak-mmk Lorak-mmk force-pushed the fix_shard_load_balancing branch from 2b565ab to 5a8adce Compare March 26, 2024 21:51
@Lorak-mmk Lorak-mmk requested a review from wprzytula March 27, 2024 07:04
Lorak-mmk and others added 4 commits March 27, 2024 11:35
This was already documented as such, but due to an oversight the code
was in disagreement with documentation. Approach from the documentation
is better, because the currently implemented approach prevented
deduplication in Plan from working correctly.
Co-authored-by: Wojciech Przytuła <wojciech.przytula@scylladb.com>
As there is a brilliant `std::iter::from_iter()` function that creates a
new iterator based on a closure, it can be used instead of verbose
boilerplate incurred by introducing IteratorWithSkippedNodes.
`with_pseudorandom_shard` and `FixedOrderLoadBalancer` are no longer used.
No need to keep them around.
@Lorak-mmk Lorak-mmk force-pushed the fix_shard_load_balancing branch from 5a8adce to e90f102 Compare March 27, 2024 10:39
@Lorak-mmk Lorak-mmk requested a review from wprzytula March 27, 2024 10:53
@wprzytula wprzytula merged commit 1fcadc9 into scylladb:main Mar 28, 2024
11 checks passed
@Lorak-mmk Lorak-mmk mentioned this pull request May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

speculative_execution test flakiness
2 participants