-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LOAD_BALANCING_POLICY_SLOW_AVOIDANCE funtionality #168
Add LOAD_BALANCING_POLICY_SLOW_AVOIDANCE funtionality #168
Conversation
I am not sure how to test this, I am open for any suggestions. |
f7f6a59
to
1290781
Compare
Can you please provide a reference of this functionality in the Java driver? |
I added link to the code to the description. |
policies.go
Outdated
@@ -424,6 +434,8 @@ type clusterMeta struct { | |||
tokenRing *tokenRing | |||
} | |||
|
|||
const MAX_IN_FLIGHT_THRESHOLD int = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this configurable.
1290781
to
3451c6e
Compare
I'm suspicious of this policy, it can induce instability in the cluster.
Any action taken has to be strongly dampened to avoid overreacting. This is especially important when there are many client processes making independent, but similar decisions. |
v2:
The process of testing this change is still in progress. I am trying to observe the behavior in use by setting up a 3-node cluster with one "slow" node (with some sleeps), aiming to show that replicas on this host are unhealthy. However, so far, I have not been able to observe higher number of in-use streams. |
There was a nice addition of 'power of two choice' in one of our drivers (Java?) - I think this might be a good candidate here as well. |
@avikivity I guess it is working fine for java-driver, but at the same time this behavior is not tested there so I am curious how often it is actually used and how often the node is slow. Especially because the MAX_IN_FLIGHT_THRESHOLD is not configurable in java-driver. |
I don't have concrete evidence that it fails, just suspicions. The policy originated[1] with Cassandra where there is an actual source of node slowness - full garbage collection. We don't have this source, but we have others. I heard that the "dynamic snitch" (which performs similar functionality for the coordinator->replica hop) performs poorly. See https://thelastpickle.com/blog/2019/01/30/new-cluster-recommendations.html (see 5). MAX_IN_FLIGHT_THRESHOLD changes its meaning if we use shard-aware or shard-unaware modes, and is very workload dependent. In a cache hit intensive workload you'd see small in flight request count, in a cache miss intensive workload you'd see high in flight request count. A node could have high latency because one of the replicas it is accessing is slow (mostly for small clusters where a single replica is 1/3 of the replicas contacted by the coordinator; for large clusters the slow replica would be averaged out). We could add it for parity with the Java driver, but we have to be careful about recommending it. [1] I'm just guessing |
scylla.go
Outdated
func (p *scyllaConnPicker) InFlight() int { | ||
result := 0 | ||
for i := range p.conns { | ||
idx := int(atomic.AddUint64(&p.pos, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idx := int(atomic.AddUint64(&p.pos, 1))
should be calculated once before the loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed that to just iterating through p.conns for simplicity
The java driver has the feature to automatically avoid slow replicas by doing simple heuristics. This is one of the key feature to have a stable latency. This commit adds additional field in tokenAwareHostPolicy to control if the feature is enabled and what is the maximum in flight threshold. If feature is enabled driver sorts the replicas to first try those with less than specified maximum in flight connections. Fixes: scylladb#154
3451c6e
to
26c61ef
Compare
I managed to observe the behavior in use by setting up a 3-node cluster with one "slow" node (with some sleeps). Extra logs showed larger amount of in-use streams and the slow node was considered unhealthy. Now I will repeat the process to see the behavior on metrics. |
I used 3-node cluster with one slow node (with some sleeps). It is not obvious how to observe avoiding slow replicas on metrics. If replica shuffling is disabled (default behavior) or the query happens to be LWT, there is little to no difference between gocql with or without slow replica avoidance. The difference can only be seen if the slow node happens to not be the first on the replica list. On this graph we see number of request for different situations: If driver does shuffle the replica list, enabling slow replica avoidance gives positive outcome. We can see the test execution is faster and more queries are directed to the fast nodes. (In this test there were 5 concurrent scenarios with 10 INSERT queries with RF=3 and 50 SELECT queries with CL=1) With higher number of concurrent scenarios version without slow replica avoidance timeouts and with it works just fine: |
During testing slow replica avoidance functionality I realized that simple queries e.g. |
@avikivity I added some metrics to show the impact on the performance, what do you think about merging this? |
#168 (comment) is not an objection to merging, since it's adding functionality already in the Java driver (opt-in I hope). I'll go over your measurements regardless. |
btw - @michoecho this can feed into our discussion re slow shards. |
Your measurement results are unsurprising - for sure if one node really is slow, then avoiding it gives better results. My worry is that we'll misdetect a node as slow, thereby diverting requests to another node, which then becomes slow, starting a feedback loop where we cause nodes to be slow. If the detection threshold is sufficiently high, perhaps this doesn't happen. |
Hello there, just giving a comment to give a little context. It happened that we had a very slow node, especially during upgrade process or hardware issues. This slow down with the earlier version of the driver slowed everything down, causing timeouts, write/read issue and angry customers. This PR should definitively improve those situations. |
The java driver has the feature to automatically avoid slow replicas by doing simple heuristics (https://github.com/scylladb/java-driver/blob/scylla-4.x/core/src/main/java/com/datastax/oss/driver/internal/core/loadbalancing/DefaultLoadBalancingPolicy.java#L104). This is one of the key feature to have a stable latency.
This PR adds additional field in tokenAwareHostPolicy to control if the feature is enabled and what is the maximum in flight threshold.
If feature is enabled driver sorts the replicas to first try those with less than specified maximum in flight requests.
Fixes: #154