You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two potentially curious aspects of repair behaviour with AAE full-sync. This may be a false reading of the code but:
If two clusters are fund to be out of sync - with a large number of objects being more up-to-date on Cluster A than Cluster B, bi-directional full-syncing between A and B will lead to B sending all its out-of-date objects to A (perhaps before A sends all the up-to-date objects to B).
If there are significant number of differences between the two clusters, the source side will try and resolve this through building a bloom filter of mismatched keys and performing a full object fold over the vnode using the bloom filter as an input. However, it only produces this fold after it has already sent 5% of the keyspace through random reads.
For the first part this appears to be a consequence of not storing the clocks in the AAE store, just the hashes. So AAE has no way of determining which side is up-to-date. This may require significant change to resolve, so this is a design rather than implementation issue.
So if you have 1M keys in the vnode and 50,001 differences - I think it will fix 50K differences through random reads, and resolve the last difference by creating a bloom and folding over all the objects. As you would expect the differences would be randomly distributed across the segments of the AAE tree, it does seem plausible that the decision could be made earlier (perhaps after a sample of 1000 random reads), that the 5% limit is likely to be breached - and the bloom approach invoked.
The text was updated successfully, but these errors were encountered:
There are two potentially curious aspects of repair behaviour with AAE full-sync. This may be a false reading of the code but:
If two clusters are fund to be out of sync - with a large number of objects being more up-to-date on Cluster A than Cluster B, bi-directional full-syncing between A and B will lead to B sending all its out-of-date objects to A (perhaps before A sends all the up-to-date objects to B).
If there are significant number of differences between the two clusters, the source side will try and resolve this through building a bloom filter of mismatched keys and performing a full object fold over the vnode using the bloom filter as an input. However, it only produces this fold after it has already sent 5% of the keyspace through random reads.
For the first part this appears to be a consequence of not storing the clocks in the AAE store, just the hashes. So AAE has no way of determining which side is up-to-date. This may require significant change to resolve, so this is a design rather than implementation issue.
For the second part, this is where the bloom is generated -
https://github.com/basho/riak_repl/blob/develop/src/riak_repl_aae_source.erl#L379-L386. The 5% limit is defined here https://github.com/basho/riak_repl/blob/develop/src/riak_repl_aae_source.erl#L292.
The actual transition between using random reads and a fold is defined here: https://github.com/basho/riak_repl/blob/develop/src/riak_repl_aae_source.erl#L543-L582
So if you have 1M keys in the vnode and 50,001 differences - I think it will fix 50K differences through random reads, and resolve the last difference by creating a bloom and folding over all the objects. As you would expect the differences would be randomly distributed across the segments of the AAE tree, it does seem plausible that the decision could be made earlier (perhaps after a sample of 1000 random reads), that the 5% limit is likely to be breached - and the bloom approach invoked.
The text was updated successfully, but these errors were encountered: