Typed Bucket Replication Support #490

bowrocker · 2013-12-16T19:55:01Z

Overview

Riak 2.0 brings with it typed buckets -- the ability to create a type and associate a bucket with that type:

In order to successfully replicate a typed object from one cluster to another, the type definition must exist and be equal on both clusters. MDC replication must handle the possibility that the type of a given object exists on the replication source cluster, but not on the sink cluster (repl 2 terminology).

At this time, there is no facility to automatically create type across clusters -- for example, create a type that does not exist on the sink cluster -- to facilitate replication happening seemlessly. This could be an avenue to explore.

Additionally, replication must be extended to properly handle typed buckets. This work is assumed to come along with support for typed-bucket replication. "Default" type buckets (legacy buckets in Riak) continue to be automatically replicated as before.

This document discusses 2 possible options for implementing this in replication:

https://gist.github.com/bowrocker/7f0d9d6879493f1ac0e9

This PR implements the second option.

Type Checking on the Sink

This option allows typed buckets to be replicated to the sink cluster without any checking on the source. During the do_repl_put on the sink, the bucket is checked. If it has a type, that type is checked for existence, and if it does not exist, the object is dropped, and an error message is printed, and preferably an alarm of some sort is sent.

This option assumes that types should exist on both source and sink clusters, and that the non-existance of a type is an operational error that the user or CSE should take care of. Once the type is created, object of that type will replicate correctly.

Pros

This is the simplest solution. Very few moving parts, very low impact on operations and performance.

Cons

Bandwidth, RTQ space, and processing is wasted when an object whose type does not exist on the sink cluster is replicated.
The assumption that all types should exist on both sides.

Configuration

No new user configuration is introduced. If the sink is not configured with the type of an object being replicated, an error message is printed. It is the responsibility of the user to provision this type on the sink.

Mixed version replication

A new replication protocol version is introduced for bucket type support: {2,1}. The following is supported for each version:

Source	Sink	Result
{3,0}	{3,0}	"default" and user defined typed buckets replicated
{3,0}	{2,0} or less	only "default" buckets replicated

Dependencies

This PR depends upon code in the following riak_kv branch to be merged, or the conditional post-commit hook present will not work:

https://github.com/basho/riak_kv/tree/jdb-conditional-postcommit
https://github.com/basho/riak_repl/tree/jdb-conditional-postcommit

Previous PR

#486

Riak Test

basho/riak_test#477

for 2.0

bookshelfdave · 2013-12-17T16:31:46Z

src/riak_repl2_rt.erl

+        {Type, _B} ->
+            PropsHash = riak_repl_util:get_bucket_props_hash(riak_core_bucket_type:get(Type)),
+            lager:debug("typed bucket, setting meta data: Type:~p, HashProps:~p", [Type, PropsHash]),
+            M1 = orddict:store(typed_bucket, true, M),


it might be nice to define typed_bucket as a constant.

buddhisthead · 2013-12-17T20:59:03Z

src/riak_repl_util.erl

 %%      in the new format as obtained from riak_object:to_binary(v1, RObj).
-new_w1(B, K, BinObj) when is_binary(B), is_binary(K), is_binary(BinObj) ->


I feel that the wire version must be increased to w2 and the version increased to denote the new encoding of the Bucket Type in the object. That way, any code that encounters it can either work or safely ignore it and log it. Perhaps even better would be add the bucket type as a property item in a list and say that for wire version 2, we now support meta data along with the bucket and key. This might avoid future wire changes and make the systems more flexible. Let me know if you want help with what that would look like in our encoded binary.

…or bucket type handling

bookshelfdave · 2013-12-18T14:11:57Z

include/riak_repl.hrl

@@ -53,7 +57,11 @@
 %% Also see analagous binary_version() in riak_object, which is carried
 %% inside the wire format in "BinObj". w0 implies v0. w1 imples v1.
 -type(wire_version() :: w0   %% simple term_to_binary() legacy encoding


can you add a comment on what w0/w1 + w2 are?

jtuple · 2013-12-19T03:34:32Z

Pull-request that adds necessary support to K/V: basho/riak_kv#771

lordnull · 2013-12-19T20:06:09Z

Ran riak_test against this, and passes. Code looks good. 👍 .

jrwest · 2013-12-19T21:43:37Z

src/riak_repl2_rt.erl

+    M = orddict:new(),
+    case riak_object:bucket(Obj) of
+        {Type, _B} ->
+            PropsHash = riak_repl_util:get_bucket_props_hash(riak_core_bucket_type:get(Type)),


just verifying: is it ok to assume here that the type always exists? get/1 may return undefined and get_bucket_props_hash/1 does not account for that case -- it will crash on the get_value/2 call here:

riak_repl/src/riak_repl_util.erl

Line 961 in 2dfb083

PB = [{Prop, proplists:get_value(Prop, Props)} || Prop <- ?BUCKET_TYPES_PROPS],

i guess it is safe to assume because in order to have the riak_object the type must have existed/been activated in order to write it.

That was my assumption.

Vagabond · 2013-12-20T22:05:04Z

include/riak_repl.hrl

@@ -52,8 +56,14 @@
 %% the to_wire() and from_wire() operations, see riak_repl_util.erl.
 %% Also see analagous binary_version() in riak_object, which is carried
 %% inside the wire format in "BinObj". w0 implies v0. w1 imples v1.
+%% w2 still uses v1, but supports encoding the type information
+%% for bucket types.


What happens on a cluster configured with v0 objects that is using bucket types?

Is that even possible? A cluster that supports bucket types will be using v1 objects. But it is possible that we'll be sending v0 objects to a cluster because it doesn't support binary riak objects. Would we ever be sending bucket types to such a cluster?

I think bucket types and object version 1 are orthagonal features. If they aren't than this is not a problem.

I just ran the bucket_types riak_test with object_format set to v0, and the test passed. This means that it is a valid permutation and must be handled by this PR. I will try to run the riak_test for this feature with v0 objects at some point, possibly not until Monday.

This should prevent merging until it is dealt with.

I do not fully understand this scenario. What does it mean that a cluster is "configured with v0 objects"?

riak_repl2_rt:postcommit/1 hard-codes the encoding to w1, which appears to encode to v1. I guess what you mean is that the to_wire with w1 can handle v0 objects, but w2 cannot? Is that correct?

Vagabond · 2013-12-20T23:21:13Z

I'm generally OK with this PR, after reading it. My main concern is with all the lager:debugs in the critical path, for every single write. lager:debug is cheap but not free and it also will completely spam the logs if anyone ever turns on global debugging.

I have not run the code, nor do I have time to until next week.

…itical path; made version check forward compatible; renamed riak_repl2_rtsource_conn:write_object -> maybe_write_object

Vagabond · 2013-12-21T06:56:59Z

I'm out of time on this for now, but I'm against this feature merging right now. I'm not satisfied that the v0 objects with bucket types edgecase is covered and the riak_test has some serious flaws and omissions.

I will try to continue working with the test next week.

Vagabond · 2013-12-23T19:07:41Z

src/riak_repl2_rtsink_conn.erl

+    case make_donefun(BinObjsMeta, Me, Ref, Seq) of
+        {DoneFun, BinObjs, Meta} ->
+            case maybe_write_object(Meta) of
+            true ->


I think indenting here is a little off.

Vagabond · 2013-12-23T19:34:22Z

So for posterity, the v0/v1 issue was a non-issue because the addition of bucket type metadata is independant of the wire encoding. v0 objects have their type extracted and placed in the metadata and are thus subject to all the normal rules.

With that, and the addressing of issues in the test, I hereby provide my +1 to merge.

bowrocker · 2013-12-23T20:35:33Z

Changes were squashed and merged:

c51c365

This is the initial checkin of support for replication of typed buckets

a5ed0a6

for 2.0

bowrocker mentioned this pull request Dec 16, 2013

Typed Bucket Replication Support #486

Closed

bookshelfdave reviewed Dec 17, 2013
View reviewed changes

many items from Dave Parfitt's review input

afb186f

buddhisthead reviewed Dec 17, 2013
View reviewed changes

bumped up protocol version to 3.0 and wire version to w2 to account f…

68b59e2

…or bucket type handling

bookshelfdave reviewed Dec 18, 2013
View reviewed changes

added comments around wire version atoms and removed a lager:info()

82aa218

fixed whitespace issues

2dfb083

bowrocker mentioned this pull request Dec 19, 2013

Test for Typed Bucket Replication basho/riak_test#477

Merged

jrwest reviewed Dec 19, 2013
View reviewed changes

bowrocker added 2 commits December 19, 2013 18:13

removed check for bucket 2-tuple where not necessary

cd39aca

simplified to_wire w2 type handling; corrected some indentation

b217209

Vagabond reviewed Dec 20, 2013
View reviewed changes

addressed review feedback from Andrew T: removed lager debugs from cr…

87d8835

…itical path; made version check forward compatible; renamed riak_repl2_rtsource_conn:write_object -> maybe_write_object

Vagabond reviewed Dec 23, 2013
View reviewed changes

fixed indent issue

6cd621e

bowrocker closed this Dec 23, 2013

bowrocker mentioned this pull request Jan 2, 2014

Add repl_bucket_types riak_test to Riak EE 2.0 test suite basho/giddyup#109

Merged

seancribbs deleted the feature/jra/repl_bucket_types_2 branch April 1, 2015 23:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typed Bucket Replication Support #490

Typed Bucket Replication Support #490

bowrocker commented Dec 16, 2013

bookshelfdave Dec 17, 2013

buddhisthead Dec 17, 2013

bookshelfdave Dec 18, 2013

jtuple commented Dec 19, 2013

lordnull commented Dec 19, 2013

jrwest Dec 19, 2013

jrwest Dec 19, 2013

bowrocker Dec 19, 2013

Vagabond Dec 20, 2013

buddhisthead Dec 20, 2013

Vagabond Dec 20, 2013

Vagabond Dec 21, 2013

bowrocker Dec 21, 2013

Vagabond commented Dec 20, 2013

Vagabond commented Dec 21, 2013

Vagabond Dec 23, 2013

Vagabond commented Dec 23, 2013

bowrocker commented Dec 23, 2013

		%% in the new format as obtained from riak_object:to_binary(v1, RObj).
		new_w1(B, K, BinObj) when is_binary(B), is_binary(K), is_binary(BinObj) ->

Typed Bucket Replication Support #490

Typed Bucket Replication Support #490

Conversation

bowrocker commented Dec 16, 2013

Overview

Type Checking on the Sink

Pros

Cons

Configuration

Mixed version replication

Dependencies

Previous PR

Riak Test

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtuple commented Dec 19, 2013

lordnull commented Dec 19, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Vagabond commented Dec 20, 2013

Vagabond commented Dec 21, 2013

Choose a reason for hiding this comment

Vagabond commented Dec 23, 2013

bowrocker commented Dec 23, 2013