-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typed Bucket Replication Support #490
Conversation
{Type, _B} -> | ||
PropsHash = riak_repl_util:get_bucket_props_hash(riak_core_bucket_type:get(Type)), | ||
lager:debug("typed bucket, setting meta data: Type:~p, HashProps:~p", [Type, PropsHash]), | ||
M1 = orddict:store(typed_bucket, true, M), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be nice to define typed_bucket
as a constant.
%% in the new format as obtained from riak_object:to_binary(v1, RObj). | ||
new_w1(B, K, BinObj) when is_binary(B), is_binary(K), is_binary(BinObj) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that the wire version must be increased to w2 and the version increased to denote the new encoding of the Bucket Type in the object. That way, any code that encounters it can either work or safely ignore it and log it. Perhaps even better would be add the bucket type as a property item in a list and say that for wire version 2, we now support meta data along with the bucket and key. This might avoid future wire changes and make the systems more flexible. Let me know if you want help with what that would look like in our encoded binary.
…or bucket type handling
@@ -53,7 +57,11 @@ | |||
%% Also see analagous binary_version() in riak_object, which is carried | |||
%% inside the wire format in "BinObj". w0 implies v0. w1 imples v1. | |||
-type(wire_version() :: w0 %% simple term_to_binary() legacy encoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment on what w0/w1 + w2 are?
Pull-request that adds necessary support to K/V: basho/riak_kv#771 |
Ran riak_test against this, and passes. Code looks good. 👍 . |
M = orddict:new(), | ||
case riak_object:bucket(Obj) of | ||
{Type, _B} -> | ||
PropsHash = riak_repl_util:get_bucket_props_hash(riak_core_bucket_type:get(Type)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just verifying: is it ok to assume here that the type always exists? get/1
may return undefined and get_bucket_props_hash/1
does not account for that case -- it will crash on the get_value/2
call here:
riak_repl/src/riak_repl_util.erl
Line 961 in 2dfb083
PB = [{Prop, proplists:get_value(Prop, Props)} || Prop <- ?BUCKET_TYPES_PROPS], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess it is safe to assume because in order to have the riak_object
the type must have existed/been activated in order to write it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my assumption.
@@ -52,8 +56,14 @@ | |||
%% the to_wire() and from_wire() operations, see riak_repl_util.erl. | |||
%% Also see analagous binary_version() in riak_object, which is carried | |||
%% inside the wire format in "BinObj". w0 implies v0. w1 imples v1. | |||
%% w2 still uses v1, but supports encoding the type information | |||
%% for bucket types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens on a cluster configured with v0 objects that is using bucket types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that even possible? A cluster that supports bucket types will be using v1 objects. But it is possible that we'll be sending v0 objects to a cluster because it doesn't support binary riak objects. Would we ever be sending bucket types to such a cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think bucket types and object version 1 are orthagonal features. If they aren't than this is not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just ran the bucket_types riak_test with object_format set to v0, and the test passed. This means that it is a valid permutation and must be handled by this PR. I will try to run the riak_test for this feature with v0 objects at some point, possibly not until Monday.
This should prevent merging until it is dealt with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not fully understand this scenario. What does it mean that a cluster is "configured with v0 objects"?
riak_repl2_rt:postcommit/1 hard-codes the encoding to w1, which appears to encode to v1. I guess what you mean is that the to_wire with w1 can handle v0 objects, but w2 cannot? Is that correct?
I'm generally OK with this PR, after reading it. My main concern is with all the lager:debugs in the critical path, for every single write. lager:debug is cheap but not free and it also will completely spam the logs if anyone ever turns on global debugging. I have not run the code, nor do I have time to until next week. |
…itical path; made version check forward compatible; renamed riak_repl2_rtsource_conn:write_object -> maybe_write_object
I'm out of time on this for now, but I'm against this feature merging right now. I'm not satisfied that the v0 objects with bucket types edgecase is covered and the riak_test has some serious flaws and omissions. I will try to continue working with the test next week. |
case make_donefun(BinObjsMeta, Me, Ref, Seq) of | ||
{DoneFun, BinObjs, Meta} -> | ||
case maybe_write_object(Meta) of | ||
true -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think indenting here is a little off.
So for posterity, the v0/v1 issue was a non-issue because the addition of bucket type metadata is independant of the wire encoding. v0 objects have their type extracted and placed in the metadata and are thus subject to all the normal rules. With that, and the addressing of issues in the test, I hereby provide my +1 to merge. |
Changes were squashed and merged: |
Overview
Riak 2.0 brings with it typed buckets -- the ability to create a type and associate a bucket with that type:
basho/riak#362
In order to successfully replicate a typed object from one cluster to another, the type definition must exist and be equal on both clusters. MDC replication must handle the possibility that the type of a given object exists on the replication source cluster, but not on the sink cluster (repl 2 terminology).
At this time, there is no facility to automatically create type across clusters -- for example, create a type that does not exist on the sink cluster -- to facilitate replication happening seemlessly. This could be an avenue to explore.
Additionally, replication must be extended to properly handle typed buckets. This work is assumed to come along with support for typed-bucket replication. "Default" type buckets (legacy buckets in Riak) continue to be automatically replicated as before.
This document discusses 2 possible options for implementing this in replication:
https://gist.github.com/bowrocker/7f0d9d6879493f1ac0e9
This PR implements the second option.
Type Checking on the Sink
This option allows typed buckets to be replicated to the sink cluster without any checking on the source. During the do_repl_put on the sink, the bucket is checked. If it has a type, that type is checked for existence, and if it does not exist, the object is dropped, and an error message is printed, and preferably an alarm of some sort is sent.
This option assumes that types should exist on both source and sink clusters, and that the non-existance of a type is an operational error that the user or CSE should take care of. Once the type is created, object of that type will replicate correctly.
Pros
Cons
Configuration
No new user configuration is introduced. If the sink is not configured with the type of an object being replicated, an error message is printed. It is the responsibility of the user to provision this type on the sink.
Mixed version replication
A new replication protocol version is introduced for bucket type support: {2,1}. The following is supported for each version:
Dependencies
This PR depends upon code in the following riak_kv branch to be merged, or the conditional post-commit hook present will not work:
https://github.com/basho/riak_kv/tree/jdb-conditional-postcommit
https://github.com/basho/riak_repl/tree/jdb-conditional-postcommit
Previous PR
#486
Riak Test
basho/riak_test#477