Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typed Bucket Replication Support #490

Closed
wants to merge 9 commits into from

Conversation

bowrocker
Copy link
Contributor

Overview

Riak 2.0 brings with it typed buckets -- the ability to create a type and associate a bucket with that type:

basho/riak#362

In order to successfully replicate a typed object from one cluster to another, the type definition must exist and be equal on both clusters. MDC replication must handle the possibility that the type of a given object exists on the replication source cluster, but not on the sink cluster (repl 2 terminology).

At this time, there is no facility to automatically create type across clusters -- for example, create a type that does not exist on the sink cluster -- to facilitate replication happening seemlessly. This could be an avenue to explore.

Additionally, replication must be extended to properly handle typed buckets. This work is assumed to come along with support for typed-bucket replication. "Default" type buckets (legacy buckets in Riak) continue to be automatically replicated as before.

This document discusses 2 possible options for implementing this in replication:

https://gist.github.com/bowrocker/7f0d9d6879493f1ac0e9

This PR implements the second option.

Type Checking on the Sink

This option allows typed buckets to be replicated to the sink cluster without any checking on the source. During the do_repl_put on the sink, the bucket is checked. If it has a type, that type is checked for existence, and if it does not exist, the object is dropped, and an error message is printed, and preferably an alarm of some sort is sent.

This option assumes that types should exist on both source and sink clusters, and that the non-existance of a type is an operational error that the user or CSE should take care of. Once the type is created, object of that type will replicate correctly.

Pros
  • This is the simplest solution. Very few moving parts, very low impact on operations and performance.
Cons
  • Bandwidth, RTQ space, and processing is wasted when an object whose type does not exist on the sink cluster is replicated.
  • The assumption that all types should exist on both sides.

Configuration

No new user configuration is introduced. If the sink is not configured with the type of an object being replicated, an error message is printed. It is the responsibility of the user to provision this type on the sink.

Mixed version replication

A new replication protocol version is introduced for bucket type support: {2,1}. The following is supported for each version:

Source Sink Result
{3,0} {3,0} "default" and user defined typed buckets replicated
{3,0} {2,0} or less only "default" buckets replicated

Dependencies

This PR depends upon code in the following riak_kv branch to be merged, or the conditional post-commit hook present will not work:

https://github.com/basho/riak_kv/tree/jdb-conditional-postcommit
https://github.com/basho/riak_repl/tree/jdb-conditional-postcommit

Previous PR

#486

Riak Test

basho/riak_test#477

{Type, _B} ->
PropsHash = riak_repl_util:get_bucket_props_hash(riak_core_bucket_type:get(Type)),
lager:debug("typed bucket, setting meta data: Type:~p, HashProps:~p", [Type, PropsHash]),
M1 = orddict:store(typed_bucket, true, M),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be nice to define typed_bucket as a constant.

%% in the new format as obtained from riak_object:to_binary(v1, RObj).
new_w1(B, K, BinObj) when is_binary(B), is_binary(K), is_binary(BinObj) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that the wire version must be increased to w2 and the version increased to denote the new encoding of the Bucket Type in the object. That way, any code that encounters it can either work or safely ignore it and log it. Perhaps even better would be add the bucket type as a property item in a list and say that for wire version 2, we now support meta data along with the bucket and key. This might avoid future wire changes and make the systems more flexible. Let me know if you want help with what that would look like in our encoded binary.

@@ -53,7 +57,11 @@
%% Also see analagous binary_version() in riak_object, which is carried
%% inside the wire format in "BinObj". w0 implies v0. w1 imples v1.
-type(wire_version() :: w0 %% simple term_to_binary() legacy encoding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment on what w0/w1 + w2 are?

@jtuple
Copy link
Contributor

jtuple commented Dec 19, 2013

Pull-request that adds necessary support to K/V: basho/riak_kv#771

@lordnull
Copy link
Contributor

Ran riak_test against this, and passes. Code looks good. 👍 .

M = orddict:new(),
case riak_object:bucket(Obj) of
{Type, _B} ->
PropsHash = riak_repl_util:get_bucket_props_hash(riak_core_bucket_type:get(Type)),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just verifying: is it ok to assume here that the type always exists? get/1 may return undefined and get_bucket_props_hash/1 does not account for that case -- it will crash on the get_value/2 call here:

PB = [{Prop, proplists:get_value(Prop, Props)} || Prop <- ?BUCKET_TYPES_PROPS],

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess it is safe to assume because in order to have the riak_object the type must have existed/been activated in order to write it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my assumption.

@@ -52,8 +56,14 @@
%% the to_wire() and from_wire() operations, see riak_repl_util.erl.
%% Also see analagous binary_version() in riak_object, which is carried
%% inside the wire format in "BinObj". w0 implies v0. w1 imples v1.
%% w2 still uses v1, but supports encoding the type information
%% for bucket types.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens on a cluster configured with v0 objects that is using bucket types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that even possible? A cluster that supports bucket types will be using v1 objects. But it is possible that we'll be sending v0 objects to a cluster because it doesn't support binary riak objects. Would we ever be sending bucket types to such a cluster?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think bucket types and object version 1 are orthagonal features. If they aren't than this is not a problem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran the bucket_types riak_test with object_format set to v0, and the test passed. This means that it is a valid permutation and must be handled by this PR. I will try to run the riak_test for this feature with v0 objects at some point, possibly not until Monday.

This should prevent merging until it is dealt with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not fully understand this scenario. What does it mean that a cluster is "configured with v0 objects"?

riak_repl2_rt:postcommit/1 hard-codes the encoding to w1, which appears to encode to v1. I guess what you mean is that the to_wire with w1 can handle v0 objects, but w2 cannot? Is that correct?

@Vagabond
Copy link

I'm generally OK with this PR, after reading it. My main concern is with all the lager:debugs in the critical path, for every single write. lager:debug is cheap but not free and it also will completely spam the logs if anyone ever turns on global debugging.

I have not run the code, nor do I have time to until next week.

…itical path; made version check forward compatible; renamed riak_repl2_rtsource_conn:write_object -> maybe_write_object
@Vagabond
Copy link

I'm out of time on this for now, but I'm against this feature merging right now. I'm not satisfied that the v0 objects with bucket types edgecase is covered and the riak_test has some serious flaws and omissions.

I will try to continue working with the test next week.

case make_donefun(BinObjsMeta, Me, Ref, Seq) of
{DoneFun, BinObjs, Meta} ->
case maybe_write_object(Meta) of
true ->

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think indenting here is a little off.

@Vagabond
Copy link

So for posterity, the v0/v1 issue was a non-issue because the addition of bucket type metadata is independant of the wire encoding. v0 objects have their type extracted and placed in the metadata and are thus subject to all the normal rules.

With that, and the addressing of issues in the test, I hereby provide my +1 to merge.

@bowrocker
Copy link
Contributor Author

Changes were squashed and merged:

c51c365

@bowrocker bowrocker closed this Dec 23, 2013
@seancribbs seancribbs deleted the feature/jra/repl_bucket_types_2 branch April 1, 2015 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants