-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typed Bucket Replication Support #486
Conversation
for 2.0 changed bucket type checking to use meta data to send Bucket type properties and compare fixed compile error added backwardly compatible meta-data handling took out commented line
case riak_object:bucket(Obj) of | ||
{Type, _B} -> | ||
AllProps = riak_core_bucket_type:get(Type), | ||
PropsHash = erlang:phash2(proplists:delete(claimant, AllProps)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably why long term its better to have a "whitelist" of properties we include in the hash rather than a "blacklist" of ones we don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed -- we should discuss once we have an idea of what those properties are, then we get do it that way, I'd greatly prefer that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we sort of need to discuss soon don't we :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but we can probably use what's here for now, then open an issue to fix it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well i guess it depends. its getting kinda of late for the open an issue thing (that basically means its a bug for 2.0 and a fix in 2.0.1 or 2.1). One immediate problem I see is if a user has properties that are only meaninful to them -- bucket (type) properties support arbitrary key/value pairs. By not limiting this list we open ourselves to all sorts of weirdness there.
I don't think it would be too bad to gather up a list of things that we should check. Previously repl did no checking iirc so its kind of hard for us to get it wrong isn't it? Off the top of my head the ones that probably matter are:
consistent
datatype
n_val
allow_mult
last_write_wins
Those are the ones I can think of that affect how data is stored. We shouldn't require that users have the same default w
or r
values on both sides, etc.
I also wonder if we should make the hashes of the properties exposed to the user so that when things aren't replicating they can check whether or not they in fact match or not on both sides. |
Closing -- new PR here: |
Overview
Riak 2.0 brings with it typed buckets -- the ability to create a type and associate a bucket with that type:
basho/riak#362
In order to successfully replicate a typed object from one cluster to another, the type definition must exist and be equal on both clusters. MDC replication must handle the possibility that the type of a given object exists on the replication source cluster, but not on the sink cluster (repl 2 terminology).
At this time, there is no facility to automatically create type across clusters -- for example, create a type that does not exist on the sink cluster -- to facilitate replication happening seemlessly. This could be an avenue to explore.
Additionally, replication must be extended to properly handle typed buckets. This work is assumed to come along with support for typed-bucket replication. "Default" type buckets (legacy buckets in Riak) continue to be automatically replicated as before.
This document discusses 2 possible options for implementing this in replication:
https://gist.github.com/bowrocker/7f0d9d6879493f1ac0e9
This PR implements the second option.
Type Checking on the Sink
This option allows typed buckets to be replicated to the sink cluster without any checking on the source. During the do_repl_put on the sink, the bucket is checked. If it has a type, that type is checked for existence, and if it does not exist, the object is dropped, and an error message is printed, and preferably an alarm of some sort is sent.
This option assumes that types should exist on both source and sink clusters, and that the non-existance of a type is an operational error that the user or CSE should take care of. Once the type is created, object of that type will replicate correctly.
Pros
Cons
Configuration
No new user configuration is introduced. If the sink is not configured with the type of an object being replicated, an error message is printed. It is the responsibility of the user to provision this type on the sink.
Mixed version replication
A new replication protocol version is introduced for bucket type support: {2,1}. The following is supported for each version:
Dependencies
This PR depends upon code in the following riak_kv branch to be merged, or the conditional post-commit hook present will not work:
https://github.com/basho/riak_repl/tree/jdb-conditional-postcommit
Original PR
#477