-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Bucket Types to Riak #362
Comments
Very good addition IMHO. Next to easier bucket configuration, a extra dimension really helps in structuring equally named buckets for different scenario's (instead of prefixing it yourself). Don't see any large problems in client implementation though... :-) |
Is there any particular reason why link walking will be restricted to the |
I don't see any large problems in client implementation, either. I'm going to guess that the .proto implementation will include an optional field for Barring a sudden introduction of new PBC methods, this looks good to me. |
@gideondk @peschkaj great! glad to hear you guys won't find it too be too much of a pain. @brunogirin There is no technical reason why it cannot be added. We may do so in a future release depending on demand but for the upcoming release it probably won't make it in. For existing link walking users nothing will break, which I think is the most important thing. As usual an early community contribution is always welcome and I'll be happy to squeeze in a review where I can. |
These don't really feel like types to me - more like "bucket groups". Separately, there seem to be some optional settings you can put in for all buckets that are in one group. I don't know if I'm just bikeshedding here - but I feel like you could just allow for some "/" characters in a bucket - a sort of simple bucket hierarchy - and get most of what you need. Buckets that aren't in any group - (normal-looking Riak URL's) would maybe stay the same; and grouped buckets would just have another slash in the bucket part of the URL. e.g.
would be a GET for an object in a classic 'default' bucket
would be a GET for an object in a grouped one. I'm just looking at this from the point of someone who doesn't know the internals of Riak at all - so there could be all kinds of terrible things wrong with my counterproposal. And ultimately I don't think it matters too much. But I do like the fact that the URL's I'm fetching things from stay looking pretty much the same, even if I use the new features. |
@uberbrady I think it's more constructive to think of them outside the context of the HTTP API. It clouds the issue. |
|
@uberbrady one thing I didn't realize upon first read is your version used the already "old" API and the RFC uses the "newer" APIs. The APIs you are referring to are not yet deprecated within Riak but they are also not commonly extended (for example). The effect will be the same and the API you refer to will continue to work for at-least buckets in the default type (it may be extended as well depending on time, further discussion). |
We have hit one of the problems described with the current design which is: we have some data where "last write wins" is fine and other data where "allow_multi: true" is the right way to handle things. So I support this proposal because it sounds like we can easily share the riak infrastructure even though we might want mixed behaviors about concurrent writes within the same cluster. |
@jrwest Thanks for the answer. If there is no technical blocker, that's brilliant and it makes complete sense to me that you would want to limit the scope of a first delivery. From a point of view of what it would solve for me, it would make the multi backend concept more usable by being able to configure one backend for one bucket type and a different backend for another bucket type. I also like @mrallen1 's use case. |
I'v been doing some minor testing and have some questions:
|
Awesome to see you taking things for a spin! My comments are below.
As mentioned elsewhere,
Good catch. We are aware not all validation is completely implemented,
You should be able to store your own properties. The validation does not |
@jrwest the custom attribute is limited by |
@lafka ah right. There is nothing stopping you from using internal APIs but the command-line parsing code will prevent it. Although we may not stay w/ JSON [1] for the [1] #424 |
Integrate the bucket types functionality. basho/riak#362 Overall ------- Bucket types are the future of namespacing and property creation in Riak. They allow efficient storage of "bucket properties" outside of the Ring and 2-level namespacing of `Type` and `Name`. Essentially the bucket type can now be either a lone `binary()` (legacy) or a 2-tuple of `{Type :: binary(), Name :: binary()}`. Internally, when the legacy version is encountered it is considered to live under the `default` bucket type. For example the bucket `<<"my_bucket">>` would become `{<<"default">>, <<"my_bucket">>}`. Up until this point Yokozuna has used the bucket property `yz_index` to determine where to index data. This commit changes that in some ways. Legacy users will have existing data in buckets. Those buckets, in 2.0, will be considered to live under the default type as described above. For legacy buckets (the default type) Yokozuna will NOT respect the `yz_index` property. Rather it will act like Riak Search and use an index named the same as the bucket AS LONG AS the `search` property is set to true. Once users upgrade to 2.0 they should start opting into non-default bucket types since it is more efficient and newer features require the use of non-default type. For these types of buckets Yokozuna will still use the `yz_index` property. This property will typically be set at the type level but can also be overridden per name under a type. Yokozuna doesn't care. If that `{Type, Name}` has a `yz_index` property then it will be indexed. In summary: * Legacy buckets (default type) will act like Riak Search. The index used must have same name as bucket and `search` property must be true. This is used to users migrating from Riak Search. * All new users MUST use new style buckets made of Type + Name. In most cases the `yz_index` property will be set on the type and thus inherited by all names under it (many buckets to one index). The index DOES NOT have to have the same name. Handoff ------- Another important change revolves around handoff. Since Yokozuna leeches off the KV vnode it doesn't have control over handoff like it would if it were a true vnode. When a node joins KV can start shipping data before the bucket type data has been shipped over. In that case there will be no `yz_index` property and indexes will be missing. AAE would eventually catch this but it is poor form that node join would cause a degradation in harvest, especially in a quiescent cluster. To fix this Yokozuna needs more control over the lifecycle of the KV vnode. Yokozuna needs to hook into the `handoff_starting` stage and verify that the bucket types data is shipped before data handoff begins. This is accomplished by adding the `yz_kv:should_handoff` hook which is hard-coded in the KV vnode for now. This is important for removing the hack around index creation as well. Currently Yokozuna has a pretty horrible busy-wait hack in its index hook to make sure indexes are created on joining nodes before doing the first write of a handoff. This busy-wait blocks the KV vnode and is dangerous for vnode latency. In a future commit this busy-wait will be replaced with a check in this new handoff hook. Removal of Automatic AAE Tree Clearing -------------------------------------- Remove all functionality around automatic clearing of trees when adding or removing the `yz_index` property on a bucket with data. This was referred to as `sync_data` in the `yz_events` module. Also called "flags" harking back to when Yokozuna had a one-to-one bucket-to-index mapping. The original intention was that adding an index to a bucket with data should clear the AAE trees so that exchanges would start repairing missing indexes. If setting the index property to the tombstone value (removal) then a) data for that bucket should be purged from the index and b) AAE trees should be cleared. After much thought I think this implicit behavior hurts more than helps. Actions like clearing all AAE trees can be very expensive. It will not be obvious to all users that adding or changing `yz_index` could cause expensive operations to occur. For example, clearing the AAE trees for a database with billions or trillions of objects will be expensive to rebuild. Rather than relying on AAE a more direct operation could be offered that allows the user to re-index a bucket or subset of data. When removing an index it makes more sense to let the user delete the index entirely rather than do an implicit delete-by-query which is doing a bunch of extra work for a index that is going to be deleted anyways. Misc Changes ------------ * Update all tests to work with bucket types. * Update Basho Bench driver to work with bucket types. * Make map-reduce extraction more efficient. This is the ugly hack found in `yokozuna:positions`.
Integrate the bucket types functionality. basho/riak#362 Overall ------- Bucket types are the future of namespacing and property creation in Riak. They allow efficient storage of "bucket properties" outside of the Ring and 2-level namespacing of `Type` and `Name`. Essentially the bucket type can now be either a lone `binary()` (legacy) or a 2-tuple of `{Type :: binary(), Name :: binary()}`. Internally, when the legacy version is encountered it is considered to live under the `default` bucket type. For example the bucket `<<"my_bucket">>` would become `{<<"default">>, <<"my_bucket">>}`. Up until this point Yokozuna has used the bucket property `yz_index` to determine where to index data. This commit changes that in some ways. Legacy users will have existing data in buckets. Those buckets, in 2.0, will be considered to live under the default type as described above. For legacy buckets (the default type) Yokozuna will NOT respect the `yz_index` property. Rather it will act like Riak Search and use an index named the same as the bucket AS LONG AS the `search` property is set to true. Once users upgrade to 2.0 they should start opting into non-default bucket types since it is more efficient and newer features require the use of non-default type. For these types of buckets Yokozuna will still use the `yz_index` property. This property will typically be set at the type level but can also be overridden per name under a type. Yokozuna doesn't care. If that `{Type, Name}` has a `yz_index` property then it will be indexed. In summary: * Legacy buckets (default type) will act like Riak Search. The index used must have same name as bucket and `search` property must be true. This is used to users migrating from Riak Search. * All new users MUST use new style buckets made of Type + Name. In most cases the `yz_index` property will be set on the type and thus inherited by all names under it (many buckets to one index). The index DOES NOT have to have the same name. Handoff ------- Another important change revolves around handoff. Since Yokozuna leeches off the KV vnode it doesn't have control over handoff like it would if it were a true vnode. When a node joins KV can start shipping data before the bucket type data has been shipped over. In that case there will be no `yz_index` property and indexes will be missing. AAE would eventually catch this but it is poor form that node join would cause a degradation in harvest, especially in a quiescent cluster. To fix this Yokozuna needs more control over the lifecycle of the KV vnode. Yokozuna needs to hook into the `handoff_starting` stage and verify that the bucket types data is shipped before data handoff begins. This is accomplished by adding the `yz_kv:should_handoff` hook which is hard-coded in the KV vnode for now. This is important for removing the hack around index creation as well. Currently Yokozuna has a pretty horrible busy-wait hack in its index hook to make sure indexes are created on joining nodes before doing the first write of a handoff. This busy-wait blocks the KV vnode and is dangerous for vnode latency. In a future commit this busy-wait will be replaced with a check in this new handoff hook. Removal of Automatic AAE Tree Clearing -------------------------------------- Remove all functionality around automatic clearing of trees when adding or removing the `yz_index` property on a bucket with data. This was referred to as `sync_data` in the `yz_events` module. Also called "flags" harking back to when Yokozuna had a one-to-one bucket-to-index mapping. The original intention was that adding an index to a bucket with data should clear the AAE trees so that exchanges would start repairing missing indexes. If setting the index property to the tombstone value (removal) then a) data for that bucket should be purged from the index and b) AAE trees should be cleared. After much thought I think this implicit behavior hurts more than helps. Actions like clearing all AAE trees can be very expensive. It will not be obvious to all users that adding or changing `yz_index` could cause expensive operations to occur. For example, clearing the AAE trees for a database with billions or trillions of objects will be expensive to rebuild. Rather than relying on AAE a more direct operation could be offered that allows the user to re-index a bucket or subset of data. When removing an index it makes more sense to let the user delete the index entirely rather than do an implicit delete-by-query which is doing a bunch of extra work for a index that is going to be deleted anyways. Misc Changes ------------ * Update all tests to work with bucket types. * Update Basho Bench driver to work with bucket types. * Make map-reduce extraction more efficient. This is the ugly hack found in `yokozuna:positions`.
Like buckets, we must teach core this because a few subsystems use them. Bucket Types provide a method for grouping buckets logically (see basho/riak#362).
Update locked deps and fix reltool.config for 2.1.2 rc6
Today, keys in Riak are made up of two parts: the bucket they belong to and a unique identifier within that bucket. Buckets act as a namespace and allow for similar keys to be grouped. In addition, they provide a means of configuring how Riak treats that data.
In the next release of Riak several new features will take advantage of this namespacing. In discussing these features, it has become apparent that to properly support them a change to Riak's key structure must be made. The change is to introduce another layer of namespacing called the "Bucket Type". With Bucket Types each key is now made up of three parts: the bucket type it belongs to, a unique bucket within that type, and a unique identifier within that bucket.
Why?
The primary driver for adding Bucket Types is the introduction of features like security and strong consistency which may wish to deal with groups of buckets (see the Security RFC's wildcard
proposal). Without Bucket Types these features must rely using special prefixes in Bucket names. Since Bucket names have never been restricted this has undesireable edge cases. By introducing a new layer of namespacing these features can refer to groups of Buckets by Bucket Type.
There is an additional user-facing benefit of Bucket Types. In Riak, currently, users have the choice of either: a) conforming all buckets to the default bucket properties (stored in app.config) or b) setting many custom properties for buckets, which is known to scale poorly. This is especially frustrating if the user has buckets that
fit the default properties and another set that all use a different set of properties. Using Bucket Types, users will be able to define groups of buckets that share the same properties and only store information about each Bucket Type instead of individual buckets. In addition to internal changes made in Riak to help this issue, this will scale much better.
Goals and Requirements
The Default Bucket Type
Internally, Riak will be changed to be aware of Bucket Types as necessary. However, they are still an opt-in feature and as such existing data (and new data written to existing buckets) must be handled appropriately.
To allow existing data to live in the world of Bucket Types, all existing buckets are assigned to the
default
type. Riak's existing APIs will inject and strip the type information where necessary.To opt-in, a user must write data to in a non-default bucket type using new APIs. Unfortunately, this means additional work for client developers but since Riak will continue to happily work with the old APIs it is not necessary for clients to be updated before the release.
APIs
As mentioned previously, no changes wil be made to existing APIs and in the case of protocol buffers all fields added will be optional.
Create/Update/Read Bucket Types
HTTP endpoints and PB messages will be added to create, update and fetch the properties associated with a Bucket Type. In HTTP, this might look like:
Object/Key API
The HTTP Fetch/Store/Delete API will be mirrored to take the Bucket Type. This might look like:
The correspoding protocol buffers messages will have an
optional
Bucket Type field added.2i
The HTTP 2i API will be mirrored to take the Bucket Type. This might look like:
The corresponding protocol buffers messages will have an
optional
Bucket Type field added.Link Walking
Link walking will only be supported for data stored within the
default
bucket type. No API changes will be made.MapReduce
MapReduce will be extended to take inputs and queries in buckets other than the default type.
Datatypes API
How the counters API will be extended is still undecided since work is ongoing to extend the API to support other datatypes. Of course the existing API will continue to work as promised by this RFC.
Upgrading and Downgrading
Upgrading is handled naturally by the treatment of all existing data as a member of the
default
type.Downgrading, however, is a bit more complicated. If a user has not opted-in by using the new APIs (or only writing to the
default
bucket type) then downgrading stands as it does for all versions of Riak.For users that have opted in to the new features downgrading becomes more painful because older versions of Riak will not understand data stored in Bucket Types other than
default
. A user should only opt-in after upgrading to the new release and using the existing APIs until satisified that the upgrade does not harm the application. Rolling back to a version that does not support bucket types will require removing (or moving to the side) all data stored in the non-default type.The text was updated successfully, but these errors were encountered: