Add Bucket Types to Riak #362

jrwest · 2013-08-12T18:52:33Z

Today, keys in Riak are made up of two parts: the bucket they belong to and a unique identifier within that bucket. Buckets act as a namespace and allow for similar keys to be grouped. In addition, they provide a means of configuring how Riak treats that data.

In the next release of Riak several new features will take advantage of this namespacing. In discussing these features, it has become apparent that to properly support them a change to Riak's key structure must be made. The change is to introduce another layer of namespacing called the "Bucket Type". With Bucket Types each key is now made up of three parts: the bucket type it belongs to, a unique bucket within that type, and a unique identifier within that bucket.

Why?

The primary driver for adding Bucket Types is the introduction of features like security and strong consistency which may wish to deal with groups of buckets (see the Security RFC's wildcard
proposal). Without Bucket Types these features must rely using special prefixes in Bucket names. Since Bucket names have never been restricted this has undesireable edge cases. By introducing a new layer of namespacing these features can refer to groups of Buckets by Bucket Type.

There is an additional user-facing benefit of Bucket Types. In Riak, currently, users have the choice of either: a) conforming all buckets to the default bucket properties (stored in app.config) or b) setting many custom properties for buckets, which is known to scale poorly. This is especially frustrating if the user has buckets that
fit the default properties and another set that all use a different set of properties. Using Bucket Types, users will be able to define groups of buckets that share the same properties and only store information about each Bucket Type instead of individual buckets. In addition to internal changes made in Riak to help this issue, this will scale much better.

Goals and Requirements

Opt-In: All changes must not affect old APIs. For existing bucket (with or without custom properties) and data stored within them access and storage will remain the same.
Zero data migration on upgrade (or downgrade if the user has not stored any data in a Bucket Type other than the default type).
Properties can be set on Bucket Types. Buckets within that type are "inherited" by the Bucket. Any properties set on the Bucket override those set in the Bucket Type.
Unlike Buckets, Bucket Types must be explicitly created. If a Bucket Type does not exist requests for data stored in that type will be rejected. This makes listing types quick and painless (a common gripe with Buckets).

The Default Bucket Type

Internally, Riak will be changed to be aware of Bucket Types as necessary. However, they are still an opt-in feature and as such existing data (and new data written to existing buckets) must be handled appropriately.

To allow existing data to live in the world of Bucket Types, all existing buckets are assigned to the default type. Riak's existing APIs will inject and strip the type information where necessary.

To opt-in, a user must write data to in a non-default bucket type using new APIs. Unfortunately, this means additional work for client developers but since Riak will continue to happily work with the old APIs it is not necessary for clients to be updated before the release.

APIs

As mentioned previously, no changes wil be made to existing APIs and in the case of protocol buffers all fields added will be optional.

Create/Update/Read Bucket Types

HTTP endpoints and PB messages will be added to create, update and fetch the properties associated with a Bucket Type. In HTTP, this might look like:

POST /types/<type>/props
PUT  /types/<type>/props
GET /types/<type>/props

Object/Key API

The HTTP Fetch/Store/Delete API will be mirrored to take the Bucket Type. This might look like:

GET /types/<type>/buckets/<bucket>/keys/<key>
PUT /types/<type>/buckets/<bucket/keys/<key>
..etc..

The correspoding protocol buffers messages will have an optional Bucket Type field added.

2i

The HTTP 2i API will be mirrored to take the Bucket Type. This might look like:

GET /types/<type>/buckets/<buckets>/index/<index>/...

The corresponding protocol buffers messages will have an optional Bucket Type field added.

Link Walking

Link walking will only be supported for data stored within the default bucket type. No API changes will be made.

MapReduce

MapReduce will be extended to take inputs and queries in buckets other than the default type.

Datatypes API

How the counters API will be extended is still undecided since work is ongoing to extend the API to support other datatypes. Of course the existing API will continue to work as promised by this RFC.

Upgrading and Downgrading

Upgrading is handled naturally by the treatment of all existing data as a member of the default type.

Downgrading, however, is a bit more complicated. If a user has not opted-in by using the new APIs (or only writing to the default bucket type) then downgrading stands as it does for all versions of Riak.

For users that have opted in to the new features downgrading becomes more painful because older versions of Riak will not understand data stored in Bucket Types other than default. A user should only opt-in after upgrading to the new release and using the existing APIs until satisified that the upgrade does not harm the application. Rolling back to a version that does not support bucket types will require removing (or moving to the side) all data stored in the non-default type.

The text was updated successfully, but these errors were encountered:

gideondk · 2013-08-12T19:28:00Z

Very good addition IMHO. Next to easier bucket configuration, a extra dimension really helps in structuring equally named buckets for different scenario's (instead of prefixing it yourself).

Don't see any large problems in client implementation though... :-)

brunogirin · 2013-08-12T19:57:47Z

Is there any particular reason why link walking will be restricted to the default bucket type?

peschkaj · 2013-08-12T20:13:05Z

I don't see any large problems in client implementation, either. I'm going to guess that the .proto implementation will include an optional field for type which means sane protocol buffers clients shouldn't have to make any changes or do any version detection dance.

Barring a sudden introduction of new PBC methods, this looks good to me.

jrwest · 2013-08-12T21:50:17Z

@gideondk @peschkaj great! glad to hear you guys won't find it too be too much of a pain.

@brunogirin There is no technical reason why it cannot be added. We may do so in a future release depending on demand but for the upcoming release it probably won't make it in. For existing link walking users nothing will break, which I think is the most important thing. As usual an early community contribution is always welcome and I'll be happy to squeeze in a review where I can.

uberbrady · 2013-08-13T16:52:02Z

These don't really feel like types to me - more like "bucket groups". Separately, there seem to be some optional settings you can put in for all buckets that are in one group.

I don't know if I'm just bikeshedding here - but I feel like you could just allow for some "/" characters in a bucket - a sort of simple bucket hierarchy - and get most of what you need.

Buckets that aren't in any group - (normal-looking Riak URL's) would maybe stay the same; and grouped buckets would just have another slash in the bucket part of the URL.

e.g.

GET /riak/bucket/key

would be a GET for an object in a classic 'default' bucket

GET /riak/mygroup/bucket/key

would be a GET for an object in a grouped one.

I'm just looking at this from the point of someone who doesn't know the internals of Riak at all - so there could be all kinds of terrible things wrong with my counterproposal.

And ultimately I don't think it matters too much. But I do like the fact that the URL's I'm fetching things from stay looking pretty much the same, even if I use the new features.

seancribbs · 2013-08-13T17:05:25Z

@uberbrady I think it's more constructive to think of them outside the context of the HTTP API. It clouds the issue.

jrwest · 2013-08-13T17:06:34Z

These don't really feel like types to me - more like "bucket groups".
Separately, there seem to be some optional settings you can put in for all
buckets that are in one group.

We considered the "group" as well and ultimately the decision was between
that and "type". In the end we chose "type" because more of us in the room
felt it was a better name. One fear was that from "group" one may or may
not infer that data may be grouped (either on disk or within nodes of the
cluster) which isn't the case. Ultimately, I think "type" does a good job
of conveying that these buckets have a set of settings common between. We
also considered "family" (which has too much resemblance to a similar name
in another data model) and "class" (which has too many parallels to
object-oriented programming that we decided against it).

I don't know if I'm just bikeshedding here - but I feel like you could
just allow for some "/" characters in a bucket - a sort of simple bucket
hierarchy - and get most of what you need.

This is the problem Bucket Types set out to solve. Since there are no
restrictions on bucket names if we decide to bless "prefix/..." and an
application already uses it we have a problem. The proposed security
feature's wildcard support is another area where this clashes. It would be
possible to provide a way to audit and migrate these buckets but that would
probably be slow and not ops-friendly.

Buckets that aren't in any group - (normal-looking Riak URL's) would maybe
stay the same; and grouped buckets would just have another slash in the
bucket part of the URL.

e.g.

GET /riak/bucket/key

would be a GET for an object in a classic 'default' bucket

GET /riak/mygroup/bucket/key

would be a GET for an object in a grouped one.

This is what we plan to do exactly with bucket types.

I'm just looking at this from the point of someone who doesn't know the
internals of Riak at all - so there could be all kinds of terrible things
wrong with my counterproposal.

And ultimately I don't think it matters too much. But I do like the fact
that the URL's I'm fetching things from stay looking pretty much the same,
even if I use the new features.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/362#issuecomment-22579681
.

jrwest · 2013-08-13T17:12:04Z

@uberbrady one thing I didn't realize upon first read is your version used the already "old" API and the RFC uses the "newer" APIs. The APIs you are referring to are not yet deprecated within Riak but they are also not commonly extended (for example). The effect will be the same and the API you refer to will continue to work for at-least buckets in the default type (it may be extended as well depending on time, further discussion).

jadeallenx · 2013-08-14T20:37:37Z

We have hit one of the problems described with the current design which is: we have some data where "last write wins" is fine and other data where "allow_multi: true" is the right way to handle things. So I support this proposal because it sounds like we can easily share the riak infrastructure even though we might want mixed behaviors about concurrent writes within the same cluster.

brunogirin · 2013-08-15T09:00:46Z

@jrwest Thanks for the answer. If there is no technical blocker, that's brilliant and it makes complete sense to me that you would want to limit the scope of a first delivery.

From a point of view of what it would solve for me, it would make the multi backend concept more usable by being able to configure one backend for one bucket type and a different backend for another bucket type. I also like @mrallen1 's use case.

lafka · 2013-10-26T07:37:03Z

I'v been doing some minor testing and have some questions:

Currently there is no way to delete a bucket type, is the only way around this to delete the datadir?
Calling riak_core_bucket_type:reset/1 will only set the default values for some keys, for instance yz_index is not altered. Should reset be equivalent of calling
When using riak-admin, should some bucket properties be read only? (like active should not be changeable through bucket-type update). It's already there for datatype.
Is there a way to store custom custom attributes in the bucket properties? i.e. I would like to store information about the structure of the keys in this bucket.

jrwest · 2013-10-26T07:53:18Z

Awesome to see you taking things for a spin! My comments are below.

I'v been doing some minor testing and have some questions:

Currently there is no way to delete a bucket type, is the only way
around this to delete the datadir?

There is not currently a way to delete a bucket type similar to buckets in
Riak. This is not something we plan to address for Riak 2.0 (but we may in
the future).

Calling riak_core_bucket_type:reset/1 will only set the default
values for some keys, for instance yz_index is not altered. Should
reset be equivalent of calling

As mentioned elsewhere, reset is not ready for use and the example you
found is one reason why. This will be addressed or reset will be removed
before Riak 2.0.

When using riak-admin, should some bucket properties be read only?
(like active should not be changeable through bucket-type update).
It's already there for datatype.

Good catch. We are aware not all validation is completely implemented,
however there was no issue tracking these specific cases so I opened one:
basho/riak_core#442

Is there a way to store custom custom attributes in the bucket
properties? i.e. I would like to store information about the structure of
the keys in this bucket.

You should be able to store your own properties. The validation does not
take into account custom properties and should not reject them. However,
you may have problems using the same property names Riak uses (e.g.
datatype or n_val).

lafka · 2013-10-26T09:58:29Z

@jrwest the custom attribute is limited by list_to_existing_atom/1 in riak_kv_wm_utils:erlify_bucket_prop. The same conversion is used in HTTP API, so might not be feasible to use list_to_atom/1.

jrwest · 2013-10-26T20:11:34Z

@lafka ah right. There is nothing stopping you from using internal APIs but the command-line parsing code will prevent it. Although we may not stay w/ JSON [1] for the riak-admin bucket-type commands I'm not sure if this restriction will be lifted. I haven't run into this issue myself as a user of Riak but I imagine there are existing workarounds (loading a module w/ your atoms?) since buckets have had the same restriction when setting them via the API.

[1] #424

Integrate the bucket types functionality. basho/riak#362 Overall ------- Bucket types are the future of namespacing and property creation in Riak. They allow efficient storage of "bucket properties" outside of the Ring and 2-level namespacing of `Type` and `Name`. Essentially the bucket type can now be either a lone `binary()` (legacy) or a 2-tuple of `{Type :: binary(), Name :: binary()}`. Internally, when the legacy version is encountered it is considered to live under the `default` bucket type. For example the bucket `<<"my_bucket">>` would become `{<<"default">>, <<"my_bucket">>}`. Up until this point Yokozuna has used the bucket property `yz_index` to determine where to index data. This commit changes that in some ways. Legacy users will have existing data in buckets. Those buckets, in 2.0, will be considered to live under the default type as described above. For legacy buckets (the default type) Yokozuna will NOT respect the `yz_index` property. Rather it will act like Riak Search and use an index named the same as the bucket AS LONG AS the `search` property is set to true. Once users upgrade to 2.0 they should start opting into non-default bucket types since it is more efficient and newer features require the use of non-default type. For these types of buckets Yokozuna will still use the `yz_index` property. This property will typically be set at the type level but can also be overridden per name under a type. Yokozuna doesn't care. If that `{Type, Name}` has a `yz_index` property then it will be indexed. In summary: * Legacy buckets (default type) will act like Riak Search. The index used must have same name as bucket and `search` property must be true. This is used to users migrating from Riak Search. * All new users MUST use new style buckets made of Type + Name. In most cases the `yz_index` property will be set on the type and thus inherited by all names under it (many buckets to one index). The index DOES NOT have to have the same name. Handoff ------- Another important change revolves around handoff. Since Yokozuna leeches off the KV vnode it doesn't have control over handoff like it would if it were a true vnode. When a node joins KV can start shipping data before the bucket type data has been shipped over. In that case there will be no `yz_index` property and indexes will be missing. AAE would eventually catch this but it is poor form that node join would cause a degradation in harvest, especially in a quiescent cluster. To fix this Yokozuna needs more control over the lifecycle of the KV vnode. Yokozuna needs to hook into the `handoff_starting` stage and verify that the bucket types data is shipped before data handoff begins. This is accomplished by adding the `yz_kv:should_handoff` hook which is hard-coded in the KV vnode for now. This is important for removing the hack around index creation as well. Currently Yokozuna has a pretty horrible busy-wait hack in its index hook to make sure indexes are created on joining nodes before doing the first write of a handoff. This busy-wait blocks the KV vnode and is dangerous for vnode latency. In a future commit this busy-wait will be replaced with a check in this new handoff hook. Removal of Automatic AAE Tree Clearing -------------------------------------- Remove all functionality around automatic clearing of trees when adding or removing the `yz_index` property on a bucket with data. This was referred to as `sync_data` in the `yz_events` module. Also called "flags" harking back to when Yokozuna had a one-to-one bucket-to-index mapping. The original intention was that adding an index to a bucket with data should clear the AAE trees so that exchanges would start repairing missing indexes. If setting the index property to the tombstone value (removal) then a) data for that bucket should be purged from the index and b) AAE trees should be cleared. After much thought I think this implicit behavior hurts more than helps. Actions like clearing all AAE trees can be very expensive. It will not be obvious to all users that adding or changing `yz_index` could cause expensive operations to occur. For example, clearing the AAE trees for a database with billions or trillions of objects will be expensive to rebuild. Rather than relying on AAE a more direct operation could be offered that allows the user to re-index a bucket or subset of data. When removing an index it makes more sense to let the user delete the index entirely rather than do an implicit delete-by-query which is doing a bunch of extra work for a index that is going to be deleted anyways. Misc Changes ------------ * Update all tests to work with bucket types. * Update Basho Bench driver to work with bucket types. * Make map-reduce extraction more efficient. This is the ugly hack found in `yokozuna:positions`.

Like buckets, we must teach core this because a few subsystems use them. Bucket Types provide a method for grouping buckets logically (see basho/riak#362).

Update locked deps and fix reltool.config for 2.1.2 rc6

jrwest mentioned this issue Aug 15, 2013

Teach riak_core Bucket Types basho/riak_core#357

Merged

bkerley mentioned this issue Aug 22, 2013

Nicer Bucket Type API basho/riak-ruby-client#112

Merged

rzezeski mentioned this issue Sep 20, 2013

Yokozuna and Bucket Types basho/yokozuna#176

Closed

seancribbs mentioned this issue Oct 9, 2013

Add bucket types support to HTTP. basho/riak_kv#694

Merged

broach mentioned this issue Oct 18, 2013

Add Bucket Type properties ops and itests basho/riak-java-client#313

Merged

Licenser pushed a commit to Kyorai/riak_core that referenced this issue Nov 15, 2013

Teach riak_core Bucket Types

b74d7d5

Like buckets, we must teach core this because a few subsystems use them. Bucket Types provide a method for grouping buckets logically (see basho/riak#362).

bowrocker mentioned this issue Dec 12, 2013

Typed Bucket Replication Support basho/riak_repl#477

Closed

daurnimator mentioned this issue Dec 13, 2013

Bucket type support. bakins/lua-resty-riak#13

Open

This was referenced Dec 13, 2013

Typed Bucket Replication Support basho/riak_repl#486

Closed

Typed Bucket Replication Support basho/riak_repl#490

Closed

jaredmorrow added this to the 2.0 milestone Mar 24, 2014

jaredmorrow closed this as completed Mar 24, 2014

rzezeski modified the milestones: 2.0-beta, 2.0 Mar 25, 2014

exell-christopher referenced this issue in basho/riak-java-client Sep 16, 2014

Removing BucketTypeProps Operations

0b9d668

ghost mentioned this issue Jan 27, 2015

Riak Links with Bucket Types #673

Closed

hmmr pushed a commit that referenced this issue Nov 8, 2016

Merge pull request #362 from basho/dr/update_locked_deps_for_2.1.2_rc6

72e8ec3

Update locked deps and fix reltool.config for 2.1.2 rc6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Bucket Types to Riak #362

Add Bucket Types to Riak #362

jrwest commented Aug 12, 2013

gideondk commented Aug 12, 2013

brunogirin commented Aug 12, 2013

peschkaj commented Aug 12, 2013

jrwest commented Aug 12, 2013

uberbrady commented Aug 13, 2013

seancribbs commented Aug 13, 2013

jrwest commented Aug 13, 2013

jrwest commented Aug 13, 2013

jadeallenx commented Aug 14, 2013

brunogirin commented Aug 15, 2013

lafka commented Oct 26, 2013

jrwest commented Oct 26, 2013

lafka commented Oct 26, 2013

jrwest commented Oct 26, 2013

Add Bucket Types to Riak #362

Add Bucket Types to Riak #362

Comments

jrwest commented Aug 12, 2013

Why?

Goals and Requirements

The Default Bucket Type

APIs

Create/Update/Read Bucket Types

Object/Key API

2i

Link Walking

MapReduce

Datatypes API

Upgrading and Downgrading

gideondk commented Aug 12, 2013

brunogirin commented Aug 12, 2013

peschkaj commented Aug 12, 2013

jrwest commented Aug 12, 2013

uberbrady commented Aug 13, 2013

seancribbs commented Aug 13, 2013

jrwest commented Aug 13, 2013

jrwest commented Aug 13, 2013

jadeallenx commented Aug 14, 2013

brunogirin commented Aug 15, 2013

lafka commented Oct 26, 2013

jrwest commented Oct 26, 2013

lafka commented Oct 26, 2013

jrwest commented Oct 26, 2013