Skip to content
This repository has been archived by the owner on Mar 9, 2019. It is now read-only.

Nested Keys #56

Closed
3 tasks
benbjohnson opened this issue Feb 28, 2014 · 5 comments · Fixed by #127
Closed
3 tasks

Nested Keys #56

benbjohnson opened this issue Feb 28, 2014 · 5 comments · Fixed by #127
Assignees
Milestone

Comments

@benbjohnson
Copy link
Member

Overview

Add the ability to nest key/value data inside of keys. This is useful for building indexes and for certain types of nested data.

API

We'll need a function for retrieving a cursor to iterate over the nested values:

func (c *Cursor).Subcursor(key string) (*Cursor, error)

We'll also need functions to get/put/delete. These subkeys are essentially the same thing as buckets. Maybe we just return a Bucket:

func (b *Bucket) Sub(key string) *Bucket

Otherwise we'll have to create a bunch of methods on Bucket:

func (b *Bucket) SubGet(key []byte) []byte
func (b *Bucket) SubPut(key []byte, []byte) 
func (b *Bucket) SubDelete(key []byte, []byte) 

That's pretty ugly. I'm leaning toward just returning a sub-bucket.

Changes

  • Add a flag on the leaf node element to distinguish it as having nested data.
  • Add Subcursor() function to the Cursor().
  • Delete subpages for when deleting the parent key.

Caveats

Only support single-level subcursors. Not only will it make it less confusing but it means that Cursor's can reuse a single subcursor during iteration.

Subbuckets introduce a new error condition where you could try to do a normal Bucket.Get() on a nested key. Currently Get() doesn't return an error which I'd like to keep. I will probably just have it return nil. The other option is to panic.

@benbjohnson benbjohnson added this to the v0.2.0 milestone Feb 28, 2014
@benbjohnson benbjohnson self-assigned this Feb 28, 2014
@tv42
Copy link
Contributor

tv42 commented Mar 13, 2014

I'm not sure if this means fully hierarchical keys, or just moving from bucket+key to potentially bucket+key+subkey. Everything before "Caveats" seems to imply the former, but the "Only support single-level" phrase conflicts with that. Personally, I don't see a single, fixed, extra level of depth worth the effort. Complex uses will end up having to construct a compound key in a []byte anyway.

Fully hierarchical keys would be nice, assuming the API and internals can still remain clean. What if it's just Buckets all the way down?

// variadic so you can go multiple levels deep in one call.
// returns nil if bucket does not exist.
func (db *DB) Bucket(keys ...[]byte) *Bucket
func (db *DB) CreateBucket(key []byte) error
func (db *DB) CreateBucketIfNotExists(key []byte) error

// just like DB.Bucket, goes deeper, returns nil for non-existing buckets.
// calling Bucket on nil receivers is valid and returns nil.
func (b *Bucket) Bucket(keys ...[]byte) *Bucket
// gives error if b==nil.
// buckets and keys directly under this bucket are in separate namespaces; that is,
// a bucket can have both a sub-bucket and a data item with the same key.
func (b *Bucket) CreateBucket(key []byte) error
func (b *Bucket) CreateBucketIfNotExists(key []byte) error

// get a key inside this bucket; returns nil if b==nil or key is not found
func (b *Bucket) Get(key []byte) []byte
// walk the bucket; returns nil if b==nil
func (b *Bucket) Cursor() *Cursor

The variadic versions make it easier to use, and guaranteeing behavior for b==nil helps write error checks just at the end. For example, if I have a compound key <timestamp, userid, itemid, mumble>, these two would be interchangeable:

a := db.Bucket([]byte("widgets")).Bucket(timestamp).Bucket(userid).Bucket(itemid).Get(mumble)
b := db.Bucket([]byte("widgets"), timestamp, userid, itemid).Get(mumble)

So .Bucket() becomes a lot like cd, it's a reference deeper into the tree, and using it avoids repeating & re-walking the path there. I don't know much about the internals, if this is feasible at all.

And, well, then you could also support direct Get and Put with hierarchical keys. Using a Bucket is means of 1. avoiding the walking cost 2. often, staying within one transaction.

func (db *DB) Get(keys ...[]byte) []byte
func (b *Bucket) Get(keys ...[]byte) []byte

Also, I switched bucket keys to []byte in the examples. I feel like using both string and []byte for keys in one API misleading.

Anyway, appreciate all the work and please be careful with api growth.

P.S. I've been looking for a way to encode sql-style compound keys in a []byte while preserving sort order. Everything I've been able to find assumes that all variable length fields are nul-terminated, and thus do not contain all possible binary values; and quoting nuls is hard if you're trying to preserve sort order. Hierarchical keys would eliminate the need.

@benbjohnson
Copy link
Member Author

@tv42 I think Buckets all the way down is probably the way to go. I like the idea of variadic args too.

One of my concerns with infinitely-nestable buckets is cursor allocations. I wanted to be able to allocate a main cursor and a subcursor initially and just reuse them. Although as long as we can reuse the same cursor at each depth in the hierarchy then I'm fine with n-levels.

You bring up a good point about bucket names being bytes instead of strings. I'll probably have to change that.

@kardianos
Copy link
Contributor

I think []byte is more appropriate for keys due to the cost of copying []byte to string.

@tv42
Copy link
Contributor

tv42 commented Apr 7, 2014

I wrote a library that might be relevant: https://github.com/tv42/compound
not as in BoltDB using it, but because of the underlying trick.

It can combine multiple []byte into a single key, in a way that does not confuse "foo"+"bar" with "foob"+"ar", while preserving sort order.

The trick is: quote 0x00 as 0x00 0xFF, separate items with 0x00 0xNN where NN is any value other than FF. I use that space to store the data type, for extra safety.

So for Bolt, you could make the bucket (or key) space be nested, and join the []byte names with this logic. The API could look like the varargs stuff above.

It seems lmdb internals have a big separation between buckets and keys, not sure if blowing up the number of buckets would hurt too much. Basically, logically you have hierarchical keys:

val := tx.Get(a, b, c, d, e)

but where do you draw the boundary between "buckets" and "keys"? In my mind, the bucket separation is largely an artifice of not having this hierarchy in the first place, so it might not make sense to keep that separation, if the API has this power.. but then again, I don't know much that would change the implementation details.

@benbjohnson
Copy link
Member Author

@tv42 Thanks for the link to the library. I'll check it out tomorrow. Just as a heads up, I'm going to be implementing nested keys this week because we're looking to do Bolt integration and production-level testing next week and that's the last missing piece.

My plan right now is to utilize the leafPageElement.flags field to create two new leaf element types: an embedded bucket type and a regular bucket type. (The names of those types will probably change)

The embedded bucket will be used if the subbucket contents are small enough to fit inline on the leaf page. The regular bucket type will be used if the subbucket contents are large and then a root page pointer will be used just like the current bucket type.

I'll add some more details tomorrow. I gotta run.

benbjohnson added a commit to benbjohnson/bolt that referenced this issue Apr 7, 2014
This commit adds the ability to create buckets inside of other buckets.
It also replaces the buckets page with a root bucket.

Fixes boltdb#56.
benbjohnson added a commit to benbjohnson/bolt that referenced this issue Apr 8, 2014
This commit adds the ability to create buckets inside of other buckets.
It also replaces the buckets page with a root bucket.

Fixes boltdb#56.
benbjohnson added a commit to benbjohnson/bolt that referenced this issue Apr 9, 2014
This commit adds the ability to create buckets inside of other buckets.
It also replaces the buckets page with a root bucket.

Fixes boltdb#56.
benbjohnson added a commit to benbjohnson/bolt that referenced this issue Apr 9, 2014
This commit adds the ability to create buckets inside of other buckets.
It also replaces the buckets page with a root bucket.

Fixes boltdb#56.
benbjohnson added a commit to benbjohnson/bolt that referenced this issue Apr 10, 2014
This commit adds the ability to create buckets inside of other buckets.
It also replaces the buckets page with a root bucket.

Fixes boltdb#56.
benbjohnson added a commit to benbjohnson/bolt that referenced this issue Apr 10, 2014
This commit adds the ability to create buckets inside of other buckets.
It also replaces the buckets page with a root bucket.

Fixes boltdb#56.
benbjohnson added a commit to benbjohnson/bolt that referenced this issue Apr 10, 2014
This commit adds the ability to create buckets inside of other buckets.
It also replaces the buckets page with a root bucket.

Fixes boltdb#56.
benbjohnson added a commit to benbjohnson/bolt that referenced this issue Apr 10, 2014
This commit adds the ability to create buckets inside of other buckets.
It also replaces the buckets page with a root bucket.

Fixes boltdb#56.
benbjohnson added a commit to benbjohnson/bolt that referenced this issue Apr 10, 2014
This commit adds the ability to create buckets inside of other buckets.
It also replaces the buckets page with a root bucket.

Fixes boltdb#56.
benbjohnson added a commit to benbjohnson/bolt that referenced this issue Apr 10, 2014
This commit adds the ability to create buckets inside of other buckets.
It also replaces the buckets page with a root bucket.

Fixes boltdb#56.
benbjohnson added a commit to benbjohnson/bolt that referenced this issue Apr 11, 2014
This commit adds the ability to create buckets inside of other buckets.
It also replaces the buckets page with a root bucket.

Fixes boltdb#56.
benbjohnson added a commit to benbjohnson/bolt that referenced this issue Apr 11, 2014
This commit adds the ability to create buckets inside of other buckets.
It also replaces the buckets page with a root bucket.

Fixes boltdb#56.
heyitsanthony pushed a commit to heyitsanthony/bolt that referenced this issue Nov 16, 2017
Trivial. Removed 'moribund' from README.md
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants