-
-
Notifications
You must be signed in to change notification settings - Fork 267
Plugin Pattern Discussion
I've built a bunch of modules for extending levelup, supporting job queues, map-reduce,
live-streams (think tail -f
), and scuttlebutt.
I've taken care make things as flexible as possible, and to not abuse levelup. However, sometimes I have gone slightly further than a polite module user might.
Yes, I am talking about monkey-patching.
This was entered with the spirit of experimentation, being much faster than making a pull request, And I did not want to include my experimentations in the main project, incase they got stuck there, but turned out not to be such a good idea.
Each module is installed by passing a database instance to it.
var hooks = require('level-hooks')
hooks(db)
The plugin may add a property to db
with the same name as the plugin.
(that is how the user may access the plugin's functionality)
The plugin must check whether it is already added, and not add itself twice.
The plugin may add any other plugins it depends on - (which will check they are not there already)
I would never normally just patch someone elses module like this. The reason I did it like this is because most of these modules insert data, with a prefix. Each module uses it's own prefix, if two tried to use the same prefix, things would probably break. If there was a cleaner way to separate the data, (see namespaces suggestion, below) then it could work like >this
var hooks = require('level-hooks')(db)
level-hooks
is a little bit more naughty than that, as I will describe shortly.
How all my plugins fit together.
All the features are based on 3 important modules
hooks allows you to intercept put
, del
, and batch
.
with pre
you get the transaction before it goes to the database,
and you have the chance to change it - say adding something to the batch.
The transaction is always converted into an array, as accepted by batch
db.hooks.pre(function (batch) {
batch.push({type: 'put', key: 'log:'+Date.now(), value: Date.now})
return batch
})
db.hooks.post(function (item) {
console.log(item) //{key: key, value: value}
})
range-bucket
is not a levelup plugin, but is just a module on it's own.
It's used to generate keys that will have certain sorting properties.
levelup.readStream({start: s, end: e})
streams all keys between s
,
and e
inclusive.
Keys are sorted lexicographically. leveldb does support a custom sort, but it must be configured into the database at compile time, and written in C.
For all intents an purposes - keys in levelup are lexicographically sorted, and there is no way to change that.
However, it's very useful to be able to partition some set of documents into a particular range.
Basically, "lexicographically sorted" means sorted by the initial-bytes in the key. two keys are compared from the start, until one has a byte that is different to the other's byte at that position.
So, there are two particularity important byte values here - null '\x00'
, and "LATIN SMALL LETTER Y WITH DIAERESIS" '\xff'
http://www.utf8-chartable.de/
If a key looks like this: <0xFF 'a' 'b' 'c'>
then it will always sort AFTER anything else, including '\xFF'
, so if you never insert a document under key: FF
you can always retrieve all the regular documents with db.readStream({start: '', end: '\xFF'})
.
All my plugins that insert data use range-bucket, prefixing anything they insert with '\xFF$PREFIX
where $PREFIX is a customizable prefix. some modules use more than one prefix.
range-bucket
also has a way to handle arrays as keys, but that is only used by level-map, and level-reduce
I think now, that since all the modules use range-bucket, it should be handled differently - maybe it should be built into the way that plugins work - but more on that later.
The map-reduce modules (level-map, level-reduce, and level-view-stream) all need to understand nested prefixes, which is used to implement group-by style aggregation queries.
Example, see the street-food example in map-reduce. The map stage stores values with prefixes like [country, region, city]
and these are aggregated, via reduce down to smaller groups [country, region]
, [country]
, and []
.
To do a reduce, I need to query a group of data, to calculate the aggregation for a particular country, I feed all the regions into a reduce and then save it into a higher level group, triggering the next reduce stage.
(like, reduce(readRange([country,*])) -> put([country])
It is necessary to be able to request [country, *]
and get [country, north], [country, south], ...
but NOT get levels under a particular region (like [country, north, smalltown]
)
There may be other ways to implement this, but I generate keys like this:
var key = [length].concat(keyArray).join('\x00')
which creates a null separated key.
(does not support null chars within a key - escaping breaks ordering, so this is better than escaping)
This makes it possible to query keys with multiple wildcards also, like [country, *, *] it's just like
db.readStream({
start: '\xFFMAPPED\xFF3\x00newzealand\x00',
end: '\xFFMAPPED\xFF3\x00newzealand\x00\xFF'
})
//not EXACTLY how map-reduce works, consider this pseudocode.
Which will get everything 3 levels deep under New Zealand, in the MAPPED prefix, for example.
This is only used by the map-reduce modules, so could probably be refactored out.
The basic feature set provided by leveldb is great, but to build applications we need more than just get/set
if you have a map-reduce, or something, you want that to reliably update when I insert new data, without having to manually trigger map-reduce when I do a put
.
Say every time we insert a doc within a certain range we want to generate transformations of that doc, and save them elsewhere in the db (for easy retrieval, as part of a range for a particular query)
This operation is probably asynchronous - so, what if the process crashes when it's half done?
If the source data is saved, but the transformed data is not, we need to know that the transformed data will eventually be saved - else, our data will be inconsistent - some data won't have it's transformation.
So, there is a very simple way to do this - use db.hooks.pre
to intercept a put
, and then use db.queue to add a job for the transformation - this put is turned into a batch, that inserts both the original put - and the job to transform it. After the put & job is successfully saved, the job is started, when the job is finished, the job is deleted.
If the process crashes before it's the job & put is saved, then neither will be saved, so the data base is consistent. If the process crashes before the job is finished, then the job will still be in the database, and the job will be restarted when the process restarts, and then deleted eventually.
In some situations, it's possible that the same job is run twice. (like if the job completes successfully, but crashes before the job is deleted, and is then rerun) It is necessary that the job is idempotent, but most jobs you might need to do (like map-reduce) are idempotent.
If there is a bug, and the job causes a crash, then the job will run over and over. Make sure jobs do not crash, use try {...} catch (err) {...}
etc, jobs need to be as liberal as possible. Unlike couchdb/mongo it's possible to console.log from inside a map-reduce, so you can easily find errors, and debug.
level-queue
and level-hooks
are combined into level-trigger
which allows you to conveniently trigger an job from an insert.
Essentially, it just adds a record into the database (with a specific prefix, and some content - which may be the originating key, or something) under a safe prefix, and then checks if there is anything lingering in that prefix when the db reloads, and re-executes the job.
Plugins I want to write but haven't written yet.
I want a plugin that intercepts a put, and reads the current value, compares it with the new value
var merge = require('level-merge')
merge(db, function (value /*new value*/, _value /*old value*/) {
//there was no old value, insert.
//(could also check if current user is allowed to insert this here)
if(!_value) return value
if(isConcurrent(_value.version, value.timestamp))
throw new Error('concurrent update') //enforce arbitrary versioning systems
//merge values?
return merge(value, _value)
})
This could be used to build all sorts of cool features, enforce arbitrary versioning patterns, (which would be necessary for master-master replication), create arbitrary permission systems, etc.
This would require an async pre hook. (my current pre hook is only sync)
This stuff all basically works, but I don't really think it's ideal, here are some ideas.
-
Build prefixing in some how, maybe you go
_db = db.namespace('prefix')
and it_db
has the exact same api, but everything will be saved with a prefix. This will make getting the prefix stuff right much easier, and should mean that you don't need to patch the db object, you can create a namespaced db, and give it to the plugin, which could then use it without prefixing. This could be great - because it means you some stuff - like the map-reduce instance in an entirely separate db, which could be handy for scaling, etc. The tricky part is that I would need it to support atomically inserting into multiple name spaces. -
Probably need to add hooks into levelup it self, they can be used in a whole bunch of ways, and for a whole bunch of things - trying to do this different ways could cause problems.
Thanks!