-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for 'implicit snapshot iterator' #10
Comments
More info about abstract-leveldown (where the test suite comes from): https://github.com/rvagg/abstract-leveldown#abstract-leveldown- and also the test in question: https://github.com/rvagg/abstract-leveldown/blob/master/abstract/iterator-test.js#L458 @hmalphettes I think your description is slightly wrong. It should be: "Redisdown fails the iterator test 'iterator create snapshot correctly' where an iterator is created, a record removed and the iterator still successfully finds the removed record." The idea of snapshots is that when you open an iterator it also makes a snapshot of the db state so that the iterator will iterate against the rows in the db at the state they were in when the snapshot was taken. Meanwhile outside of the iterator other operations can change rows at the same time but the iterator will match the correct versions |
is this done to a single record or the whole db? |
@NHQ leveldb snapshots are for the entire state of the db at the time of the snapshot |
I believe redis has to be configured to make snapshots, to make snapshots. http://redis.io/topics/persistence |
@NHQ As far as I understand we cant take advantage of this: it is limited to how often one wants to dump the redis content on the file system in case of a catastrophic failure. |
you can tell redis to snapshot on the fly http://redis.io/commands/bgsave |
@NHQ, oh ok. Is there a way to make queries against such snapshot? |
my guess is to start a new redis process configured to read the new .rdb file and serve clients on another port |
@NHQ that would not work against a standard rediscloud instance for example. |
To put things in context, it is not clear that this feature is a core requirement of an abstract-level implementation: Level/abstract-leveldown#38 (comment) It might remain a leveldown only feature. The agreement at this point is to document the difference in behavior with levelddown when it is not supported. I am closing this issue for now. We can re-open if we find a fantastic way to support this. |
If I understand it correctly, this library uses a zset as the backing store for the level data in Redis, correct? If so, you could just use zunionstore to create a "snapshot" |
@brycebaril correct it is a zset. Great, I have never used a zunionstore before. Let me see if I can figure that out. |
@brycebaril we are using a zset to store the keys and an hset to store the values. |
Ahh, so the index and data are split into two different locations, that complicates things. If it was only the ZSET then simply zunionstore could create a copy of the ZSET as the snapshot. Given the split storage, this would mean that the snapshot index (zset) could point to data that is either changed or removed from the data backing (hash). You could do the "snapshot" locally to the node process (ship the entire zset and hash) or partially (zunionstore and ship only the hash) but that results in some memory overhead and result processing for data you may not need. I'm assuming the O(log(N)) lookup for a single value (e.g. for GET) is why you didn't go with putting all of the data in the ZSET? Other than that I think it would work. For a read-stream heavy use-case this would likely improve performance to not have to go to the hash for each record. |
Thanks a lot for the review! Regarding performance of read heavy streams, iterators are using lua to fetch batches of the values in a single redis call: https://github.com/hmalphettes/redisdown/tree/master/luascripts I did not put all the data in the ZSET mostly because I did not think that it was possible. That implementation would be a lot simpler at the cost of O(log(N)) to lookup values by keys. If complexity and size of the collection are not the issue we could support a hybrid: I am not familiar with enough levelup use cases to decide what is the best implementation really. What do you think? |
The other problem I just thought of with concatenating the key/value into the ZSET is it incurs a removal cost for updates, since in the current setup updates to existing keys is just a HSET and a noop ZADD, but in a ZSET-only world you'd need to remove the ZSET member that contains the old data. The other general problem with "snapshotting" on Redis is that you're doubling (at least) your memory footprint in Redis whenever you perform it. In terms of Level use-cases, it depends on how strongly you want to guarantee of readstreams being read from snapshot data. I will try and find some time to read through iterator.js in detail to see if this is being violated or not -- at first glance it looks like the stream may be virtual in that you're fetching all of the data at once and buffering it locally vs. using some sort of cursor and re-fetching per batch... At this point, I don't think that snapshotting beyond generating consistent read streams is a commonly-used (or used at all) feature of levelup. If the streams here (even if they are non-streaming from Redis and locally buffered as streams) are consistent, I don't think you really need to support snapshotting. To do this truly natively streaming you'd need a lexicographic version of ZSCAN in conjunction with ZUNIONSTORE for snapshotting. I don't know offhand if ZSCAN reliably replies in lexicographic order -- it seems to, but most likely it isn't guaranteed. The best one could probably do when the dataset becomes so large that you can't buffer locally is one could emulate it with ZUNIONSTORE and a cursor on the Node.js-side with ZRANGEBYLEX with a LIMIT that re-queries from the snapshot for each batch. E.g. see https://www.npmjs.org/package/redis-scanstreams for inspiration. Both of these would work if the backing store was the single ZSET, though you might be able to do basically the same thing if it was easy enough to write a HASH copy LUA command that was performant. Then in a single "snapshot" command, you could ZUNIONSTORE to copy the ZSET, then do the HASH copy to get an in-redis-memory snapshot of the data to stream from. |
OK @brycebaril thanks again for all the ideas. iterator.js fetches the values from redis little by little: not everything at once. The ZSCAN documentation says there is no guarantee that elements wont be returned multiple times and keeps silent regarding the order. So snapshotting support in redis would require to clone the data: prohibitive in many situations. Once we guarantee that even if we add or remove objects during an iteration we still see values only once and in the right order: #11. |
Another alternative to snapshots would be a mutex lock on top of the redis On Tuesday, September 16, 2014, Bryce Baril notifications@github.com
|
@hmalphettes ok, I gave it a good read-through and I see how you're batching. Unfortunately without a snapshotting solution this is very problematic. This doesn't provide a consistent read as from a snapshot (or like LevelDB) unfortunately, so that's the problem we've discussed above -- which means since records can be removed or inserted between index reads, old results can repeat and old records can be skipped, or new records returned inconsistently depending on the current offset. Example: (let's say the HWM is artificially set to 1)
D but we saw E even though it was inserted later, and Y is missed even though they existed prior to the first streamed value. Back to alternative solutions one other way is to harness Redis' replication for this -- while you'd limit the number of snapshots you could have at any time, you could simply "snapshot" by reading from a replica where you "snapshot" by disabling replication until you are done with the snapshot. |
@brycebaril yes understood, so the fix to avoid this very inconsistent read-stream consist of not using an offset between batches. This wont give us a snapshot but it will guarantee that we don't skip values or get them multiple times. |
That is the same short-term fix I came up with. When it comes down to it not all back-ends can provide the exact same feature-sets, I think this is something that the level ecosystem should not only accept but welcome. Pick the right tool for the job -- if all the back-ends were exactly the same feature-wise it wouldn't make sense to have more than one.
👍 |
Thanks a lot for the review; very much appreciated (sorry I keep repeating myself). |
"Redisdown fails the iterator test 'iterator create snapshot correctly' where an iterator is created, a record removed and the iterator still successfully finds the removed record." (thanks @maxogden for the edit)
I am don't know how to support this behavior with redis at the moment.
The text was updated successfully, but these errors were encountered: