-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFE: Sharding #6
Comments
Totally, in fact it's one of the goals of the new version that's sitting in my local repository. |
Excellent! I'll keep my eyes peeled. Also willing to help if needed. |
If you're not already, I highly, highly recommend looking at http://pypi.python.org/pypi/hash_ring/1.2 for an implementation of consistent hashing. |
Two other posts about it: http://amix.dk/blog/viewEntry/19367 and http://www.lexemetech.com/2007/11/consistent-hashing.html Thanks, BTW--just started with Redis today, but the python library seems to be doing its job well. |
I thought about this quite a bit while refactoring the client. Adding sharding would explicitly forbid the usage of some Redis commands that take multiple keys as arguments. This might be "OK", but it was enough of a red flag that I wanted to get some additional feedback before committing to an implementation. Looking at the command list, I've identified these commands as dangerous to use w/ sharding. There may be a few more that I missed:
So, one idea would be to have these commands simply raise exceptions if the client is configured for sharding. Before implementing this, does anyone else have a better idea? |
It's great to see you are targeting sharding now. Raise an exception for the above command is ok for me. |
pretty straightforward port of ketama hashing in python (used by most memcached clients): |
Hey @andymccurdy -- for sharded instances, have you seen what antirez has suggested with "key tags"? I think they're in the Ruby client. Then, if all of the keys for a command are on the same server, you could still run them. (If not, the awesomest thing might be a wrapper on redis-py that would do things like merge a SORT with MGET client-side, but that's a much bigger project). |
Antirez has said that his next project after Hashes is redis-cluster. Clients talk the single redis-cluster, which in turns talks to a list of redis servers. This sounds like a much better solution than individual clients implementing hashing, consistent or not. |
I guess the real question for me is how long is it going to take for redis-cluster to become available in a beta state? Until then client-based sharding is somewhat of a necessity. We could wrap redis-py in-house to do it, but it would be great if the client did it so that it became a bit of redis sharding standard for Python libraries that use redis-py. |
So what happens when you add a new server to your cluster? Even with consistent hashing, some keys are going to have to move around. Are you using Redis only as a cache, where the data can just get regenerated on the new server? If not, how do you deal with data no longer residing where it should? |
We're using it only as a cache. So it'll get regenerated, also, we're using timeouts on the data so that plan works fairly well. |
From what I saw, antirez seems to want people to use client-side sharding where possible. Redis-cluster will only be for certain more complex configurations. But it might be worth trying to get the "official" position on this. |
Has anyone taken a look at the hashing implementation in the Ruby client? |
I took a quick look at the Ruby client. As far as I can tell, the commands that Andy referenced above would not work properly if sharding is used. The only exception I can see is for a multiget, which is custom coded to find the right shard for each supplied key. It might be worth posting a message to the Redis mailing list for any other ideas.
Consistent hashing as a sharding technique will work when the database is used as a cache. Otherwise, sharding is typically done with very application-specific code. In some cases, depending on the sharding algorithm, you can add nodes without requiring any keys to be moved around. Consistent hashing is another option, but you would need to manually move data around (automating this piece is asking for trouble, IMO). One thing the client library could do is expose an API that allows the user to specify a list of hosts, and the function that should be used to map any given key to the appropriate node, with the default routine (when sharding is used) being consistent hashing. |
Just committed a first stab at consistent hashing in Redis. It's in the branch aptly named "consistent_hashing". I'd like to get some feedback on this before I commit merge it to master. |
Seems like redis-cluster is the priority now of antirez. It will no doubt be superior to functionality that any client lib will provide, so closing this issue. If you require client-side sharding, take a look at the consistent_hashing branch. |
Hello, where do I find the consistent_hashing branch, does it still exist? |
Would it be possible to implement automatic sharding similar to the redis Ruby library?
The text was updated successfully, but these errors were encountered: