Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: Sharding #6

Closed
williamsjj opened this issue Jan 20, 2010 · 18 comments
Closed

RFE: Sharding #6

williamsjj opened this issue Jan 20, 2010 · 18 comments

Comments

@williamsjj
Copy link

Would it be possible to implement automatic sharding similar to the redis Ruby library?

@andymccurdy
Copy link
Contributor

Totally, in fact it's one of the goals of the new version that's sitting in my local repository.

@williamsjj
Copy link
Author

Excellent! I'll keep my eyes peeled. Also willing to help if needed.

@dan-g
Copy link

dan-g commented Feb 4, 2010

If you're not already, I highly, highly recommend looking at http://pypi.python.org/pypi/hash_ring/1.2 for an implementation of consistent hashing.

@dan-g
Copy link

dan-g commented Feb 4, 2010

Two other posts about it: http://amix.dk/blog/viewEntry/19367 and http://www.lexemetech.com/2007/11/consistent-hashing.html

Thanks, BTW--just started with Redis today, but the python library seems to be doing its job well.

@andymccurdy
Copy link
Contributor

I thought about this quite a bit while refactoring the client. Adding sharding would explicitly forbid the usage of some Redis commands that take multiple keys as arguments. This might be "OK", but it was enough of a red flag that I wanted to get some additional feedback before committing to an implementation.

Looking at the command list, I've identified these commands as dangerous to use w/ sharding. There may be a few more that I missed:

  • SORT with GET or BY options
  • SDIFFSTORE
  • SINTERSTORE
  • SUNIONSTORE
  • MSETNX
  • SMOVE
  • RENAME
  • RENAMENX
  • BLPOP
  • BRPOP
  • RPOPLPUSH

So, one idea would be to have these commands simply raise exceptions if the client is configured for sharding.

Before implementing this, does anyone else have a better idea?

@fabware
Copy link

fabware commented Feb 21, 2010

It's great to see you are targeting sharding now. Raise an exception for the above command is ok for me.

@toymachine
Copy link

pretty straightforward port of ketama hashing in python (used by most memcached clients):
http://github.com/toymachine/concurrence/blob/master/lib/concurrence/memcache/ketama.py

@dan-g
Copy link

dan-g commented Mar 10, 2010

Hey @andymccurdy -- for sharded instances, have you seen what antirez has suggested with "key tags"? I think they're in the Ruby client. Then, if all of the keys for a command are on the same server, you could still run them.

(If not, the awesomest thing might be a wrapper on redis-py that would do things like merge a SORT with MGET client-side, but that's a much bigger project).

@andymccurdy
Copy link
Contributor

Antirez has said that his next project after Hashes is redis-cluster. Clients talk the single redis-cluster, which in turns talks to a list of redis servers. This sounds like a much better solution than individual clients implementing hashing, consistent or not.

@williamsjj
Copy link
Author

I guess the real question for me is how long is it going to take for redis-cluster to become available in a beta state? Until then client-based sharding is somewhat of a necessity. We could wrap redis-py in-house to do it, but it would be great if the client did it so that it became a bit of redis sharding standard for Python libraries that use redis-py.

@andymccurdy
Copy link
Contributor

So what happens when you add a new server to your cluster? Even with consistent hashing, some keys are going to have to move around. Are you using Redis only as a cache, where the data can just get regenerated on the new server? If not, how do you deal with data no longer residing where it should?

@williamsjj
Copy link
Author

We're using it only as a cache. So it'll get regenerated, also, we're using timeouts on the data so that plan works fairly well.

@dan-g
Copy link

dan-g commented Mar 14, 2010

From what I saw, antirez seems to want people to use client-side sharding where possible. Redis-cluster will only be for certain more complex configurations. But it might be worth trying to get the "official" position on this.

@mjrusso
Copy link

mjrusso commented Mar 22, 2010

Has anyone taken a look at the hashing implementation in the Ruby client?

@mjrusso
Copy link

mjrusso commented Mar 23, 2010

I took a quick look at the Ruby client. As far as I can tell, the commands that Andy referenced above would not work properly if sharding is used. The only exception I can see is for a multiget, which is custom coded to find the right shard for each supplied key.

It might be worth posting a message to the Redis mailing list for any other ideas.

So what happens when you add a new server to your cluster? Even with consistent > hashing, some keys are going to have to move around. Are you using Redis only as > a cache, where the data can just get regenerated on the new server? If not, how do > you deal with data no longer residing where it should?

Consistent hashing as a sharding technique will work when the database is used as a cache. Otherwise, sharding is typically done with very application-specific code. In some cases, depending on the sharding algorithm, you can add nodes without requiring any keys to be moved around. Consistent hashing is another option, but you would need to manually move data around (automating this piece is asking for trouble, IMO).

One thing the client library could do is expose an API that allows the user to specify a list of hosts, and the function that should be used to map any given key to the appropriate node, with the default routine (when sharding is used) being consistent hashing.

@andymccurdy
Copy link
Contributor

Just committed a first stab at consistent hashing in Redis. It's in the branch aptly named "consistent_hashing". I'd like to get some feedback on this before I commit merge it to master.

@andymccurdy
Copy link
Contributor

Seems like redis-cluster is the priority now of antirez. It will no doubt be superior to functionality that any client lib will provide, so closing this issue. If you require client-side sharding, take a look at the consistent_hashing branch.

@yusufk
Copy link

yusufk commented May 9, 2012

Hello, where do I find the consistent_hashing branch, does it still exist?

bellatoris pushed a commit to bellatoris/redis-py that referenced this issue Jul 13, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants