Skip to content

Random MOVED errors under concurrent load when ClusterClient configured with port-only addrs #771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sqaxomonophonen opened this issue May 23, 2018 · 3 comments

Comments

@sqaxomonophonen
Copy link

Please refer to my test program here: #753 (comment)

In trying to figure out why your "cluster races" test cases succeed while my program fails, I discovered that your cluster client is configured with full addresses like "127.0.0.1:8220", while my program uses a port-only address (":7000"), and indeed, my program stops failing when I instead connect to "127.0.0.1:7000". I'm also seeing the exact same behavior on the Redis Cluster booted by your test cases (by inserting a select{} // block forever somewhere so I could run my program against it).

I'm seeing the same problem when using addresses like "0.0.0.0:8220" and "localhost:8220". Only "127.0.0.1:8220" works (or actually, the entire 127.0.0.0/8 block seems to work)

I'd like send a test case patch to replicate the problem, but it's non-trivial because there's an internal consistency check (clusterClient.GetState().IsConsistent() in cluster_test.go) which fails for these problematic addresses, so it seems like the tests know that these addresses are bad, while there's no indication I'm doing anything wrong (except for cryptic random MOVED errors) when using go-redis as a package.

Also, I haven't figured out why the errors occur. But it seems like a go-redis bug (because it only happens under concurrent load), or: if I'm doing something wrong with my addresses, at least go-redis should give me a better error message?

Anyway, I know how to work around the problem now, but I can spare some time to write up a fix if you give some pointers on how to approach it.

@vmihailenco
Copy link
Collaborator

I think this was reported before and the problem is with Redis Cluster configuration. Specifically:

  • Redis cluster reports something like "127.0.0.1:7000"
  • But user can only connect to ":7000"

AFAIR this is somehow related to Docker, e.g. "127.0.0.1:7000" is local to Docker container and ":7000" is available to the world. I can be wrong, because I did not try to replicate this myself. See #701 (comment) for more info.

Are you using Docker? Can you check what addresses are returned by https://redis.io/commands/cluster-nodes and if you can connect to those address using redis-cli?

@sqaxomonophonen
Copy link
Author

I'm using Docker, but I also verified that the problem is the same against the Redis Cluster booted by your test cases (I don't know the exact details, but it looks like it compiles Redis directly from source and runs it without any sandboxing/namespacing). For both clusters, CLUSTER NODES shows 127.0.0.1:$PORT addresses, and redis-cli works fine.

To replicate, compile this:

package main

import "github.com/go-redis/redis"
import "math/rand"
import "fmt"
import "os"

func main() {
	client := redis.NewClusterClient(&redis.ClusterOptions{ Addrs: []string{os.Args[1]} })
	const concurrency = 10
	for i := 0; i < concurrency; i++ {
		go func() {
			for {
				rnd := ""
				for i := 0; i < 10; i++ {
					rnd += string(rune(int(rune('a')) + rand.Intn(int(rune('z')-rune('a')))))
				}
				fmt.Printf("GET %s\n", rnd)
				err := client.Get(rnd).Err()
				if err != nil && err != redis.Nil {
					panic(err)
				}
			}
		}()
	}
	select {} // sleep
}

Then to test it against the Redis Cluster booted by your test cases, apply this diff:

diff --git a/race_test.go b/race_test.go
index 04effc4..3ef7456 100644
--- a/race_test.go
+++ b/race_test.go
@@ -318,6 +318,7 @@ var _ = Describe("cluster races", func() {
 
 	It("should get", func() {
 		perform(C, func(id int) {
+			select{} // block forever
 			for i := 0; i < N; i++ {
 				key := fmt.Sprintf("key_%d_%d", id, i)
 				_, err := client.Get(key).Result()

and run with go test ./... --ginkgo.focus="cluster races" -- it should intentionally block indefinitely so you can run my program against it. Lets say the executable name is rediserr, then these succeeds:
$ ./rediserr 127.0.0.1:8220
$ ./rediserr 127.42.24.64:8220
While these fails:
$ ./rediserr :8220
$ ./rediserr localhost:8220
$ ./rediserr 0.0.0.0:8220

To test against the Docker Redis Cluster I'm using, you can try this:
$ docker run -p 7000:7000 -p 7001:7001 -p 7002:7002 -p 7003:7003 -p 7004:7004 -p 7005:7005 -p 7006:7006 -p 7007:7007 -i -t -d grokzen/redis-cluster:3.2.11

Replace 8220 with 7000 in the above rediserr executions, and you should see the same cases succeed and fail respectively.

Also, none of the cases fail if (in my program) you change const concurrency = 10 to const concurrency = 1

@vmihailenco
Copy link
Collaborator

vmihailenco commented May 31, 2018

Thanks for the reproducer - I think I've fixed this in master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants