Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstable redistest.StartCluster due to redis "address already in use" #7

Closed
xor-gate opened this issue Jan 4, 2018 · 3 comments
Closed

Comments

@xor-gate
Copy link

xor-gate commented Jan 4, 2018

We use the redistest cluster implementation in our tests. And see often some unstable starts when the redis-server exited prematuraly due to "address already in use" after the free port is "acquired" by the getFreePort function. Probably it Closes the underlaying os connection filedescriptor on a later time (see go1.10beta1 - blog.gopheracademy.com blogpost (search for Close)). And golang/go issue: golang/go#21856

We currently see some more stable starts (retry on failure) with this change: dualinventive@ca95cd2

Have you seen also this behaviour?
We are running under Debian 8 AMD64 (kernel 3.16.0-4-amd64) if that would help.

Kind regards,
Jerry

@mna
Copy link
Owner

mna commented Jan 5, 2018

Hello Jerry,

I did see this error from time to time indeed, though I haven't investigated the issue seriously. I wonder if that means the error should not present itself with Go1.10? I'm pretty sure I've seen this error on macOS too, though it's been awhile since I last spent some time working on redisc.

I'm definitely interested in any pointers/fixes regarding this!

Thanks,
Martin

@xor-gate
Copy link
Author

xor-gate commented Jan 5, 2018

We have tried to see if it wouldn't happen with 1.10 but it could also result in "address already in use" but the go runtime guarantees now Close waits until the underlaying OS filedescriptor is closed. Still it is possible the kernel keeps the socket to the world in TIME_WAIT state (e.g Linux). The workaround in our repository fork branch is just to retry a few times, which seems to work to get our CI platform passing the tests.

It is indeed a very nasty problem which is hard to resolve.

@xor-gate
Copy link
Author

I will close this, as we have moved to a mocked redis interface which talks to a single instance. Still it is an issue but when implementing tests a redis cluster should be persistent to get rid of this startup problem. Feel free to comment if you have anything to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants