Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #1165 by having threads wait for any outstanding connect to finish. #1170

Merged
merged 5 commits into from
Aug 13, 2015

Conversation

slackpad
Copy link
Contributor

This creates a map of channels (per-address) that throttles requests for connections to one at a time (per-address). The other threads will then quickly grab the newly-created connection one at a time and use it.

If the connection they were waiting for fails, they will all try to make a new connection, and the race condition logic for that case was preserved. This seems like the best thing to do in order to move forward, since they all could have been hanging around waiting for that one connection to go. They essentially revert to the behavior we have today, but only in the case where there's network trouble.

I was trying to find a simpler way to handle the case where the connection they are waiting on fails - we could fail all the others immediately but I think the code would be a little more complicated. Thundering a herd at a dead guy seems pretty low-impact.

@slackpad
Copy link
Contributor Author

Also noticed that we didn't bump the ref count in the race condition case, so those might have behaved strangely in some rare cases. Pushed a fix for that.

@slackpad
Copy link
Contributor Author

That last change makes it so only one thread attempts the connect and everybody else goes based on that. I'll let that bake for a little while (it's more tricky, imho, but not too bad).

}()

c, err := p.getNewConn(dc, addr, version)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor, but can we move the code in the defer to after getNewConn and delete the limiter entry and add the pool entry inside the same critical section? Just to save a defer and another round of locking

var wait chan struct{}
var ok bool
if wait, ok = p.limiter[addr.String()]; !ok {
wait = make(chan struct{}, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need to buffer it, since we just close

@armon
Copy link
Member

armon commented Aug 13, 2015

LGTM!

slackpad added a commit that referenced this pull request Aug 13, 2015
Fixes #1165 by having threads wait for any outstanding connect to finish.
@slackpad slackpad merged commit 009f0fb into master Aug 13, 2015
@slackpad slackpad deleted the b-connection-spam branch August 13, 2015 18:38
slackpad added a commit that referenced this pull request Aug 13, 2015
Fixes #1165 by having threads wait for any outstanding connect to finish.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants