-
Notifications
You must be signed in to change notification settings - Fork 43
Conversation
I don't understand the QUIC codebase enough to be able to give this a meaningful review. Requesting @marten-seemann to please take a look. |
5bc579f
to
ae38828
Compare
transport.go
Outdated
if bytes.Compare([]byte(t.localPeer), []byte(p)) < 0 { | ||
err = t.holePunch(network, addr) | ||
if err == nil { | ||
err = errors.New("hole punching attempted; no active dial") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the tricky part.
Ideally we would return the connection accepted by listen, but this is problematic:
- we need to hook into the listen and intercept that
- the connection would be returned to the swarm twice, which is very undesirable for obvious reasons.
- We also need to signal to the swarm that we are done trying to connect, and propagate this to the caller.
I don't think it is safe to return nil, nil
here to the swarm, as it will probably crash if it doesn't get a connection and no error.
So I couldn't find anything better to do, than just return an error and let the caller (who actually supplied the context option after all) to check for the incoming connection.
Any ideas for improving this situation are welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Granted, this should probably be a named error (if we decide to keep this logic) so that the caller knows to wait for the incoming connection and not have to guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now that I think about it, if the incoming connection succeeds the Connect
call will not return an error to the caller.
I still think we need to return an error to the swarm (so that it doesn't expect a connection) though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point me to where this ctx
is configured and Dial
is called? I tried grepping through the libp2p workspace, but the only occurrence for WithSimultaneousConnect
is in go-libp2p-core, so I assume it's only in a PR at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's brand new -- it's gonna come from this pr: libp2p/go-libp2p#1057
I don't think aarsh has added yet, but he will do very soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with returning a named error here. Hooking into Listen
and catching the connection there might be another (more complicated) solution, but I'm not sure if we're going to run into problems if both peers think that they're a client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am not sure either but I think that the swarm is not going to like it.
I'll make a named error for this, maybe it should live in core however?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added named error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now that I think about it, if the incoming connection succeeds the Connect
call will not return an error to the caller.
Is there some magic in Swarm that does NOT return an error for the Connect
call even if the dial fails but an incoming succeeds ? Please can you point me to it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM. Thank you @vyzo!
Once we have a test case this should be good to merge.
what's this stuff aarsh? I am going to drop these commits when we are to merge. |
@vyzo It's all debugging. We'll drop it all. DW |
5bb6967
to
71724d9
Compare
listener.go
Outdated
holePunch, ok := l.transport.holePunching[hpkey] | ||
l.transport.holePunchingMx.Unlock() | ||
if ok { | ||
select { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a race condition here: If we receive 2 connections before the other go routine has deleted this entry from the holePunching
map, we'll block here until the context is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It's fixed now, right?
- Why not delete from the hole-punching map when we first access it? That should fix all the problems here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if we one-buffer the channel and atomically delete from the map, we should be able to do everything under the same lock:
l.transport.holePunchingMx.Lock()
holePunch, ok := l.transport.holePunching[key]
if ok {
holePunch.connCh <- conn
delete(l.transport.holePunching, key)
}
l.transport.holePunchingMx.Unlock()
Probably not necessary, but may make things easier to reason about.
71724d9
to
d1ea15f
Compare
d1ea15f
to
e14c3ed
Compare
@Stebalien Can I ask you for a review here? I think I fixed a race condition that could lead to a deadlock when we'd two connection from the same address, but I'd like to get a second set of eyes before merging this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly: make the handoff more atomic.
listener.go
Outdated
holePunch, ok := l.transport.holePunching[hpkey] | ||
l.transport.holePunchingMx.Unlock() | ||
if ok { | ||
select { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It's fixed now, right?
- Why not delete from the hole-punching map when we first access it? That should fix all the problems here, right?
listener.go
Outdated
holePunch, ok := l.transport.holePunching[hpkey] | ||
l.transport.holePunchingMx.Unlock() | ||
if ok { | ||
select { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if we one-buffer the channel and atomically delete from the map, we should be able to do everything under the same lock:
l.transport.holePunchingMx.Lock()
holePunch, ok := l.transport.holePunching[key]
if ok {
holePunch.connCh <- conn
delete(l.transport.holePunching, key)
}
l.transport.holePunchingMx.Unlock()
Probably not necessary, but may make things easier to reason about.
(also happy to walk over this in-person as well) |
afc0382
to
91bf7c5
Compare
f046b6b
to
62696b5
Compare
|
6125e62
to
fd54c8c
Compare
transport.go
Outdated
@@ -245,3 +344,7 @@ func (t *transport) Protocols() []int { | |||
func (t *transport) String() string { | |||
return "QUIC" | |||
} | |||
|
|||
func holePunchKey(addr net.Addr, id peer.ID) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can also just use a custom type holePunchKey{addr string, id peer.ID}
. Not critical, but that would be slightly easier to reason about.
Needs testing, but feel free to review.
See libp2p/go-libp2p#1048