Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

crypto errors can stall udp receiver loop #564

Closed
rade opened this issue Apr 15, 2015 · 3 comments · Fixed by #617
Closed

crypto errors can stall udp receiver loop #564

rade opened this issue Apr 15, 2015 · 3 comments · Fixed by #617
Assignees
Labels
Milestone

Comments

@rade
Copy link
Member

rade commented Apr 15, 2015

Receiving a stray udp packet can cause the decryptor to wait for a new nonce which will never arrive. This stalls the entire udp receiver loop for 30 seconds, until the offending connection gets shut down by heartbeat timeouts. That of course will cause all other connections to miss heartbeats too and shut down too. Not great.

@rade rade added the bug label Apr 15, 2015
@rade
Copy link
Member Author

rade commented Apr 15, 2015

A quick hack fix would introduce a timeout. 1 second? 3? Timeouts have a habit of, err, timing out when they shouldn't, so this may well replace one problem with another. Though arguably less severe one. If the packet was genuine then we will drop it. and the next packet along will once again cause us to wait for a nonce. No big deal.

However, if we have a whole stream of such stray packets we could still end up slowing the entire UDP receiver loop to a crawl.

An alternative would be to introduce per-connection goroutines for inbound udp packet processing. just like we have on the outbound side. This is costly in performance terms, and also poses some problems for packet injection - iirc the injection handle isn't thread safe. We could create separate handles, I suppose.

@rade
Copy link
Member Author

rade commented Apr 16, 2015

Another option would be not to wait for a nonce at all, i.e. drop the packet if we cannot get hold of a new nonce. I've experimented with that and it actually works very well.

Note that new nonces are sent when the sender is half-way through the offset range, so they should arrive at the remote well in advance of any packets using the new nonces. The main exception is the first packet. In fact odds of the nonce not making it through before that packet arrives are quite high.

@rade
Copy link
Member Author

rade commented Apr 16, 2015

Idea from @bboreham only wait (with a timeout) once, and only for the first nonce. That would cure the common case of the first nonce arriving fractionally later than the first packet.

Another related idea: replace the source peer name in our udp encapsulation with the connection uid. The former is just a lookup away from the latter, and looking up connections in the udp reader by connection uid instead of peer name ensures that packets get processed by the correct decryptor.

@dpw dpw closed this as completed in #617 Apr 30, 2015
dpw added a commit that referenced this issue Apr 30, 2015
replace random nonces with message sequence numbers

Fixes #564.
@rade rade modified the milestone: 0.11.0 May 12, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant