-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
conn, device, tun: implement vectorized I/O on Linux #64
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,21 @@ import ( | |
func init() { | ||
controlFns = append(controlFns, | ||
|
||
// Attempt to set the socket buffer size beyond net.core.{r,w}mem_max by | ||
// using SO_*BUFFORCE. This requires CAP_NET_ADMIN, and is allowed here to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where do you envision tun being available but not SO_*BUFFORCE? Android where it's passed over a unix domain socket? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, for better and worse it's becoming a bit chaotic out there what feature sets overlap, largely due to security policy constraints. The two big aspects are Android and container runtimes, with there being much larger variation in configurations in the latter. In Android this call fails due to lack of CAP_NET_ADMIN, though I would have expected it to also be caught by the selinux configuration (as sadly the offload is) 🤷🏼 We might consider reporting the outcome, but to some extent I'm also waiting to see what happens here with the defaults once userspace QUIC deployments become more common. This will come up there once those folks also get through optimizations, perhaps it's time to change the defaults or some of the constraints. I'd be happy to write some notes about this in a doc somewhere, if we have a suitable doc in a suitable place. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The right place for those types of implementation notes is commit messages. It seems like this snippet can be separated out into a standalone thing. If you think there's interesting context and such, just put it in its own commit and stick that in a commit message. (But in case if it's not that interesting and doesn't matter, don't bother - up to you.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The socket options for UDP snd/rcv buffer size have been broken out into their own commit, with these details included in the commit message aeed2ba. |
||
// fail silently - the result of failure is lower performance on very fast | ||
// links or high latency links. | ||
func(network, address string, c syscall.RawConn) error { | ||
return c.Control(func(fd uintptr) { | ||
// Set up to *mem_max | ||
_ = unix.SetsockoptInt(int(fd), unix.SOL_SOCKET, unix.SO_RCVBUF, socketBufferSize) | ||
_ = unix.SetsockoptInt(int(fd), unix.SOL_SOCKET, unix.SO_SNDBUF, socketBufferSize) | ||
// Set beyond *mem_max if CAP_NET_ADMIN | ||
_ = unix.SetsockoptInt(int(fd), unix.SOL_SOCKET, unix.SO_RCVBUFFORCE, socketBufferSize) | ||
_ = unix.SetsockoptInt(int(fd), unix.SOL_SOCKET, unix.SO_SNDBUFFORCE, socketBufferSize) | ||
}) | ||
}, | ||
|
||
// Enable receiving of the packet information (IP_PKTINFO for IPv4, | ||
// IPV6_PKTINFO for IPv6) that is used to implement sticky socket support. | ||
func(network, address string, c syscall.RawConn) error { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
/* SPDX-License-Identifier: MIT | ||
* | ||
* Copyright (C) 2017-2023 WireGuard LLC. All Rights Reserved. | ||
*/ | ||
|
||
package conn | ||
|
||
import ( | ||
"syscall" | ||
|
||
"golang.org/x/sys/windows" | ||
) | ||
|
||
func init() { | ||
controlFns = append(controlFns, | ||
func(network, address string, c syscall.RawConn) error { | ||
return c.Control(func(fd uintptr) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this actually do anything on Windows, given that Windows uses WinRIO? It might, I just don't know myself. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, at least for API users who use StdNetBind, such as Tailscale. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
(It can stay in this PR, but just have a separate commit with a message I can read to understand this.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The socket options for UDP snd/rcv buffer size have been broken out into their own commit, with these details included in the commit message aeed2ba. |
||
_ = windows.SetsockoptInt(windows.Handle(fd), windows.SOL_SOCKET, windows.SO_RCVBUF, socketBufferSize) | ||
_ = windows.SetsockoptInt(windows.Handle(fd), windows.SOL_SOCKET, windows.SO_SNDBUF, socketBufferSize) | ||
}) | ||
}, | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be platform defined? Seems odd to justify the use of 7MBs on the grounds of macOS and then apply it everywhere. Or is it the case where every other platform clamps below 7MB and never above, so 7MB represents a useful upperbound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be platform defined anytime if we decide that's a good idea, if we demonstrate a scenario where it is worthwhile.
This value would ideally actually be tunable, it is essentially the main tuning knob around which you can address BDP. What we observed in testing is that we need a value substantially larger than the rather conservative default of hundreds of kilobytes in scenarios where there is either significant latency between peers, or significant bandwidth available.
The 7mb value while a bit strange courtesy of macOS limit, is not strongly worse than say, splitting this out and picking 8mb for Linux. We'd like to see more feedback from users regarding use cases here, in our testing this makes 100ms latency links perform much better (once whatever is inside also warms up, such as window expansion), as well as removes a significant bottleneck in reaching speeds in the 1-10gbps range. It is possible it is not sufficient for some 25gbps+ use cases, however other bottlenecks in the system need to be resolved first, so adding potentially substantially more memory usage now would be premature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just worry that we'll want to change it at some point and then inadvertently break macOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break because we make the value too large and miss the comment? Would adding a darwin test checking the value be sufficient in this case? Happy to break it out per-platform, but maybe one test file is simpler? Or we could do both.