Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: sending UDP datagrams back and forth fails with "sendmsg: operation not permitted" #63322

Open
marten-seemann opened this issue Oct 1, 2023 · 10 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@marten-seemann
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go1.21.0 linux/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/home/parallels/.cache/go-build'
GOENV='/home/parallels/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/parallels/src/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/parallels/src/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/parallels/bin/go1.21ex'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/parallels/bin/go1.21ex/pkg/tool/linux_arm64'
GOVCS=''
GOVERSION='go1.21.0'
GCCGO='gccgo'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/parallels/src/go/src/github.com/libp2p/go-libp2p/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build618110419=/tmp/go-build -gno-record-gcc-switches'

What did you do?

I created two UDP sockets, and send UDP datagrams from one to the other, and vice versa:

func newUDPConn(t *testing.T) *net.UDPConn {
	addr, err := net.ResolveUDPAddr("udp", "localhost:0")
	if err != nil {
		t.Fatal(err)
	}
	conn, err := net.ListenUDP("udp", addr)
	if err != nil {
		t.Fatal(err)
	}
	t.Cleanup(func() { conn.Close() })
	return conn
}

func TestUDPSending(t *testing.T) {
	c1 := newUDPConn(t)
	c2 := newUDPConn(t)

	const num = 1000
	done := make(chan struct{}, 2)
	var failed1, failed2 bool
	go func() {
		defer func() { done <- struct{}{} }()
		for i := 0; i < num; i++ {
			_, _, err := c1.WriteMsgUDP(make([]byte, 1000), nil, c2.LocalAddr().(*net.UDPAddr))
			if err != nil {
				failed1 = true
				t.Logf("c1 send failed %d: %v\n", i, err)
			}
			time.Sleep(200 * time.Microsecond)
		}
	}()

	go func() {
		defer func() { done <- struct{}{} }()
		for i := 0; i < num; i++ {
			_, _, err := c2.WriteMsgUDP(make([]byte, 1000), nil, c1.LocalAddr().(*net.UDPAddr))
			if err != nil {
				failed2 = true
				t.Logf("c2 send failed %d: %v\n", i, err)
			}
			time.Sleep(200 * time.Microsecond)
		}
	}()

	<-done
	<-done

	if failed1 || failed2 {
		t.Fail()
	}
}

What did you expect to see?

I expected UDP datagrams to be sent on both sockets.

What did you see instead?

More often than not, this test fails on Linux:

c1 send failed 0: write udp 127.0.0.1:47716->127.0.0.1:34183: sendmsg: operation not permitted

It's either the first or the second connection that fails, but if there's a failure, it's always only the first call that fails. All subsequent calls complete successfully.

On macOS, this code doesn't produce any errors.

@qiulaidongfeng
Copy link
Member

qiulaidongfeng commented Oct 2, 2023

I cannot reproduce failures in both wsl1 and github code spaces using the TestUDPSending you provided.

@rittneje
Copy link

rittneje commented Oct 2, 2023

I am able to reproduce (on Linux), but it is very hit-or-miss. Strangely, it only seems to happen on the host - I cannot reproduce in a Docker container.

From looking around, it seems like this may be a Linux "feature" where sendmsg sometimes returns EPERM if the firewall buffer is full, or something like that. It's not very clear though.

Strangely, EPERM is not defined as one of the error values from sendmsg, so no matter what the kernel isn't even abiding by its own documentation.

@dennis-tra
Copy link

dennis-tra commented Oct 2, 2023

I've run the snippet 100 times on a Linux machine and on MacOS but couldn't reproduce. Interestingly, on Linux a test run takes ~1.2s, and on MacOS ~0.35s.

Linux: Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-56-generic x86_64) (go1.20.1)
MacOS: Ventura 13.5.2. (go1.20.0)

Snippet to run the test 100 times
package main

import (
	"net"
	"testing"
	"time"
)

func newUDPConn(t *testing.T) *net.UDPConn {
	addr, err := net.ResolveUDPAddr("udp", "localhost:0")
	if err != nil {
		t.Fatal(err)
	}
	conn, err := net.ListenUDP("udp", addr)
	if err != nil {
		t.Fatal(err)
	}
	t.Cleanup(func() { conn.Close() })
	return conn
}

func TestUDPSending(t *testing.T) {
	c1 := newUDPConn(t)
	c2 := newUDPConn(t)
	for j := 0; j < 100; j++ {
		t.Run("", func(t *testing.T) {
			const num = 1000
			done := make(chan struct{}, 2)
			var failed1, failed2 bool
			go func() {
				defer func() { done <- struct{}{} }()
				for i := 0; i < num; i++ {
					_, _, err := c1.WriteMsgUDP(make([]byte, 1000), nil, c2.LocalAddr().(*net.UDPAddr))
					if err != nil {
						failed1 = true
						t.Logf("c1 send failed %d: %v\n", i, err)
					}
					time.Sleep(200 * time.Microsecond)
				}
			}()

			go func() {
				defer func() { done <- struct{}{} }()
				for i := 0; i < num; i++ {
					_, _, err := c2.WriteMsgUDP(make([]byte, 1000), nil, c1.LocalAddr().(*net.UDPAddr))
					if err != nil {
						failed2 = true
						t.Logf("c2 send failed %d: %v\n", i, err)
					}
					time.Sleep(200 * time.Microsecond)
				}
			}()

			<-done
			<-done

			if failed1 || failed2 {
				t.Fail()
			}
		})
	}
}

@sukunrt
Copy link

sukunrt commented Oct 2, 2023

@dennis-tra You need to make new udp conns on each run. The error concerns the initial packets exchanged on the connection. This variant fails quite frequently for me on linux.

package main

import (
	"net"
	"testing"
	"time"
)

func newUDPConn(t *testing.T) *net.UDPConn {
	addr, err := net.ResolveUDPAddr("udp", "localhost:0")
	if err != nil {
		t.Fatal(err)
	}
	conn, err := net.ListenUDP("udp", addr)
	if err != nil {
		t.Fatal(err)
	}
	t.Cleanup(func() { conn.Close() })
	return conn
}

func TestUDPSending(t *testing.T) {
	
	for j := 0; j < 100; j++ {
		t.Run("", func(t *testing.T) {
            c1 := newUDPConn(t)
	        c2 := newUDPConn(t)
			const num = 1000
			done := make(chan struct{}, 2)
			var failed1, failed2 bool
			go func() {
				defer func() { done <- struct{}{} }()
				for i := 0; i < num; i++ {
					_, _, err := c1.WriteMsgUDP(make([]byte, 1000), nil, c2.LocalAddr().(*net.UDPAddr))
					if err != nil {
						failed1 = true
						t.Logf("c1 send failed %d: %v\n", i, err)
					}
					time.Sleep(200 * time.Microsecond)
				}
			}()

			go func() {
				defer func() { done <- struct{}{} }()
				for i := 0; i < num; i++ {
					_, _, err := c2.WriteMsgUDP(make([]byte, 1000), nil, c1.LocalAddr().(*net.UDPAddr))
					if err != nil {
						failed2 = true
						t.Logf("c2 send failed %d: %v\n", i, err)
					}
					time.Sleep(200 * time.Microsecond)
				}
			}()

			<-done
			<-done

			if failed1 || failed2 {
				t.Fail()
			}
		})
	}
}

@qiulaidongfeng
Copy link
Member

qiulaidongfeng commented Oct 2, 2023

Use the following code

package main

import (
	"net"
	"testing"
	"time"
)

func newUDPConn(t *testing.T) *net.UDPConn {
	addr, err := net.ResolveUDPAddr("udp", "localhost:0")
	if err != nil {
		t.Fatal(err)
	}
	conn, err := net.ListenUDP("udp", addr)
	if err != nil {
		t.Fatal(err)
	}
	t.Cleanup(func() { conn.Close() })
	return conn
}

func TestUDPSending(t *testing.T) {

	for j := 0; j < 100; j++ {
		t.Run("", func(t *testing.T) {
			t.Parallel()
			c1 := newUDPConn(t)
			c2 := newUDPConn(t)
			const num = 1000
			done := make(chan struct{}, 2)
			var failed1, failed2 bool
			go func() {
				defer func() { done <- struct{}{} }()
				for i := 0; i < num; i++ {
					_, _, err := c1.WriteMsgUDP(make([]byte, 1000), nil, c2.LocalAddr().(*net.UDPAddr))
					if err != nil {
						failed1 = true
						t.Logf("c1 send failed %d: %v\n", i, err)
					}
					time.Sleep(200 * time.Microsecond)
				}
			}()

			go func() {
				defer func() { done <- struct{}{} }()
				for i := 0; i < num; i++ {
					_, _, err := c2.WriteMsgUDP(make([]byte, 1000), nil, c1.LocalAddr().(*net.UDPAddr))
					if err != nil {
						failed2 = true
						t.Logf("c2 send failed %d: %v\n", i, err)
					}
					time.Sleep(200 * time.Microsecond)
				}
			}()

			<-done
			<-done

			if failed1 || failed2 {
				t.Fail()
			}
		})
	}
}

Editor [Sorry for accidentally confusing the platform]: Successfully reproduced the failure on the Linux platform in Github code space, while wsl1 failed to reproduce the failure.

@dennis-tra
Copy link

Thanks for correcting, and sorry for the noise. However, even with both of your corrected versions, I cannot reproduce it on my Linux machine.

@prattmic prattmic added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 3, 2023
@prattmic
Copy link
Member

prattmic commented Oct 3, 2023

cc @neild @ianlancetaylor

@bcmills
Copy link
Contributor

bcmills commented Oct 13, 2023

Strangely, EPERM is not defined as one of the error values from sendmsg, so no matter what the kernel isn't even abiding by its own documentation.

POSIX does not seem to allow EPERM from sendmsg either:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sendmsg.html
It seems like if the firewall buffer is full it ought to return ENOBUFS instead. 🤷‍♂️

At any rate, this looks like maybe a Linux kernel bug rather than a Go bug, and (especially given that it is UDP) it's not clear to me that Go should necessarily work around that bug itself.

@ArtemGr
Copy link

ArtemGr commented Nov 8, 2023

FYI, I've seen this “sendmsg: operation not permitted” happen with Syncthing (QUIC) due to UDPFLOOD protection, configured automatically by DirectAdmin (iptables -vL UDPFLOOD showing "limit: avg 100/sec burst 500" and "limit: avg 30/min burst 5").

@bcmills
Copy link
Contributor

bcmills commented Nov 8, 2023

@ArtemGr, that still sounds consistent with a this being a Linux kernel bug. (Perhaps enabling UDPFLOOD limits causes the kernel to return an error that is not consistent with its documentation?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

9 participants