Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky tests and/or goroutine leaks #270

Closed
mengelbart opened this issue Mar 31, 2023 · 8 comments · Fixed by #292
Closed

Flaky tests and/or goroutine leaks #270

mengelbart opened this issue Mar 31, 2023 · 8 comments · Fixed by #292
Assignees

Comments

@mengelbart
Copy link
Contributor

Your environment.

What did you do?

I run go test -v -count=5 ./... on the master-branch.

What did you expect?

I expected all tests to pass all the time.

What happened?

I frequently see tests fail with either Unexpected routines on test startup or goroutine is leaked.

@stv0g
Copy link
Member

stv0g commented May 19, 2023

At least the test TestAssociation_Shutdown appears to be failing regularily:

https://github.com/pion/sctp/actions/runs/5021866113/jobs/9004672764

@enobufs
Copy link
Member

enobufs commented Dec 29, 2023

The above log has expired. When I ran locally, I saw this error at high occurrance:

% go test -v -count 1 -race ./...
 :
 
 === RUN   TestAssocStressDuplex
    util.go:44: Unexpected routines on test startup:
        goroutine 10534 [chan send, 1 minutes]:
        github.com/pion/sctp.TestAssocReset.func2.2()
        	/Users/ytakeda/Projects/pion/bugfix/sctp-270/sctp/association_test.go:2225 +0xdc
        created by github.com/pion/sctp.TestAssocReset.func2 in goroutine 10344
        	/Users/ytakeda/Projects/pion/bugfix/sctp-270/sctp/association_test.go:2221 +0xad4

        goroutine 14423 [chan send]:
        github.com/pion/sctp.TestAssocReset.func2.2()
        	/Users/ytakeda/Projects/pion/bugfix/sctp-270/sctp/association_test.go:2225 +0xdc
        created by github.com/pion/sctp.TestAssocReset.func2 in goroutine 14437
        	/Users/ytakeda/Projects/pion/bugfix/sctp-270/sctp/association_test.go:2221 +0xad4

        goroutine 2602 [chan send, 2 minutes]:
        github.com/pion/sctp.TestAssocReset.func2.2()
        	/Users/ytakeda/Projects/pion/bugfix/sctp-270/sctp/association_test.go:2225 +0xdc
        created by github.com/pion/sctp.TestAssocReset.func2 in goroutine 2629
        	/Users/ytakeda/Projects/pion/bugfix/sctp-270/sctp/association_test.go:2221 +0xad4

        goroutine 6555 [chan send, 1 minutes]:
        github.com/pion/sctp.TestAssocReset.func2.2()
        	/Users/ytakeda/Projects/pion/bugfix/sctp-270/sctp/association_test.go:2225 +0xdc
        created by github.com/pion/sctp.TestAssocReset.func2 in goroutine 6569
        	/Users/ytakeda/Projects/pion/bugfix/sctp-270/sctp/association_test.go:2221 +0xad4
--- FAIL: TestAssocStressDuplex (10.20s)

What failed was not TestAssociation_Shutdown but the cause was Unexpected routines on test startup as the original report. I'd assume I am seeing the same one. I will try to fix the goroutine leak from TestAssocReset.

@enobufs
Copy link
Member

enobufs commented Dec 29, 2023

Now I see the ci test error:

TestAssociation_Shutdown (2s)
      association_test.go:2604: 
          	Error Trace:	/home/runner/work/sctp/sctp/association_test.go:2604
          	            				/opt/hostedtoolcache/go/1.19.13/x64/src/runtime/asm_amd64.s:1594
          	Error:      	Received unexpected error:
          	            	association closed before connecting
          	Test:       	TestAssociation_Shutdown
      association_test.go:2622: 
          	Error Trace:	/home/runner/work/sctp/sctp/association_test.go:2622
          	            				/home/runner/work/sctp/sctp/association_test.go:2643
          	Error:      	timed out waiting for a1
          	Test:       	TestAssociation_Shutdown
      association_test.go:2628: 
          	Error Trace:	/home/runner/work/sctp/sctp/association_test.go:2628
          	            				/home/runner/work/sctp/sctp/association_test.go:2643
          	Error:      	timed out waiting for a2
          	Test:       	TestAssociation_Shutdown
      association_test.go:2640: 
          	Error Trace:	/home/runner/work/sctp/sctp/association_test.go:2640
          	            				/opt/hostedtoolcache/go/1.19.13/x64/src/runtime/panic.go:890
          	            				/opt/hostedtoolcache/go/1.19.13/x64/src/runtime/panic.go:260
          	            				/opt/hostedtoolcache/go/1.19.13/x64/src/runtime/signal_unix.go:[83](https://github.com/pion/sctp/actions/runs/7353192438/job/20018854151?pr=292#step:9:135)9
          	            				/home/runner/work/sctp/sctp/association.go:1380
          	            				/home/runner/work/sctp/sctp/association_test.go:2645
          	Error:      	Not equal: 
          	            	expected: 2
          	            	actual  : 6
          	Test:       	TestAssociation_Shutdown
          	Messages:   	goroutine is leaked
  panic: runtime error: invalid memory address or nil pointer dereference [recovered]
  	panic: runtime error: invalid memory address or nil pointer dereference
  [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x6c430d]
  
  goroutine 2616 [running]:
  testing.tRunner.func1.2({0x774b40, 0x9cb[86](https://github.com/pion/sctp/actions/runs/7353192438/job/20018854151?pr=292#step:9:138)0})
  	/opt/hostedtoolcache/go/1.19.13/x64/src/testing/testing.go:1396 +0x372
  testing.tRunner.func1()
  	/opt/hostedtoolcache/go/1.19.13/x64/src/testing/testing.go:1399 +0x5f0
  panic({0x774b40, 0x9cb860})
  	/opt/hostedtoolcache/go/1.19.13/x64/src/runtime/panic.go:[89](https://github.com/pion/sctp/actions/runs/7353192438/job/20018854151?pr=292#step:9:141)0 +0x262
  github.com/pion/sctp.(*Association).OpenStream(0x0, 0x0?, 0x0?)
  	/home/runner/work/sctp/sctp/association.go:1380 +0x6d
  github.com/pion/sctp.TestAssociation_Shutdown(0xc0005e2680)
  	/home/runner/work/sctp/sctp/association_test.go:2645 +0xd9
  testing.tRunner(0xc0005e2680, 0x7e6038)
  	/opt/hostedtoolcache/go/1.19.13/x64/src/testing/testing.go:1446 +0x217
  created by testing.(*T).Run
  	/opt/hostedtoolcache/go/1.19.13/x64/src/testing/testing.go:14[93](https://github.com/pion/sctp/actions/runs/7353192438/job/20018854151?pr=292#step:9:145) +0x75e
  exit status 2

enobufs added a commit that referenced this issue Dec 30, 2023
enobufs added a commit that referenced this issue Dec 30, 2023
enobufs added a commit that referenced this issue Dec 30, 2023
enobufs added a commit that referenced this issue Dec 30, 2023
Added createConnectedUDPConnPair().
Relates to #270
enobufs added a commit that referenced this issue Dec 30, 2023
@enobufs
Copy link
Member

enobufs commented Dec 30, 2023

I have been trying different things and I finally got some clue why TestAssociation_Shutdown fails. Write operation on a net.Conn (a connected UDP using loopback interface) fails with some permission error:

https://github.com/pion/sctp/actions/runs/7363064418/job/20042145678?pr=292

❌ TestAssociation_Shutdown (2s)
  sctp DEBUG: 08:31:53.073542 association.go:1002: [0xc0009fa000] state change: 'Closed' => 'CookieWait'
  sctp DEBUG: 08:31:53.073558 association.go:1002: [0xc000248000] state change: 'Closed' => 'CookieWait'
  sctp DEBUG: 08:31:53.073574 association.go:377: [0xc0009fa000] sending INIT
  sctp DEBUG: 08:31:53.073589 association.go:377: [0xc000248000] sending INIT
  sctp DEBUG: 08:31:53.073620 association.go:538: [0xc0009fa000] readLoop entered
  sctp DEBUG: 08:31:53.073640 association.go:564: [0xc000248000] writeLoop entered
  sctp DEBUG: 08:31:53.073658 association.go:564: [0xc0009fa000] writeLoop entered
  sctp DEBUG: 08:31:53.073656 association.go:538: [0xc000248000] readLoop entered
  sctp WARNING: 2023/12/30 08:31:53 [0xc0009fa000] failed to write packets on netConn: write udp 127.0.0.1:41866->127.0.0.1:58766: write: operation not permitted
  sctp DEBUG: 08:31:53.073939 association.go:577: [0xc0009fa000] writeLoop ended
  sctp DEBUG: 08:31:53.074058 association.go:1002: [0xc0009fa000] state change: 'CookieWait' => 'Closed'
  sctp DEBUG: 08:31:53.073963 association.go:1066: [0xc0009fa000] chunkInit received in state 'CookieWait'
  sctp DEBUG: 08:31:53.074192 association.go:1099: [0xc0009fa000] use ForwardTSN (on init)
  sctp DEBUG: 08:31:53.074[163](https://github.com/pion/sctp/actions/runs/7362909083/job/20041824067?pr=292#step:9:232) association.go:600: [0xc0009fa000] writeLoop exited
  sctp ERROR: 2023/12/30 08:31:55 [0xc0009fa000] client handshake canceled: state=Closed
  sctp ERROR: 2023/12/30 08:31:55 [0xc000248000] client handshake canceled: state=CookieWait
      association_test.go:2686: 
          	Error Trace:	/home/runner/work/sctp/sctp/association_test.go:2686
          	Error:      	Received unexpected error:
          	            	context deadline exceeded
          	Test:       	TestAssociation_Shutdown
      association_test.go:2682: 
          	Error Trace:	/home/runner/work/sctp/sctp/association_test.go:2682
          	            				/opt/hostedtoolcache/go/1.20.12/x64/src/runtime/panic.go:540
          	            				/opt/hostedtoolcache/go/1.20.12/x64/src/testing/testing.go:980
          	            				/home/runner/work/sctp/sctp/association_test.go:2686
          	Error:      	Not equal: 
          	            	expected: 2
          	            	actual  : 7
          	Test:       	TestAssociation_Shutdown
          	Messages:   	goroutine is leaked

Does any one have any idea what could cause this logs message? sctp WARNING: 2023/12/30 08:31:53 [0xc0009fa000] failed to write packets on netConn: write udp 127.0.0.1:41866->127.0.0.1:58766: write: operation not permitted
@stv0g @Sean-Der

@enobufs
Copy link
Member

enobufs commented Dec 30, 2023

This might be related: golang/go#63322

Also see quick-go's workaround:quic-go/quic-go@86de97c

@enobufs
Copy link
Member

enobufs commented Dec 30, 2023

FYI: goroutine is leaked was also caused by false-positives by how goroutine leak was detected. I have improved it in the WIP-PR #292.

enobufs added a commit that referenced this issue Dec 30, 2023
enobufs added a commit that referenced this issue Dec 31, 2023
enobufs added a commit that referenced this issue Dec 31, 2023
@enobufs
Copy link
Member

enobufs commented Dec 31, 2023

I have tried quick-go's workaround but now it causes a connection refuse error on the reader side which really does not make any sense.

At this point, I am leaning toward not using the mutually connected UDPConn pair in the test, and use a mocked net.Conn, or vnet instead. pion/sctp does not actually use the mutually connected UDPConn anyway.

❌ TestAssociation_Shutdown (1s)
  sctp DEBUG: 07:03:37.660217 association.go:334: [0xc0009714a0] retryFirstWriteError=true
  sctp DEBUG: 07:03:37.660316 association.go:1030: [0xc0009714a0] state change: 'Closed' => 'CookieWait'
  sctp DEBUG: 07:03:37.660331 association.go:334: [0xc000938960] retryFirstWriteError=true
  sctp DEBUG: 07:03:37.660415 association.go:1030: [0xc000938960] state change: 'Closed' => 'CookieWait'
  sctp DEBUG: 07:03:37.660431 association.go:545: [0xc000938960] readLoop entered
  sctp DEBUG: 07:03:37.660345 association.go:545: [0xc0009714a0] readLoop entered
  sctp DEBUG: 07:03:37.660453 association.go:571: [0xc000938960] writeLoop entered
  sctp DEBUG: 07:03:37.660347 association.go:384: [0xc0009714a0] sending INIT
  sctp DEBUG: 07:03:37.660439 association.go:384: [0xc000938960] sending INIT
  sctp DEBUG: 07:03:37.660361 association.go:571: [0xc0009714a0] writeLoop entered
  sctp DEBUG: 07:03:37.660614 association.go:624: [0xc000938960] initial write succeeded
  sctp DEBUG: 07:03:37.660636 association.go:624: [0xc0009714a0] initial write succeeded
  sctp DEBUG: 07:03:37.660700 association.go:567: [0xc0009714a0] readLoop exited read udp 127.0.0.1:44500->127.0.0.1:39991: read: connection refused
  sctp DEBUG: 07:03:37.660741 association.go:537: [0xc0009714a0] association closed
  sctp DEBUG: 07:03:37.660761 association.go:538: [0xc0009714a0] stats nDATAs (in) : 0
  sctp DEBUG: 07:03:37.660779 association.go:539: [0xc0009714a0] stats nSACKs (in) : 0
  sctp DEBUG: 07:03:37.660795 association.go:540: [0xc0009714a0] stats nT3Timeouts : 0
  sctp DEBUG: 07:03:37.660809 association.go:541: [0xc0009714a0] stats nAckTimeouts: 0
  sctp DEBUG: 07:03:37.660830 association.go:542: [0xc0009714a0] stats nFastRetrans: 0
  sctp DEBUG: 07:03:37.660856 association.go:1030: [0xc0009714a0] state change: 'CookieWait' => 'Closed'
  sctp DEBUG: 07:03:37.660896 association.go:607: [0xc0009714a0] writeLoop exited
  sctp ERROR: 2023/12/31 07:03:37 [0xc000938960] client handshake canceled: state=CookieWait
      association_test.go:2675: 
          	Error Trace:	/home/runner/work/sctp/sctp/association_test.go:2675
          	Error:      	Received unexpected error:
          	            	association closed before connecting
          	Test:       	TestAssociation_Shutdown
      association_test.go:2111: leaked 4 goroutines

enobufs added a commit that referenced this issue Dec 31, 2023
enobufs added a commit that referenced this issue Dec 31, 2023
enobufs added a commit that referenced this issue Dec 31, 2023
enobufs added a commit that referenced this issue Dec 31, 2023
enobufs added a commit that referenced this issue Jan 1, 2024
This is avoid the issue with unexpected errors occur on Ubuntu linux.
Relates to #270
enobufs added a commit that referenced this issue Jan 1, 2024
enobufs added a commit that referenced this issue Jan 1, 2024
This is avoid the issue with unexpected errors occur on Ubuntu linux.
Relates to #270
@enobufs enobufs self-assigned this Jan 1, 2024
@enobufs
Copy link
Member

enobufs commented Jan 1, 2024

At this point, I am leaning toward not using the mutually connected UDPConn pair in the test ...

Now I am using a mock'd connection using "dumbConn2". Now, all the flaky tests are stable and ready for review.

Sean-Der pushed a commit that referenced this issue Jan 6, 2024
Sean-Der pushed a commit that referenced this issue Jan 6, 2024
Sean-Der pushed a commit that referenced this issue Jan 6, 2024
This is avoid the issue with unexpected errors occur on Ubuntu linux.
Relates to #270
Sean-Der pushed a commit that referenced this issue Jan 6, 2024
Sean-Der pushed a commit that referenced this issue Jan 6, 2024
Sean-Der pushed a commit that referenced this issue Jan 6, 2024
This is avoid the issue with unexpected errors occur on Ubuntu linux.
Relates to #270
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants