Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Panics in SendRequestNoWait are triggered  when reconnecting consumer to broker #859

Open
countryroadscn opened this issue Oct 11, 2022 · 2 comments · May be fixed by #860
Open

Comments

@countryroadscn
Copy link

countryroadscn commented Oct 11, 2022

Expected behavior

Panics should not occur when the client reconnects

Actual behavior

Panic is triggered in pulsar and cannot be captured by the client code.

The backtrace of core dump:
0 0x0000000000479261 in runtime.raise
at /data/goroot/go/src/runtime/sys_linux_amd64.s:168
1 0x000000000045adc5 in runtime.dieFromSignal
at /data/goroot/go/src/runtime/signal_unix.go:852
2 0x000000000045b796 in runtime.sigfwdgo
at /data/goroot/go/src/runtime/signal_unix.go:1066
3 0x0000000000459b27 in runtime.sigtrampgo
at /data/goroot/go/src/runtime/signal_unix.go:430
4 0x000000000047a0ae in runtime.sigtrampgo
at :1
5 0x000000000047955d in runtime.sigtramp
at /data/goroot/go/src/runtime/sys_linux_amd64.s:361
6 0x00007f9041b19390 in ???
at ?:-1
7 0x0000000000444e49 in runtime.crash
at /data/goroot/go/src/runtime/signal_unix.go:944
8 0x0000000000444e49 in runtime.fatalpanic
at /data/goroot/go/src/runtime/panic.go:1092
9 0x0000000000444617 in runtime.gopanic
at /data/goroot/go/src/runtime/panic.go:941
10 0x0000000000490994 in sync.(*WaitGroup).Add
at /data/goroot/go/src/sync/waitgroup.go:94
11 0x0000000000490a05 in sync.(*WaitGroup).Done
at /data/goroot/go/src/sync/waitgroup.go:105
12 0x0000000000fec866 in github.com/apache/pulsar-client-go/pulsar/internal.(*connection).SendRequestNoWait
at /data/goroot/gopath/pkg/mod/github.com/apache/pulsar-client-go@v0.6.0/pulsar/internal/connection.go:606

截屏2022-10-11 15 36 42
截屏2022-10-11 15 36 59
截屏2022-10-11 15 37 08

Steps to reproduce

This issue may not reproduce every time, but it must be related to sync.WaitGroup.

System configuration

Pulsar version: 2.9
pulsar-client-go version: v0.6.0

@zzzming
Copy link
Contributor

zzzming commented Oct 12, 2022

Thank you for reporting this. Although it's 0.6.0, the same logic exists in the current master.

The problem is basically WaitGroup does not allow Wait() and Add() to run concurrently. (Add() itself can run concurrently). This is clearly explained in the Go WaitGroup source code at https://github.com/golang/go/blob/master/src/sync/waitgroup.go#L30

// Add adds delta, which may be negative, to the WaitGroup counter.
// If the counter becomes zero, all goroutines blocked on Wait are released.
// If the counter goes negative, Add panics.
//
// Note that calls with a positive delta that occur when the counter is zero
// must happen before a Wait. Calls with a negative delta, or calls with a
// positive delta that start when the counter is greater than zero, may happen
// at any time.
// Typically this means the calls to Add should execute before the statement
// creating the goroutine or other event to be waited for.

The WaitGroup, incomingRequestsWG, in the connection.go waits all the sent requests before close the main run goroutine. The design is for run() wait for any requests to be completed if there is any. So both Wait() and Add() are in this concurrent manner. The code should preemptively Add(1) before Wait() is called in the closeCh triggered.

@countryroadscn
Copy link
Author

Thank you for reporting this. Although it's 0.6.0, the same logic exists in the current master.

The problem is basically WaitGroup does not allow Wait() and Add() to run concurrently. (Add() itself can run concurrently). This is clearly explained in the Go WaitGroup source code at https://github.com/golang/go/blob/master/src/sync/waitgroup.go#L30

// Add adds delta, which may be negative, to the WaitGroup counter.
// If the counter becomes zero, all goroutines blocked on Wait are released.
// If the counter goes negative, Add panics.
//
// Note that calls with a positive delta that occur when the counter is zero
// must happen before a Wait. Calls with a negative delta, or calls with a
// positive delta that start when the counter is greater than zero, may happen
// at any time.
// Typically this means the calls to Add should execute before the statement
// creating the goroutine or other event to be waited for.

The WaitGroup, incomingRequestsWG, in the connection.go waits all the sent requests before close the main run goroutine. The design is for run() wait for any requests to be completed if there is any. So both Wait() and Add() are in this concurrent manner. The code should preemptively Add(1) before Wait() is called in the closeCh triggered.

What time can this pr be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants