-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnsafeRange panicking during shutdown #17223
Comments
UPDATED: When server closes the backend, the backend will stop background commit goroutine and reset read transaction. etcd/server/storage/backend/read_tx.go Lines 133 to 138 in c8b4b16
For etcd/server/storage/backend/batch_tx.go Lines 367 to 376 in c8b4b16
The |
Another case with etcd-io/bbolt#715 |
@fuweid would you be able to propose a fix? The issue showed up in robustness tests, which I would prefer to keep flake free. |
Hi @serathius Sure. Will file pull request later. |
Hi @ahrtr @serathius sorry for taking so long on this issue. This issue has been fixed by gRPC layer grpc/grpc-go@61eab37 (released by v1.61.0 to fix regression): All the requests are tracked by We call gracefulstop when we received SIGTERM signal, except cmux-mode. Lines 474 to 493 in a7f5d4b
We don't need to setup timeout for draining things because etcd/server/etcdserver/server.go Lines 816 to 827 in a7f5d4b
And the #17757 is also enhancement for failpoint test. PTAL. Thanks Side note: I was using old version (1.60.1) so that previous approach is to introduce If the server layer can track active RPCs, it will be better. So, I revisit the gRPC code and find that type txRef struct {
sync.RWMutex
wg sync.WaitGroup
}
type Backend interface {
ReadTx() (ReadTx, TxRefReleaseFunc, error)
ConcurrentReadTx() (ReadTx, TxRefReleaseFunc, error)
BatchTx() (BatchTx, TxRefReleaseFunc, error)
}
tx, txPut, err := ReadTx() // ConncurrentReadTx() / BatchTx()
if err != nil {
return err
}
defer txPut()
.. |
Bug report criteria
What happened?
Test case
TestMaintenanceSnapshotCancel
failed and panicking.Refer to https://github.com/etcd-io/etcd/actions/runs/7463174417/job/20307221683?pr=17220
Based on the log, the reason should be that the backend has already been closed (the member is being stopped) before the snapshot operation,
etcd/server/storage/backend/backend.go
Line 331 in a2eb17c
What did you expect to happen?
No panicking on processing any client requests
How can we reproduce it (as minimally and precisely as possible)?
Write an integration test to stop a member before call the snapshot api.
Anything else we need to know?
No response
Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
No response
The text was updated successfully, but these errors were encountered: