Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quotaKVServer blocks Txn-compare-get-delete (kube-apiserver delete) while allowing DeleteRange #13429

Closed
chaochn47 opened this issue Oct 20, 2021 · 1 comment · Fixed by #13435
Assignees
Labels

Comments

@chaochn47
Copy link
Member

chaochn47 commented Oct 20, 2021

When there is no space left in boltdb, kubernetes delete (txn with compare, get, delete) is blocked

func (s *quotaKVServer) Txn(ctx context.Context, r *pb.TxnRequest) (*pb.TxnResponse, error) {
if err := s.qa.check(ctx, r); err != nil {
return nil, err
}
return s.KVServer.Txn(ctx, r)
}

// check whether request satisfies the quota. If there is not enough space,
// ignore request and raise the free space alarm.
func (qa *quotaAlarmer) check(ctx context.Context, r interface{}) error {
if qa.q.Available(r) {
return nil
}
req := &pb.AlarmRequest{
MemberID: uint64(qa.id),
Action: pb.AlarmRequest_ACTIVATE,
Alarm: pb.AlarmType_NOSPACE,
}
qa.a.Alarm(ctx, req)
return rpctypes.ErrGRPCNoSpace
}

func (b *BackendQuota) Available(v interface{}) bool {
// TODO: maybe optimize Backend.Size()
return b.be.Size()+int64(b.Cost(v)) < b.maxBackendBytes
}

while etcd DeleteRange can be served and not guarded by quotaKVServer

Here is an example

# etcd logging
backendSize is 84111360, cost is 0, maxBackendBytes is 83886080
./bin/etcdctl txn --interactive
compares:
value("foo") = "bar"

success requests (get, put, del):
del foo

failure requests (get, put, del):

{"level":"warn","ts":"2021-10-20T07:13:38.594Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00000a1e0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = ResourceExhausted desc = etcdserver: mvcc: database space exceeded"}
Error: etcdserver: mvcc: database space exceeded
./bin/etcdctl del foo
1

Does it make sense to let txn with no cost non-blocking?

@hexfusion hexfusion self-assigned this Oct 20, 2021
@hexfusion
Copy link
Contributor

@chaochn47 thanks for the detailed issue. Looking at this further your proposal makes sense. costTxn calculates cost in bytes as the highest of success or failure mutating Op. If there is no mutating Op in the Txn the cost would be 0. This aligns well with the recent change allowing health checks to exclude alarms[1]. The cluster should be able to resolve NOSPACE alert without admin interaction if safe. 👍

func costTxn(r *pb.TxnRequest) int {
sizeSuccess := 0
for _, u := range r.Success {
sizeSuccess += costTxnReq(u)
}
sizeFailure := 0
for _, u := range r.Failure {
sizeFailure += costTxnReq(u)
}
if sizeFailure > sizeSuccess {
return sizeFailure
}
return sizeSuccess
}

func costTxnReq(u *pb.RequestOp) int {
r := u.GetRequestPut()
if r == nil {
return 0
}
return costPut(r)
}

[1] #12880

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

2 participants