-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdserver: Add response byte size and range response count to took too long warning #9826
Conversation
@@ -114,7 +114,7 @@ func (a *applierV2store) Sync(r *RequestV2) Response { | |||
// applyV2Request interprets r as a call to v2store.X | |||
// and returns a Response interpreted from v2store.Event | |||
func (s *EtcdServer) applyV2Request(r *RequestV2) Response { | |||
defer warnOfExpensiveRequest(s.getLogger(), time.Now(), r) | |||
defer warnOfExpensiveRequest(s.getLogger(), time.Now(), r, nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is v3 only acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*RequestV2
also satisfies proto.Message
interface, but I don't mind just passing nil for v2 API (since this is a new feature, no need to worry about v2 behavior).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks!
Marking as WIP until we figure out CI failures. |
Reviewing a problem with
It appears that somehow |
ref. golang/protobuf#631 Also, later we can replace https://github.com/coreos/etcd/blob/61ef343166eabab25deebf4e83b6e647aa4402fe/etcdserver/quota.go#L126-L137 with |
I've put a workaround for the Integration tests are now passing. I'll wait for CI. |
etcdserver/util.go
Outdated
if r := recover(); r != nil { | ||
size = "proto_size_failure" | ||
} | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work?
import "github.com/golang/protobuf/proto"
ms, ok := msg.(proto.Marshaler)
if ms == nil || !ok {
return "proto_size_failure"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately no, I just tried it and got the same error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then, let's use recover
for now, and revert it back to upstream proto.Size
later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Too bad we can't have a clean PR for this improvement. I'll track the upstream issue and propose a PR if needed.
Just waiting on a clean semaphoreci run now. |
@jpbetz I noticed that In // safeSize attempt to calculate size using proto.Size, but recovers from panics and returns "proto_size_failure" in those cases.
// TODO: Remove once https://github.com/golang/protobuf/issues/631 is resolved.
func safeSize(msg proto.Message) (size string) {
+ fmt.Println("safeSize:", msg == nil, reflect.ValueOf(msg).IsNil()) This will print out So, it will panic in func (m *CompactionResponse) Marshal() (dAtA []byte, err error) {
size := m.Size() |
@gyuho Thanks for hunting this down. I'd been warned before of this gotcha with interfaces! Fixing now. |
Very useful information. LGTM. |
adf989f
to
df47105
Compare
|
etcdserver/util.go
Outdated
func warnOfExpensiveRequest(lg *zap.Logger, now time.Time, reqStringer fmt.Stringer, respMsg proto.Message, err error) { | ||
var resp string | ||
if !isNil(respMsg) { | ||
resp = fmt.Sprintf("size:%s", proto.Size(respMsg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%d
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks. The safeSize function I used before returned a string. Fixing.
etcdserver/util.go
Outdated
func warnOfExpensiveReadOnlyRangeRequest(lg *zap.Logger, now time.Time, reqStringer fmt.Stringer, rangeResponse *pb.RangeResponse, err error) { | ||
var resp string | ||
if !isNil(rangeResponse) { | ||
resp = fmt.Sprintf("range_response_count:%d size:%s", len(rangeResponse.Kvs), proto.Size(rangeResponse)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"range_response_count:%d size:%d"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
etcdserver/util.go
Outdated
// only range responses should be in a read only txn request | ||
} | ||
} | ||
resp = fmt.Sprintf("responses:<%s> size:%s", strings.Join(resps, " "), proto.Size(txnResponse)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"responses:<%s> size:%d"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks!
Large range reads can result in "took too long" warnings. This adds range response counts as well as total proto response message size (in bytes) to the warnings to help debug situations where knowing the volume of data returned can help diagnose an issue.
put:
txn:
cc @gyuho @wenjiaswe @mml