-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf observability: Encoder data throughput and blob size breakdown #716
Conversation
disperser/encoder/metrics.go
Outdated
@@ -42,7 +42,7 @@ func NewMetrics(httpPort string, logger logging.Logger) *Metrics { | |||
Name: "request_total", | |||
Help: "the number and size of total encode blob request at server side per state", | |||
}, | |||
[]string{"state"}, // state is either success, ratelimited, canceled, or failure | |||
[]string{"type", "state"}, // type is either number or size; state is either success, ratelimited, canceled, or failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a separate metric instead of using label
disperser/encoder/server.go
Outdated
@@ -126,7 +126,7 @@ func (s *Server) handleEncoding(ctx context.Context, req *pb.EncodeBlobRequest) | |||
} | |||
|
|||
totalTime := time.Since(begin) | |||
s.metrics.TakeLatency(encodingTime, totalTime) | |||
s.metrics.TakeLatency(len(req.GetData()), encodingTime, totalTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since coding time actually depends on the size of encoded blob, that is a function of input blob size and current stake distribution. Should we measure against the coded size.
i.e. ChunkLength*NumChunks @dmanc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add chunklength numchunks as labels too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not trying to add all factors that affect latency to a metric.
Also when the operators are given, each blob has the same condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blobs has the same condition only in the same reference Blocknumber. It could change depending on the stake distribution. I guess the real question is what we want to get out of this metrics. My understanding is that we want to know the encoding speed. Now here is my concern, suppose for one stake distribution, min stake is 0.01, and in another distribution, the stake distribution is 0.001. Although both blob are the same size, the proof generation time would be different, because the encoded size are differnt. We might misinterpret as if there is performance degradation, whereas in reality, it is just some stake distribution change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we want to track a variable from user perspective, like "I have a blob of size X, and it takes Y ms to process".
I'm fine to drop this label if blobSize is a poor variable to monitor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I should have kept it: we are not using blobSize to determine latency here. And this blobSize label will help detect the significant change in stake distribution.
E.g. the same blob size is getting a much larger encoding latency. We will be informed that there is fundamental change in the stake distribution.
Without this label, we cannot tell if it's because stake distribution change or something else, as it's mixing all blob sizes which can have very different latencies.
Why are these changes needed?
Checks