Perf observability: Encoder data throughput and blob size breakdown #716

jianoaix · 2024-08-21T19:38:05Z

Why are these changes needed?

Better understanding of the encoding throughput
Better understanding of the latency by blob size buckets

Checks

I've made sure the lint is passing in this PR.
I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
Testing Strategy
- Unit tests
- Integration tests
- This PR is not tested :(

…codermetricdatasize

dmanc · 2024-08-21T20:11:25Z

disperser/encoder/metrics.go

@@ -42,7 +42,7 @@ func NewMetrics(httpPort string, logger logging.Logger) *Metrics {
 				Name:      "request_total",
 				Help:      "the number and size of total encode blob request at server side per state",
 			},
-			[]string{"state"}, // state is either success, ratelimited, canceled, or failure
+			[]string{"type", "state"}, // type is either number or size; state is either success, ratelimited, canceled, or failure


Can we have a separate metric instead of using label

bxue-l2 · 2024-08-21T22:01:38Z

disperser/encoder/server.go

@@ -126,7 +126,7 @@ func (s *Server) handleEncoding(ctx context.Context, req *pb.EncodeBlobRequest)
 	}

 	totalTime := time.Since(begin)
-	s.metrics.TakeLatency(encodingTime, totalTime)
+	s.metrics.TakeLatency(len(req.GetData()), encodingTime, totalTime)


since coding time actually depends on the size of encoded blob, that is a function of input blob size and current stake distribution. Should we measure against the coded size.
i.e. ChunkLength*NumChunks @dmanc

Why not add chunklength numchunks as labels too?

We are not trying to add all factors that affect latency to a metric.
Also when the operators are given, each blob has the same condition.

Blobs has the same condition only in the same reference Blocknumber. It could change depending on the stake distribution. I guess the real question is what we want to get out of this metrics. My understanding is that we want to know the encoding speed. Now here is my concern, suppose for one stake distribution, min stake is 0.01, and in another distribution, the stake distribution is 0.001. Although both blob are the same size, the proof generation time would be different, because the encoded size are differnt. We might misinterpret as if there is performance degradation, whereas in reality, it is just some stake distribution change.

Ideally we want to track a variable from user perspective, like "I have a blob of size X, and it takes Y ms to process".
I'm fine to drop this label if blobSize is a poor variable to monitor.

Actually I should have kept it: we are not using blobSize to determine latency here. And this blobSize label will help detect the significant change in stake distribution.
E.g. the same blob size is getting a much larger encoding latency. We will be informed that there is fundamental change in the stake distribution.
Without this label, we cannot tell if it's because stake distribution change or something else, as it's mixing all blob sizes which can have very different latencies.

…codermetricdatasize

jianoaix added 3 commits August 21, 2024 19:35

Perf observability: Encoder data throughput and blob size breakdown

fac977a

fix

35d9b44

Merge branch 'master' of https://github.com/Layr-Labs/eigenda into en…

478c714

…codermetricdatasize

jianoaix requested review from dmanc and bxue-l2 August 21, 2024 19:52

dmanc approved these changes Aug 21, 2024

View reviewed changes

bxue-l2 reviewed Aug 21, 2024

View reviewed changes

jianoaix added 3 commits August 22, 2024 20:55

Merge branch 'master' of https://github.com/Layr-Labs/eigenda into en…

2ce324e

…codermetricdatasize

feedback

e0d9f85

fix

4d7271d

jianoaix merged commit 1cdaebd into Layr-Labs:master Aug 22, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf observability: Encoder data throughput and blob size breakdown #716

Perf observability: Encoder data throughput and blob size breakdown #716

jianoaix commented Aug 21, 2024 •

edited

Loading

dmanc Aug 21, 2024

bxue-l2 Aug 21, 2024

dmanc Aug 21, 2024

jianoaix Aug 22, 2024

bxue-l2 Aug 22, 2024

jianoaix Aug 22, 2024

jianoaix Aug 22, 2024

Perf observability: Encoder data throughput and blob size breakdown #716

Perf observability: Encoder data throughput and blob size breakdown #716

Conversation

jianoaix commented Aug 21, 2024 • edited Loading

Why are these changes needed?

Checks

dmanc Aug 21, 2024

Choose a reason for hiding this comment

bxue-l2 Aug 21, 2024

Choose a reason for hiding this comment

dmanc Aug 21, 2024

Choose a reason for hiding this comment

jianoaix Aug 22, 2024

Choose a reason for hiding this comment

bxue-l2 Aug 22, 2024

Choose a reason for hiding this comment

jianoaix Aug 22, 2024

Choose a reason for hiding this comment

jianoaix Aug 22, 2024

Choose a reason for hiding this comment

jianoaix commented Aug 21, 2024 •

edited

Loading