segment writer service #3498

korniltsev · 2024-08-18T19:28:18Z

Bring back segment writer service.
Add push protobuf api for segment writer.

The service is still detached - nobody is pushing to it yet.
The service is not as optimized as in POC.

It will be addressed in followups.

kolesnikovae

LGTM – I've left a few notes but those are just my thoughts / topics for discussion – please feel free to ignore them, we'll figure those out along the way

kolesnikovae · 2024-08-19T03:38:52Z

pkg/experiment/ingester/service.go

+	f.DurationVar(&cfg.SegmentDuration, prefix+"segment.duration", 500*time.Millisecond, "Timeout when flushing segments to bucket.")
+	f.BoolVar(&cfg.Async, prefix+"async", false, "Enable async mode for segment writer.")


I think these should be tenant options (limits) rather than global ones

kolesnikovae · 2024-08-19T03:56:40Z

pkg/phlaredb/metrics.go

+func ContextWithHeadMetrics(ctx context.Context, reg prometheus.Registerer, prefix string) context.Context {
+	return contextWithHeadMetrics(ctx, newHeadMetrics2(reg, prefix))
+}
+


Not for this PR: I saw your attempt to make the dependency on metrics explicit 👍🏻 I really hope we won't pass it via the context

pkg/phlaredb/metrics.go

kolesnikovae · 2024-08-19T04:09:18Z

pkg/experiment/ingester/service.go

+		err = pprof.FromBytes(sample.RawProfile, func(p *profilev1.Profile, size int) error {
+			if err = segment.ingest(ctx, tenantID, p, id, series.Labels); err != nil {
+				reason := validation.ReasonOf(err)
+				if reason != validation.Unknown {
+					validation.DiscardedProfiles.WithLabelValues(string(reason), tenantID).Add(float64(1))
+					validation.DiscardedBytes.WithLabelValues(string(reason), tenantID).Add(float64(size))
+					switch validation.ReasonOf(err) {
+					case validation.SeriesLimit:
+						return connect.NewError(connect.CodeResourceExhausted, err)
+					}
+				}
+			}
+			return nil
+		})
+		if err != nil {
+			return err
+		}


As we won't have SeriesLimit in segment writer, we can simplify this piece

kolesnikovae · 2024-08-19T04:13:58Z

pkg/experiment/ingester/service.go

+			i.segmentWriter.metrics.segmentFlushTimeouts.WithLabelValues(tenantID).Inc()
+			i.segmentWriter.metrics.segmentFlushWaitDuration.WithLabelValues(tenantID).Observe(time.Since(t1).Seconds())
+			level.Error(i.logger).Log("msg", "flush timeout", "err", err)


We assume that the error indicates a timeout. We probably want to check the error type here (or context.Err())

kolesnikovae · 2024-08-19T04:16:32Z

pkg/experiment/ingester/service.go

+	var waits = make(map[segmentWaitFlushed]struct{}, len(req.Msg.Series))
+	for _, series := range req.Msg.Series {
+		var shard = shardKey(series.Shard)
+		wait, err := i.segmentWriter.ingest(shard, func(segment segmentIngest) error {
+			return i.ingestToSegment(ctx, segment, series, tenantID)
+		})
+		if err != nil {
+			return nil, err
+		}
+		waits[wait] = struct{}{}
+	}


NB: If we moved pprof split from distributors to segment writers and restricted requests to a single profile, we would not need to wait multiple segments to flush (which may result in 2 * segment_duration latency)

kolesnikovae · 2024-08-19T04:25:15Z

api/segmentwriter/v1/push.proto

+message PushRequest {
+  // series is a set raw pprof profiles and accompanying labels
+  repeated RawProfileSeries series = 1;
+}


We discussed this internally at some point, and I recall the consensus was that batching does not benefit us here. On the contrary, it introduces several issues:

Callers have to wait all the affected shard segment writers to flush, which badly impacts latency, and may also impact resource usage on the distributor side.

It complicates error handling. I'm not 100% sure that partial success is handled properly.

It complicates retries on the distributor end.

I hope we'll amend the API and implementation accordingly in follow-up PRs.

I removed repeated series, but kept repeated samples

Co-authored-by: Anton Kolesnikov <anton.e.kolesnikov@gmail.com>

segment writer service

5abc7dd

korniltsev requested review from a team as code owners August 18, 2024 19:28

make fmt

2e451a2

kolesnikovae approved these changes Aug 19, 2024

View reviewed changes

korniltsev and others added 7 commits August 19, 2024 09:07

Update pkg/phlaredb/metrics.go

3e1729e

Co-authored-by: Anton Kolesnikov <anton.e.kolesnikov@gmail.com>

review fixes

4bbd794

Merge branch 'main' into korniltsev/segmentwriter

8d22f6c

review fixes

ed6f3c7

fix

5ba606f

less api batching

efd640f

fix

5553f9e

korniltsev merged commit 987f743 into main Aug 19, 2024
18 checks passed

korniltsev deleted the korniltsev/segmentwriter branch August 19, 2024 08:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segment writer service #3498

segment writer service #3498

korniltsev commented Aug 18, 2024

kolesnikovae left a comment •

edited

Loading

kolesnikovae Aug 19, 2024

kolesnikovae Aug 19, 2024

kolesnikovae Aug 19, 2024

kolesnikovae Aug 19, 2024

kolesnikovae Aug 19, 2024

kolesnikovae Aug 19, 2024

korniltsev Aug 19, 2024

		f.DurationVar(&cfg.SegmentDuration, prefix+"segment.duration", 500*time.Millisecond, "Timeout when flushing segments to bucket.")
		f.BoolVar(&cfg.Async, prefix+"async", false, "Enable async mode for segment writer.")

segment writer service #3498

segment writer service #3498

Conversation

korniltsev commented Aug 18, 2024

kolesnikovae left a comment • edited Loading

Choose a reason for hiding this comment

kolesnikovae Aug 19, 2024

Choose a reason for hiding this comment

kolesnikovae Aug 19, 2024

Choose a reason for hiding this comment

kolesnikovae Aug 19, 2024

Choose a reason for hiding this comment

kolesnikovae Aug 19, 2024

Choose a reason for hiding this comment

kolesnikovae Aug 19, 2024

Choose a reason for hiding this comment

kolesnikovae Aug 19, 2024

Choose a reason for hiding this comment

korniltsev Aug 19, 2024

Choose a reason for hiding this comment

kolesnikovae left a comment •

edited

Loading