kafka replay speed: add bytes limit for inflight fetch requests #9892

dimitarvdimitrov · 2024-11-13T17:16:10Z

Problem

OOMs happen because the concurrency * records-per-fetch product is too large. We hold too many records in memory and don't consume them as fast as we fetch them. The goal is to have a setting to control the memory consumption of the process. In the future it may be scaled with the ingester's memory request for example.

Proposal

Instead of configuring records-per-fetch we should configure the maximum memory for all fetchers. We can work backwards to devise the number of records per fetchWant from the number of concurrent fetchers.

I propose to remove these flags

-ingest-storage.kafka.ongoing-records-per-fetch=30
-ingest-storage.kafka.startup-records-per-fetch=512

and replace them with a single flag

-ingest-storage.kafka.max-buffered-bytes=100000000 # 100MB

Each fetchWant still needs to have a well defined startOffset and endOffset. The number of records of each fetchWant will be ($max_inflight_bytes / $startup_concurrency) / $bytes_per_record. We already track the average bytes_per_record of the last few fetches, but the value there can be volatile. Maybe we can also invest in a better tracking of average bytes.

What this PR does

This PR is the first step to doing the above: adding a limit for the inflight bytes.

-ingest-storage.kafka.max-buffered-bytes

The limit works by controlling the fetchWants. We don't dispatch a fetchWant if its MaxBytes would exceed the limit. The idea is that MaxBytes of the request are usually a good estimation of how much data we have to fetch.

This does have a caveat that if estimation is bad, we'd continue and still risk OOMing.

I didn't add two separate limits for startup and ongoing because we're currently setting them both to the same value and the short/medium-term plan is to only have a single config option.

Next steps

The next step would be to remove the records per fetch config options altogether and only rely on max bytes.

Note to reviewers

This is based on #9891 because that PR fixes flaky tests. The changes aren't otherwise dependant.

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

tacole02

Thanks so much, @dimitarvdimitrov !

docs/sources/mimir/configure/configuration-parameters/index.md

pkg/storage/ingest/config.go

pkg/storage/ingest/fetcher.go

pkg/storage/ingest/reader.go

pkg/storage/ingest/fetcher.go

pracucci · 2024-11-14T12:49:06Z

pkg/storage/ingest/reader.go

@@ -1043,6 +1084,12 @@ func newReaderMetrics(partitionID int32, reg prometheus.Registerer, bufferedReco
 		lastConsumedOffset:               lastConsumedOffset,
 		kprom:                            NewKafkaReaderClientMetrics(component, reg),
 	}
+
+	m.Service = services.NewTimerService(100*time.Millisecond, nil, func(context.Context) error {


I can't see anyone starting/stopping this service.

it's started here

mimir/pkg/storage/ingest/reader.go

Lines 219 to 228 in 314647e

r.dependencies, err = services.NewManager(r.committer, r.offsetReader, r.consumedOffsetWatcher, startOffsetReader, r.metrics)

if err != nil {

return errors.Wrap(err, "creating service manager")

}

// Use context.Background() because we want to stop all dependencies when the PartitionReader stops

// instead of stopping them when ctx is cancelled and while the PartitionReader is still running.

err = services.StartManagerAndAwaitHealthy(context.Background(), r.dependencies)

if err != nil {

return errors.Wrap(err, "starting service manager")

}

and then stopped here

mimir/pkg/storage/ingest/reader.go

Lines 280 to 285 in 314647e

func (r *PartitionReader) stopDependencies() error {

if r.dependencies != nil {

if err := services.StopManagerAndAwaitStopped(context.Background(), r.dependencies); err != nil {

return errors.Wrap(err, "stopping service manager")

}

}

pracucci

Nice work and nice tests! Pre-approving, but please remember to start/stop the reader metrics service.

pracucci · 2024-11-14T12:59:12Z

pkg/storage/ingest/fetcher.go

 			nextFetch = nextFetch.Next(recordsPerFetch)

 		case result, moreLeft := <-refillBufferedResult:
 			if !moreLeft {
-				if pendingResults.Len() > 0 {


We've lost this if check (removeNextResult assumes the list is non empty). Is it a problem?

To my understanding it's not a problem because, with the new logic, we have the guarantee that if refillBufferedResult is valued that there's at least 1 item in the list (the fetch we're currently reading from, which is set to refillBufferedResult itself). Is my understanding correct?

that's right. Now instead of keeping nextResult as state we compute it on every iteration. The invariants from before still hold

pkg/storage/ingest/fetcher.go

pkg/storage/ingest/fetcher_test.go

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> � Conflicts: � pkg/storage/ingest/fetcher.go � pkg/storage/ingest/fetcher_test.go � pkg/storage/ingest/reader.go

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Co-authored-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

dimitarvdimitrov requested review from a team and tacole02 as code owners November 13, 2024 17:16

tacole02 approved these changes Nov 13, 2024

View reviewed changes

docs/sources/mimir/configure/configuration-parameters/index.md Outdated Show resolved Hide resolved

pracucci reviewed Nov 14, 2024

View reviewed changes

pracucci approved these changes Nov 14, 2024

View reviewed changes

dimitarvdimitrov changed the base branch from dimitar/ingest/replay-speed/avoid-double-fetching to main November 14, 2024 14:52

dimitarvdimitrov and others added 18 commits November 14, 2024 16:52

Add rudimentary inflight bytes tracking

c46a15a

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> � Conflicts: � pkg/storage/ingest/fetcher.go � pkg/storage/ingest/fetcher_test.go � pkg/storage/ingest/reader.go

Add test to make sure we don't get bogged down due to estimations

df46224

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Tracking implementation of inflight bytes

9e647e6

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Add config option for max buffered bytes

a6c2e02

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Update generated docs

1584dea

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Add CHANGELOG.md entry

28a6040

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Appease linter

82ccf10

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Replace inflight records calculations with inflight bytes

eb90541

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Update help text

8f96769

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Make test faster

5b0021a

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Export bytes per record as a histogram

13bff6e

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Export buffered bytes

dd2b9ce

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Rename inflight bytes to buffered bytes

3c6e61f

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Rename inflightFetchWants members

8f2cf6e

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Update CLI help text

8484b58

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Update pkg/storage/ingest/fetcher.go

009d0b8

Co-authored-by: Marco Pracucci <marco@pracucci.com>

Share waitForStableBufferedRecords

47f2827

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

Assert there are no more records left

65b3769

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

dimitarvdimitrov force-pushed the dimitar/ingest/replay-speed/inflight-fetch-bytes-limit branch from 07ed98b to 65b3769 Compare November 14, 2024 14:53

dimitarvdimitrov enabled auto-merge (squash) November 14, 2024 14:53

dimitarvdimitrov mentioned this pull request Nov 14, 2024

kafka replay speed: remove records-per-fetch configuration #9906

Merged

4 tasks

dimitarvdimitrov merged commit cacf8c5 into main Nov 14, 2024
30 checks passed

dimitarvdimitrov deleted the dimitar/ingest/replay-speed/inflight-fetch-bytes-limit branch November 14, 2024 16:09

dimitarvdimitrov mentioned this pull request Dec 6, 2024

ingest: use single configuration for fetch concurrency #10156

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kafka replay speed: add bytes limit for inflight fetch requests #9892

kafka replay speed: add bytes limit for inflight fetch requests #9892

dimitarvdimitrov commented Nov 13, 2024

tacole02 left a comment

pracucci Nov 14, 2024

dimitarvdimitrov Nov 14, 2024

pracucci left a comment •

edited

Loading

pracucci Nov 14, 2024

dimitarvdimitrov Nov 14, 2024

	r.dependencies, err = services.NewManager(r.committer, r.offsetReader, r.consumedOffsetWatcher, startOffsetReader, r.metrics)
	if err != nil {
	return errors.Wrap(err, "creating service manager")
	}
	// Use context.Background() because we want to stop all dependencies when the PartitionReader stops
	// instead of stopping them when ctx is cancelled and while the PartitionReader is still running.
	err = services.StartManagerAndAwaitHealthy(context.Background(), r.dependencies)
	if err != nil {
	return errors.Wrap(err, "starting service manager")
	}

	func (r *PartitionReader) stopDependencies() error {
	if r.dependencies != nil {
	if err := services.StopManagerAndAwaitStopped(context.Background(), r.dependencies); err != nil {
	return errors.Wrap(err, "stopping service manager")
	}
	}

kafka replay speed: add bytes limit for inflight fetch requests #9892

kafka replay speed: add bytes limit for inflight fetch requests #9892

Conversation

dimitarvdimitrov commented Nov 13, 2024

Problem

Proposal

What this PR does

Next steps

Note to reviewers

Checklist

tacole02 left a comment

Choose a reason for hiding this comment

pracucci Nov 14, 2024

Choose a reason for hiding this comment

dimitarvdimitrov Nov 14, 2024

Choose a reason for hiding this comment

pracucci left a comment • edited Loading

Choose a reason for hiding this comment

pracucci Nov 14, 2024

Choose a reason for hiding this comment

dimitarvdimitrov Nov 14, 2024

Choose a reason for hiding this comment

pracucci left a comment •

edited

Loading