Report minimum storage threshold targets #10710

dotnwat · 2023-05-12T06:18:27Z

Disk space management policy needs to be able to juggle space between the disk cache and log storage. To do this some thresholds will be useful, such as a minimum amount of space that log storage needs for normal functionality, and the minimum amount it needs to meet some basic policies like data retention goals.

This PR adds reporting for

minimum needed capacity
minimum capacity needed for retention.bytes policies
minimum capacity needed for retention.ms policies

(note that I realize retention.{bytes,ms} refers to something different than local retention. in this pr we use the normalized version which for cloud storage will be local retention. i believe i've been using this terminology rather loosely).

Backports Required

Release Notes

none

src/v/storage/disk_log_impl.cc

tests/rptest/tests/full_disk_test.py

src/v/storage/disk_log_impl.cc

andrwng · 2023-05-19T17:32:53Z

src/v/storage/disk_log_impl.cc

+         * otherwise, we fall through and evaluate the space wanted metric using
+         * any configured retention policies _without_ overriding based on cloud
+         * retention settings. the assumption here is that retention and not
+         * compaction is what will limit on disk space. this heuristic should be
+         * updated if we decide to also make "nice to have" estimates based on
+         * expected compaction ratios.
+         */


I'm not following why we're not applying the cloud overrides here. Aren't they honored when applying retention, even on a compacted topic? (i could be wrong, would appreciate a pointer)

Let me explain, and if it makes sense to you, i'll update this comment to clarify the case in the code:

Compacted topics aren't subject to local retention (they always remain whole on local storage), so the only way that retention comes into play for a compacted topic is if the policy is compat,delete. However, the override_retention_config helper doesn't take compaction into account. So if we were to apply the cloud overrides here for a compacted topic, then the retention policy we used in calculating "nice to have" wouldn't reflect what would actually happen when housekeeping ran.

I think I understand: it wasn't clicking that retention settings applied to a compact,delete topic followed the overarching retention settings rather than the local settings for tiered storage. If that's the case, I think this policy makes sense.

andrwng · 2023-05-19T18:15:41Z

src/v/storage/disk_log_impl.cc

+      [](storage::usage acc, storage::usage u) { return acc + u; });
+
+    // extrapolate out for the missing period of time in the retention period
+    auto missing_bytes = (usage.total() * missing.value()) / duration.value();


I think this is saying "I expect to see same throughput that ingested the latest segments for the entire remainder of the retention period". This seems like a reasonable, but it makes me wonder about bursty workloads, where a large number of records are written in the span of a few dozens of seconds, and then stopped. In such a case, we could end up with a small duration and large missing and end up with a very large value for missing_bytes.

Maybe it makes sense to compute the expected throughput with now() - start_timestamp instead of end_timestamp - start_timestamp, though given these are user-input timestamps, it's probably best to avoid using now()

I suppose regardless it might be better to overestimate the wanted bytes anyway for the goal of avoiding out of space issues

I attempted to capture this issue with a heuristic like not making any estimate unless we have say 10 seconds worth of data. But this is going to just be a whack-a-mole game I'm afraid. Perhaps we would do well to do add a cap here, too, such as double or triple the amount of data currently on disk. This may work well since the goal isn't necessarily to observe the system once and know how much space we need. Rather, this will be monitoring periodically. So capping will let the estimate continue to portray growth requirement, while also eliminating huge bursts?

Yeah, picking something that's good in all cases is hard. I'd lean harder away from this if this significantly impacted workloads (e.g. if we were to base rejection of produce messages on this). But given the ultimate goal here is to just apply retention more aggressively, this is probably fine.

Might be worth calling out in a header comment that this is an estimate and is subject to being way off, so don't use it too aggressively (like rejecting writes based on it)

src/v/storage/types.h

dotnwat · 2023-05-23T20:24:46Z

Force-pushed to resolve merge conflicts in admin_server.

dotnwat · 2023-05-25T04:43:14Z

Force-push:

Added some extra trace logging
Updated the test to be more stable. Before I was using some fixed produces with sleeps to try to achieve some predictable throughput. But this was very noisy. Now this requests the kafka producer tool to write at a specific rate.

dotnwat · 2023-05-25T04:53:58Z

Force-push: fix python formatting

dotnwat · 2023-05-25T16:38:50Z

ping @andrwng when you get a chance

dotnwat · 2023-05-25T17:30:55Z

Force-push: increased fuzz threshold on release it seems we write a bit faster than on debug which I was using for the calibration.

The state computed and reported by the public disk_usage function is going to grow, and so split this function up so that it doesn't become too unwieldy. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

When collecting a storage report, we add a new section that descibes target sizes with different meanings. This commit adds a minimum capacity target which is the minimum amount of capacity that log storage needs to be able function at a bare minimum. The minimum is computed as the max segment size (potentially different for each partition) * the minimum reserved number of segments (configurable) * the total number of partitions. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

Useful for making sure we aren't missing any call sites--a possible scenario when using the {.x=, .y=} form of construction. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

Checks in cases of multiple partitions, topics, and settings for both the minimum reserved number of segments and topic segment size settings. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

It is useful to examine retention configurations without having cloud storage related overrides in the case of estimating the amount of disk capacity needed. Used in later commits. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

Reports how much capacity a partition (or all raft) would like to have. This commit only considers sized based retetion requirements. Time will be added in a later commit. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

Examines recently written data to a partition and tries to extrapolate how much capacity will be needed based on the apparent rate of data production. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

dotnwat · 2023-05-25T19:04:34Z

Force-push: fix conflicts with clang-format16 changes

dotnwat · 2023-05-25T21:27:40Z

Failures:

andrwng

Thanks for the ping, and for clarifying. I think this looks like a good starting point as far as heuristics go. Remaining comments/clarifications aren't blocking.

andrwng · 2023-05-25T23:36:15Z

src/v/storage/disk_log_impl.cc

+         * otherwise, we fall through and evaluate the space wanted metric using
+         * any configured retention policies _without_ overriding based on cloud
+         * retention settings. the assumption here is that retention and not
+         * compaction is what will limit on disk space. this heuristic should be
+         * updated if we decide to also make "nice to have" estimates based on
+         * expected compaction ratios.
+         */


I think I understand: it wasn't clicking that retention settings applied to a compact,delete topic followed the overarching retention settings rather than the local settings for tiered storage. If that's the case, I think this policy makes sense.

andrwng · 2023-05-25T23:52:18Z

src/v/storage/disk_log_impl.cc

+      [](storage::usage acc, storage::usage u) { return acc + u; });
+
+    // extrapolate out for the missing period of time in the retention period
+    auto missing_bytes = (usage.total() * missing.value()) / duration.value();


Yeah, picking something that's good in all cases is hard. I'd lean harder away from this if this significantly impacted workloads (e.g. if we were to base rejection of produce messages on this). But given the ultimate goal here is to just apply retention more aggressively, this is probably fine.

Might be worth calling out in a header comment that this is an estimate and is subject to being way off, so don't use it too aggressively (like rejecting writes based on it)

github-actions bot added the area/redpanda label May 12, 2023

dotnwat requested review from jcsp, andrwng and VladLazar May 12, 2023 06:20

dotnwat added the area/storage label May 12, 2023

dotnwat requested a review from abhijat May 12, 2023 06:20

dotnwat marked this pull request as draft May 17, 2023 03:10

dotnwat force-pushed the space-management branch 4 times, most recently from 223645c to 91f59f8 Compare May 19, 2023 06:51

dotnwat marked this pull request as ready for review May 19, 2023 06:51

andrwng reviewed May 19, 2023

View reviewed changes

abhijat reviewed May 22, 2023

View reviewed changes

src/v/storage/types.h Show resolved Hide resolved

dotnwat force-pushed the space-management branch from 9fac2ea to 11421ce Compare May 23, 2023 20:24

dotnwat requested review from abhijat and andrwng May 23, 2023 20:24

dotnwat force-pushed the space-management branch from 11421ce to f160442 Compare May 25, 2023 04:42

dotnwat force-pushed the space-management branch from f160442 to 53d8ddb Compare May 25, 2023 04:53

dotnwat force-pushed the space-management branch from 53d8ddb to bf8dd1d Compare May 25, 2023 17:30

dotnwat added 5 commits May 25, 2023 10:47

storage: split disk usage function into helper

9ac6534

The state computed and reported by the public disk_usage function is going to grow, and so split this function up so that it doesn't become too unwieldy. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

storage: add disk usage report constructor

abb52d5

Useful for making sure we aren't missing any call sites--a possible scenario when using the {.x=, .y=} form of construction. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

storage: add test for minimum capacity target

4288a53

Checks in cases of multiple partitions, topics, and settings for both the minimum reserved number of segments and topic segment size settings. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

storage: split out overrides from cloud overrides

43ca4f7

It is useful to examine retention configurations without having cloud storage related overrides in the case of estimating the amount of disk capacity needed. Used in later commits. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

dotnwat added 6 commits May 25, 2023 10:47

storage: add size based retention wanted metric

fc8486f

Reports how much capacity a partition (or all raft) would like to have. This commit only considers sized based retetion requirements. Time will be added in a later commit. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

test: add basic test for minimum capacity wanted metric

46750b6

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

storage: add time based retention capacity wanted metric

072b9ae

Examines recently written data to a partition and tries to extrapolate how much capacity will be needed based on the apparent rate of data production. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

storage: add test for time based retention capacity metric

982ca09

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

storage: update method name to reflect behavior

ca3ce06

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

test: make it clear that custom segment bytes is used

b240a85

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

dotnwat force-pushed the space-management branch from bf8dd1d to b240a85 Compare May 25, 2023 19:04

andrwng approved these changes May 26, 2023

View reviewed changes

dotnwat merged commit c8210e8 into redpanda-data:dev May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report minimum storage threshold targets #10710

Report minimum storage threshold targets #10710

dotnwat commented May 12, 2023 •

edited

Loading

andrwng May 19, 2023

dotnwat May 23, 2023

andrwng May 25, 2023

andrwng May 19, 2023

dotnwat May 23, 2023

andrwng May 25, 2023

dotnwat commented May 23, 2023

dotnwat commented May 25, 2023

dotnwat commented May 25, 2023

dotnwat commented May 25, 2023

dotnwat commented May 25, 2023

dotnwat commented May 25, 2023

dotnwat commented May 25, 2023

andrwng left a comment

andrwng May 25, 2023

andrwng May 25, 2023

Report minimum storage threshold targets #10710

Report minimum storage threshold targets #10710

Conversation

dotnwat commented May 12, 2023 • edited Loading

Backports Required

Release Notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dotnwat commented May 23, 2023

dotnwat commented May 25, 2023

dotnwat commented May 25, 2023

dotnwat commented May 25, 2023

dotnwat commented May 25, 2023

dotnwat commented May 25, 2023

dotnwat commented May 25, 2023

andrwng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dotnwat commented May 12, 2023 •

edited

Loading