Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report minimum storage threshold targets #10710

Merged
merged 11 commits into from
May 26, 2023

Conversation

dotnwat
Copy link
Member

@dotnwat dotnwat commented May 12, 2023

Disk space management policy needs to be able to juggle space between the disk cache and log storage. To do this some thresholds will be useful, such as a minimum amount of space that log storage needs for normal functionality, and the minimum amount it needs to meet some basic policies like data retention goals.

This PR adds reporting for

  • minimum needed capacity
  • minimum capacity needed for retention.bytes policies
  • minimum capacity needed for retention.ms policies

(note that I realize retention.{bytes,ms} refers to something different than local retention. in this pr we use the normalized version which for cloud storage will be local retention. i believe i've been using this terminology rather loosely).

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

@dotnwat dotnwat requested review from jcsp, andrwng and VladLazar May 12, 2023 06:20
@dotnwat dotnwat requested a review from abhijat May 12, 2023 06:20
@dotnwat dotnwat marked this pull request as draft May 17, 2023 03:10
@dotnwat dotnwat force-pushed the space-management branch 4 times, most recently from 223645c to 91f59f8 Compare May 19, 2023 06:51
@dotnwat dotnwat marked this pull request as ready for review May 19, 2023 06:51
src/v/storage/disk_log_impl.cc Outdated Show resolved Hide resolved
src/v/storage/disk_log_impl.cc Show resolved Hide resolved
tests/rptest/tests/full_disk_test.py Outdated Show resolved Hide resolved
src/v/storage/disk_log_impl.cc Show resolved Hide resolved
Comment on lines +1950 to +1955
* otherwise, we fall through and evaluate the space wanted metric using
* any configured retention policies _without_ overriding based on cloud
* retention settings. the assumption here is that retention and not
* compaction is what will limit on disk space. this heuristic should be
* updated if we decide to also make "nice to have" estimates based on
* expected compaction ratios.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following why we're not applying the cloud overrides here. Aren't they honored when applying retention, even on a compacted topic? (i could be wrong, would appreciate a pointer)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me explain, and if it makes sense to you, i'll update this comment to clarify the case in the code:

Compacted topics aren't subject to local retention (they always remain whole on local storage), so the only way that retention comes into play for a compacted topic is if the policy is compat,delete. However, the override_retention_config helper doesn't take compaction into account. So if we were to apply the cloud overrides here for a compacted topic, then the retention policy we used in calculating "nice to have" wouldn't reflect what would actually happen when housekeeping ran.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand: it wasn't clicking that retention settings applied to a compact,delete topic followed the overarching retention settings rather than the local settings for tiered storage. If that's the case, I think this policy makes sense.

[](storage::usage acc, storage::usage u) { return acc + u; });

// extrapolate out for the missing period of time in the retention period
auto missing_bytes = (usage.total() * missing.value()) / duration.value();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is saying "I expect to see same throughput that ingested the latest segments for the entire remainder of the retention period". This seems like a reasonable, but it makes me wonder about bursty workloads, where a large number of records are written in the span of a few dozens of seconds, and then stopped. In such a case, we could end up with a small duration and large missing and end up with a very large value for missing_bytes.

Maybe it makes sense to compute the expected throughput with now() - start_timestamp instead of end_timestamp - start_timestamp, though given these are user-input timestamps, it's probably best to avoid using now()

I suppose regardless it might be better to overestimate the wanted bytes anyway for the goal of avoiding out of space issues

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to capture this issue with a heuristic like not making any estimate unless we have say 10 seconds worth of data. But this is going to just be a whack-a-mole game I'm afraid. Perhaps we would do well to do add a cap here, too, such as double or triple the amount of data currently on disk. This may work well since the goal isn't necessarily to observe the system once and know how much space we need. Rather, this will be monitoring periodically. So capping will let the estimate continue to portray growth requirement, while also eliminating huge bursts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, picking something that's good in all cases is hard. I'd lean harder away from this if this significantly impacted workloads (e.g. if we were to base rejection of produce messages on this). But given the ultimate goal here is to just apply retention more aggressively, this is probably fine.

Might be worth calling out in a header comment that this is an estimate and is subject to being way off, so don't use it too aggressively (like rejecting writes based on it)

@dotnwat dotnwat force-pushed the space-management branch from 9fac2ea to 11421ce Compare May 23, 2023 20:24
@dotnwat
Copy link
Member Author

dotnwat commented May 23, 2023

Force-pushed to resolve merge conflicts in admin_server.

@dotnwat dotnwat requested review from abhijat and andrwng May 23, 2023 20:24
@dotnwat dotnwat force-pushed the space-management branch from 11421ce to f160442 Compare May 25, 2023 04:42
@dotnwat
Copy link
Member Author

dotnwat commented May 25, 2023

Force-push:

  1. Added some extra trace logging
  2. Updated the test to be more stable. Before I was using some fixed produces with sleeps to try to achieve some predictable throughput. But this was very noisy. Now this requests the kafka producer tool to write at a specific rate.

@dotnwat dotnwat force-pushed the space-management branch from f160442 to 53d8ddb Compare May 25, 2023 04:53
@dotnwat
Copy link
Member Author

dotnwat commented May 25, 2023

Force-push: fix python formatting

@dotnwat
Copy link
Member Author

dotnwat commented May 25, 2023

ping @andrwng when you get a chance

@dotnwat dotnwat force-pushed the space-management branch from 53d8ddb to bf8dd1d Compare May 25, 2023 17:30
@dotnwat
Copy link
Member Author

dotnwat commented May 25, 2023

Force-push: increased fuzz threshold on release it seems we write a bit faster than on debug which I was using for the calibration.

dotnwat added 5 commits May 25, 2023 10:47
The state computed and reported by the public disk_usage function is
going to grow, and so split this function up so that it doesn't become
too unwieldy.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
When collecting a storage report, we add a new section that descibes
target sizes with different meanings. This commit adds a minimum
capacity target which is the minimum amount of capacity that log storage
needs to be able function at a bare minimum.

The minimum is computed as the max segment size (potentially different
for each partition) * the minimum reserved number of segments
(configurable) * the total number of partitions.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Useful for making sure we aren't missing any call sites--a possible
scenario when using the {.x=, .y=} form of construction.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Checks in cases of multiple partitions, topics, and settings for both
the minimum reserved number of segments and topic segment size settings.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
It is useful to examine retention configurations without having cloud
storage related overrides in the case of estimating the amount of disk
capacity needed. Used in later commits.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
dotnwat added 6 commits May 25, 2023 10:47
Reports how much capacity a partition (or all raft) would like to have.
This commit only considers sized based retetion requirements. Time will
be added in a later commit.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Examines recently written data to a partition and tries to extrapolate
how much capacity will be needed based on the apparent rate of data
production.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
@dotnwat dotnwat force-pushed the space-management branch from bf8dd1d to b240a85 Compare May 25, 2023 19:04
@dotnwat
Copy link
Member Author

dotnwat commented May 25, 2023

Force-push: fix conflicts with clang-format16 changes

Copy link
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping, and for clarifying. I think this looks like a good starting point as far as heuristics go. Remaining comments/clarifications aren't blocking.

Comment on lines +1950 to +1955
* otherwise, we fall through and evaluate the space wanted metric using
* any configured retention policies _without_ overriding based on cloud
* retention settings. the assumption here is that retention and not
* compaction is what will limit on disk space. this heuristic should be
* updated if we decide to also make "nice to have" estimates based on
* expected compaction ratios.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand: it wasn't clicking that retention settings applied to a compact,delete topic followed the overarching retention settings rather than the local settings for tiered storage. If that's the case, I think this policy makes sense.

[](storage::usage acc, storage::usage u) { return acc + u; });

// extrapolate out for the missing period of time in the retention period
auto missing_bytes = (usage.total() * missing.value()) / duration.value();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, picking something that's good in all cases is hard. I'd lean harder away from this if this significantly impacted workloads (e.g. if we were to base rejection of produce messages on this). But given the ultimate goal here is to just apply retention more aggressively, this is probably fine.

Might be worth calling out in a header comment that this is an estimate and is subject to being way off, so don't use it too aggressively (like rejecting writes based on it)

@dotnwat dotnwat merged commit c8210e8 into redpanda-data:dev May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants