Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: Fix download throttling ducktape test #17455

Merged
merged 3 commits into from
Apr 1, 2024

Conversation

Lazin
Copy link
Contributor

@Lazin Lazin commented Mar 28, 2024

Fix the throttling ducktape test. The test was failing because additional restart reset the metric.

Additionally, the PR improves the way the metric is collected and updates the token bucket implementation to improve observability.

Fixes #14139

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

  • None

@Lazin Lazin requested a review from abhijat March 28, 2024 11:06
@Lazin Lazin changed the title Fix/throttling e2e test cloud_storage: Fix download throttling ducktape test Mar 28, 2024
abhijat
abhijat previously approved these changes Mar 28, 2024
@Lazin Lazin force-pushed the fix/throttling-e2e-test branch from c55ef87 to 7c939d8 Compare March 28, 2024 19:20
@Lazin Lazin requested a review from abhijat March 28, 2024 19:20
@@ -62,7 +73,21 @@ class token_bucket {
_refresh_timer.arm(refresh_interval - elapsed);
}

return _sem.wait(as, size);
auto deficiency = size - _sem.available_units();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean available_units here vs one of the other semaphore interfaces for querying units? i ask because this one can become negative depending on how the semaphore is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this context it should go negative but still, I think that the deficiency value could be incorrect in case of concurrency, I'll update this

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please submit other changes in a different pr. we need the fix for the ci failure, and there are other failures in the ci run for this pr that are probably unrelated to the other changes in the pr--they look fine to me--but we can do that separately.

@dotnwat
Copy link
Member

dotnwat commented Mar 28, 2024

separate fix #17474

Lazin added 3 commits March 29, 2024 05:17
Implement 'maybe_throttle' method. The method works the same way as
'throttle'. It returns an optional value which is set to nullopt if
there was no throttling. If the throttling was applied it returns the
deficiency value (number of units we had to wait for) and the sleep
duration.
Previously, the throttling metric was computed before calling
'throttle'. If number of available units was lower than the number of
bytes to read we caclulated deficiency value as a difference between the
two.

In case of concurrency this could lead to incorrect value of the
metric. If two fibers will do this check and find out the available
untis are too low.
@Lazin Lazin force-pushed the fix/throttling-e2e-test branch from 7c939d8 to 285c8e0 Compare March 29, 2024 09:17
@Lazin
Copy link
Contributor Author

Lazin commented Mar 29, 2024

Updated the token_bucket implementation. Now it returns only the amount of time that the bucket was waiting for units.
Also, the metric now contains number of milliseconds that all downloads were throttled (updated the description).

@Lazin Lazin requested a review from dotnwat March 29, 2024 09:19
Comment on lines -712 to +718
_parent._read_path_probe.download_throttled(
buffer_size - available);
auto throttled = co_await _parent._throughput_limit.maybe_throttle(
buffer_size, _as);
if (throttled) {
// This path is taken only when the download is throttled. In case
// of concurrency every `maybe_throttle` call will return the
// precise value for deficiency and sleep time.
auto duration
= std::chrono::duration_cast<std::chrono::milliseconds>(
throttled.value());
_parent._read_path_probe.download_throttled(duration.count());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the units for the parameter passed to download_throttled? It looks like this switches from bytes to milliseconds. That's probably worth explaining in the commit message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the description of the metric. It's not possible to reliably capture the metric in bytes (and such metric doesn't make much sense tbh).

@Lazin Lazin merged commit 0004233 into redpanda-data:dev Apr 1, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants