-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make TransferredBytes
be the top of the list in BinLabel
#3871
Conversation
A new generated diff is ready to view.
A new doc preview is ready to view. |
A new generated diff is ready to view.
A new doc preview is ready to view. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! One non-blocking request, would it be possible to create a test that replicates the reproduction from awslabs/aws-sdk-rust#1202? Might be too difficult since we would need some kind of mocked server to talk to? In that case I would prioritize getting this out since it is currently causing customer issues.
I think it would, using test utilities John originally wrote (it's in integration tests of
Since I've verified the customer repro and added relevant unit tests, let's get it out first to customers' hands. I'll work on adding an integ test in the next PR. |
This commit addresses #3871 (review)
…t#1202 (#3874) ## Motivation and Context A follow-up on #3871, responding to [the review feedback](#3871 (review)) ## Testing - Also confirmed that reverting the change in the above PR (so that `BinLabel::Pending` becomes the top of the list) failed the integration test added to this PR, as expected. ``` 2024-10-10T19:06:56.417686Z TRACE aws_smithy_runtime::client::http::body::minimum_throughput::http_body_0_4_x: received poll pending 2024-10-10T19:06:56.417694Z DEBUG aws_smithy_runtime::client::http::body::minimum_throughput::http_body_0_4_x: current throughput: 0 B/s is below minimum: 1 B/s thread 'user_polls_pending_followed_by_data_for_every_bin_in_throughput_logs' panicked at aws-smithy-runtime/tests/stalled_stream_download.rs:252:10: response MUST NOT timeout: ThroughputBelowMinimum { expected: Throughput { bytes_read: 1, per_time_elapsed: 1s }, actual: Throughput { bytes_read: 0, per_time_elapsed: 1s } } note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace failures: user_polls_pending_followed_by_data_for_every_bin_in_throughput_logs ``` ---- _By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice._
Motivation and Context
awslabs/aws-sdk-rust#1202
Description
The issue above demonstrated the incorrect BinLabel ordering in LogBuffer, the underlying data structure we use for stall stream protection.
The following trace logs are generated from executing the reproduction steps in the issue above. In the file labeled "no_sleep," we have commented out
std::thread::sleep(std::time::Duration::from_millis(120));
from the reproducer so the updated code can be tested as the happy path.s3_throughput_min_repro_no_sleep.log
s3_throughput_min_repro_with_sleep.log
In both files, it’s important to note that
Bin
s assignedTransferredBytes
can be overwritten byPending
due toThroughputLogs::push
. Once aBin
is labeled asPending
, it cannot be re-labeled.When this occurs, the only way to avoid the stall stream protection check going into the grace period is for time to advance beyond the current
Bin
's resolution, theLogBuffer
pushes a newBin
duringcatch_up
, and this newBin
hopefully gets assigned aTransferredBytes
. However, this newBin
could also be overwritten by Pending in a subsequent call toMinimumThroughputDownloadBody::poll_data
, which can trigger the the grace period if the overallLogBuffer
looks like it's violated the stall stream protection check.The reproducer without sleep does not fail the stall stream protection obviously because the execution completes way before the grace period ends, but more importantly because the execution periodically assigns new
TransferredBytes
Bin
s in the throughput logs. This effectively resets the grace period for the stall stream protection (search forthroughput recovered; exiting grace period
in thes3_throughput_min_repro_no_sleep.log
). However, with sleep,Bin
s labeled asTransferredBytes
are frequently (and almost immediately) overwritten byPending
. This results in the execution being unable to exit the grace period, ultimately leading to a stall stream protection error.To resolve this, we make
TransferredBytes
be the top priority inBinLabel
. This means once a newBin
has earnedTransferredBytes
, it's green for that time resolution and that it should not be revoked byPending
overwriting it to make it look like no bytes transferred during that time.Testing
BinLabel
ordering and forThroughputLogs
Checklist
.changelog
directory, specifying "client," "server," or both in theapplies_to
key..changelog
directory, specifying "aws-sdk-rust" in theapplies_to
key.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.