-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19221. S3A: Unable to recover from failure of multipart block upload attempt (#6938) #7044
Merged
steveloughran
merged 1 commit into
apache:branch-3.4
from
steveloughran:s3/HADOOP-19221-multipart-put-failures-branch-3.4
Sep 16, 2024
Merged
HADOOP-19221. S3A: Unable to recover from failure of multipart block upload attempt (#6938) #7044
steveloughran
merged 1 commit into
apache:branch-3.4
from
steveloughran:s3/HADOOP-19221-multipart-put-failures-branch-3.4
Sep 16, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…upload attempt (apache#6938) This is a major change which handles 400 error responses when uploading large files from memory heap/buffer (or staging committer) and the remote S3 store returns a 500 response from a upload of a block in a multipart upload. The SDK's own streaming code seems unable to fully replay the upload; at attempts to but then blocks and the S3 store returns a 400 response "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: S3, Status Code: 400...)" There is an option to control whether or not the S3A client itself attempts to retry on a 50x error other than 503 throttling events (which are independently processed as before) Option: fs.s3a.retry.http.5xx.errors Default: true 500 errors are very rare from standard AWS S3, which has a five nines SLA. It may be more common against S3 Express which has lower guarantees. Third party stores have unknown guarantees, and the exception may indicate a bad server configuration. Consider setting fs.s3a.retry.http.5xx.errors to false when working with such stores. Signification Code changes: There is now a custom set of implementations of software.amazon.awssdk.http.ContentStreamProvidercontent in the class org.apache.hadoop.fs.s3a.impl.UploadContentProviders. These: * Restart on failures * Do not copy buffers/byte buffers into new private byte arrays, so avoid exacerbating memory problems.. There new IOStatistics for specific http error codes -these are collected even when all recovery is performed within the SDK. S3ABlockOutputStream has major changes, including handling of Thread.interrupt() on the main thread, which now triggers and briefly awaits cancellation of any ongoing uploads. If the writing thread is interrupted in close(), it is mapped to an InterruptedIOException. Applications like Hive and Spark must catch these after cancelling a worker thread. Contributed by Steve Loughran
🎊 +1 overall
This message was automatically generated. |
4 tasks
steveloughran
added a commit
to steveloughran/hadoop
that referenced
this pull request
Oct 2, 2024
…upload attempt (apache#6938) (apache#7044) This is a major change which handles 400 error responses when uploading large files from memory heap/buffer (or staging committer) and the remote S3 store returns a 500 response from a upload of a block in a multipart upload. The SDK's own streaming code seems unable to fully replay the upload; at attempts to but then blocks and the S3 store returns a 400 response "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: S3, Status Code: 400...)" There is an option to control whether or not the S3A client itself attempts to retry on a 50x error other than 503 throttling events (which are independently processed as before) Option: fs.s3a.retry.http.5xx.errors Default: true 500 errors are very rare from standard AWS S3, which has a five nines SLA. It may be more common against S3 Express which has lower guarantees. Third party stores have unknown guarantees, and the exception may indicate a bad server configuration. Consider setting fs.s3a.retry.http.5xx.errors to false when working with such stores. Signification Code changes: There is now a custom set of implementations of software.amazon.awssdk.http.ContentStreamProvidercontent in the class org.apache.hadoop.fs.s3a.impl.UploadContentProviders. These: * Restart on failures * Do not copy buffers/byte buffers into new private byte arrays, so avoid exacerbating memory problems.. There new IOStatistics for specific http error codes -these are collected even when all recovery is performed within the SDK. S3ABlockOutputStream has major changes, including handling of Thread.interrupt() on the main thread, which now triggers and briefly awaits cancellation of any ongoing uploads. If the writing thread is interrupted in close(), it is mapped to an InterruptedIOException. Applications like Hive and Spark must catch these after cancelling a worker thread. Contributed by Steve Loughran
steveloughran
added a commit
that referenced
this pull request
Oct 3, 2024
…upload attempt (#6938) (#7044) (#7094) This is a major change which handles 400 error responses when uploading large files from memory heap/buffer (or staging committer) and the remote S3 store returns a 500 response from a upload of a block in a multipart upload. The SDK's own streaming code seems unable to fully replay the upload; at attempts to but then blocks and the S3 store returns a 400 response "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: S3, Status Code: 400...)" There is an option to control whether or not the S3A client itself attempts to retry on a 50x error other than 503 throttling events (which are independently processed as before) Option: fs.s3a.retry.http.5xx.errors Default: true 500 errors are very rare from standard AWS S3, which has a five nines SLA. It may be more common against S3 Express which has lower guarantees. Third party stores have unknown guarantees, and the exception may indicate a bad server configuration. Consider setting fs.s3a.retry.http.5xx.errors to false when working with such stores. Signification Code changes: There is now a custom set of implementations of software.amazon.awssdk.http.ContentStreamProvidercontent in the class org.apache.hadoop.fs.s3a.impl.UploadContentProviders. These: * Restart on failures * Do not copy buffers/byte buffers into new private byte arrays, so avoid exacerbating memory problems.. There new IOStatistics for specific http error codes -these are collected even when all recovery is performed within the SDK. S3ABlockOutputStream has major changes, including handling of Thread.interrupt() on the main thread, which now triggers and briefly awaits cancellation of any ongoing uploads. If the writing thread is interrupted in close(), it is mapped to an InterruptedIOException. Applications like Hive and Spark must catch these after cancelling a worker thread. Contributed by Steve Loughran
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a major change which handles 400 error responses when uploading large files from memory heap/buffer (or staging committer) and the remote S3 store returns a 500 response from a upload of a block in a multipart upload.
The SDK's own streaming code seems unable to fully replay the upload; at attempts to but then blocks and the S3 store returns a 400 response
There is an option to control whether or not the S3A client itself attempts to retry on a 50x error other than 503 throttling events (which are independently processed as before)
Option: fs.s3a.retry.http.5xx.errors
Default: true
500 errors are very rare from standard AWS S3, which has a five nines SLA. It may be more common against S3 Express which has lower guarantees.
Third party stores have unknown guarantees, and the exception may indicate a bad server configuration. Consider setting fs.s3a.retry.http.5xx.errors to false when working with such stores.
Signification Code changes:
There is now a custom set of implementations of
software.amazon.awssdk.http.ContentStreamProvidercontent in the class org.apache.hadoop.fs.s3a.impl.UploadContentProviders.
These:
There new IOStatistics for specific http error codes -these are collected even when all recovery is performed within the SDK.
S3ABlockOutputStream has major changes, including handling of Thread.interrupt() on the main thread, which now triggers and briefly awaits cancellation of any ongoing uploads.
If the writing thread is interrupted in close(), it is mapped to an InterruptedIOException. Applications like Hive and Spark must catch these after cancelling a worker thread.
Contributed by Steve Loughran
How was this patch tested?
in progress
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?