-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Bulk Delete Api to BlobStore #40322
Conversation
Pinging @elastic/es-distributed |
Jenkins run elasticsearch-ci/bwc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I've added a few comments. Also, I wonder if we need to add some kind of tests for deleteBlobs
method?
plugins/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobContainer.java
Outdated
Show resolved
Hide resolved
plugins/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobContainer.java
Outdated
Show resolved
Hide resolved
}); | ||
}; | ||
handlers.insert(nonAuthPath(HttpPost.METHOD_NAME, "/"), bulkDeleteHandler); | ||
handlers.insert(nonAuthPath(HttpPost.METHOD_NAME, "/{bucket}"), bulkDeleteHandler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this implementation work for both cases? It seems that {bucket} is even not used in the implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, the unfortunate reality here is that the current fixtures logic for bulk delete doesn't even care about the bucket and simply tries to find the given blobs in any bucket.
I wouldn't really put any effort into this tbh. I think we should probably rather look into just removing the fixture now that we have the Docker-based Minio tests and third-party tests.
The fixture seems completely redundant now ...
server/src/main/java/org/elasticsearch/common/blobstore/BlobContainer.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/blobstore/BlobContainer.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Show resolved
Hide resolved
} | ||
}); | ||
} catch (final AmazonClientException e) { | ||
throw new IOException("Exception when deleting blobs [" + blobNames + "]", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an IOException we do not proceed even if we have more DeleteRequests to be sent.
Previously when performing deletes - if one delete failed, we still were proceeding with the next delete requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, fixed this for S3 as well as the generic case now by catching and aggregating exceptions in the loop :)
logger.warn(() -> | ||
new ParameterizedMessage( | ||
"[{}] indices [{}] are no longer part of any snapshots in the repository, " + | ||
"but failed to clean up their index folders.", metadata.name(), indicesToCleanUp), ioe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer know what particular indices are not removed. We just log all indices, including those that are successful.
The same thing applies to deleteBlobs usage below.
Probably we can add this kind of information to IOException thrown by deleteBlobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 9a13dadd0c1 :) We now aggregate all the exceptions. The IndexId
that are printed here contain the name and the index snapshot id so with the information in the aggregate exception we can work out what index failed to clean up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know how AWS S3 works when you send a bulk request of 1000 and entry number 500 fails? Will it stop at entry 500 or will it try to delete all entries from the bulk?
Will S3 include information about each failed entry to the exception? (or just the first one?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrershov it will try to delete all of the entries and given errors for all the ones that failed :)
@andrershov thanks for taking a look! All points addressed and tests added in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@original-brownbear mostly looks good. But before approving I want to understand how S3 deals with bulk requests in terms of exceptions when several deletions have failed.
I want to be sure that S3 will try to delete all the elements from the bulk even if some elements from the bulk have problems and that information about all failed elements will be included to the exception.
* Adds Bulk delete API to blob container * Implement bulk delete API for S3 * Adjust S3Fixture to accept both path styles for bulk deletes since the S3 SDK uses both during our ITs * Closes elastic#40250
fd16f66
to
d6423d7
Compare
@andrershov fixed the docs on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@andrershov thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @original-brownbear. Looking very good already. I've left some minor points
@@ -56,6 +57,11 @@ | |||
|
|||
class S3BlobContainer extends AbstractBlobContainer { | |||
|
|||
/** | |||
* Maximum number of deletes in a {@link DeleteObjectsRequest}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps link to AWS docs here
@@ -118,6 +124,51 @@ public void deleteBlob(String blobName) throws IOException { | |||
deleteBlobIgnoringIfNotExists(blobName); | |||
} | |||
|
|||
@Override | |||
public void deleteBlobs(List<String> blobNames) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should call this deleteBlobsIgnoringIfNotExists
?
"but failed to clean up its index folder.", metadata.name(), indexId), ioe); | ||
logger.warn(() -> | ||
new ParameterizedMessage( | ||
"[{}] indices [{}] are no longer part of any snapshots in the repository, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second placeholder can be just {}
instead of [{}]
as it's a list that will render anyway with the brackets. Same issue in other places in this PR
snapshotId, shardId, blobName), e); | ||
} | ||
} | ||
final List<String> staleBlobs = blobs.keySet().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps call this orphanedBlobs
?
…leniency * elastic/master: SQL: Fix deserialisation issue of TimeProcessor (elastic#40776) Improve GCS docs for using keystore (elastic#40605) Add Restore Operation to SnapshotResiliencyTests (elastic#40634) Small refactorings to analysis components (elastic#40745) SQL: Fix display size for DATE/DATETIME (elastic#40669) add HLRC protocol tests for transform state and stats (elastic#40766) Inline TransportReplAction#registerRequestHandlers (elastic#40762) remove experimental label from search_as_you_type documentation (elastic#40744) Remove some abstractions from `TransportReplicationAction` (elastic#40706) Upgrade to latest build scan plugin (elastic#40702) Use default memory lock setting in testing (elastic#40730) Add Bulk Delete Api to BlobStore (elastic#40322) Remove yaml skips older than 7.0 (elastic#40183) Docs: Move id in the java-api (elastic#40748)
* Adds Bulk delete API to blob container * Implement bulk delete API for S3 * Adjust S3Fixture to accept both path styles for bulk deletes since the S3 SDK uses both during our ITs * Closes elastic#40250
* Just like elastic#40322 for AWS * We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex) * Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations
* Implement Bulk Deletes for GCS Repository * Just like #40322 for AWS * We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex) * Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations
* Implement Bulk Deletes for GCS Repository * Just like elastic#40322 for AWS * We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex) * Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations
* Implement Bulk Deletes for GCS Repository (#41368) * Just like #40322 for AWS * We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex) * Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations back port of #41368
* Implement Bulk Deletes for GCS Repository * Just like elastic#40322 for AWS * We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex) * Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations
* Adds Bulk delete API to blob container * Implement bulk delete API for S3 * Adjust S3Fixture to accept both path styles for bulk deletes since the S3 SDK uses both during our ITs * Closes elastic#40250
* Implement Bulk Deletes for GCS Repository * Just like elastic#40322 for AWS * We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex) * Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations
For S3 this should pretty much give us almost 1000x speedup for deletes of huge indices/shards.