-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bulk] Add _index, _id, status to ERROR object #10015
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Aswath <it.aswath@gmail.com> This is to reduce the bulk response size with filter_path on items.index.error and capture failed documents
Gradle Check (Jenkins) Run Completed with:
|
Compatibility status:Checks if related components are compatible with change cbc0a90 Incompatible componentsSkipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git] |
This PR is stalled because it has been open for 30 days with no activity. |
Hi @aswath86, the PR is stalled. Is this being worked upon? Feel free to reach out to maintainers for further reviews. |
This PR is stalled because it has been open for 30 days with no activity. |
This PR is stalled because it has been open for 30 days with no activity. |
@@ -94,6 +94,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws | |||
builder.field(_ID, failure.getId()); | |||
builder.field(STATUS, failure.getStatus().getStatus()); | |||
builder.startObject(ERROR); | |||
builder.field(_INDEX, failure.getIndex()); | |||
builder.field(_ID, failure.getId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this always generated when the error is passed? What if the error was encountered even before the document id could be generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the behaviour would be the same as in for builder.field(_ID, failure.getId());
that is above line builder.startObject(ERROR);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I see that users who are not providing filter_path, they will get the _id
and _index
field twice in case of errors and this adds additional payload by default. Wondering if there is a better way to solve this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if filter_path only contains error, how are users able to determine which document actually failed since successful docs won't return any response, and with auto-generated id, it becomes difficult for clients to know which document failed. (Applicable only for auto generated ids)
❌ Gradle check result for 693cea5: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 140d25d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 620165f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@@ -94,6 +94,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws | |||
builder.field(_ID, failure.getId()); | |||
builder.field(STATUS, failure.getStatus().getStatus()); | |||
builder.startObject(ERROR); | |||
builder.field(_INDEX, failure.getIndex()); | |||
builder.field(_ID, failure.getId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I see that users who are not providing filter_path, they will get the _id
and _index
field twice in case of errors and this adds additional payload by default. Wondering if there is a better way to solve this
@@ -96,6 +96,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws | |||
builder.field(_ID, failure.getId()); | |||
builder.field(STATUS, failure.getStatus().getStatus()); | |||
builder.startObject(ERROR); | |||
builder.field(_INDEX, failure.getIndex()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests for this?
@@ -94,6 +94,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws | |||
builder.field(_ID, failure.getId()); | |||
builder.field(STATUS, failure.getStatus().getStatus()); | |||
builder.startObject(ERROR); | |||
builder.field(_INDEX, failure.getIndex()); | |||
builder.field(_ID, failure.getId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if filter_path only contains error, how are users able to determine which document actually failed since successful docs won't return any response, and with auto-generated id, it becomes difficult for clients to know which document failed. (Applicable only for auto generated ids)
@@ -96,6 +96,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws | |||
builder.field(_ID, failure.getId()); | |||
builder.field(STATUS, failure.getStatus().getStatus()); | |||
builder.startObject(ERROR); | |||
builder.field(_INDEX, failure.getIndex()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding these new fields will add to the b/w usage for existing users who are not using filter path. I would suggest that we rather include this in the error reason in a way that this is backward compatible. You can also consider adding a new field that can be controlled by query parameter similar to what we have in _cat/nodes
api where we can control which fields are returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default we should not be increasing the response size and it should be controlled by the user that they need the additional information that you have added here.
@aswath86 Are you planning to continue on this change? |
❌ Gradle check result for 0a71128: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
One of the Bulk API best practices is to reduce the response size using filter_path. AWS OpenSearch document says this,
Also, often times, response code for a Bulk request cannot be trusted since document level failures are not known but are only known in the bulk response.
For example, consider the below failed document
filter_path such as
filter_path=items.index.error
will give the below, leaving no clue about which document on what index failed.One cannot reduce the response size as well as capture failed documents. The idea is to add the _index, _id and status to the
error
object too so it gives us this,_index, _id and status would be repeated for those responses that end in an error. Are we ok with that?
May not be super useful when _id is auto-generated but useful when _id is client-generated
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.