[Bulk] Add _index, _id, status to ERROR object #10015

aswath86 · 2023-09-13T11:29:08Z

Description

One of the Bulk API best practices is to reduce the response size using filter_path. AWS OpenSearch document says this,

This response size might seem minimal, but if you index 1,000,000 documents per day—approximately 11.5 documents per second—339 bytes per response works out to 10.17 GB of download traffic per month.

Also, often times, response code for a Bulk request cannot be trusted since document level failures are not known but are only known in the bulk response.

For example, consider the below failed document

{
    "index": {
    "_index": "bulk_response",
    "_id": "2",
    "status": 400,
    "error": {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [field2x] within [_doc] is not allowed"
    }
    }
}

filter_path such as filter_path=items.index.error will give the below, leaving no clue about which document on what index failed.

  {
    "index": {
      "error": {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [field2x] within [_doc] is not allowed"
      }
    }
  }

One cannot reduce the response size as well as capture failed documents. The idea is to add the _index, _id and status to the error object too so it gives us this,

  {
    "index" : {
      "error" : {
        "_index" : "bulk_response",
        "_id" : "3",
        "status" : 400,
        "type" : "strict_dynamic_mapping_exception",
        "reason" : "mapping set to strict, dynamic introduction of [field2x] within [_doc] is not allowed"
      }
    }
  }

_index, _id and status would be repeated for those responses that end in an error. Are we ok with that?

May not be super useful when _id is auto-generated but useful when _id is client-generated

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff
Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Aswath <it.aswath@gmail.com> This is to reduce the bulk response size with filter_path on items.index.error and capture failed documents

github-actions · 2023-09-13T11:58:49Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/25479/
CommitID: cbc0a90
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-09-13T11:59:38Z

Compatibility status:

Checks if related components are compatible with change cbc0a90

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git]

opensearch-trigger-bot · 2023-10-13T15:20:22Z

This PR is stalled because it has been open for 30 days with no activity.

ticheng-aws · 2024-01-06T00:19:54Z

Hi @aswath86, the PR is stalled. Is this being worked upon? Feel free to reach out to maintainers for further reviews.

opensearch-trigger-bot · 2024-02-12T15:19:55Z

This PR is stalled because it has been open for 30 days with no activity.

opensearch-trigger-bot · 2024-03-24T15:19:35Z

This PR is stalled because it has been open for 30 days with no activity.

mgodwan · 2024-04-23T08:24:07Z

server/src/main/java/org/opensearch/action/bulk/BulkItemResponse.java

@@ -94,6 +94,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
            builder.field(_ID, failure.getId());
            builder.field(STATUS, failure.getStatus().getStatus());
            builder.startObject(ERROR);
+            builder.field(_INDEX, failure.getIndex());
+            builder.field(_ID, failure.getId());


Is this always generated when the error is passed? What if the error was encountered even before the document id could be generated?

Then the behaviour would be the same as in for builder.field(_ID, failure.getId()); that is above line builder.startObject(ERROR);

In that case, I see that users who are not providing filter_path, they will get the _id and _index field twice in case of errors and this adds additional payload by default. Wondering if there is a better way to solve this

Also, if filter_path only contains error, how are users able to determine which document actually failed since successful docs won't return any response, and with auto-generated id, it becomes difficult for clients to know which document failed. (Applicable only for auto generated ids)

github-actions · 2024-06-28T15:31:58Z

❌ Gradle check result for 693cea5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-07-01T08:00:25Z

❌ Gradle check result for 140d25d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-07-01T12:45:29Z

❌ Gradle check result for 620165f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

mgodwan · 2024-07-03T06:50:40Z

server/src/main/java/org/opensearch/action/bulk/BulkItemResponse.java

@@ -94,6 +94,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
            builder.field(_ID, failure.getId());
            builder.field(STATUS, failure.getStatus().getStatus());
            builder.startObject(ERROR);
+            builder.field(_INDEX, failure.getIndex());
+            builder.field(_ID, failure.getId());


In that case, I see that users who are not providing filter_path, they will get the _id and _index field twice in case of errors and this adds additional payload by default. Wondering if there is a better way to solve this

mgodwan · 2024-07-03T06:53:28Z

server/src/main/java/org/opensearch/action/bulk/BulkItemResponse.java

@@ -96,6 +96,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
            builder.field(_ID, failure.getId());
            builder.field(STATUS, failure.getStatus().getStatus());
            builder.startObject(ERROR);
+            builder.field(_INDEX, failure.getIndex());


Can you add tests for this?

mgodwan · 2024-07-03T06:57:59Z

server/src/main/java/org/opensearch/action/bulk/BulkItemResponse.java

@@ -94,6 +94,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
            builder.field(_ID, failure.getId());
            builder.field(STATUS, failure.getStatus().getStatus());
            builder.startObject(ERROR);
+            builder.field(_INDEX, failure.getIndex());
+            builder.field(_ID, failure.getId());


Also, if filter_path only contains error, how are users able to determine which document actually failed since successful docs won't return any response, and with auto-generated id, it becomes difficult for clients to know which document failed. (Applicable only for auto generated ids)

ashking94 · 2024-07-03T07:17:09Z

server/src/main/java/org/opensearch/action/bulk/BulkItemResponse.java

@@ -96,6 +96,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
            builder.field(_ID, failure.getId());
            builder.field(STATUS, failure.getStatus().getStatus());
            builder.startObject(ERROR);
+            builder.field(_INDEX, failure.getIndex());


Adding these new fields will add to the b/w usage for existing users who are not using filter path. I would suggest that we rather include this in the error reason in a way that this is backward compatible. You can also consider adding a new field that can be controlled by query parameter similar to what we have in _cat/nodes api where we can control which fields are returned.

By default we should not be increasing the response size and it should be controlled by the user that they need the additional information that you have added here.

mgodwan · 2024-07-22T15:34:17Z

@aswath86 Are you planning to continue on this change?

github-actions · 2024-10-28T09:34:06Z

❌ Gradle check result for 0a71128: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

aswath86 added 2 commits September 13, 2023 13:02

[Bulk] Add _index, _id, status to ERROR object

3c46ea4

Signed-off-by: Aswath <it.aswath@gmail.com> This is to reduce the bulk response size with filter_path on items.index.error and capture failed documents

Merge remote-tracking branch 'origin/main' into main

cbc0a90

opensearch-trigger-bot bot added the stalled Issues that have stalled label Oct 13, 2023

opensearch-trigger-bot bot removed the stalled Issues that have stalled label Jan 9, 2024

opensearch-trigger-bot bot added the stalled Issues that have stalled label Feb 12, 2024

sohami added enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing API Issues with external APIs labels Feb 14, 2024

opensearch-trigger-bot bot removed the stalled Issues that have stalled label Feb 17, 2024

opensearch-trigger-bot bot added the stalled Issues that have stalled label Mar 24, 2024

mgodwan reviewed Apr 23, 2024

View reviewed changes

opensearch-trigger-bot bot removed the stalled Issues that have stalled label Apr 28, 2024

ankitkala assigned aswath86 Jun 24, 2024

Merge branch 'opensearch-project:main' into main

693cea5

aswath86 requested a review from ashking94 as a code owner June 28, 2024 14:55

Merge branch 'opensearch-project:main' into main

140d25d

Merge branch 'opensearch-project:main' into main

620165f

mgodwan reviewed Jul 3, 2024

View reviewed changes

ashking94 reviewed Jul 3, 2024

View reviewed changes

This was referenced Sep 9, 2024

[AUTOCUT] Gradle Check Flaky Test Report for RemoteClustersIT #14599

Closed

[AUTOCUT] Gradle Check Flaky Test Report for RemoteSegmentTransferTrackerTests #14325

Open

Merge branch 'opensearch-project:main' into main

0a71128

aswath86 requested review from jainankitk and linuxpi as code owners October 28, 2024 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bulk] Add _index, _id, status to ERROR object #10015

[Bulk] Add _index, _id, status to ERROR object #10015

aswath86 commented Sep 13, 2023

github-actions bot commented Sep 13, 2023

github-actions bot commented Sep 13, 2023

opensearch-trigger-bot bot commented Oct 13, 2023

ticheng-aws commented Jan 6, 2024

opensearch-trigger-bot bot commented Feb 12, 2024

opensearch-trigger-bot bot commented Mar 24, 2024

mgodwan Apr 23, 2024

aswath86 Jul 1, 2024

mgodwan Jul 3, 2024

mgodwan Jul 3, 2024 •

edited

Loading

github-actions bot commented Jun 28, 2024

github-actions bot commented Jul 1, 2024

github-actions bot commented Jul 1, 2024

mgodwan Jul 3, 2024

mgodwan Jul 3, 2024

mgodwan Jul 3, 2024 •

edited

Loading

ashking94 Jul 3, 2024

ashking94 Jul 3, 2024

mgodwan commented Jul 22, 2024

github-actions bot commented Oct 28, 2024

[Bulk] Add _index, _id, status to ERROR object #10015

Are you sure you want to change the base?

[Bulk] Add _index, _id, status to ERROR object #10015

Conversation

aswath86 commented Sep 13, 2023

Description

Related Issues

Check List

github-actions bot commented Sep 13, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Sep 13, 2023

Compatibility status:

Incompatible components

Skipped components

Compatible components

opensearch-trigger-bot bot commented Oct 13, 2023

ticheng-aws commented Jan 6, 2024

opensearch-trigger-bot bot commented Feb 12, 2024

opensearch-trigger-bot bot commented Mar 24, 2024

mgodwan Apr 23, 2024

Choose a reason for hiding this comment

aswath86 Jul 1, 2024

Choose a reason for hiding this comment

mgodwan Jul 3, 2024

Choose a reason for hiding this comment

mgodwan Jul 3, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Jun 28, 2024

github-actions bot commented Jul 1, 2024

github-actions bot commented Jul 1, 2024

mgodwan Jul 3, 2024

Choose a reason for hiding this comment

mgodwan Jul 3, 2024

Choose a reason for hiding this comment

mgodwan Jul 3, 2024 • edited Loading

Choose a reason for hiding this comment

ashking94 Jul 3, 2024

Choose a reason for hiding this comment

ashking94 Jul 3, 2024

Choose a reason for hiding this comment

mgodwan commented Jul 22, 2024

github-actions bot commented Oct 28, 2024

mgodwan Jul 3, 2024 •

edited

Loading

mgodwan Jul 3, 2024 •

edited

Loading