Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce no_items_on_success option to the bulk API #55088

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

Mpdreamz
Copy link
Member

This allows the _bulk API to return an empty array for items if so explicitly requested by the user.

This also introduces an extra items_omitted property to broadcast the act of omitting the items from the response very loudly.

POST /_bulk?no_items_on_success
{ "index": { "_index" : "a" } } 
{}
{ "index": { "_index" : "a" } } 
{}
{ "index": { "_index" : "a" } }
{}
{
    "took": 530,
    "errors": false,
    "items_omitted": true,
    "items": []
}

If no_items_on_success is set but the bulk has failures items will always return all the items and items_omitted will be false.

{ "index": { "_index" : "a" } } 
{}
{ "index": { "_index" : "a" } } 
{ "i": 2}
{ "index": { "_index" : "a" } }
{ "i": "string" }
{
    "took": 56,
    "errors": true,
    "items_omitted": false,
    "items": [
		...
        {
            "index": {
                "_index": "a",
                "_id": "64vOZXEBvXhNefbuM4g2",
                "status": 400,
                "error": {
                   ...
                }
            }
        }
    ]
}


@Mpdreamz Mpdreamz requested review from dliappis and removed request for dliappis April 10, 2020 20:39
@jtibshirani jtibshirani added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement labels Apr 10, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/CRUD)

builder.startArray(ITEMS);
for (BulkItemResponse item : this) {
item.toXContent(builder, params);
if (noFailuresAnNoItemsRequested == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if the decision on whether to include the items can be made earlier.
Maybe in TransportBulkAction in the finishHim() method around line 541?
This would then save creating bulk item response array.

Would be great if we could make this decision even earlier, so that no shard level response items
are collected. Then in the case that there are failures then we could serialize the successful items
as null elements? But then I'm afraid that we break the bulk response format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I could do the first, that would definitely help 👍

Not sure the second warrants the additional complexity? The goal of this PR is to shave off serialization time on the consumer. Perhaps something to consider once this lands?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martijnvg revisiting this I think it's best not to introduce the complexity in the transport layer and only expose this on the REST layer.

It would introduce a set of new problems

  • shards would need to send back how many items they would have returned
  • reducing all shard information becomes more complex to expand the empty shard responses
  • We need to find a new representation for successful bulk operations in case of failures.

The PR as it stands does not preclude taking this on as follow up work but is useful as is already.

Keen to hear your thoughts!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mpdreamz I agree, this PR is useful in its current form.

However I think it makes sense to not include the items in the bulk response if there are no failures and no_items_on_success is set. This can simply be done by iterating over the response items and see if there is an error and then have an additional constructor of bulk responses which accepts no response items and noFailuresAnNoItemsRequested (or sets that always to true).

@Mpdreamz Mpdreamz force-pushed the feature/master/bulk-empty-items-on-success branch from 2b7ab7d to c9f3cb0 Compare May 4, 2020 07:57
@rjernst rjernst added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 4, 2020
@Mpdreamz Mpdreamz force-pushed the feature/master/bulk-empty-items-on-success branch from a4a4c7d to 99654a5 Compare August 5, 2020 14:49
@Mpdreamz Mpdreamz force-pushed the feature/master/bulk-empty-items-on-success branch from 99654a5 to 2d393f4 Compare December 7, 2020 12:26
@@ -153,9 +167,14 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
builder.field(INGEST_TOOK, ingestTookInMillis);
}
builder.field(ERRORS, hasFailures());
boolean noItems = noItemsOnSuccess != null && noItemsOnSuccess;
boolean noFailuresAndNoItemsRequested = noItems && this.hasFailures() == false;
builder.field(ITEMS_OMITTED, noFailuresAndNoItemsRequested);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can return items with null value instead of returning the items_omitted field?
It looks like we currently never return null, so we could use that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The third state for items_omitted adds more complexity to the users of the API IMO. Knowing the property is always there makes checking for it all the easier.

@Mpdreamz Mpdreamz force-pushed the feature/master/bulk-empty-items-on-success branch from af86291 to e623aa6 Compare December 10, 2020 10:29
@Mpdreamz Mpdreamz requested a review from martijnvg December 14, 2020 11:27
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few more comments.

@@ -4,7 +4,7 @@
<titleabbrev>Bulk</titleabbrev>
++++

Performs multiple indexing or delete operations in a single API call.
Performs multiple indexing or delete operations in a single API call.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe undo these whitespace changes?


public static final long NO_INGEST_TOOK = -1L;

private final BulkItemResponse[] responses;
private final long tookInMillis;
private final long ingestTookInMillis;
@Nullable
private final Boolean noItemsOnSuccess;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be a primitive boolean? I think we default to false.


public BulkResponse(BulkItemResponse[] responses, long tookInMillis, long ingestTookInMillis, Boolean noItemsOnSuccess) {
final boolean hasError = responses != null && Arrays.stream(responses).anyMatch(r-> r != null && r.isFailed());
this.responses = noItemsOnSuccess != null && noItemsOnSuccess && hasError == false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment that the bulk items are omitted because these are no longer needed if noItemsOnSuccess is true, which would allow the gc to clean the items up before the response has been serialized over the network.

@@ -49,28 +50,42 @@
private static final String ERRORS = "errors";
private static final String TOOK = "took";
private static final String INGEST_TOOK = "ingest_took";
private static final String ITEMS_OMITTED = "items_omitted";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename to no_items? Then it is more inline with the field name noItemsOnSuccess ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants