Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Warning Http Header when one ore more documents fails in bulk #76216

Open
snakefoot opened this issue Aug 7, 2021 · 10 comments
Open

Add Warning Http Header when one ore more documents fails in bulk #76216

snakefoot opened this issue Aug 7, 2021 · 10 comments
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@snakefoot
Copy link

snakefoot commented Aug 7, 2021

Have this BytesResponse from a bulk-operation, where Success = true:

Successful (200) low level call on POST: /_bulk
{
"took":0,
"errors":true,
"items":[{"index":{"_index":"logstash-2021.07.12","_type":"_doc","_id":"OTgmmnoBq3Eup03vE24t","status":403,"error":{"type":"security_exception","reason":"action [indices:data/write/bulk[s]] is unauthorized for API key id [NzgjmnoBq3Eup03vg25H] of user [elastic] on indices [logstash-2021.07.12], this action is granted by the index privileges [create_doc,create,delete,index,write,all]"}

And when checking Http-Headers then there is no "Warning" header:

responseMessage.Headers.TryGetValues("Warning", out warnings);

Please add "Warning"-header to also signal that "One ore more items with errors", when no other "deprecated warnings". Then one can react on this and deserialize the entire response to diagnose the issue (Avoid spending time on deserializing succesful responses, when no warnings).

See also: elastic/elastic-transport-net#30

@snakefoot snakefoot added >enhancement needs:triage Requires assignment of a team area label labels Aug 7, 2021
@nik9000 nik9000 added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed needs:triage Requires assignment of a team area label labels Aug 9, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Aug 9, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor

IMO the question of whether to add a header in this case is much more general than just for POST /_bulk: many APIs will return 200 OK to indicate that the request was coordinated successfully even if it encountered one or more failures after coordination, and we could reasonably indicate whether we were completely successful or not in the headers returned from all of them. It's also a question for the client folks whether/how users could indicate via the client APIs that they only care about the response body if it contains this sort of failure.

As it stands today we deliberately include status info within the first few bytes (e.g. "errors":true or the number of failed nodes or shards) so a streaming parser can start to read the body and bail out quickly if it decides it doesn't care any more. I've no idea if any of the official clients actually use this fact but I've certainly seen folks doing this with a low-level HTTP client.

As such I'd like the clients team to discuss this idea instead.

@DaveCTurner
Copy link
Contributor

An alternative idea, specifically for the _bulk API, would be to add a parameter that suppresses all the document-level responses if there's no errors. I see a lot of clients that check the errors field and if it's false they simply discard the rest; if we had a way for the client to tell us it's going to do that we could save ourselves the bother of even sending all that junk over the wire.

@Mpdreamz
Copy link
Member

@DaveCTurner #55088 😄

@snakefoot
Copy link
Author

I'm mostly interested in having the feature in the LowLevel Client, so one can use BytesResponse, and only deserialize when the reply-payload is worth deserializing.

@DaveCTurner
Copy link
Contributor

There's no need to change anything in Elasticsearch to solve this in the Java low-level REST client, you can just start to parse the body and stop at the errors field if it's false. It's the 2nd or 3rd field, you shouldn't need to look at more than ~60 bytes of body.

@snakefoot
Copy link
Author

snakefoot commented Aug 10, 2021

I guess if the BytesResponse in the low-level-client received a nice method that could do that quick-scan without loading up memory-stream, stream-readers with decoding etc. And then quickly return whether the reply-payload should be fully deserialized to reveal the actual warnings.

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Aug 11, 2021

We (the @elastic/es-distributed team) discussed this today and want to propose an alternative mode(*) in which we only return the errors in all cases, because this would actually let us add some nice optimisations (data-transfer reductions) within Elasticsearch itself, and would save data transfer and client-side effort in the very common case that you only care about the errors. The main question is how to correlate the errors with the original docs, we could either pad the items array with null or replace it with a map keyed by the index of the doc that had the error (numbers in strings woo) add another key to each per-item map containing its original index. Both should work but neither sounds super-neat. I've asked the clients folks whether they have a preference, but I like the extra-key idea.

(*) edit to add that the default behaviour would remain as it is today, clients would need to opt-in to this alternative mode

@snakefoot
Copy link
Author

I'm happy if the "alterntive mode" becomes available through the DotNet LowLevel-client. And I would be really happy if BytesResponse provided a userfriendly (and fast) way to check for any errors.

@DaveCTurner
Copy link
Contributor

The alternative mode would certainly be available in the low-level client (you can do anything there) but I expect we'd add it to the high-level client too. I also expect that the method for checking for any errors would remain the same as it is today, but it'd be much faster since in the no-errors case the response would be tiny and therefore trivial to parse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

5 participants