Skip to content

Conversation

@jasontedor
Copy link
Member

Today we wrap exceptions that occur while executing an ingest processor in an ElasticsearchException. Today, in ExceptionsHelper#unwrapCause we only unwrap causes for exceptions that implement ElasticsearchWrapperException, which the top-level ElasticsearchException does not. Ultimately, this means that any exception that occurs during processor execution does not have its cause unwrapped, and so its status is blanket treated as a 500. This means that while executing a bulk request with an ingest pipeline, document-level failures that occur during a processor will cause the status for that document to be treated as 500. Since that does not give the client any indication that they made a mistake, it means some clients will enter infinite retries, thinking that there is some server-side problem that merely needs to clear. This commit addresses this by introducing a dedicated ingest processor exception, so that its causes can be unwrapped. While we could consider a broader change to unwrap causes for more than just ElasticsearchWrapperExceptions, that is a broad change with unclear implications. Since the problem of reporting 500s on client errors is a user-facing bug, we take the conservative approach for now, and we can revisit the unwrapping in a future change.

Closes #48803

Today we wrap exceptions that occur while executing an ingest processor
in an ElasticsearchException. Today, in ExceptionsHelper#unwrapCause we
only unwrap causes for exceptions that implement
ElasticsearchWrapperException, which the top-level
ElasticsearchException does not. Ultimately, this means that any
exception that occurs during processor execution does not have its cause
unwrapped, and so its status is blanket treated as a 500. This means
that while executing a bulk request with an ingest pipeline,
document-level failures that occur during a processor will cause the
status for that document to be treated as 500. Since that does not give
the client any indication that they made a mistake, it means some
clients will enter infinite retries, thinking that there is some
server-side problem that merely needs to clear. This commit addresses
this by introducing a dedicated ingest processor exception, so that its
causes can be unwrapped. While we could consider a broader change to
unwrap causes for more than just ElasticsearchWrapperExceptions, that is
a broad change with unclear implications. Since the problem of reporting
500s on client errors is a user-facing bug, we take the conservative
approach for now, and we can revisit the unwrapping in a future change.
@jasontedor jasontedor added >bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.0.0 v7.5.0 v7.6.0 labels Nov 1, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Ingest)

@jasontedor jasontedor marked this pull request as ready for review November 1, 2019 19:27
@jasontedor jasontedor requested a review from martijnvg November 1, 2019 19:27
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @jasontedor and clearly describing the problem here.

@jimczi jimczi removed the v7.5.0 label Nov 12, 2019
@jasontedor jasontedor merged commit d4e82d4 into elastic:master Nov 14, 2019
jasontedor added a commit that referenced this pull request Nov 14, 2019
Today we wrap exceptions that occur while executing an ingest processor
in an ElasticsearchException. Today, in ExceptionsHelper#unwrapCause we
only unwrap causes for exceptions that implement
ElasticsearchWrapperException, which the top-level
ElasticsearchException does not. Ultimately, this means that any
exception that occurs during processor execution does not have its cause
unwrapped, and so its status is blanket treated as a 500. This means
that while executing a bulk request with an ingest pipeline,
document-level failures that occur during a processor will cause the
status for that document to be treated as 500. Since that does not give
the client any indication that they made a mistake, it means some
clients will enter infinite retries, thinking that there is some
server-side problem that merely needs to clear. This commit addresses
this by introducing a dedicated ingest processor exception, so that its
causes can be unwrapped. While we could consider a broader change to
unwrap causes for more than just ElasticsearchWrapperExceptions, that is
a broad change with unclear implications. Since the problem of reporting
500s on client errors is a user-facing bug, we take the conservative
approach for now, and we can revisit the unwrapping in a future change.
@jasontedor jasontedor deleted the ingest-processor-exception branch November 14, 2019 16:05
jasontedor added a commit that referenced this pull request Nov 15, 2019
Today we wrap exceptions that occur while executing an ingest processor
in an ElasticsearchException. Today, in ExceptionsHelper#unwrapCause we
only unwrap causes for exceptions that implement
ElasticsearchWrapperException, which the top-level
ElasticsearchException does not. Ultimately, this means that any
exception that occurs during processor execution does not have its cause
unwrapped, and so its status is blanket treated as a 500. This means
that while executing a bulk request with an ingest pipeline,
document-level failures that occur during a processor will cause the
status for that document to be treated as 500. Since that does not give
the client any indication that they made a mistake, it means some
clients will enter infinite retries, thinking that there is some
server-side problem that merely needs to clear. This commit addresses
this by introducing a dedicated ingest processor exception, so that its
causes can be unwrapped. While we could consider a broader change to
unwrap causes for more than just ElasticsearchWrapperExceptions, that is
a broad change with unclear implications. Since the problem of reporting
500s on client errors is a user-facing bug, we take the conservative
approach for now, and we can revisit the unwrapping in a future change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v7.5.0 v7.6.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Receiving status code 500 on pipeline failures for bulk ingestion requests

5 participants