Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests with ReactiveElasticsearchRepository methods doesn't fail if it can't connect with Elasticsearch #1712

Closed
josemalonsom opened this issue Feb 28, 2021 · 9 comments · Fixed by #1713
Labels
type: bug A general bug

Comments

@josemalonsom
Copy link

Spring Boot: 2.3.9.RELEASE
Spring Data Elastic Search: 4.0.7.RELEASE

Problem

When doing a search request with a ReactiveElasticsearchRepository method, as for example findAllById(), if there's no connection with Elasticsearch, for example the cluster is stopped just before the request is made, the connection error is silenced and the request finish correctly but without returning values.

How to reproduce it:

  1. With an Elasticsearch repository prepare a request like:
        List<TestDocument> documents = repository.findAllById(Arrays.asList(ID1, ID2))
                .collectList()
                .block();
  1. Set a breakpoint just before executing the search.
  2. Start the service in debug mode with Elasticsearch running.
  3. Once it stops in the breakpoint stop Elasticsearch.
  4. With Elasticsearch stopped resume the execution.

Result: The request won't fail and an empty list will be returned.

I wasn't able to prepare a test as I couldn't pinpoint exactly where the problem occurs, but I prepared a project with this example TestBug.test(), it's a little clumsy, it requires to stop the execution with a breakpoint, sorry for that, but is the only way I found to reproduce it, I hope it helps.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Feb 28, 2021
@sothawo
Copy link
Collaborator

sothawo commented Mar 1, 2021

I can reproduce this with you demo like you described. I still have to investigate why the dropped connection is not reported.
When changing your demo to use Spring Boot 2.4.3 (with Spring Data Elasticsearch 4.1.5), I get an error:

2021-03-01 08:07:01.398  WARN 11460 --- [ctor-http-nio-2] r.netty.http.client.HttpClientConnect    : [id: 0x2391c6d5, L:/127.0.0.1:57551 - R:localhost/127.0.0.1:9200] The connection observed an error

io.netty.handler.timeout.ReadTimeoutException: null

2021-03-01 08:07:01.409 ERROR 11460 --- [ctor-http-nio-2] reactor.core.publisher.Operators         : Operator called default onErrorDropped

reactor.core.Exceptions$ErrorCallbackNotImplemented: org.springframework.data.elasticsearch.client.NoReachableHostException: Host 'localhost:9200' not reachable. Cluster state is offline.
Caused by: org.springframework.data.elasticsearch.client.NoReachableHostException: Host 'localhost:9200' not reachable. Cluster state is offline.
	at org.springframework.data.elasticsearch.client.reactive.SingleNodeHostProvider.lambda$lookupActiveHost$4(SingleNodeHostProvider.java:108) ~[spring-data-elasticsearch-4.1.5.jar:4.1.5]

Is it an option to upgrade your application to the actual Spring Boot version?

I do not know currently if it is a change in Spring Data Elasticsearch or in Spring webflux that now shows the correct behaviour, I'll have to investigate that.

@sothawo sothawo added the status: waiting-for-feedback We need additional information before we can continue label Mar 1, 2021
@josemalonsom
Copy link
Author

Thanks @sothawo. Yes, it seems with Spring Boot 2.4.3 an error is thrown.

About the possibility of upgrading, I'm not sure if it's a possibility for us, I'm starting to investigate which changes we'd require for doing so, the service in which we found the issue is in production and uses 2.3.9.RELEASE and it seems there're a lot of changes between the different dependencies in sprint 2.4.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 1, 2021
@sothawo
Copy link
Collaborator

sothawo commented Mar 1, 2021

thanks for the feedback; I'll dig into this after work today.

@josemalonsom
Copy link
Author

Thanks @sothawo.

@sothawo
Copy link
Collaborator

sothawo commented Mar 1, 2021

Debugging reactive code is ... well, not nice.

What I found out is, that the problem comes from a function deep in the Spring webflux code, namely org.springframework.web.reactive.function.client.ExchangeFunctions.DefaultExchangeFunctions class

This is from version 5.2.13.RELEASE, pulled in by Spring Boot 2.4.3:

public Mono<ClientResponse> exchange(ClientRequest clientRequest) {
    Assert.notNull(clientRequest, "ClientRequest must not be null");
    HttpMethod httpMethod = clientRequest.method();
    URI url = clientRequest.url();
    String logPrefix = clientRequest.logPrefix();

    return this.connector
            .connect(httpMethod, url, httpRequest -> clientRequest.writeTo(httpRequest, this.strategies))
            .doOnRequest(n -> logRequest(clientRequest))
            .doOnCancel(() -> logger.debug(logPrefix + "Cancel signal (to close connection)"))
            .map(httpResponse -> {
                logResponse(httpResponse, logPrefix);
                return new DefaultClientResponse(
                        httpResponse, this.strategies, logPrefix, httpMethod.name() + " " + url,
                        () -> createRequest(clientRequest));
            });
}

There is no error handling here, the closed connection is not handled, and an empty Mono is returned.

In version 5.3.0.RC1 this was changed to

@Override
public Mono<ClientResponse> exchange(ClientRequest clientRequest) {
	Assert.notNull(clientRequest, "ClientRequest must not be null");
	HttpMethod httpMethod = clientRequest.method();
	URI url = clientRequest.url();
	String logPrefix = clientRequest.logPrefix();
	return this.connector
			.connect(httpMethod, url, httpRequest -> clientRequest.writeTo(httpRequest, this.strategies))
			.doOnRequest(n -> logRequest(clientRequest))
			.doOnCancel(() -> logger.debug(logPrefix + "Cancel signal (to close connection)"))
			.onErrorResume(WebClientUtils::shouldWrapException, t -> wrapException(t, clientRequest))
			.map(httpResponse -> {
				logResponse(httpResponse, logPrefix);
				return new DefaultClientResponse(
						httpResponse, this.strategies, logPrefix, httpMethod.name() + " " + url,
						() -> createRequest(clientRequest));
			});
}

Here now the error is wrapped in an exception. Since 5.3.2 this is changed to

@Override
public Mono<ClientResponse> exchange(ClientRequest clientRequest) {
	Assert.notNull(clientRequest, "ClientRequest must not be null");
	HttpMethod httpMethod = clientRequest.method();
	URI url = clientRequest.url();
	String logPrefix = clientRequest.logPrefix();

	return this.connector
			.connect(httpMethod, url, httpRequest -> clientRequest.writeTo(httpRequest, this.strategies))
			.doOnRequest(n -> logRequest(clientRequest))
			.doOnCancel(() -> logger.debug(logPrefix + "Cancel signal (to close connection)"))
			.onErrorResume(WebClientUtils.WRAP_EXCEPTION_PREDICATE, t -> wrapException(t, clientRequest))
			.map(httpResponse -> {
				logResponse(httpResponse, logPrefix);
				return new DefaultClientResponse(
						httpResponse, this.strategies, logPrefix, httpMethod.name() + " " + url,
						() -> createRequest(clientRequest));
			});
}

So Im afraid there is nothing in Spring Data Elasticsearch I can do to handle this error as I don't get any information about it from the underlying Spring WebFlux framework.

@josemalonsom
Copy link
Author

Thanks @sothawo for looking into this, I really appreciate it.

Following your insight I did some more debug trying to understand the problem. This is what I found:

It seems the error reaches the method DefaultReactiveElasticsearchClient.handleServerError(), here the error is received and statusCode=503 and status=SERVICE_UNAVAILABLE.

Also setting logging.level.org.springframework.data.elasticsearch.client.WIRE: TRACE I can see the following message:

2021-03-02 08:23:45.076 TRACE 27364 --- [ctor-http-nio-2] o.s.data.elasticsearch.client.WIRE       : [2b400bd0] Received raw response: 503 SERVICE_UNAVAILABLE

after that I can't understand what's happening, it seems the error is lost somehow.

Most likely I'm wrong, but it seems as if the error reaches the Spring Data Elasticsearch layer but somehow it's not propagated, what do you think? does this make sense?.

@sothawo
Copy link
Collaborator

sothawo commented Mar 2, 2021

have to check that as well, especially where this 503 comes from. Normally a 503 comes from the server with an error message, here there is an empty body as well. And this code does not come from the server, as there is no server available anymore.

But you're right, at least this error - no matter where it comes from - should come up at the client call.

@sothawo sothawo added type: bug A general bug and removed status: feedback-provided Feedback has been provided status: waiting-for-triage An issue we've not yet triaged labels Mar 2, 2021
@sothawo
Copy link
Collaborator

sothawo commented Mar 2, 2021

I will change the handleServerError() method to deal with the case of an empty body as well, I will do this in the current master and backport it to the 4.1 and 4.0 branches, as an empty body is currently not handled.

In your case this will then be translated to an UncategorizedElasticsearchException which will be thrown in the case that the server is not available anymore. Although a 503 is not really correct in this case. But you won't have the problem with the empty list and no errors.

In the current version this comes up as a network error anyway.

sothawo added a commit that referenced this issue Mar 3, 2021
@sothawo sothawo added this to the 4.2 M5 (2021.0.0) milestone Mar 3, 2021
sothawo added a commit that referenced this issue Mar 3, 2021
Original Pull Request #1713
Closes #1712

(cherry picked from commit 6634d00)
sothawo added a commit that referenced this issue Mar 3, 2021
Original Pull Request #1713
Closes #1712

(cherry picked from commit 6634d00)
Test adapted
@josemalonsom
Copy link
Author

@sothawo Awesome, I just tested it with our service and I can confirm you it solves our issue. Thank you very much 😄.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
3 participants