Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record WebFlux server metrics for client disconnection #23606

Closed
cdmatta opened this issue Oct 7, 2020 · 3 comments
Closed

Record WebFlux server metrics for client disconnection #23606

cdmatta opened this issue Oct 7, 2020 · 3 comments
Assignees
Labels
type: enhancement A general enhancement
Milestone

Comments

@cdmatta
Copy link

cdmatta commented Oct 7, 2020

Version: Spring Boot 2.3.4.RELEASE with Webflux & Metrics

Scenario:

  1. Client aborts the connection before server sends a response.
  2. Server request processing takes forever (server configured with read/write timeout). Server closes the connection

In both the above cases, one can see a log generated due to connection abort (client/server). However, no metrics are generated.

Project dependencies

<dependencies>
	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-webflux</artifactId>
	</dependency>
	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-actuator</artifactId>
	</dependency>
	<dependency>
		<groupId>io.micrometer</groupId>
		<artifactId>micrometer-registry-prometheus</artifactId>
	</dependency>
</dependencies>
@SpringBootApplication
@RestController
public class Application {
    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }

    @GetMapping("/five")
    public Mono<String> five() {
        LocalDateTime start = LocalDateTime.now();
        return Mono.just("start")
                .delayElement(Duration.ofSeconds(5))
                .then(Mono.defer(() -> Mono.just("Call took " + Duration.between(start, LocalDateTime.now()).getSeconds() + " seconds")));
    }

    @GetMapping("/twenty")
    public Mono<String> forever() {
        LocalDateTime start = LocalDateTime.now();
        return Mono.just("start")
                .delayElement(Duration.ofSeconds(20))
                .then(Mono.defer(() -> Mono.just("Call took " + Duration.between(start, LocalDateTime.now()).getSeconds() + " seconds")));
    }

    @GetMapping("/fifty")
    public Mono<String> loopy() {
        LocalDateTime start = LocalDateTime.now();
        return Mono.fromRunnable(() -> {
                    for (int i = 0; i < 10; i++) {
                        System.out.println("Iteration " + i);
                        try {
                            Thread.sleep(5_000);
                        } catch (InterruptedException e) {
                            e.printStackTrace();
                            System.out.println("Thread was interrupted");
                        }
                    }
                }
        ).then(Mono.defer(() -> Mono.just("Call took " + Duration.between(start, LocalDateTime.now()).getSeconds() + " seconds")));
    }

    @Bean
    public WebServerFactoryCustomizer serverFactoryCustomizer() {
        return new NettyTimeoutCustomizer();
    }

    class NettyTimeoutCustomizer implements WebServerFactoryCustomizer<NettyReactiveWebServerFactory> {
        @Override
        public void customize(NettyReactiveWebServerFactory factory) {
            int timeout = 10_000;
            factory.addServerCustomizers(server -> server.tcpConfiguration(tcp ->
                    tcp.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, timeout)
                            .doOnConnection(connection ->
                                    connection
                                            .addHandlerLast(new WriteTimeoutHandler(timeout, MILLISECONDS))
                                            .addHandlerLast(new ReadTimeoutHandler(timeout, MILLISECONDS))
                            )));
        }
    }
}

Use JMeter for the tests below

  1. Call the /five endpoint. Abort the call before 5 seconds.
    Result -
    No metrics are generated

  2. Call the /twenty endpoint.
    Result -
    After 10 seconds, Client receives connection closed - org.apache.http.NoHttpResponseException: 127.0.0.1:8080 failed to respond.
    No metrics are generated.

  3. Call the /fifty endpoint
    Result -
    Strangely enough, after 50 seconds, the call succeeds with response "Call took 50 seconds"
    Metric is generated.
    But why is the connection not interrupted by the server ? Read/Write timeout is configured for 10 seconds.

  4. Call the /fifty endpoint. Abort the call in 10 seconds.
    Result -
    The iterating/sleeping thread does not get interrupted, interal loop succeeds.
    Exception is noticed in log after the entire loop.
    Metrics look like http_server_requests_seconds_max{exception="IOException",method="GET",outcome="SUCCESS",status="200",uri="/fifty",} 50.157466

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Oct 7, 2020
@philwebb
Copy link
Member

philwebb commented Oct 7, 2020

@cdmatta Are you able to share this code as a complete project that we can run? Either a link to a GitHub project or an attached zip would be ideal.

@philwebb philwebb added the status: waiting-for-feedback We need additional information before we can continue label Oct 7, 2020
cdmatta added a commit to cdmatta/sb-conn-interruption-issues that referenced this issue Oct 7, 2020
@cdmatta
Copy link
Author

cdmatta commented Oct 7, 2020

Thanks @philwebb . I added sample code here https://github.com/cdmatta/sb-conn-interruption-issues

Added 2 modules mvc and reactive to allow comparison + JMeter script.
For one test case when client closes the connection before server can reply, one needs to start the jmeter request and "Stop" the execution.

Compared against mvc stack.

When client interrupts the call

  1. Reactive stack - metrics are missing
  2. MVC Stack
    http_server_requests_seconds_sum{exception="ClientAbortException",method="GET",outcome="SUCCESS",status="200",uri="/five",} 5.057541428
    

When server interrupts the request (processing took too long)

  1. Reactive Stack - when netty read/write timeout is configured. Request is processing for too long - Metrics are missing
  2. Reactive Stack - when using timeout on the pipeline. for ex
    Mono.just("result").delayElement(Duration.ofSeconds(20)).timeout(Duration.ofSeconds(5));
     http_server_requests_seconds_sum{exception="TimeoutException",method="GET",outcome="SERVER_ERROR",status="500",uri="/twenty",} 5.127515163
    
  3. MVC Stack - with DeferredResult
    http_server_requests_seconds_sum{exception="AsyncRequestTimeoutException",method="GET",outcome="SERVER_ERROR",status="503",uri="/twenty",} 11.04647488
    

In the Reactive stack when I add a WebFilter, I can see the cancel signal.
This is not caught in the org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter. Thus the metric is missing.
Though, even if one registers for that signal, one does not know if it was the client cancel or server read/write timeout.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Oct 7, 2020
@wilkinsona
Copy link
Member

See #18444 which improved things on the client side. @bclozel Do we need to make some similar changes on the server side?

@wilkinsona wilkinsona added the for: team-attention An issue we'd like other members of the team to review label Jan 15, 2021
@bclozel bclozel removed the for: team-attention An issue we'd like other members of the team to review label Feb 23, 2021
@bclozel bclozel self-assigned this Mar 19, 2021
@bclozel bclozel added type: enhancement A general enhancement and removed status: feedback-provided Feedback has been provided status: waiting-for-triage An issue we've not yet triaged labels Mar 29, 2021
@bclozel bclozel added this to the 2.5.0-RC1 milestone Mar 29, 2021
@bclozel bclozel changed the title Metrics missing on connection interruption Client/Server Record WebFlux server metrics for client disconnection Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

5 participants