Skip to content

Hang with no activity in certain high-concurrency situations #3559

Open
@abizjak

Description

@abizjak

Version

[[package]]
name = "hyper"
version = "0.14.28"

[[package]]
name = "h2"
version = "0.3.24"

[[package]]
name = "tonic"
version = "0.10.2"

Platform

Linux ****** 6.6.13-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 20 18:03:28 UTC 2024 x86_64 GNU/Linux

and also

Linux ****** 6.5.11-300.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov  8 22:37:57 UTC 2023 x86_64 GNU/Linux

Description

In case of a lot of concurrent queries being requested by the client, and with a limit on the number of concurrent queries by the server (but with no load shedding) both the client and server eventually seem to just hang, neither of then doing anything (i.e, there is no CPU usage, and the server, if queried by a different client responds to the new requests from that client).

The exact reproducibility is hard to nail down, but it seems to have something to do with the amount of data being transferred, since if I make messages that are being transferred very small I cannot reproduce.

During attempts I have once also witnessed this

thread 'tokio-runtime-worker' panicked at 'assertion failed: self.window_size.0 >= sz as i32', /home/abizjak/.cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.24/src/proto/streams/flow_control.rs:181:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

but unfortunately I was not running with RUST_BACKTRACE, and I have been unable to reproduce it again.

One way to structure the description:

The server code is this

use tonic::{transport::Server, Request, Response, Status};

use hello_world::greeter_server::{Greeter, GreeterServer};
use hello_world::{HelloReply, HelloRequest};

pub mod hello_world {
    tonic::include_proto!("helloworld");
}

#[derive(Default)]
pub struct MyGreeter {}

#[tonic::async_trait]
impl Greeter for MyGreeter {
    async fn say_hello(
        &self,
        request: Request<HelloRequest>,
    ) -> Result<Response<HelloReply>, Status> {
        let reply = hello_world::HelloReply {
            message: format!(
                "{:?}, {}",
                [0; 10000], // I have been unable to reproduce with size 1000 here.
                request.into_inner().name,
            ),
        };
        Ok(Response::new(reply))
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let addr = "[::1]:50051".parse().unwrap();
    let greeter = MyGreeter::default();

    Server::builder()
        .concurrency_limit_per_connection(100)
        .add_service(GreeterServer::new(greeter))
        .serve(addr)
        .await?;

    Ok(())
}

And the client code is

use futures::stream::FuturesUnordered;
use futures::TryStreamExt;
use hello_world::greeter_client::GreeterClient;
use hello_world::HelloRequest;
use tracing_subscriber::filter::LevelFilter;

pub mod hello_world {
    tonic::include_proto!("helloworld");
}

// 300 -> Hang

// 150 -> Hang

// 100 -> Hang

// 50 -> No hang
const N: usize = 80;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // It seems a lot harder to reproduce with logging. I have been unable to do so with trace logging.
    // So consider commenting out.
    {
        use tracing_subscriber::prelude::*;
        let log_filter = tracing_subscriber::filter::Targets::new()
            .with_target("h2", LevelFilter::DEBUG)
            .with_target("hyper", LevelFilter::DEBUG);
        tracing_subscriber::registry()
            .with(tracing_subscriber::fmt::layer())
            .with(log_filter)
            .init();
    }

    let client = GreeterClient::connect("http://[::1]:50051").await?;

    let futs = FuturesUnordered::new();
    let mut message = String::with_capacity(N);
    for _ in 0..N {
        message.push('1');
    }
    // The goal is to make concurrent requests.
    for i in 0..100_000 {
        let request = tonic::Request::new(HelloRequest { name: message.clone() });
        let mut client = client.clone();
        futs.push(async move {
            client.say_hello(request).await.map(|r| (r, i))
        });
    }
    futs.try_for_each_concurrent(None, move |(r, i)| async move {
        let msg = r.into_inner().message;
        let len = msg.len();
        eprintln!("{len}, {i}",);
        Ok(())
    })
    .await?;

    Ok(())
}

I am attaching a complete Rust project with lock files that can be run as follows

  1. Start server
cargo run --release --bin server
  1. Run client
cargo run --release --bin client

Omitting --release also seems to reproduce it.

I am attaching two things

Metadata

Metadata

Assignees

Labels

A-http2Area: HTTP/2 specific.C-bugCategory: bug. Something is wrong. This is bad!

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions