Hang with no activity in certain high-concurrency situations

**Version**

```
[[package]]
name = "hyper"
version = "0.14.28"

[[package]]
name = "h2"
version = "0.3.24"

[[package]]
name = "tonic"
version = "0.10.2"
```

**Platform**

```
Linux ****** 6.6.13-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 20 18:03:28 UTC 2024 x86_64 GNU/Linux
```
and also
```
Linux ****** 6.5.11-300.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov  8 22:37:57 UTC 2023 x86_64 GNU/Linux
```

**Description**

In case of a lot of concurrent queries being requested by the client, **and with a limit on the number of concurrent queries by the server** (but with no load shedding) both the client and server eventually seem to just hang, neither of then doing anything (i.e, there is no CPU usage, and the server, if queried by a different client responds to the new requests from that client).

The exact reproducibility is hard to nail down, but it seems to have something to do with the amount of data being transferred, since if I make messages that are being transferred very small I cannot reproduce.

During attempts I have once also witnessed this
```
thread 'tokio-runtime-worker' panicked at 'assertion failed: self.window_size.0 >= sz as i32', /home/abizjak/.cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.24/src/proto/streams/flow_control.rs:181:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
but unfortunately I was not running with RUST_BACKTRACE, and I have been unable to reproduce it again.

One way to structure the description:

The server code is this
```rust
use tonic::{transport::Server, Request, Response, Status};

use hello_world::greeter_server::{Greeter, GreeterServer};
use hello_world::{HelloReply, HelloRequest};

pub mod hello_world {
    tonic::include_proto!("helloworld");
}

#[derive(Default)]
pub struct MyGreeter {}

#[tonic::async_trait]
impl Greeter for MyGreeter {
    async fn say_hello(
        &self,
        request: Request<HelloRequest>,
    ) -> Result<Response<HelloReply>, Status> {
        let reply = hello_world::HelloReply {
            message: format!(
                "{:?}, {}",
                [0; 10000], // I have been unable to reproduce with size 1000 here.
                request.into_inner().name,
            ),
        };
        Ok(Response::new(reply))
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let addr = "[::1]:50051".parse().unwrap();
    let greeter = MyGreeter::default();

    Server::builder()
        .concurrency_limit_per_connection(100)
        .add_service(GreeterServer::new(greeter))
        .serve(addr)
        .await?;

    Ok(())
}
```

And the client code is
```rust
use futures::stream::FuturesUnordered;
use futures::TryStreamExt;
use hello_world::greeter_client::GreeterClient;
use hello_world::HelloRequest;
use tracing_subscriber::filter::LevelFilter;

pub mod hello_world {
    tonic::include_proto!("helloworld");
}

// 300 -> Hang

// 150 -> Hang

// 100 -> Hang

// 50 -> No hang
const N: usize = 80;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // It seems a lot harder to reproduce with logging. I have been unable to do so with trace logging.
    // So consider commenting out.
    {
        use tracing_subscriber::prelude::*;
        let log_filter = tracing_subscriber::filter::Targets::new()
            .with_target("h2", LevelFilter::DEBUG)
            .with_target("hyper", LevelFilter::DEBUG);
        tracing_subscriber::registry()
            .with(tracing_subscriber::fmt::layer())
            .with(log_filter)
            .init();
    }

    let client = GreeterClient::connect("http://[::1]:50051").await?;

    let futs = FuturesUnordered::new();
    let mut message = String::with_capacity(N);
    for _ in 0..N {
        message.push('1');
    }
    // The goal is to make concurrent requests.
    for i in 0..100_000 {
        let request = tonic::Request::new(HelloRequest { name: message.clone() });
        let mut client = client.clone();
        futs.push(async move {
            client.say_hello(request).await.map(|r| (r, i))
        });
    }
    futs.try_for_each_concurrent(None, move |(r, i)| async move {
        let msg = r.into_inner().message;
        let len = msg.len();
        eprintln!("{len}, {i}",);
        Ok(())
    })
    .await?;

    Ok(())
}
```

I am attaching a complete Rust project with lock files that can be run as follows

1. Start server
```
cargo run --release --bin server
```

2. Run client

```
cargo run --release --bin client
```

Omitting `--release` also seems to reproduce it.

I am attaching two things
- the project that can be run [example.tar.gz](https://github.com/hyperium/hyper/files/14154282/example.tar.gz)
- a log (of the client) of one of the runs which ended in a hang [log.txt.gz](https://github.com/hyperium/hyper/files/14154283/log.txt.gz)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Hang with no activity in certain high-concurrency situations #3559

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Hang with no activity in certain high-concurrency situations #3559

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions