Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc server memory leak #797

Closed
whfuyn opened this issue Oct 19, 2021 · 10 comments
Closed

grpc server memory leak #797

whfuyn opened this issue Oct 19, 2021 · 10 comments

Comments

@whfuyn
Copy link

whfuyn commented Oct 19, 2021

Bug Report

Version

└── tonic v0.5.2
└── tonic-build v0.5.2

Platform

Linux XXX 5.11.0-37-generic #41-Ubuntu SMP Mon Sep 20 16:39:20 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Description

tonic grpc server takes a lot of memory under high load and end up killed by the kernel for OOM.

The memory usage didn't drop even after the client stopped.

I made a simple demo with tonic related code only:
https://github.com/whfuyn/grpc-test

top -p `pgrep -d ',' 'grpc-'`

PID     USER  PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
_       _      20   0 3341.6m   2.5g   3.7m S   0.0   7.9   1:27.75 grpc-server

Other info:

  1. didn't occur in Windows 10.
  2. didn't occur when [tokio::main(flavor = "current_thread")] and single client instance, but still occur when there are multiple clients (2 in my test).
  3. mitigated when max_concurrent_streams is set, but results in some client requests timeout.
  4. memory usage drop after client stopped when run with valgrind.

Also post the demo code here:

proto file

syntax = "proto3";
package test;

message Req {
    bytes payload = 1;
}

message Resp {
    bytes payload = 1;
}

service TestService {
    rpc call(Req) returns (Resp);
}

server

struct Service;

#[tonic::async_trait]
impl TestService for Service {
    async fn call(&self, _req: Request<Req>) -> Result<Response<Resp>, Status> {
        Ok(Response::new(Resp{ payload: vec![] }))
    }
}

#[tokio::main]
async fn main() {
    let hostname = "0.0.0.0";
    let port = 6666;

    let addr = format!("{}:{}", hostname, port).parse().unwrap();
    tonic::transport::Server::builder()
        .add_service(TestServiceServer::new(Service))
        .serve(addr)
        .await
        .unwrap();
}

client, run multiple instances to reproduce this quickly.

let hostname = "localhost";
let port = 6666;

let client = {
    let uri = format!("http://{}:{}", hostname, port);
    let channel = Endpoint::from_shared(uri)
        .unwrap()
        .connect_lazy()
        .unwrap();
    TestServiceClient::new(channel)
};

let hs: Vec<tokio::task::JoinHandle<()>> = (0..200_000).map(|_| {
    let mut client = client.clone();
    tokio::spawn(async move {
        client.call(Req{ payload: vec![6; 1000]}).await.unwrap();
    })
})
.collect();

for h in hs {
    h.await.unwrap();
}
@whfuyn
Copy link
Author

whfuyn commented Oct 19, 2021

It seems caused by hyper.
hyperium/hyper#1790

@aleksuss
Copy link

Seems that I stuck with the same issue as well. I've tried to debug your example with console-subscriber and I found out remarkably increasing amount of task after some time (~30 secs). After client finishing the task amount doesn't decrease.

Before running a client:
Screenshot 2021-12-21 at 17 03 15

After running a client:
Screenshot 2021-12-21 at 17 06 04

@whfuyn
Copy link
Author

whfuyn commented Dec 23, 2021

It's a good idea to use console. I'll try to dig into it later.

@LucioFranco
Copy link
Member

@seanmonstar ^

@LucioFranco
Copy link
Member

This is probably a better issue for hyper so might be good to create a minimal reproduction.

@nashley
Copy link
Contributor

nashley commented Jun 30, 2022

@LucioFranco by minimal reproduction, do you mean of hyper code or of tonic code?
This behavior (of tasks leaking) is present even in the helloworld-server example.
Here's a screenshot from tokio-console after a single invocation of the helloworld-client:
20220630_15h32m45s_grim
And here's a screenshot from tokio-console after 79 more invocations:
20220630_15h40m58s_grim
edit: I had missed the active (running+idle) task count and was instead looking at the total task count, which includes tasks that have completed.

In order for tokio-console to work, I added only the following to the helloworld-server example (in addition to adding console-subscriber to the Cargo.toml):

diff --git a/examples/src/helloworld/server.rs b/examples/src/helloworld/server.rs
index c6398bb..6502e61 100644
--- a/examples/src/helloworld/server.rs
+++ b/examples/src/helloworld/server.rs
@@ -27,6 +27,7 @@ impl Greeter for MyGreeter {

 #[tokio::main]
 async fn main() -> Result<(), Box<dyn std::error::Error>> {
+    console_subscriber::init();
     let addr = "[::1]:50051".parse().unwrap();
     let greeter = MyGreeter::default();

The memory usage (and slowdown from so many tasks) grows slowly, but it grows indefinitely.

@nashley
Copy link
Contributor

nashley commented Jun 30, 2022

Actually, I think I just forgot how to read tokio-console and ignored the fact that the tasks are no longer running (they are visibly removed with tokio-console --retain-for 1us).

My apologies.

The memory usage does still seem to grow over time even with the simple helloworld-server, but I'll do some more digging first to better understand what's happening, since tasks don't appear to be leaking.

@nashley
Copy link
Contributor

nashley commented Jun 30, 2022

Not sure if this applies to OP, but I think my issue at least was actually due to expected behavior within console-subscriber itself (i.e., keeping relevant information around for an hour after a task completes by default): tokio-rs/console#184 (comment)

Running the example as-is (i.e., without console-subscriber) does not appear to cause any memory usage growth.

@gabrik
Copy link

gabrik commented Oct 25, 2024

Is there any solution/workaround to this?

@LucioFranco
Copy link
Member

Unless someone can actually create a minimal reproducer, I don't think there is actually a memory leak here that needs to be fixed. I am gong to close this issue feel free to post any reproducer and we can reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants