-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak on high number of concurrent connections #1790
Comments
Thanks for the detailed writeup! I have some follow-up questions:
|
Oh, this specific one is just HTTP1. (Since I'm just spinning up a basic server without tls defaulting to http1). But it happens with HTTP2 as well - remember noticing the problem first running it on warp with tls. However, let me verify this again soon on HTTP2.
Unfortunately, this is the only version I've tested so far. Will update in case I happen to able to run some more tests soon. |
Thanks for the extra info! I've looked around, and so far nothing jumps out at me. It seems everything is dropped correctly when the connection is closed. Which makes me suspicious if it's just the memory allocator holding onto the blocks instead of eagerly releasing back to the OS. I'll try some other log configs to see what else I can learn. |
@seanmonstar I think those Massif profiles indicate that the memory is held by some |
One quick way to rule this out would be to test this with a raw tokio TCP socket. However..
I think @lnicola is right. While it's a reasonably accurate indication that, it is the |
Apologies, I was unfamiliar with Massif, I thought was just saying where memory has been allocated before. If it's saying that memory isn't being de-allocated, then it seems the task owning that memory is just hanging around in the executor. This could either be due to a flaw in the executor (less likely), or a specific case causing hyper's connection task to stop polling even though the connection closed (more likely). I've been using this patch to monitor if the tasks are dropped, and so far with hyper's hello world example, it always drops back to 0. There may be something where a connection hangs up while hyper is processing a request body or maybe sitting in idle mode, but I've so far not found how to trigger it. |
Could it be related to half-closed connections? Or does that not apply to HTTP/2? |
It could, by default hyper allows half-closed from the client. If there's a response to write, though, that should either finish the connection and close, or notice the |
@seanmonstar I don't think you're completely off from how massif works. AFAIK, it primarily relies on collecting allocations through a time period. The reason I think it's a reasonably accurate indication is due to correlation. This path way seems to be the only one that generates remotely enough allocations, and it matches up with the overall memory usage. What I pasted into the issue is just one small part of the profile that I thought was relevant. If you look at the entire profile and run it with either
I just tried this. Same result. Drops back to 0, but memory usage seems to increase as usual. |
Just thinking aloud looking at this - could there be any pooled memory that's being reused that's increasing it's capacity on load but not trimming itself again? More precisely |
The buffer inside Since the memory was reaching a large size, and not freeing it, but not growing bigger if the benchmark is run against the same running server, that made me assume that something is just reusing memory, and if the task is dropped and it's being freed, I started to suspect the allocator. But if it's only happening in hyper and not with some similar buffered TCP server in Rust, than it is likely a bug somewhere in hyper. I just don't yet know where. |
@seanmonstar Any update on this issue? |
@ccgogo123 latest status is all there. Are you experiencing this? Could be helpful to dig more. If not, I actually was thinking of closing as there doesn't seem to be much more to do (I don't experience a leak myself). |
@seanmonstar What I'm experiencing is that the rusoto_s3 client doesn't release the memory under high concurrency. I happened to see this issue and suspect the root cause might be this. |
@ccgogo123 - Would be helpful to see if you can eliminate tokio threadpool as the cause. Are you able to provide any insight? @seanmonstar - Could you hold it off until Jan 2020? I should be able to get some free time mid Jan, and would like to take another stab at digging in, and also check on newer std future based versions. |
Hi, I am experiencing this as well in https://github.com/gauteh/dars. This is streaming netCDF files using the DAP protocol. In this case it keeps increasing with sequential requests as well, without ever going down. I am using hyper directly with tokio. I have not been able to isolate the leak in the netcdf reading library, so I suspect I am seeing this issue. |
@seanmonstar Any further update on this issue? I'm still facing this. I have tried this with Warp 0.2 and tide 0.8.0 , and with high concurrent load the memory keeps rising but is not released even hours after load drops. FYI - I used the following for loadtest:
The tests are pretty basic. For example, this is the hello-world code for warp:
Test run on MacOS Mojave. |
As stated before, all updates are in the issue. I have not been able to reproduce it myself. It's difficult to pin point that this is actually leaked memory. It can also be a case of the allocator not releasing back to the OS. Does the behavior exist if you use jemalloc instead of the system allocator? I'm inclined to close this (really this time) unless someone would like to dig more into this themselves, as I can't trigger it. |
I also have this problem. I'm concerned because if I put my private app into our production system (our peak is 1M+ simultaneous connections), I think I'm guaranteed to have a memory leak. I've created a repo to maybe? help with reproducing the problem: https://github.com/brianbruggeman/warp-mem-leak-test Getting started:
Environment variables
Highlights:
|
hi @brianbruggeman have you read #1790 (comment), tried with jemalloc instead of the system allocator? |
Edit: Assuming I correctly set the allocator, then I still see the same problems. For my jemalloc test, I added jemallocator-global. I also created a branch specifically for this test: https://github.com/brianbruggeman/warp-mem-leak-test/tree/jemalloc |
Let me jump in here - I can confirm @brianbruggeman's inference on allocators. I had switched allocator in the past and don't recall it showing any difference. (Sorry for not following up before - ended up with other life priorities unfortunately) @seanmonstar would you mind stating your platform, OS, mem etc? I'm curious to see if that can help dig in further since you don't seem to be running into this, interestingly (is this still the case?) |
It seems related to the allocator. After I switch to Jemalloc, memory reduces when the client side send less requests. (triggered by sending new but less requests. If not new requests sent, memory won't reduce) |
@whfuyn I switched to |
Yes maybe it's kind of an edge case and maybe people are okay with restarting their server once in a while... |
I also think that it's more of a pub async fn get_data_route(Extension(client): Extension<Arc<MyClient>>) -> impl IntoResponse {
// fetching huge payload
let res = client
.body(...)
.send()
.await
.unwrap();
// without this part the memory seems to be stable
let data = res.json::<Foo>().await.unwrap();
(StatusCode::OK, Json("hello world"))
} or it could be an issue with It could also be that there's a general problem with spawning tasks, I'm wondering how it works with |
I'll just share some of the data we have, as it may be helpful in narrowing down a smaller repro. I have some load tests for We have a load test instrumented like so:
When running the client with steadily increasing (and then resetting) load we see memory use grow steadily, basically in lock-step with the number of server connections. Memory is not released when load resets: On the server-side, however, memory usage basically remains steady. The server uses far fewer TCP connections, but it also creates far fewer Hyper servers. I can instrument this test to run with pure TCP connections and no Hyper servers/clients. I'll report back when I have some data points there. At the moment, it appears as if our memory costs are correlated with instantation of per-connection Hyper servers, though... That might be something to focus on as we try to get a narrower reproduction. |
I've changed the test scenario so that we use a single proxy:
And in this scenario:
We can see in both cases that the behavior is roughly similar: memory usage never decreases. The same load profile is used in both cases. So the proxy is transporting the same amount of raw bytes. Though, in the HTTP case we use dramatically more more memory--presumably due to more allocation. In this case, we're using Looking at the
I'll try to get this running through heap track, though in the past it has shown wildly different results (seemingly underreporting used memory). |
Here's where things get weird. Using the system allocator and using heaptrack we see the 'leaky behavior' as usual: But heaptrack reports a very different picture: Heaptrack claims that there only 5.4MB of 'leaked' allocations remaining at the end of the run, while the I'm assuming this indicates that this memory is actually freed, but the operating system doesn't recognize that? I suppose I'll test jemalloc for completeness... |
High performance memory allocators are built to allocate memory quickly, not to keep RSS at the minimum possible value. They generally operate under the assumption that if you have allocated a bunch of memory recently, you will want to allocate a bunch of memory again in the future and so hold onto the pages for some period of time. If you actually want to look for leaks you should be asking the allocator how many bytes it has actually allocated to the program. In jemalloc, this can be done with the jemalloc-ctl crate: https://docs.rs/jemalloc-ctl/0.3.3/jemalloc_ctl/stats/struct.allocated.html. |
If it does appear that the application is explicitly holding onto allocated memory for too long, jemalloc also has profiling features you can use to see what sections of code are performing those allocations: https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Leak-Checking |
When I tried |
@seanmonstar I ran into this issue with Tonic 0.7.2 using 0.14.19 under tokio 1.19.2. The behavior was exactly the same with the system allocator and with jemalloc: the more load (requests/second) you put on the server, the more memory it allocates. That memory is not freed. Decreased load is handled without further allocation, but once the load is increases about the former level, additional memory gets allocated. |
Did anyone ever resolve this? |
Was there any mitigation for this? |
I encountered the same problem as @hseeberger.It could not work even I switched another allocator. |
I think the leak is pretty well established by this point - just that I don't think anyone (including myself) has had the time enough to dig in to get to the exact source, rather than simpler outer workarounds. I think @olix0r made an interesting point above - if this can be reproduced by just tokio. Not sure anyone tried this. Would be interested to see this across different fixed set of worker thread pool. Think that can help clarify the domain. |
I don't thinkd the issue should be closed without a solution . |
I was having the same issue and switched to the MiMalloc allocator. It releases memory instantly - from 2GB when underload back to 50MB when idle, constrast to the default allocator which peaks at 3GB and 1.7GB when idle. MiMalloc does not only consume less memory but also performs better in my case, bench tested both on my Linux machine and Docker container with resource limited to 1CPU 1GB. I think I'm going to stick with MiMalloc for now. |
@wonrax this is interesting. Was there any specific config you had used for MiMalloc? I re-call seeing these comments: #1790 (comment) earlier by @olix0r where mimalloc doesn't seem to have helped. Any many more comments about related to jemalloc. Curious if this was a specific config or mimalloc itself had some tweaks that seems to have better behavior. |
I didn't set or tweak any config, the code only imported and set mimalloc as the global allocator |
I tested it in a virtual machine of VMware, with the environment being Rust is the latest version All terminals are set to the maximum file number with Direct access to http under the same environment. For the server-side axum, I wrote a most basic 'hello', and tested three memory allocation modes mentioned in the previous post: the default memory allocator, MiMalloc and JeMalloc. First, I tested the performance with The plan is to use reqwest as a client to hit axum with 1000/10000 QPS within seconds, and use tokio's interval to hit it once every second, continuously for more than a minute. The code is the official example, all tokio asynchronous, no synchronous blocking code, and the report prints once a second, which basically does not affect the concurrency performance.
When starting, the MiMalloc mode has more than 400+K memory, and the default allocator, JeMalloc are all less than 300K memory. When the client continues at 1000 QPS, after a few seconds, the memory of MiMalloc will be around 30+M, and the others are around 50~60M. When the client continues at 10000 QPS, after a few seconds, the memory of MiMalloc will be around 300+M, and the others are around 500~600M. When the client pressure test ends, the server-side of all memory allocation modes will maintain high memory usage, and will not automatically release memory in a short time if there are no new requests. However, sending requests at a low frequency (intermittently 1~100 QPS) will gradually release memory.
First of all, it must be admitted that axum/hyper does have a memory allocation problem, and after multiple rounds of requests, it does not return to a memory state close to the initial one. Under the default allocator, using reqwest to hit axum at 1000 QPS for two hours, 720,000 requests, the memory is still around 55~60M. Memory cannot be correctly recovered, which is suspected to be related to the number of concurrent requests, and has little to do with the total number of requests. MiMalloc and JeMalloc both release memory better than the default allocator, although the final memory state of the two is still different, but it can be basically considered within the bearable range.
Because in the same environment, CPU overload and port congestion will affect both the client and the server at the same time. Although reqwest used tokio's semaphore to control concurrency, if the instantaneous concurrency is too large, the CPU will be full, and the client side will generate a large number of send errors, the specific reason is unknown, it is speculated that it may be that the TCP port is full. In this environment, reqwest can send about 28,000 continuously within seconds without errors, but after a few seconds, a large number of send errors will appear, which will block the system's TCP channel, and even if the server is not down, it cannot provide services. Ultimately, it was decided to test at the maximum of 10,000 QPS, which could handle requests within a second, completing all transactions in about 200~300 ms. Under the condition that the CPU is not overloaded, there is basically no big difference in the throughput performance of the server side of each memory allocator. But even at 10000 QPS, after running for a period of time, it is still possible for the CPU to overload and squeeze requests. Under the continuous 10000 QPS situation, The default allocator start to run full CPU after about ten seconds, and start to report send errors; MiMalloc starts to run full CPU after about thirty seconds, and starts to report send errors; JeMalloc starts to run full CPU after about eighty seconds, and starts to report send errors. When the CPU is full, the axum server side will be unable to access, and after the client stops sending requests, the server side will return to normal within a few seconds.
According to this test process, even if it has memory allocation problems, it seems that the memory will not increase indefinitely, MiMalloc performed better in memory allocation and release during the test, JeMalloc performed better in terms of CPU occupancy while taking into account memory release. |
@prasannavl ,Modifying the memory allocator isn't complicated in itself, but I'm unsure about the feasibility of integrating it into the hyper framework. cargo.toml:
src/main.rs:
|
I have the same issue. |
严格说这个不是bug或者内存泄漏。是系统内存管理策略。使用malloc申请的内存, free 后会将空闲的内存合并成大的内存块,并一定会立马还给操作系统(这块具体可以看 free 源码实现),malloc_trim 就是将空闲的块都还回去。 unsafe {
libc::malloc_trim(0);
} 更换分配器就行。 |
Hyper version: 0.12.25
Context
Server
(For brevity, I've used the warp app, but a basic hyper sample works as well - Here's a simple hello world servers off several hyper based servers (tide, warp, raw hyper, etc) that I used to test: https://github.com/quixon-labs/playground-rs)
Boom! (Stress tests)
Memory observation
It doesn't go beyond 5 threads. Note, when tokio blocking functions are used, this can rise up all the way to 100 threads (which don't go back down yet) by default causing a significant memory bump. So, used a simpler example to sidestep this as the cause.
Heap profiles
Here's the relevant bit:
(I'm not entirely sure what exactly is causing this. My first guess was some
thread_local
cache per connection, but couldn't really see any evidence to co-relate that yet). I'm also not sure if tokio or hyper is the cause. But it is consistently noticeable in all hyper based servers.Here's the full massif profile: massif.out.zip
Note: The above massif profile is from a cargo build without release optimisations which caused the memory levels to stay much lower. Running this on release builds atleast double the usage as compared to the debug builds.
Edit:
I'm attaching here, another massif profile (release build) that's probably more interesting:
Full massif profile [release build]:
massif.release.out.zip
The text was updated successfully, but these errors were encountered: