Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory surge during load testing #2527

Open
jinzaz opened this issue Jun 21, 2024 · 10 comments
Open

Memory surge during load testing #2527

jinzaz opened this issue Jun 21, 2024 · 10 comments
Assignees
Labels
Type: Bug Something isn't working
Milestone

Comments

@jinzaz
Copy link

jinzaz commented Jun 21, 2024

Describe the bug

When I created an empty Web API Project and only added the reverse proxy service, and then used JMeter for load testing, I found a surge in memory. Below was the stack information monitored using dotnet dump and dotnet counters. I suspected that Jeeter was using keep-alive, but I added the app UseRequestTimeouts(); There will still be these phenomena, and I don't know what causes them
screenshot-20240621-232017
screenshot-20240621-231923

Further technical details

  • yarp.reverse-proxy version 2.2.0-preview.1.24266.1
  • linux docker contianer
  • net 8.0
@jinzaz jinzaz added the Type: Bug Something isn't working label Jun 21, 2024
@MihaZupan
Copy link
Member

MihaZupan commented Jun 27, 2024

Was the memory dump screenshot you shared taken after all the requests already completed?

Also just in case, can you share how you're using YARP (e.g. config file) / whether and how you're using the IHttpForwarder directly.

@karelz karelz added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 27, 2024
@jinzaz
Copy link
Author

jinzaz commented Jun 28, 2024

yeah this dump was after all the requests already completed!,this is my YARP config

  "ReverseProxy": {
    "Routes": {
      "basic-apiservice": {
        "Match": {
          "Methods": null,
          "Hosts": null,
          "Path": "/xxx/xxxx/{**catch-all}",
          "QueryParameters": null,
          "Headers": null
        },
        "ClusterId": "basic-apiservice",
        "AuthorizationPolicy": null,
        "RateLimiterPolicy": null,
        "Timeout": "00:00:20",
        "CorsPolicy": null,
        "Metadata": null,
        "Transforms": [
          {
            "PathRemovePrefix": "/xxxx/xxxx"
          }
        ]
      }
    },
    "Clusters": {
      "basic-apiservice": {
        "LoadBalancingPolicy": null,
        "SessionAffinity": null,
        "HealthCheck": null,
        "HttpClient": {
          "EnableMultipleHttp2Connections": true
        },
        "HttpRequest": {
          "AllowResponseBuffering": "false"
        },
        "Destinations": {
          "basic-apiservice/destination1": {
            "Address": "http://xxxx",
            "Health": "http://xxxxxx",
            "Metadata": null,
            "Host": null
          }
        },
        "Metadata": null
      }
    }
  }

I seem to have found a solution, I seem to have found the cause of the problem, partly due to the PinnedBlockMemoryPool in aspnetcore when the container memory is not restricted. This is similar to this problem ,dotnet/aspnetcore#24958
and partly due to the low execution frequency of GC Server Mode in GC mode,But now I'm solving it by using GarbageCollectionAdaptationMode

@jinzaz
Copy link
Author

jinzaz commented Jun 28, 2024

But I found that YARP's CPU usage is very high, and I'm not sure if this is normal, or can you recommend optimization methods

@MihaZupan MihaZupan removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 9, 2024
@adityamandaleeka
Copy link
Member

Can you specify more about the CPU usage being high? Is it high under load relative to when idle? That is expected. In general, CPU traces are a useful way to determine what's happening with a process when it is using a significant amount of CPU time.

@jinzaz
Copy link
Author

jinzaz commented Aug 2, 2024

Under actual load, the CPU usage is relatively high because the upstream of my load also passes through nginx, but nginx's CPU usage is much lower. Therefore, I want to know if this is normal

@doddgu
Copy link

doddgu commented Aug 12, 2024

Under actual load, the CPU usage is relatively high because the upstream of my load also passes through nginx, but nginx's CPU usage is much lower. Therefore, I want to know if this is normal

I also found a problem with high cpu usage, which can be seen #2427
This is a very serious problem in heavy traffic.

@MihaZupan MihaZupan self-assigned this Aug 20, 2024
@MihaZupan
Copy link
Member

Has using the GC adaptation mode resolved your memory usage concerns?
The feature is built for scenarios like this, where you want to reclaim memory that was only needed during a temporary surge in traffic. Note also that it will be enabled by default as of .NET 9 with the server GC.

Re: CPU consumption, are your request latencies impacted by it? We're aware of scenarios where CPU usage may stay high as we're expecting to deal with more traffic. See e.g. dotnet/runtime#72153 (comment)
Do you see a positive change if you set DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0?

@doddgu
Copy link

doddgu commented Aug 29, 2024

Re: CPU consumption, are your request latencies impacted by it? We're aware of scenarios where CPU usage may stay high as we're expecting to deal with more traffic. See e.g. dotnet/runtime#72153 (comment) Do you see a positive change if you set DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0?

@MihaZupan hi, thanks for your reply.

Based on our current observations, the increase in backend service latency affects the number of threads in the thread pool and the CPU usage. The number of requests being processed is also significantly increasing, and the final latency will inevitably be impacted. However, from the surface analysis, I think this might be due to the temporary creation of a large number of threads. Of course, we will try your suggestion and will continue to provide feedback once we have results.

@zhenlei520
Copy link

Has using the GC adaptation mode resolved your memory usage concerns? The feature is built for scenarios like this, where you want to reclaim memory that was only needed during a temporary surge in traffic. Note also that it will be enabled by default as of .NET 9 with the server GC.

Re: CPU consumption, are your request latencies impacted by it? We're aware of scenarios where CPU usage may stay high as we're expecting to deal with more traffic. See e.g. dotnet/runtime#72153 (comment) Do you see a positive change if you set DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0?

We tried to optimize CPU usage by configuring the following environment variables
DOTNET_ThreadPool_ThreadsToKeepAlive=500,DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0,DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS=1
After the modification, the thread accumulation situation improved. Modifying the DOTNET_ThreadPool_UnfairSemaphoreSpinLimit value helped improve CPU utilization. Compared with before, the CPU was reduced by 30%. However, there was still a sudden high CPU usage. However, at this time, the downstream response time did not fluctuate significantly, and the total number of requests did not change significantly, but the processing capacity slowed down. The good news is that it quickly dropped. It will take some time to observe the specific reasons.

image
image
image
image

@zhenlei520
Copy link

Has using the GC adaptation mode resolved your memory usage concerns? The feature is built for scenarios like this, where you want to reclaim memory that was only needed during a temporary surge in traffic. Note also that it will be enabled by default as of .NET 9 with the server GC.
Re: CPU consumption, are your request latencies impacted by it? We're aware of scenarios where CPU usage may stay high as we're expecting to deal with more traffic. See e.g. dotnet/runtime#72153 (comment) Do you see a positive change if you set DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0?

We tried to optimize CPU usage by configuring the following environment variables DOTNET_ThreadPool_ThreadsToKeepAlive=500,DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0,DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS=1 After the modification, the thread accumulation situation improved. Modifying the DOTNET_ThreadPool_UnfairSemaphoreSpinLimit value helped improve CPU utilization. Compared with before, the CPU was reduced by 30%. However, there was still a sudden high CPU usage. However, at this time, the downstream response time did not fluctuate significantly, and the total number of requests did not change significantly, but the processing capacity slowed down. The good news is that it quickly dropped. It will take some time to observe the specific reasons.

image image image image

After testing, it was found that modifying DOTNET_ThreadPool_ThreadsToKeepAlive would cause greater CPU fluctuations, and it is better to let the system manage the thread pool itself than to control it

@MihaZupan MihaZupan added this to the Backlog milestone Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants