YARP has a higher cpu usage than Nginx #2427

doddgu · 2024-03-04T10:48:04Z

Sorry, I don't know if it is a bug.

Describe the bug

I deployed 3 nginx at HongKong, and deployed 3 YARP at HangZhou.

Client -> Nginx -> Yarp -> Service

Nginx forwards some services, and YARP forward one of them.

Nginx CPU

YARP CPU

YARP other metrics

Htop (Cat.Service.dll is based on YARP)

I tried to analyze the CPU on vs
Top function

Module View

To Reproduce

No exception.

Further technical details

Include the version of the packages you are using
2.1.0
The platform (Linux/macOS/Windows)
Linux

They're all 4c8g, YARP on ubuntu 22.04, nginx on centos.
YARP 2.1.0 runs on .NET 8.

Tratcher · 2024-03-04T16:53:05Z

How does the load / RPS compare?

doddgu · 2024-03-05T02:11:11Z

How does the load / RPS compare?

Every YARP is almost 4000

doddgu · 2024-03-05T03:38:58Z

I loaded pdb.
I find that the Thread in WorkerThreadStart method. The Thread.CurrentThread.SetThreadPoolWorkerThreadName() takes up a lot of CPU resources.

I don't know why have to call WorkerThreadStart so many times.

doddgu · 2024-03-06T05:11:34Z

I used YARP source code analysis, I found that YARP itself does not have high cpu usage.

doddgu · 2024-03-08T02:20:09Z

Hi @MihaZupan , any news?

doddgu · 2024-03-11T10:52:20Z

Is it related to the dotnet/runtime#70098
And I see there's pr to fix it

doddgu · 2024-08-12T03:12:54Z

@MihaZupan hi，is there any news？
In my case, I have a service , it has 120,000 qps. It only need 3 nginx, but used 40 yarp services. It troubles me.
I tried using.net 9 and I found a performance improvement of about 20%, but that's still a big difference.
Or are there any temporary ways to try to fix the problem? I'm happy to test it.

zhenlei520 · 2024-08-12T04:00:49Z

The performance gap is so obvious, is there any room for improvement?

zhenlei520 · 2024-08-23T10:19:43Z

How does the load / RPS compare?

Is there any news about this issue?
Through observations over the past few days, we found that when the response time of downstream services fluctuates, Porxy is under great pressure. Simply put, requests that originally required 100 threads to process require more threads to process these requests due to downstream fluctuations. At this time, threads are piled up, and then more threads are quickly started to process these requests. However, this rapid change of threads in a short period of time causes obvious CPU fluctuations, and as the downstream stabilizes, threads that have not been used for a long time will be destroyed. In this way, downstream fluctuations will have a great impact on Proxy. Although we set the minimum number of threads, this will not prevent the thread pool from recycling threads later. It only enables more threads to be started quickly. We hope to keep these threads alive all the time, and do not want frequent thread startups to cause large CPU fluctuations.

 ThreadPool.SetMinThreads(500, 500);

@Tratcher @MihaZupan

zhenlei520 · 2024-08-23T10:35:19Z

doddgu · 2024-09-10T03:09:20Z

We upgrade .NET 8 to .NET 9 preview, and set some envionment variables

The most obvious improvement in .NET 9 is half the memory

DOTNET_SYSTEM_NET_SOCKETS_THREAD_COUNT = 500
DOTNET_ThreadPool_UnfairSemaphoreSpinLimit = 0
DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS = 1

Overall, it indeed consumes less CPU (around 30% less), and there are no longer minute-level blockages causing widespread timeouts when the Current Request suddenly increases. However, there is still a small probability of request timeouts, and the frequency of CPU fluctuations has become very frequent. We tracked that the downstream service responds quickly, and occasionally requests timeout due to yarp, but because the QPS is relatively high, these timeouts are not visible on the dashboard. We have another upstream service that is particularly sensitive to abnormal requests, and in the upstream service, we see that requests with a small probability of timeout occur very frequently.

First, let's look at the performance of yarp, which has indeed improved.

These are abnormal requests detected upstream, all of which are SocketExceptions.

In summary: Setting thread-related parameters can reduce CPU usage but will introduce more instability, and there is still a significant gap compared to Nginx.

zhenlei520 · 2024-09-14T04:08:30Z

Later we made some adjustments to the configuration

<PropertyGroup>
  <TargetFramework>net9.0</TargetFramework>
 <GarbageCollectionAdaptationMode>0</GarbageCollectionAdaptationMode>
</PropertyGroup>

Environment variables

DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0

After turning off spin, the CPU performance increased by nearly 40%, which is indeed a big improvement. However, according to the data, it will affect qps. However, we have not yet added link monitoring, so the impact on qps is not yet known. From the perspective of upstream requests, the average response time is not greatly affected.

However, compared with nginx, yarp still has a lot of room for improvement. We hope to use it instead of other reverse proxy products.

doddgu · 2024-09-18T08:18:14Z

@Tratcher @MihaZupan
Sorry, I have to seek your help again. Because the CPU control still cannot meet our expectations, we might choose another reverse proxy as a result. This is a tough decision, as we are all .NET developers and had high hopes for YARP. Our requirements are not extremely stringent for YARP to match the performance parameters of Nginx. However, if we only need three Nginx servers to handle all the traffic stably, I cannot convince our team to choose YARP, which requires over 40 servers to run stably. I hope the .NET team can see this message and respond to us. Our biggest confusion right now is not knowing when it will be resolved, even just prioritizing the resolution would be very helpful. Thank you.

bxjg1987 · 2024-10-07T15:28:25Z

We are also choosing between nginx and yarp. As. net developers, we prefer to use yarp. Has there been any progress on this issue?

karelz · 2024-10-09T20:27:48Z

We do not see such huge differences in our perf lab between YARP and NGINX. The ratio is currently about 2:3 I believe. @MihaZupan can link our public perf dashboard.
That said, we are interested in learning why do you see so different ratio. However, it will require some deep digging and collaboration. Are you willing to help us understand the root cause and potentially help us improve YARP?

MihaZupan · 2024-10-09T21:59:25Z

Is it related to the dotnet/runtime#70098

Only in the sense that this PR is improving HttpClient (and therefore also YARP) performance.
The change is about lowering contention in the connection pool, which shouldn't be a huge factor at the several thousand RPS/machine that you're looking at. So while it may save you some CPU cycles, I wouldn't expect it to make a meaningful difference in this case.

requests that originally required 100 threads to process require more threads to process these requests due to downstream fluctuations

This is a surprisingly high number of threads to see on a 4-core machine if everything is fully async (you're not doing "sync over async").
Were you seeing these numbers before you started modifying ThreadPool settings?

ThreadPool.SetMinThreads(500, 500);
DOTNET_SYSTEM_NET_SOCKETS_THREAD_COUNT = 500

Settings like these seem excessive for the machine size and are more likely to hurt performance than improve it. I'd recommend removing them unless you have real evidence that they're improving things.
The thread pool should be able to adjust the number of threads to adapt to different load levels.

<GarbageCollectionAdaptationMode>0</GarbageCollectionAdaptationMode>

How come you're disabling this? I was under the impression that you were worried about the memory footprint (#2527) after load spikes without this functionality.

It only need 3 nginx, but used 40 yarp services

How did you arrive at the 40 number?
What happens if you e.g. use 10 instead? Is request latency meaningfully impacted?

DOTNET_ThreadPool_UnfairSemaphoreSpinLimit

While this may reduce the CPU usage reported for a process, it may negatively impact throughput while under load.
If you experiment with reducing the number of YARP instances, such that the per-instance load is higher, does removing this environment variable (leaving the default behavior) make a difference?

There may be other factors impacting the performance between Nginx and YARP.
As Karel mentioned, we're aware of the performance differences, but the numbers we're seeing in the automated performance runs (select the "Proxies" tab on the bottom) are much closer than what you're seeing, nowhere near the 3:40 ratio.

Are both proxies using the same HTTP protocol version? Both between client-proxy and proxy-backend (e.g. YARP will default to HTTP/2 if the backend supports it)?
I saw you've posted previous questions in the repo about e.g. injecting Connection: Close headers. Can you share what your YARP configuration looks like, and how it differs from the Nginx setup?

zhenlei520 · 2024-10-23T03:44:49Z

@MihaZupan

Glad to receive your reply. After confirmation with the business side, we got the ratio of 3:11, close to 1:4. Currently, no special optimization has been done on nginx. Only two operations have been done on yarp.

Weight-based weighting of Destination

public class WeightingRoundLoadBalancingPolicy : ILoadBalancingPolicy
{
    private ILogger<WeightingRoundLoadBalancingPolicy> _logger;

    public string Name => "WeightingRound";

    public WeightingRoundLoadBalancingPolicy(ILogger<WeightingRoundLoadBalancingPolicy> logger)
    {
        _logger = logger;
    }

    public DestinationState? PickDestination(HttpContext context, ClusterState cluster, IReadOnlyList<DestinationState> availableDestinations)
    {
        if (Weighting.WeightedClusterWeights.TryGetValue(cluster.ClusterId, out var weightedWeights))
        {
            if (weightedWeights is null)
            {
                _logger.LogInformation($"PickDestination Error: Can not get [{cluster.ClusterId}] cluster weightedWeights");
                return null;
            }

            if (weightedWeights.DestinationIds is null)
            {
                _logger.LogInformation($"PickDestination Error: Can not get [{cluster.ClusterId}] destination, DestinationIds is null");
                return null;
            }

            var destinationId = weightedWeights.DestinationIds[WeightingHelper.GetIndexByRandomWeight(weightedWeights.DestinationWeightedWeights, weightedWeights.DestinationWeights, weightedWeights.TotalWeights ?? 1D)];

            return availableDestinations.FirstOrDefault(destination => destination.DestinationId == destinationId);
        }

        _logger.LogInformation($"PickDestination Error: Can not get [{cluster.ClusterId}] cluster");
        return null;
    }
}

public class WeightingHelper
{
    public static (double[]? Weights, double? TotalWeight) GetWeightedWeights(double[] weights)
    {
        if (weights.Length == 0) return (null, null);
        else if (weights.Length == 1) return ([.. weights], weights[0]);

        var totalWeight = 0D;
        Span<double> newWeights = stackalloc double[weights.Length];

        for (int i = 0; i < weights.Length; i++)
        {
            totalWeight += weights[i];
            newWeights[i] = totalWeight;
        }

        return ([.. newWeights], totalWeight);
    }

    public static int GetIndexByRandomWeight(Span<double> weightedWeights, Span<double> weights, double totalWeight)
    {
        // Ignore weight when only one server
        if (weightedWeights.Length == 1) return 0;

        var randomWeight = Random.Shared.NextDouble() * totalWeight;
        var index = weightedWeights.BinarySearch(randomWeight);

        if (index < 0)
            index = -index - 1;
        else if (index > weightedWeights.Length)
            // The number of servers decreases
            index = GetIndexByRandomWeight(weightedWeights, weights, totalWeight);

        if (weights[index] != 0D)
            return index;
        else
            // The weight of the server is 0
            return GetIndexByRandomWeight(weightedWeights, weights, totalWeight);
    }
}

Add request log monitoring

public class LoggingMiddleware
{
    private readonly RequestDelegate _next;
    private readonly ILogFormatter _logFormatter;
    private static readonly Channel<byte[]> _logChannel = Channel.CreateUnbounded<byte[]>();
    private static int _batchSize = 500;
    private static DateTime _lastWriteTime;

    public LoggingMiddleware(RequestDelegate next, ILogFormatter logFormatter)
    {
        _next = next;
        _logFormatter = logFormatter;

        int consumerCount = Environment.ProcessorCount;
        Task.Run(ProcessLogsAsync);
    }

    public async Task InvokeAsync(HttpContext context)
    {
        Stopwatch sw = new();
        sw.Start();

        try
        {
            await _next(context);
        }
        finally
        {
            sw.Stop();

            var logEntry = _logFormatter.Format(context, new LogFormatterAdditionalInfo(sw.ElapsedMilliseconds));
            await _logChannel.Writer.WriteAsync(logEntry);
        }
    }

    private static async Task ProcessLogsAsync()
    {
        var batch = new List<byte[]>();

        await foreach (var logEntry in _logChannel.Reader.ReadAllAsync())
        {
            batch.Add(logEntry);

            if (batch.Count >= _batchSize || (DateTime.Now - _lastWriteTime).TotalSeconds >= 1)
            {
                var fs = LoggingHelper.FileStream;
                if (fs is null)
                {
                    continue;
                }
                _lastWriteTime = DateTime.Now;
                await WriteBatchAsync(fs, batch);
                batch.Clear();
            }
        }
    }

    private static async Task WriteBatchAsync(FileStream fs, List<byte[]> batch)
    {
        try
        {
            foreach (var logEntry in batch)
            {
                await fs.WriteAsync(logEntry);
            }
            await fs.FlushAsync();
        }
        finally
        {
        }
    }
}

Our scenario is that the response is longer and the current_request execution is higher.

zhenlei520 · 2024-10-23T06:19:04Z

nginx

worker_processes 4;
worker_rlimit_nofile 655350;

events {
  worker_connections 655350;
}


http {
  include mime.types;

  default_type application/octet-stream;

  log_format main '$remote_addr'
  '|$http_x_forwarded_for'
  '|$time_local'
  '|$request_method'
  '|$uri'
  '|$status'
  '|$upstream_status'
  '|$body_bytes_sent'
  '|$request_time'
  '|$upstream_response_time'
  '|$upstream_http_RequestType'
  '|$upstream_http_ClientID'
  '|$upstream_http_SessionID'
  '|$upstream_http_ise'
  '|$upstream_http_sid'
  '|$upstream_http_ist'
  '|$upstream_http_Accept_Encoding'
  '|$upstream_http_errocde'
  '|$upstream_http_biztype';


  sendfile on;
  
  keepalive_timeout 300;

  #gzip  on;

  upstream XXXProjectA-Intranet {
    server xxx.com;//custom internal domain 
    keepalive 32;
  }


  server {
    listen 80;
    server_name XXXProjectA.com;
	http2_max_concurrent_streams 256;

    location / {
      client_max_body_size 30m; 
      client_body_buffer_size 128k; 

      proxy_pass http://XXXProjectA-Intranet;
      proxy_set_header XXX-Real-IP $remote_addr;;
      proxy_set_header XXX-X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $host;
      proxy_set_header XXX-X-IsSSL "true";

      proxy_connect_timeout 300;
      proxy_send_timeout 300;
      proxy_read_timeout 300;
      proxy_ignore_client_abort on;
      proxy_buffer_size 64k;
      proxy_buffers 4 128k;
      proxy_busy_buffers_size 256k;
      proxy_temp_file_write_size 256k;

      keepalive_requests 1000;
      proxy_http_version 1.1;
      proxy_set_header Connection "";
    }
  }
}

yarp

{
  "Logging": {
    "LogLevel": {
      "Default": "Warning",
      "Microsoft.AspNetCore": "Warning"
    }
  },
  "AllowedHosts": "*",
  "Cat": {
    "ListenUrls": [
      "http://*:80"
    ],
    "Routes": [
      {
        "RouteId": "XXXProjectRoute",
        "Match": {
          "Path": "{**catch-all}"
        },
        "ClusterId": "XXXProjectCluster",
        "Transforms": [
          {
            "RequestHeaderOriginalHost": "true"
          },
          {
            "X-Forwarded": "Set",
            "For": "Off"
          },
          {
            "ResponseHeader": "Connection",
            "Append": "close"
          }
        ]
      }
    ],
    "Clusters": [
      {
        "ClusterId": "XXXProjectCluster",
        "ClusterConfigTemplate": {
          "LoadBalancingPolicy": "WeightingRound",
          "HttpRequest": {
            "ActivityTimeout": "00:03:00"
          }
        },
        "DestinationAddressTemplate": "http://{0}:5000",
        "Destinations": [
          {
            "IPAddress": "192.168.1.1",
            "Weight": 100
          },
	 {
            "IPAddress": "192.168.1.2",
            "Weight": 100
          }
        ]
      }
    ]
  }
}

nginx version: nginx/1.20.1
Yarp.ReverseProxy version: 2.1.0

MihaZupan · 2024-10-23T15:07:58Z

"ResponseHeader": "Connection",
"Append": "close"

Why are you forcing connections from the client to the proxy to never be reused?
I don't see you doing so with Nginx, which could explain the massive perf differences.

zhenlei520 · 2024-10-24T03:03:40Z

"ResponseHeader": "Connection",
"Append": "close"

Why are you forcing connections from the client to the proxy to never be reused? I don't see you doing so with Nginx, which could explain the massive perf differences.

Thank you very much, It has been too long. We are unable to track down why the two configurations were added. We need some time to conduct verification tests.

zhenlei520 · 2024-11-05T11:58:33Z

"ResponseHeader": "Connection",
"Append": "close"

Why are you forcing connections from the client to the proxy to never be reused? I don't see you doing so with Nginx, which could explain the massive perf differences.

After adjusting the configuration, the ratio between nginx and yarp is about 1:2.5. The Connection: Close has been deleted. after testing, turning off spin has lower CPU usage than the default

doddgu added the Type: Bug Something isn't working label Mar 4, 2024

doddgu changed the title ~~Yapr has a higher cpu usage than Nginx~~ Yarp has a higher cpu usage than Nginx Mar 5, 2024

doddgu changed the title ~~Yarp has a higher cpu usage than Nginx~~ YARP has a higher cpu usage than Nginx Mar 5, 2024

adityamandaleeka assigned MihaZupan Mar 5, 2024

MihaZupan added this to the Backlog milestone Apr 9, 2024

This was referenced Aug 12, 2024

Memory surge during load testing #2527

Closed

Add connection/stream count selections to the benchmarks/dashboard #2563

Open

zhenlei520 mentioned this issue Sep 2, 2024

How to deal with traffic bursts dotnet/runtime#107013

Open

MihaZupan added the needs-author-action An issue or pull request that requires more info or actions from the author. label Oct 23, 2024

MihaZupan mentioned this issue Oct 23, 2024

System.Net.Http.HttpRequestException: Normally only one use of each socket address (protocol/network address/port) is allowed dotnet/runtime#109141

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YARP has a higher cpu usage than Nginx #2427

YARP has a higher cpu usage than Nginx #2427

doddgu commented Mar 4, 2024 •

edited

Loading

Tratcher commented Mar 4, 2024

doddgu commented Mar 5, 2024 •

edited

Loading

doddgu commented Mar 5, 2024 •

edited

Loading

doddgu commented Mar 6, 2024

doddgu commented Mar 8, 2024

doddgu commented Mar 11, 2024 •

edited

Loading

doddgu commented Aug 12, 2024 •

edited

Loading

zhenlei520 commented Aug 12, 2024

zhenlei520 commented Aug 23, 2024

zhenlei520 commented Aug 23, 2024

doddgu commented Sep 10, 2024 •

edited

Loading

zhenlei520 commented Sep 14, 2024

doddgu commented Sep 18, 2024 •

edited

Loading

bxjg1987 commented Oct 7, 2024

karelz commented Oct 9, 2024

MihaZupan commented Oct 9, 2024 •

edited

Loading

zhenlei520 commented Oct 23, 2024 •

edited

Loading

zhenlei520 commented Oct 23, 2024 •

edited

Loading

MihaZupan commented Oct 23, 2024

zhenlei520 commented Oct 24, 2024 •

edited

Loading

zhenlei520 commented Nov 5, 2024

YARP has a higher cpu usage than Nginx #2427

YARP has a higher cpu usage than Nginx #2427

Comments

doddgu commented Mar 4, 2024 • edited Loading

Describe the bug

To Reproduce

Further technical details

Tratcher commented Mar 4, 2024

doddgu commented Mar 5, 2024 • edited Loading

doddgu commented Mar 5, 2024 • edited Loading

doddgu commented Mar 6, 2024

doddgu commented Mar 8, 2024

doddgu commented Mar 11, 2024 • edited Loading

doddgu commented Aug 12, 2024 • edited Loading

zhenlei520 commented Aug 12, 2024

zhenlei520 commented Aug 23, 2024

zhenlei520 commented Aug 23, 2024

doddgu commented Sep 10, 2024 • edited Loading

zhenlei520 commented Sep 14, 2024

doddgu commented Sep 18, 2024 • edited Loading

bxjg1987 commented Oct 7, 2024

karelz commented Oct 9, 2024

MihaZupan commented Oct 9, 2024 • edited Loading

zhenlei520 commented Oct 23, 2024 • edited Loading

zhenlei520 commented Oct 23, 2024 • edited Loading

MihaZupan commented Oct 23, 2024

zhenlei520 commented Oct 24, 2024 • edited Loading

zhenlei520 commented Nov 5, 2024

doddgu commented Mar 4, 2024 •

edited

Loading

doddgu commented Mar 5, 2024 •

edited

Loading

doddgu commented Mar 5, 2024 •

edited

Loading

doddgu commented Mar 11, 2024 •

edited

Loading

doddgu commented Aug 12, 2024 •

edited

Loading

doddgu commented Sep 10, 2024 •

edited

Loading

doddgu commented Sep 18, 2024 •

edited

Loading

MihaZupan commented Oct 9, 2024 •

edited

Loading

zhenlei520 commented Oct 23, 2024 •

edited

Loading

zhenlei520 commented Oct 23, 2024 •

edited

Loading

zhenlei520 commented Oct 24, 2024 •

edited

Loading