Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Optimize the Performance of My fasthttp Client in a Production Environment #1793

Open
luoyucumt opened this issue Jun 18, 2024 · 1 comment

Comments

@luoyucumt
Copy link

In my production environment, I use fasthttp to make requests to third-party services. During peak traffic times, the fasthttp client experiences some latency, with some delays possibly exceeding several seconds. To investigate, I conducted a stress test and discovered that as the number of connections increases, latency issues arise.

Fasthttp version: v1.55.0

Pressure test environment

      Model Name: MacBook Pro
      Model Identifier: MacBookPro18,3
      Model Number: MKGP3CH/A
      Chip: Apple M1 Pro
      Total Number of Cores: 8 (6 performance and 2 efficiency)
      Memory: 16 GB

Simulating a Third-Party Service with Code:

package main

import (
	"log"
	"time"
	"github.com/valyala/fasthttp"
)

var (
	strContentType = []byte("Content-Type")
	strApplication = []byte("application/json")
	body           = []byte("{\"message\": \"Hello, world!\"}")
)

func main() {
	go func() {
		if err := fasthttp.ListenAndServe("localhost:7001", nil); err != nil {
			log.Fatalf("Error in ListenAndServe: %v", err)
		}
	}()

	if err := fasthttp.ListenAndServe("localhost:8001", handler); err != nil {
		log.Fatalf("Error in ListenAndServe: %v", err)
	}
}

func handler(ctx *fasthttp.RequestCtx) {
	begin := time.Now()

	// handle request
	{
		ctx.Response.Header.SetCanonical(strContentType, strApplication)
		ctx.Response.SetStatusCode(fasthttp.StatusOK)
		ctx.Response.SetBody(body)
	}

	log.Printf("%v | %s %s %v %v",
		ctx.RemoteAddr(),
		ctx.Method(),
		ctx.RequestURI(),
		ctx.Response.Header.StatusCode(),
		time.Since(begin),
	)
}

Code Snippet for Simulating Third-Party Service Calls

package main

import (
	"log"
	"net/http"
	_ "net/http/pprof"
	"time"

	"github.com/valyala/fasthttp"
)

var (
	client *fasthttp.HostClient
)

const (
	readTimeout  = 3 * time.Second
	writeTimeout = 3 * time.Second

	maxConnsPerHost     = 2048
	maxIdleConnDuration = 3 * time.Minute
)

func main() {
	client = &fasthttp.HostClient{
		Addr:                          "localhost:8001",
		MaxConns:                      maxConnsPerHost,
		ReadTimeout:                   readTimeout,
		WriteTimeout:                  writeTimeout,
		MaxIdleConnDuration:           maxIdleConnDuration,
		NoDefaultUserAgentHeader:      true,
		DisableHeaderNamesNormalizing: true,
		DisablePathNormalizing:        true,
		MaxIdemponentCallAttempts:     1,
	}

	go func() {
		if err := http.ListenAndServe("localhost:7002", nil); err != nil {
			log.Fatalf("Error in ListenAndServe: %v", err)
		}
	}()

	if err := fasthttp.ListenAndServe("localhost:8002", handler); err != nil {
		log.Fatalf("Error in ListenAndServe: %v", err)
	}
}

func api(ctx *fasthttp.RequestCtx) error {
	begin := time.Now()
	defer func() {
		log.Printf("%v | %s %s %v %d",
			ctx.RemoteAddr(),
			ctx.Method(),
			ctx.RequestURI(),
			time.Since(begin),
			client.ConnsCount(),
		)
	}()

	req := fasthttp.AcquireRequest()
	defer fasthttp.ReleaseRequest(req)

	req.SetRequestURI("http://localhost:8001")
	req.Header.SetMethod(fasthttp.MethodGet)

	resp := fasthttp.AcquireResponse()
	defer fasthttp.ReleaseResponse(resp)

	return client.Do(req, resp)
}

func handler(ctx *fasthttp.RequestCtx) {
	if err := api(ctx); err != nil {
		ctx.SetStatusCode(fasthttp.StatusInternalServerError)
	} else {
		ctx.SetStatusCode(fasthttp.StatusOK)
	}
}

Results Obtained Using the Load Testing Tool:

1 connection:

➜  ~ wrk -t1 -c1 -d10s http://localhost:8002 --latency
Running 10s test @ http://localhost:8002
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   160.03us  802.36us  14.51ms   97.87%
    Req/Sec    16.41k     2.30k   18.29k    90.10%
  Latency Distribution
     50%   52.00us
     75%   65.00us
     90%   90.00us
     99%    4.04ms
  164890 requests in 10.10s, 14.62MB read
Requests/sec:  16326.54
Transfer/sec:      1.45MB

10 connections

➜  ~ wrk -t1 -c10 -d10s http://localhost:8002 --latency
Running 10s test @ http://localhost:8002
  1 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   622.15us    2.21ms  43.48ms   97.30%
    Req/Sec    30.30k     4.38k   39.26k    74.00%
  Latency Distribution
     50%  279.00us
     75%  427.00us
     90%  611.00us
     99%   10.97ms
  301272 requests in 10.00s, 26.72MB read
Requests/sec:  30121.96
Transfer/sec:      2.67MB

50 connections

➜  ~ wrk -t1 -c50 -d10s http://localhost:8002 --latency 
Running 10s test @ http://localhost:8002
  1 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.69ms    1.95ms  36.35ms   96.91%
    Req/Sec    32.71k     4.71k   42.19k    73.00%
  Latency Distribution
     50%    1.46ms
     75%    1.83ms
     90%    2.28ms
     99%   11.05ms
  325559 requests in 10.01s, 28.87MB read
Requests/sec:  32526.90
Transfer/sec:      2.88MB

100 connections

➜  ~ wrk -t1 -c100 -d10s http://localhost:8002 --latency
Running 10s test @ http://localhost:8002
  1 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.70ms    4.27ms  78.96ms   96.63%
    Req/Sec    30.67k     5.77k   43.51k    76.00%
  Latency Distribution
     50%    3.08ms
     75%    3.88ms
     90%    4.82ms
     99%   26.20ms
  305183 requests in 10.01s, 27.07MB read
Requests/sec:  30499.69
Transfer/sec:      2.71MB

500 connections

➜  ~ wrk -t1 -c500 -d10s http://localhost:8002 --latency
Running 10s test @ http://localhost:8002
  1 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    14.75ms    5.69ms  78.57ms   85.19%
    Req/Sec    34.38k     5.13k   46.05k    72.00%
  Latency Distribution
     50%   14.21ms
     75%   17.06ms
     90%   19.83ms
     99%   39.96ms
  342024 requests in 10.02s, 30.33MB read
  Socket errors: connect 0, read 637, write 0, timeout 0
Requests/sec:  34131.79
Transfer/sec:      3.03MB

1000 connections

➜  ~ wrk -t1 -c1000 -d10s http://localhost:8002 --latency
Running 10s test @ http://localhost:8002
  1 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    30.61ms   10.12ms 110.69ms   77.67%
    Req/Sec    32.04k     7.53k   47.23k    76.00%
  Latency Distribution
     50%   29.75ms
     75%   35.21ms
     90%   41.99ms
     99%   68.50ms
  318908 requests in 10.03s, 28.28MB read
  Socket errors: connect 0, read 3541, write 0, timeout 0
Requests/sec:  31807.34
Transfer/sec:      2.82MB

1500 connections

➜  ~ wrk -t1 -c1500 -d10s http://localhost:8002 --latency
Running 10s test @ http://localhost:8002
  1 threads and 1500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    44.64ms   16.50ms 212.65ms   87.08%
    Req/Sec    33.34k     7.91k   48.98k    78.00%
  Latency Distribution
     50%   42.72ms
     75%   49.30ms
     90%   58.31ms
     99%  110.18ms
  332420 requests in 10.09s, 29.48MB read
  Socket errors: connect 0, read 3383, write 469, timeout 0
Requests/sec:  32950.19
Transfer/sec:      2.92MB

2000 connections

➜  ~ wrk -t1 -c2000 -d10s http://localhost:8002 --latency
Running 10s test @ http://localhost:8002
  1 threads and 2000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    59.99ms   29.49ms 411.31ms   92.16%
    Req/Sec    29.86k    13.71k   46.66k    76.04%
  Latency Distribution
     50%   55.47ms
     75%   64.43ms
     90%   74.44ms
     99%  201.06ms
  285246 requests in 10.09s, 25.30MB read
  Socket errors: connect 0, read 16081, write 642, timeout 0
Requests/sec:  28261.07
Transfer/sec:      2.51MB

As the number of connections increases, it leads to higher latency. However, the third-party service still responds quickly; in this example, the response time is measured in microseconds (µs).

1718719592290

I used flame graphs to help with the analysis. It appears that most of the time is spent on system calls, what can I do to reduce response latency in this situation

@erikdubbelboer
Copy link
Collaborator

At 2000 connections I still see a 99% latency of 201.06ms. Is that not good? I makes sense that as the number of connections grows the latency increases as both wrk and fasthttp start to take up more CPU. Did you expect anything else here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants