-
Notifications
You must be signed in to change notification settings - Fork 103
HTTP transactions performance
In this test a load generator establishes a new TCP connection with an HTTP server for each HTTP request. I.e. all the HTTP requests contain header
Connection: close
The main purpose of the test is to cause significant load onto the Linux TCP/IP stack and check how the stack and Tempesta FW scale. In this test case we run Tempesta FW and Nginx inside a virtual machine.
While the Linux TCP/IP stack does scale, it's a real issue to get an appropriate hardware setup, which can deliver small enough overhead for the small packets workload.
The single step of the benchmark involves only 2 HTTP messages (a request and a response correspondingly) processing, while, generally speaking, there are 3 TCP connection handshake segments, 4 connection closing segments, and 2 data segments ACKnoledges (the TCP/IP stack coalesces some of the segments though). The main property of the benchmark is processing many small packets.
While the most basic VM setup can easily deliver 10Gbps throughput, many small packets is a well-known problem for the modern virtualization solutions. See Hardware virtualization performance wiki page for recommendations how to efficiently set up a virtual machine for such workloads.
There is a misbelief that the Linux kernel TCP/IP stack does not scale. The 'not scale' could be even 4 CPUs. For example watch this video from F5 or see the F-stack benchmarks which shows that Nginx can scale only for 40% from 1 CPU to 12!
There are discussion issues:
The most our concern about the benchmarks was the absence of data about the tests environment and generally inability to reproduce the benchmarks. In this page we do our best to provide as much data as possible to get the reproducible results. We appreciate if you file an issue in case of the inability to reproduce the results.
We used two servers. Server 1:
- Intel Xeon CPU E3-1240v5 (4 cores, 8 hyperthreads)
- 32GB RAM
- Mellanox ConnectX-2 Ethernet 10Gbps network adapter
- Debian 9.12 (Linux 4.19.0-0.bpo.8-amd6)
Server 2:
- Intel Xeon CPU E3-1240v5 (4 cores, 8 hyperthreads)
- 32GB RAM
- Mellanox ConnectX-2 Ethernet 10Gbps network adapter
- Ubuntu 16.04.6 LTS (Linux 4.4.0-97-generic)
In all the tests We used the same sysctl
settings for the SUT (system under test).
We used hardware and VM-base load generates, which also used similar sysctl
settings.
sysctl -w net.ipv4.tcp_max_tw_buckets=32
sysctl -w net.ipv4.tcp_max_orphans=32
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.tcp_fin_timeout=1
Since the hardware load generator has Linux 4.4, we also used
sysctl -w net.ipv4.tcp_tw_recycle=1
on it. These settings are required to faster release sockets. Otherwise too
many TIME-WAIT
sockets are produced and the system (either server or client)
spends a lot of time for looping in __inet_check_established()
(see more in bug report
Poor __inet_check_established() implementation).
Plus to these settings, Tempesta FW's start script implies following sysctl
's:
# Tempesta builds socket buffers by itself, don't cork TCP segments.
sysctl -w net.ipv4.tcp_autocorking=0 >/dev/null
# Sotfirqs are doing more work, so increase input queues.
sysctl -w net.core.netdev_max_backlog=10000 >/dev/null
sysctl -w net.core.somaxconn=131072 >/dev/null
sysctl -w net.ipv4.tcp_max_syn_backlog=131072 >/dev/null
, so Nginx also benefits from the settings.
In all the tests we used the same VM running on the 1st server with Debian 9.12 and Tempesta kernel 4.14.32-tfw (we also tried Nginx with the native Debian kernel and it didn't show any performance differences).
Either a 4 vCPU VM with Debian 9.12 (Linux kernel 4.19.0) running on the same
server 1 (VM-to-VM
tests). Or a separate hardware server 2 with Ubuntu 16.04.6 LTS
(Linux 4.4.0-97-generic) for tests Hardware-to-VM
tests.
The Apache HTTP server benchmarking tool
, ab
, is still a single-threaded tool, which doesn't suite to benchmark
high performance multi-process/thread HTTP servers.
Moreover, ab -n 100000 -c 10000
can't efficiently handle 10K and more connections,
so Nginx and Tempesta FW are underutilized.
For the non-keepalive test we used wrk with
-H 'Connection: close'
command line attribute option.
In all the tests we used the same Nginx 1.18.0 configuration. This is almost the same Nginx configuration with several well-known performance tuning options, as was verified by the Nginx development team:
user www-data;
worker_processes auto;
worker_cpu_affinity auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 65536;
use epoll;
multi_accept on;
accept_mutex off;
}
worker_rlimit_nofile 1000000;
http {
keepalive_timeout 600;
keepalive_requests 10000000;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
open_file_cache max=1000 inactive=3600s;
open_file_cache_valid 3600s;
open_file_cache_min_uses 2;
open_file_cache_errors off;
error_log /dev/null emerg;
access_log off;
server {
listen 9090 backlog=131072 deferred reuseport fastopen=4096;
location / {
root /var/www/html;
}
}
}
The data files:
# ls -l /var/www/html/
total 4
-rw-r--r-- 1 root root 600 Jun 10 15:54 index.html
We use the same file size of 600 bytes as F-stack uses in their the most impressive test against Nginx and the Linux TCP/IP stack.
The current master version (commit f6946bdefc016944216e297d94e216baab84bf98, with
the HTTP/2 performance regression,
which has about 20% worse performance than a normal Tempesta FW build
).
Configuration file:
listen 80;
srv_group default {
server 127.0.0.1:9090;
}
vhost default {
proxy_pass default;
}
cache 1;
cache_fulfill * *;
http_chain {
-> default;
}
Tempesta FW fetches the data file from the Nginx and stores it in the cache.
Two virtual KVM machines were deployed on the 1st server.
+----------------------------------------------------+
| [Server 1] |
| v |
| +--------------+ i |
| | [SUT VM] | r +----------------------+ |
| | | t | [Load generation VM] | |
| | Nginx -----*-- i | | |
| | ^ | o <=== * wrk | |
| | | | - | | |
| | Tempesta FW -*-- n +----------------------+ |
| +--------------+ e |
| t |
+----------------------------------------------------+
The number of virtual
CPUs for them were changed between the tests using libvirt
interface, e.g.
for 4 CPUs:
<cputune>
<vcpupin vcpu='0' cpuset='0'/>
<vcpupin vcpu='1' cpuset='1'/>
<vcpupin vcpu='2' cpuset='2'/>
<vcpupin vcpu='3' cpuset='3'/>
</cputune>
Both the VMs use virtio-net
NICs:
# ethtool -i ens2|grep driver
driver: virtio_net
The libvirt configuration:
<interface type='network'>
<mac address='52:54:00:ea:4b:97'/>
<source network='routed'/>
<model type='virtio'/>
<driver name='vhost' queues='4'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</interface>
To be more precise the machines are running with following options
(Debian9
is the load generator and TempestaPerfTest
is the system under the
test):
# ps -waef|grep 'Debian9\|TempestaPerfTest'
libvirt+ 3545 1 13 17:07 ? 00:42:02 qemu-system-x86_64 -enable-kvm -name guest=Debian9,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-12-Debian9/master-key.aes -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Client,ss=on,hypervisor=on,tsc_adjust=on,clflushopt=on,umip=on,xsaves=on,pdpe1gb=on -m 2048 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid bca6ec61-316d-464e-8fc4-b67054bb26ba -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=29,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x3.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x3 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x3.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x3.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/lib/libvirt/images/tempesta-perf-test-clone.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -fsdev local,security_model=passthrough,id=fsdev-fs0,path=/opt/tempesta/tempesta-vm -device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=tempesta,bus=pci.0,addr=0x7 -netdev tap,fds=31:32:33:34,id=hostnet0,vhost=on,vhostfds=35:36:37:38 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet0,id=net0,mac=52:54:00:ea:4b:97,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -add-fd set=9,fd=40 -chardev file,id=charserial1,path=/dev/fdset/9,append=on -device isa-serial,chardev=charserial1,id=serial1 -chardev socket,id=charchannel0,fd=39,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
libvirt+ 4131 1 18 20:13 ? 00:24:17 qemu-system-x86_64 -enable-kvm -name guest=TempestaPerfTest,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-14-TempestaPerfTest/master-key.aes -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Client,ss=on,hypervisor=on,tsc_adjust=on,clflushopt=on,umip=on,xsaves=on,pdpe1gb=on -m 2048 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid ead2372f-af65-47d2-9d1a-e83e286c159f -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=28,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x3.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x3 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x3.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x3.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/lib/libvirt/images/tempesta-perf-test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -fsdev local,security_model=passthrough,id=fsdev-fs0,path=/opt/tempesta/tempesta-vm -device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=tempesta,bus=pci.0,addr=0x7 -netdev tap,fds=31:32:33:34,id=hostnet0,vhost=on,vhostfds=35:36:37:38 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet0,id=net0,mac=52:54:00:07:88:51,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -add-fd set=9,fd=40 -chardev file,id=charserial1,path=/dev/fdset/9,append=on -device isa-serial,chardev=charserial1,id=serial1 -chardev socket,id=charchannel0,fd=39,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
Nginx:
# wrk --latency -H 'Connection: close' -c 8192 -d 30 -t 8 http://192.168.200.80:9090/
Running 30s test @ http://192.168.200.80:9090/
8 threads and 8192 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 49.46ms 113.73ms 1.88s 89.24%
Req/Sec 1.70k 726.97 6.68k 71.25%
Latency Distribution
50% 17.98ms
75% 20.20ms
90% 234.95ms
99% 465.98ms
405901 requests in 30.09s, 322.45MB read
Socket errors: connect 0, read 0, write 0, timeout 579
Requests/sec: 13489.36
Transfer/sec: 10.72MB
Tempesta FW:
# wrk --latency -H 'Connection: close' -c 8192 -d 30 -t 8 http://192.168.200.80/
Running 30s test @ http://192.168.200.80/
8 threads and 8192 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 37.40ms 88.27ms 1.81s 91.36%
Req/Sec 1.76k 823.02 9.36k 71.88%
Latency Distribution
50% 14.31ms
75% 15.73ms
90% 19.14ms
99% 446.43ms
419715 requests in 30.09s, 356.24MB read
Socket errors: connect 0, read 0, write 0, timeout 1
Requests/sec: 13947.89
Transfer/sec: 11.84MB
The load generation VM has 4 virtual CPUs and the SUT VM has 2 virtual CPUs.
Nginx:
# wrk --latency -H 'Connection: close' -c 8192 -d 30 -t 8 http://192.168.200.80:9090/
Running 30s test @ http://192.168.200.80:9090/
8 threads and 8192 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 63.85ms 162.82ms 1.98s 91.65%
Req/Sec 2.37k 584.55 5.34k 68.52%
Latency Distribution
50% 24.56ms
75% 28.57ms
90% 39.13ms
99% 861.46ms
563342 requests in 30.08s, 447.52MB read
Socket errors: connect 0, read 1, write 0, timeout 2215
Requests/sec: 18727.89
Transfer/sec: 14.88MB
Tempesta FW:
# wrk --latency -H 'Connection: close' -c 8192 -d 30 -t 8 http://192.168.200.80/
Running 30s test @ http://192.168.200.80/
8 threads and 8192 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 8.37ms 17.84ms 465.51ms 99.42%
Req/Sec 6.01k 4.49k 14.92k 46.67%
Latency Distribution
50% 6.80ms
75% 11.09ms
90% 14.04ms
99% 19.59ms
1256004 requests in 30.07s, 1.04GB read
Socket errors: connect 7179, read 60, write 0, timeout 0
Requests/sec: 41775.59
Transfer/sec: 35.54MB
The 4 vCPUs setup looks very similar to the F5 and F-Stack tests.
The results for Nginx are:
# wrk --latency -H 'Connection: close' -c 8192 -d 30 -t 8 http://192.168.200.80:9090/
Running 30s test @ http://192.168.200.80/
8 threads and 8192 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 11.02ms 48.66ms 1.74s 99.12%
Req/Sec 7.69k 2.05k 16.31k 71.06%
Latency Distribution
50% 6.16ms
75% 10.63ms
90% 16.31ms
99% 49.30ms
1832094 requests in 30.10s, 1.52GB read
Socket errors: connect 0, read 13133, write 0, timeout 13
Requests/sec: 60865.96
Transfer/sec: 51.72MB
Tempesta FW:
# wrk --latency -H 'Connection: close' -c 8192 -d 30 -t 8 http://192.168.200.80/
Running 30s test @ http://192.168.200.80:9090/
8 threads and 8192 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 10.89ms 15.06ms 924.32ms 95.50%
Req/Sec 8.48k 2.07k 18.37k 71.04%
Latency Distribution
50% 8.38ms
75% 13.60ms
90% 20.25ms
99% 40.00ms
2015012 requests in 30.10s, 1.56GB read
Socket errors: connect 0, read 11684, write 0, timeout 23
Requests/sec: 66954.88
Transfer/sec: 53.19MB
While that the load generator and the SUT have the same CPU power, so wrk
can't
produce enough load: we can check this with top
on the host - the host server
is fully loaded. The load generation VM has PID 3545 and uses the same CPU as
the SUT VM with Nginx (PID 4131):
# top -b
top - 20:39:57 up 1 day, 22:13, 4 users, load average: 9.03, 4.39, 2.64
Tasks: 188 total, 8 running, 180 sleeping, 0 stopped, 0 zombie
%Cpu(s): 65.6 us, 25.4 sy, 0.0 ni, 9.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 32058.4 total, 24400.9 free, 6056.4 used, 1601.1 buff/cache
MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 25510.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3545 libvirt+ 20 0 3724052 526820 21000 S 275.0 1.6 27:15.28 qemu-system-x86
4131 libvirt+ 20 0 3262420 1.6g 21132 S 275.0 5.2 16:32.54 qemu-system-x86
3551 root 20 0 0 0 0 R 25.0 0.0 1:41.21 vhost-3545
4136 root 20 0 0 0 0 R 25.0 0.0 1:20.74 vhost-4131
4138 root 20 0 0 0 0 R 25.0 0.0 1:19.12 vhost-4131
3549 root 20 0 0 0 0 R 18.8 0.0 1:41.26 vhost-3545
3550 root 20 0 0 0 0 S 18.8 0.0 1:39.53 vhost-3545
3552 root 20 0 0 0 0 R 18.8 0.0 1:43.35 vhost-3545
4135 root 20 0 0 0 0 R 18.8 0.0 1:19.22 vhost-4131
4137 root 20 0 0 0 0 R 18.8 0.0 1:19.67 vhost-4131
Or little bit less for Tempesta FW:
# top -b
top - 20:55:01 up 1 day, 22:28, 4 users, load average: 4.07, 2.21, 2.53
Tasks: 195 total, 8 running, 187 sleeping, 0 stopped, 0 zombie
%Cpu(s): 48.0 us, 37.0 sy, 0.0 ni, 14.2 id, 0.0 wa, 0.0 hi, 0.8 si, 0.0 st
MiB Mem : 32058.4 total, 24358.7 free, 6097.1 used, 1602.6 buff/cache
MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 25469.4 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3545 libvirt+ 20 0 3724052 566296 21000 S 268.8 1.7 38:26.09 qemu-system-x86
4131 libvirt+ 20 0 3460056 1.6g 21132 S 231.2 5.2 23:53.14 qemu-system-x86
3549 root 20 0 0 0 0 R 25.0 0.0 2:19.51 vhost-3545
4135 root 20 0 0 0 0 R 25.0 0.0 1:59.48 vhost-4131
4137 root 20 0 0 0 0 S 25.0 0.0 2:00.91 vhost-4131
4138 root 20 0 0 0 0 R 25.0 0.0 1:59.67 vhost-4131
3550 root 20 0 0 0 0 R 18.8 0.0 2:16.52 vhost-3545
3552 root 20 0 0 0 0 R 18.8 0.0 2:22.02 vhost-3545
4136 root 20 0 0 0 0 R 18.8 0.0 2:02.14 vhost-4131
3551 root 20 0 0 0 0 R 12.5 0.0 2:19.66 vhost-3545
In this test case we run wrk
on a separate server. Nginx and Tempesta FW are
listening on separate sockets inside a VM on the first server.
+-----------------------+ +--------------+
| [Server 1] NIC | | [Server 2] |
| +--------------+ +--* <=== * NIC |
| | [SUT VM] | m | | |
| | | a | | wrk |
| | Nginx -----*-- c | +--------------+
| | ^ | v |
| | | | t |
| | Tempesta FW -*-- a |
| +--------------+ p |
+-----------------------+
Macvtap virtual interface is used to attach the VM to the server NIC directly. We configure number of the interface queues equal to the virtual CPUs in each test:
<interface type='direct'>
<mac address='52:54:00:07:88:51'/>
<source dev='eth2' mode='private'/>
<model type='virtio'/>
<driver name='vhost' queues='4'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</interface>
Ping time from the server 2 to the VM (we used small network load instead of disabling C-states):
# ping -qc 5 172.16.0.200
PING 172.16.0.200 (172.16.0.200) 56(84) bytes of data.
--- 172.16.0.200 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3998ms
rtt min/avg/max/mdev = 0.043/0.060/0.068/0.012 ms
The network throughput:
# iperf3 -c 172.16.0.200 -p 5000
Connecting to host 172.16.0.200, port 5000
[ 4] local 172.16.0.101 port 46758 connected to 172.16.0.200 port 5000
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.06 GBytes 9.14 Gbits/sec 20 684 KBytes
[ 4] 1.00-2.00 sec 1.09 GBytes 9.32 Gbits/sec 0 686 KBytes
[ 4] 2.00-3.00 sec 1.09 GBytes 9.35 Gbits/sec 0 686 KBytes
[ 4] 3.00-4.00 sec 1.09 GBytes 9.33 Gbits/sec 0 687 KBytes
[ 4] 4.00-5.00 sec 1.09 GBytes 9.36 Gbits/sec 0 689 KBytes
[ 4] 5.00-6.00 sec 1.09 GBytes 9.37 Gbits/sec 0 689 KBytes
[ 4] 6.00-7.00 sec 1.09 GBytes 9.36 Gbits/sec 0 724 KBytes
[ 4] 7.00-8.00 sec 1.09 GBytes 9.36 Gbits/sec 0 752 KBytes
[ 4] 8.00-9.00 sec 1.09 GBytes 9.34 Gbits/sec 0 764 KBytes
[ 4] 9.00-10.00 sec 1.09 GBytes 9.34 Gbits/sec 0 773 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 10.9 GBytes 9.33 Gbits/sec 20 sender
[ 4] 0.00-10.00 sec 10.9 GBytes 9.33 Gbits/sec receiver
Now we use the sam SUT VM on the same 1st server, but we use the 2nd server
instead of a VM to generate workload with wrk
:
Nginx:
# wrk --latency -H 'Connection: close' -c 16384 -d 30 -t 16 http://172.16.0.200:9090/
Running 30s test @ http://172.16.0.200:9090/
16 threads and 8192 connections
^C Thread Stats Avg Stdev Max +/- Stdev
Latency 85.96ms 219.14ms 1.73s 91.60%
Req/Sec 1.25k 786.01 5.83k 73.10%
Latency Distribution
50% 7.38ms
75% 10.40ms
90% 218.75ms
99% 1.10s
39835 requests in 2.03s, 31.65MB read
Requests/sec: 19644.42
Transfer/sec: 15.61MB
Tempesta FW:
# wrk --latency -H 'Connection: close' -c 16384 -d 30 -t 16 http://172.16.0.200/
Running 30s test @ http://172.16.0.200/
16 threads and 16384 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 44.68ms 123.44ms 1.70s 85.79%
Req/Sec 1.58k 613.33 11.94k 76.34%
Latency Distribution
50% 5.80ms
75% 6.90ms
90% 208.40ms
99% 421.52ms
755829 requests in 30.06s, 641.53MB read
Socket errors: connect 0, read 0, write 0, timeout 1338
Requests/sec: 25140.11
Transfer/sec: 21.34MB
Nginx:
# wrk --latency -H 'Connection: close' -c 16384 -d 30 -t 16 http://172.16.0.200:9090/
Running 30s test @ http://172.16.0.200:9090/
16 threads and 16384 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 94.23ms 153.62ms 2.00s 87.33%
Req/Sec 2.22k 475.19 10.61k 79.10%
Latency Distribution
50% 23.06ms
75% 121.31ms
90% 279.46ms
99% 601.62ms
1063509 requests in 30.10s, 844.86MB read
Socket errors: connect 0, read 568, write 0, timeout 315
Requests/sec: 35332.82
Transfer/sec: 28.07MB
Tempesta FW:
# wrk --latency -H 'Connection: close' -c 16384 -d 30 -t 16 http://172.16.0.200/
Running 30s test @ http://172.16.0.200/
16 threads and 16384 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 35.47ms 106.19ms 2.00s 88.99%
Req/Sec 3.51k 0.86k 10.87k 74.44%
Latency Distribution
50% 4.41ms
75% 8.07ms
90% 204.60ms
99% 438.66ms
1677128 requests in 30.10s, 1.39GB read
Socket errors: connect 0, read 708, write 0, timeout 4388
Requests/sec: 55719.40
Transfer/sec: 47.29MB
Nginx:
# wrk --latency -H 'Connection: close' -c 16384 -d 30 -t 16 http://172.16.0.200:9090/
Running 30s test @ http://172.16.0.200:9090/
16 threads and 16384 connections
^C Thread Stats Avg Stdev Max +/- Stdev
Latency 33.59ms 101.61ms 1.92s 91.52%
Req/Sec 5.18k 1.71k 11.37k 62.63%
Latency Distribution
50% 6.82ms
75% 14.34ms
90% 23.79ms
99% 490.17ms
1374094 requests in 16.71s, 1.07GB read
Socket errors: connect 0, read 0, write 0, timeout 317
Requests/sec: 82227.96
Transfer/sec: 65.32MB
Tempesta FW:
# wrk --latency -H 'Connection: close' -c 16384 -d 30 -t 16 http://172.16.0.200/
Running 30s test @ http://172.16.0.200/
16 threads and 16384 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 27.60ms 88.15ms 1.82s 90.86%
Req/Sec 7.07k 1.16k 22.23k 73.76%
Latency Distribution
50% 3.59ms
75% 5.45ms
90% 11.89ms
99% 411.26ms
3378447 requests in 30.10s, 2.81GB read
Socket errors: connect 0, read 47, write 0, timeout 72
Requests/sec: 112244.48
Transfer/sec: 95.48MB
And here we reach the KVM interruptions bottleneck. Since our hardware doesn't
support vAPIC
we had to stop our tests. perf kvm stat report
(see
Interruptions & network performance wiki
for the problem description) is
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
EXTERNAL_INTERRUPT 5073570 75.37% 2.02% 0.22us 3066.52us 0.96us ( +- 0.49% )
EPT_MISCONFIG 1029496 15.29% 1.78% 0.34us 1795.92us 4.19us ( +- 0.35% )
MSR_WRITE 279208 4.15% 0.16% 0.28us 5695.07us 1.36us ( +- 2.52% )
HLT 194422 2.89% 95.74% 0.30us 1504068.39us 1192.90us ( +- 4.03% )
PENDING_INTERRUPT 89818 1.33% 0.03% 0.32us 189.53us 0.70us ( +- 0.83% )
PAUSE_INSTRUCTION 40905 0.61% 0.26% 0.26us 1390.91us 15.39us ( +- 1.82% )
PREEMPTION_TIMER 17384 0.26% 0.01% 0.44us 183.21us 1.49us ( +- 1.47% )
IO_INSTRUCTION 5482 0.08% 0.01% 1.75us 186.08us 3.26us ( +- 1.19% )
CPUID 972 0.01% 0.00% 0.30us 5.29us 0.66us ( +- 1.94% )
MSR_READ 104 0.00% 0.00% 0.49us 2.54us 0.94us ( +- 3.29% )
EXCEPTION_NMI 6 0.00% 0.00% 0.37us 0.78us 0.59us ( +- 9.68% )