-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nginx benchmark results in the CPS test #519
Comments
Well, I noticed the numbers difference for throughput: 0.34 for 1 CPU and up to 0.48 on 12 CPUs, so the difference is about 40%. Assuming that the RPS ratio is the same, the curve is still too flat... Also https://github.com/F-Stack/f-stack#nginx-testing-result says that you used Linux 3.10.104, which was releases in October 2016 and is just a patch level of the original 3.10 from 2013. Having, that there were a lot of scalability improvements in the Linux TCP/IP stack during these 7 year, I'm wondering if you have performance comparison with the newer Linux TCP/IP stacks? |
it is most likely the bottleneck of interrupt since the driver in Linux kernel runs in interrupt and poll mode together (NAPI), I have a video to show that: https://youtu.be/d0vPUwJT1mw, at 1:34, the ksoftirqd is 100% CPU usage when under load, yet Nginx CPU usage is still ok
the DPDK/mTCP project ported a multithread version of apache bench that could do high-performance HTTP server load test. I also made a PR HTTP SSL load test mtcp-stack/mtcp#285, the apache bench statistics seems broken, but it does the job of high load/speed of web server load test |
Hi @vincentmli , thank you very much for sharing the video - I really enjoyed to watch it (it was also quite interesting to learn more about BIG-IP traffic handling). Now I see that the benchmark which I cared about
The right benchmark result is Next question is the Nginx configuration files for F-stack and the Linux TCP/IP stack cases. I had a look onto https://github.com/F-Stack/f-stack/blob/dev/app/nginx-1.16.1/conf/nginx.conf and there is an issue with the filesystem. You have switched Which configuration files have been used for the benchmark? What was the Linux |
F-Stack improvements are on the NIC driver level, userspace DPDK poll mode driver vs Linux kernel Interrupt/Poll (NAPI), you get the DPDK benefit from F-Stack. The problem with DPDK is lack of mature TCP/IP stack, F-Stack glues the FreeBSD TCP/IP stack with DPDK together to solve the TCP/IP stack problem (F-Stack also has done some custom work in FreeBSD TCP/IP stack to fit in DPDK model as I understand it). The sendfile and access_log are Nginx configuration that should be irrelevant to F-Stack, F-Stack is for Network I/O improvement, not for filesystem I/O like sendfile/access_log, though it would be interesting to test with and without sendfile/access_log, slow filesystem I/O could potentially affect network I/O if network is waiting for data from filesystem to transmit
I can't speak for F-Stack guys since I am just an observer, Linux TCP/IP stack is very complex stack and kind of bloated (in my opinion :)). I believe the F-Stack benchmark test is based on physical hardware, not VM virtio-net in KVM/Qemu, virtio-net does not support RSS, so you can only run F-Stack in single core and single queue. You could run SR-IOV VF that support RSS offload in hardware NIC with RSS support to scale in multi core VM with multi queue. |
Hi @vincentmli ,
Well, I did some observations. E.g. there is a problem with sockets hash table in Linux. I cheched the F-stack and it seems the same problem exists. The hash is static struct inpcb *
in_pcblookup_hash(...)
{
struct inpcb *inp;
INP_HASH_RLOCK(pcbinfo);
inp = in_pcblookup_hash_locked(...);
...
Agree. I mentioned the filesystem I/O because, even in pure non-caching proxy mode, Nginx is practically unusable, so the benchmarks are somewhat theoretical.
Unfortunately, a the moment I have no SR-IOV capable NICs to test, but I'm wondering if SR-IOV can be used in a VM the same way as a physical NIC on a hardware server? I.e. it seems using SR-IOV we can coalesce interrupts on the NIC inside a VM and tune |
DPDK also support interrupt, there is an example |
That's not a real hardware interruption - the example uses |
https://github.com/DPDK/dpdk/blob/master/examples/l3fwd-power/main.c#L860 |
Yeah, I meant that you still need to go to the kernel and back (i.e. make 2 context switches) if you use |
as far as I understand it from reading the code, DPDK is handling the interrupt from userspace, epoll is just for event polling, not interrupt handling https://github.com/DPDK/dpdk/blob/master/lib/librte_eal/linux/eal_interrupts.c#L1167 |
Casually I came to this thread again with the question about interrupts handling with DPDK and I found the answer in a StackOverflow discussion https://stackoverflow.com/questions/53892565/dpdk-interrupts-rather-than-polling , So there is no real interrupts handling in DPDK. |
Could you show me the price for newest kernel and latest gcc and llvm compiler compatibility with tempesta patches for one webserver what I have? I couldn't find any prices. I tested tempesta 2 years ago and it was cool with module compiling and so on. Please share us the price |
Hi @osevan , thank you for the request! Could you please drop me a message to ak@tempesta-tech.com or, better, schedule a call https://calendly.com/tempesta-tech/30min , so we can discuss your scenario and talk about Tempesta FW abilities. |
Hi,
we're testing our in-kernel HTTPS proxy against Nginx and compare our results with kernel-bypass proxies, so I came to your project.
I noticed in your performance data https://github.com/F-Stack/f-stack/blob/dev/CPS.png that Nginx on top of the Linux TCP/IP stack doesn't scale at all with increasing CPU number - why? Even having some hard lock contention, I would not expect to see absolutely flat performance curve for the Linux kernel and Nginx. For me it, seems there is some misconfiguration for Nginx... Could you please share the Nginx configuration file for the test? I appreciate much If you could show
perf top
for Nginx/Linux.Also we found quite problematic to generate enough load to test high-performance HTTP server. For our case we needed more than 40 cores and 2 10G NICs for
wrk
to cause enough load to reach 100% of resources on our server on 4 cores. What did you use to get the maximum results for 20 cores?Thanks in advance!
The text was updated successfully, but these errors were encountered: