-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
envoy.gzip compression performance is poor #8448
Comments
@Pluies Do you know what is the delay when compressing with Nginx and Js? In other to reduce latency we decided to compress and flush on each data frame. The downside is that it degrades the compression ratio. The second option is to let gzip to compress until the buffer gets full. I think we could make it configurable. |
That's a very good question; I guess we could try and measure it with the compose-based stuff. I'll give it a go and report back! As far as I can tell, compressing in small chunks this way is something only envoy does – httpd, nginx, and Akamai (the one's I've tested so far :) ) all seem to wait until the file is fully available before compressing, if I understand correctly – resulting in much better results. Having it optional would be a great way of letting users choose latency or compression performance, as the current results are a bit baffling. By the way, by data frame, do you refer exclusively to HTTP/2 streams? (Sorry if it's a silly question, probably need to brush up on my HTTP knowledge). |
OK, I've been running tests on a the polymer js file duplicated 10 times, bringing it to 30MB for an extreme test case, with the following curl command: This hopefully should test TTFB, which (once again, if I understand correctly) should be much lower in Envoy than in other servers. I'm getting a fair amount of variance run-after-run, so please don't read too much in this test, but here are some numbers:
As expected, we're seeing a fair uptick in the total time between "base" and "optimised" version of the same server (as we're trading overall speed for data size), but I'm seeing basically no difference in TTFB between servers... So, other HTTP servers are also compressing-while-streaming, but somehow in an efficient fashion? Or is curl not telling us the full story? (e.g. headers being sent first while the body is being compressed, or other shenanigans) |
It looks like that Envoy and Nginx optimized/base are very close:
It could be also the underlying compression library that we are using. I'm looking at some other more modern libraries that may improve the performance. |
@gsagula thanks for looking into it! :) As you point out, the difference between optimised {nginx,apache,gzip} and optimised Envoy is not huge... But still ~10-15% extra, which wipes out gains from a bunch of other application-level improvements we've been working on at $CURRENT_COMPANY, for example. Regarding the compression library, as far as I can understand, both envoy and nginx use the venerable zlib? Here's the ngx_http_gzip_filter_module.c source, which refers to the exact same parameters ( |
The window_bits parameter is exposed by Envoy (see [docs](https://www.envoyproxy.io/docs/envoy/latest/api-v2/config/filter/http/gzip/v2/gzip.proto)), and according to our testing is one of the most important settings for the resulting compressed size. On averaging a few JS & CSS files, we get pretty consistently (to a few percent): - gzip, no extra configuration: 100 (baseline) - gzip, compression_level=BEST: ~99 - gzip, compression_level=BEST, window_size=15: ~80 So setting this gives us 20% extra compression! The tradeoff in memory usage is documented by Envoy, whose documentation I've lifted in this PR. Background: envoyproxy/envoy#8448
On a related note, we are also hit by this (getting worse latency via Envoy than Nginx when compressing). I wrote #10464, as a starting point to start measuring things. |
Perf annotations and flamegraphs could also help debugging |
Ok, deleted my last comment because it was inaccurate -- the problem was with the mocks. Getting rid of those and I get back ~6ms. I'll keep removing the mocks/boilerplate to get more accurate measurements. |
We are suspecting that the gzip/compressor filters are under performing due to excessive flushes. This should help with the investigation and to support needed changes, if any. Helps with envoyproxy#8448. Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Adding stats to track flushes: #10500. |
Ok, so started running with the stats added in #10500 and the ratio of flush vs finish vs upstream_rq seems potentially concerning: I am wondering if the change introduced in #3025 -- which was a fix for #2909 -- didn't mean to use
Sounds like maybe we went a step to far in fixing the missing of |
@rgs1 thanks for this. I'd love to check on the potential regression (manual check if required) when you have something to test. |
Also related to perf (but still doesn't address the excessive flush issue): #10508. |
We suspect that the gzip filter is under performing due to excessive flushes. The changes introduced in envoyproxy#3025 -- which were an attempt to fix envoyproxy#2909 -- are causing too many flushes (one per encodeData() call) to ensure the compression ise properly finished. I think at that point it was possible for encodeData() to be call with end_stream=false while encodeTrailers() wasn't handle so Z_FINISH could be missed. This isn't the case anymore, so there's no need to flush every time we encode a chunk of th response. We could make this configurable, but given it's a considerable perf regression _and_ that we'd still be doing the correct thing by just calling Z_FINISH when end_stream=true or from encodeTrailers(), I don't think it's worth it. This also adds stats for visibility. Helps with envoyproxy#8448. Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Ok, first attempt at reducing the flush calls: #10518. |
So #10518 doesn't appear to move the needle from my initial tests.... |
FYI #10530 for better instrumentation. |
Helps with #envoyproxy#8448. Related to envoyproxy#10530. Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Since Envoy::Compressor::ZlibCompressorImpl::CompressionStrategy is simply static_cast'ed to uint64_t the Standard strategy (4) becomes Z_FIXED (4 as well). This basically disables the use of dynamic Huffman codes when the gzip filter is configured to use default values. Make the Standard strategy equal to 0 to translate to Z_DEFAULT_STRATEGY. Contributes to envoyproxy#8448 Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
It looks that EDIT: Nevermind, it seems that the "worse compression ratio" part was fixed already. |
What PR fixed the "worse compression ratio" issue? |
this one: #10676 |
Finally I've got some time to look into it. Below is the output of Intel profiler It seems Envoy's code is hardly the problem. All hotspots are in zlib itself. I tried to replace zlib with zlib-ng with all optimizations switched on and for the majority of cases performance improved about twice as much. Zlib:
Zlib-ng
Do we want to migrate to zlib-ng? |
@rojkov What do you think of adding a build flag for choosing zlib-ng (similar to
|
That could be a good starting point. I'm on holidays at the moment, will submit a patch later. |
Awesome. Enjoy your holiday! https://github.com/dio/envoy/pull/new/zlib-ng it builds but not sure how to switch the |
@dio nice -- go ahead and create the PR so I can test it locally 👍 |
Nice! I think we can likely switch to zlib-ng, but maybe initially it should be a compile option as we discussed. Per @dio we may need other patches also to other code that uses zlib. |
@rgs1 where are we at with this issue? Is Envoy now comparable to other proxies or do we have more work to do here? |
Alas, the switch to zlib-ng did not move the needle for us in one of our main workloads (that is, in the only workload where it truly matters -- didn't check if it made things any better in the other ones). |
OK, that's sad. I would love to figure this out. I think @rojkov might possibly have some more cycles to help figure this out. @rojkov is this a perf related investigation you could help drive to closure? |
Yes, I'll take a look. Perhaps there's a way to avoid double buffering. |
I configured Envoy and Nginx to use similar settings for zlib: compression level 1, default strategy, one thread, memory level 8 and the same sliding window size ( The test results for downloading a 157K JSON file with
Envoy
Nginx
Nginx is about 30% faster overall with TTFB almost 3 times shorter. I tried to do three things:
The first thing improved TTFB a bit (by about 1-2%), but TTLB didn't change at all. The impact of the other two approaches was not noticeable at all. Flamegraphs show that most of the time is spent in zlib: Envoy and Nginx. Though in case of Envoy callstacks are much deeper. Callgrind doesn't reveal excessive cache thrashing neither, though I might have interpreted the numbers wrong way (sorted by Last level cache misses for Data writes, limited to 0.25%):
Looks like Nginx moves data more effectively though. Nginx uses adaptive settings for zlib if Now I run out of ideas. Any help is appreciated :) |
So, I did one more try to solve this mystery. Turned out I was measuring wrong numbers. Flamegraphs and profilers show where CPU cycles are spent, not where wall time is spent. I used
Now envoy is ~25% faster than nginx. @rgs1 What were the zlib settings in your setup? Could you check if you set them equal to nginx's? In my setup they are
Nginx doesn't document how to specify memlevel and wbits (because they are calculated dynamically for known |
Wow. So awesome! |
Great work! The only change you made was to set per_connection_buffer_limit_bytes to 8192? What was it prevously? |
Yes, only this change. Initially there was nothing in my config and the default is 1M. |
@rojkov hmm, interesting. Could it a problem that we have different limits for listeners and clusters:
? This is all using zlib-ng right? |
Our settings:
So mostly the same, I don't expect that memory_level:9 vs memory_level:8 would make such a big change... |
No, I compiled Envoy with the same old zlib used by Nginx. I don't think different limits matter so much. Neither memlevel 9. Then I don't understand why your results are so different :( |
@rojkov let me dig a bit more on our side, regardless of our case this is a great finding -- thanks! Shall we document this somewhere? |
Yeah, adding some notes to By the way, at some point I experimented with CPU frequency a bit: while benchmarking compression with k6 I artificially loaded all the cores to 100% (with a full envoy rebuild). This made Envoy serve twice as fast and Nginx showed a similar dynamics. Which can be counterintuitive yet explainable. Just a side note on how important it is to do experiments in the same environmental conditions. |
Thanks @rojkov this is amazing work and a really great finding around how buffer sizes effect the overall throughput.
I'm a little confused and have a few questions that we can follow up on:
cc @antoniovicente for event and buffer expertise. cc @ggreenway as this very likely relates to the perf work you have been doing.
As an aside, this is really neat. We should document this approach and/or would it be worthwhile to add this hooks into the code base under some macro guard so that we can more easily use this annotations for perf investigations? |
I understood that there is one single write with a single slice in both cases. This is how it looks in logs:
A-ha. I should have asked this question myself... So, I did closed loop testing, that is a new connection to the listener is not created until the current one isn't closed. TTLB should be shorter for a real client. I guess I need to use something other than k6 for benchmarking. I've just tried to benchmark with 10 virtual simultaneous users connecting to a single-threaded Envoy and got With 20 virtual users I got 306 reqs/s vs 364 reqs/s respectively. With 80 VUs I got 420 reqs/s vs 311 reqs/s. Whereas Nginx consistently gives ~420 reqs/s starting from 10 VUs. Then increasing VUs doesn't increase throughput. |
I actually do wonder if some of the fixes @ggreenway is working on may impact what we are seeing. Something still doesn't add up to me here. |
The discusstion stops in 2020, is this still a problem for now? |
Description:
Hello! 👋
We recently realised our outbound traffic had increased a fair amount sinced we switched from serving some assets directly from Akamai to serving them from Ambassador, and traced it down to our gzipped assets being much larger.
I then ran some tests, and it appears that gzipping content in Envoy results in fairly poor compression results, even when tweaking the settings to produce the smallest possible output.
For example, when gzipping html & js from Youtube (picked as a random "big website" example), Envoy-compressed html is 12% larger than nginx and js is 13% larger.
This isn't nginx-specific either: I've tested gzip locally, Apache httpd, and nginx, and all of their results are within a few percents of each other while Envoy is consistently over 10% larger.
Relevant Links:
Here is my testing methodology: https://github.com/Pluies/gzip-comparison
And the relevant Envoy configuration files are:
Is this a known issue? Is there a problem with my configuration? I'm by no means an Envoy expert, so please let me know if there's any obvious issues with the config above or anything else I've missed that might explain this discrepancy.
Thanks a lot!
The text was updated successfully, but these errors were encountered: