-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter: implemented gzip http filter #2087
Conversation
Awesome! @dnoe can you own the review on this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a few comments to pick through.
private: | ||
bool isContentTypeAllowed(const HeaderMap& headers); | ||
|
||
bool skipCompression_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use skip_compression_
to match the other member name style.
} | ||
|
||
private: | ||
bool isContentTypeAllowed(const HeaderMap& headers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be a const
method.
compression_level_(json_config.getString("compression_level", "default")), | ||
compression_strategy_(json_config.getString("compression_strategy", "default")), | ||
content_types_(json_config.getStringArray("content_types", true)), | ||
memory_level_(json_config.getInteger("memory_level", 8)), window_bits_{31} {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for making memory_level configurable but not window_bits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
window_bits changes the encoding type. I'm not sure when user will want to change it in the case of GZIP.
|
||
private: | ||
const std::string compression_level_{}; | ||
const std::string compression_strategy_{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make more sense to do the string->enum resolution at construction time and then store the enum here instead. You'd still retain the above functions but make them static functions that take the string and return enum, and call them in the ctor initializer list.
FilterHeadersStatus GzipFilter::encodeHeaders(HeaderMap& headers, bool) { | ||
// TODO(gsagula): In order to fully implement RFC2616-14.3, more work is required here. The | ||
// current implementation only checks if `gzip` is found in `accept-encoding` header, but | ||
// it disregards the presence of qvalue or the order/priority of other encoding types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think qvalue is a typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'qvalue' is described in the RFC, e.g Accept-Encoding: gzip;q=1.0, identity; q=0.5, *;q=0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind :)
return true; | ||
} | ||
for (const auto& content_type : config_->getContentTypes()) { | ||
if (headers.ContentType()->value().find(content_type.c_str())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it'll return true if there isn't an exact match but a subset of the content type is contained in the white list. Is that definitely what you want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, I fixed it by providing a prefixed list of content-types that the user can select it from in the config.
Thanks @dnoe for reviewing it. I will take look at that over the weekend. |
@dnoe I did the initial changes that you suggested. I also included the part of the RFC2616 that was missing plus some more robust configuration options. Please take another look when you have a chance. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, just a few nits and comments. Still needs documentation for the filter config but it looks like that's meant to come in a follow up PR.
if (headers.AcceptEncoding() != nullptr) { | ||
return std::regex_search( | ||
headers.AcceptEncoding()->value().c_str(), | ||
std::regex{"(?!.*gzip;\\s*q=0(,|$))(?=(.*gzip)|(^\\*$))", std::regex::optimize}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that std::regex::optimize
flag makes matching faster at the expense of slower construction. Since this uses a temporary it'll still be doing construction for each call so this trade off doesn't seem to make sense. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think ideally you could have this std::regex
live the same lifespan as Envoy itself, but watch out for the "static initialization order fiasco")
|
||
bool GzipFilter::isContentTypeAllowed(const HeaderMap& headers) const { | ||
if (config_->getContentTypeValues().size() > 0 && headers.ContentType() != nullptr) { | ||
for (auto const& value : config_->getContentTypeValues()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today I learned that const auto
and auto const
are equivalent! I think everywhere else in Envoy you'll see it written as const auto
, so please consider changing it as a nit.
|
||
bool GzipFilter::isCacheControlAllowed(const HeaderMap& headers) const { | ||
if (config_->getCacheControlValues().size() > 0 && headers.CacheControl() != nullptr) { | ||
for (auto const& value : config_->getCacheControlValues()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const auto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, CONSTRUCT_ON_FIRST_USE is perfect. Just a few things.
@@ -36,7 +44,7 @@ FilterHeadersStatus GzipFilter::encodeHeaders(HeaderMap& headers, bool end_strea | |||
} | |||
|
|||
return Http::FilterHeadersStatus::Continue; | |||
} // namespace Http | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep this namespace comment in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like that namespace Http
was in a wrong place. It should be in the line 128 only.
namespace Envoy { | ||
namespace Http { | ||
|
||
namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that this doesn't apply to c++11 or up anymore. I will remove it.
Looks good to me. Did you see @mattklein123 comment about making it V2 ready? Let me know here or on Slack if you have questions about that and what needs to be done in the data-plane-api repository. |
@dnoe I started working on proto yesterday. I reach you on Slack if I have any question. Thank you again for reviewing it! |
5229c58
to
cd5e964
Compare
@dnoe Please take another look whenever you can. Thanks! |
Will take a look at this today (sorry I didn't managed to get to it before the weekend!) |
No worries! I also had a busy week and couldn't send this before Thursday night. Take your time! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! A few remaining things to look at.
|
||
for (const auto& value : gzip.cache_control()) { | ||
cache_control_values_.insert(value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a simpler way to initialize these which will eliminate the constructor body and allow you to make all the members of the GzipFilterConfig
const. See the part about RepeatedPtrField in https://developers.google.com/protocol-buffers/docs/reference/cpp-generated
This should work: cache_control_values_(gzip.cache_control().cbegin(), gzip.cache_control.cend())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I knew that repeated provides STL-like iterators, but I didn't realized that it could be initialized like that. Cool, thanks! 👍
|
||
ZlibCompressionLevelEnum | ||
GzipFilterConfig::compressionLevelEnum(const GzipV2CompressionLevelEnum& compression_level) { | ||
if (compression_level == GzipV2CompressionLevelEnum::Gzip_CompressionLevel_Enum_BEST) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be done with a switch
statement since what comes from the proto is a true enum. It'll probably be a bit more readable code than the if statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried switch here, but for some reason didn't work. Let me try it again.
|
||
ZlibCompressionStrategyEnum GzipFilterConfig::compressionStrategyEnum( | ||
const GzipV2CompressionStrategyEnum& compression_strategy) { | ||
if (compression_strategy == GzipV2CompressionStrategyEnum::Gzip_CompressionStrategy_RLE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for this, if a switch statement works prefer it to repeated if statements.
return ZlibCompressionStrategyEnum::Standard; | ||
} | ||
|
||
const uint64_t GzipFilter::WINDOW_BITS{15 | 16}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment about where these values come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to get rid of the magic number, but good call about the comment.
@mattklein123 It's ready. Would be great to have yours or @htuch feedback on this. |
|
||
bool GzipFilter::isContentTypeAllowed() const { | ||
if (config_->contentTypeValues().size() && response_headers_->ContentType()) { | ||
return std::any_of(config_->contentTypeValues().begin(), config_->contentTypeValues().end(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you are not using a fast lookup here. Why not pull the content type up and use config_->contentTypeValues().find()?
Maybe it's because content-type could be something like "text/html; charset=utf8", but can't you parse out the substring of the content-type before the ";" and then do the unordered_set find?
Moreover, if you had a compressible type "foobar" but you had content-type:foo that would match this impl, if I'm reading it correctly. In practice that might not be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀
} | ||
|
||
bool GzipFilter::isCacheControlAllowed() const { | ||
if (config_->cacheControlValues().size() && response_headers_->CacheControl()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would cache-control influence the compressibility of something? Except maybe cache-control:no-transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not affect compression, but the necessity for such. Since the response is going to be cached, there are situations where skip compression might be desirable. Less common I believe, envoy could be behind another proxy that might cache certain content and then serve it compressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE "response is going to be cached", I think that's what Vary:Accept-Encoding is for. HTTP caches should be able to hold a gzipped response and serve it to a non-gzip-accepting client by decompressing it on the fly, if that header is in the cached response. https://blog.stackpath.com/accept-encoding-vary-important is worth a read.
RE "behind another proxy" -- I think in that case the site admin would just disable gzipping in Envoy then, right? Why would it need to be based on cache-control headers?
I'm not objecting to this setting, I just want to better understand what the usage model is and its relationship to caches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out Joshua.
I think that Vary should be also available in the configuration. In regards to cache-control, I was thinking more in in terms of proxied requests, like Nginx does on the presence of Via request header. See gzip_proxied at http://nginx.org/en/docs/http/ngx_http_gzip_module.html
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sounds like you are going for a drop-in replacement for nginx gzip support? Fair enough. I don't know why someone would configure cache-control's effect on gzipping in an nginx proxy either, but it must have been put in for a reason. In that case, let's just do it efficiently. In the case of cache-control, I think it means splitting any number of attributes on ",", and doing a hash lookup in your set on each of them. E.g. cache-control: no-cache, no-store, must-revalidate, max-age=0
should (IMO) result in 4 exact hash lookups.
The most relevant setting (AFAICT), cache-control:no-transform, is not mentioned in the nginx doc you referenced, and I don't see it in your PR either. By default, IMO, you should honor cache-control:no-transform and avoid gzipping any response with that header.
I think it's very important for proxies to get the defaults right, especially Vary:accept-encoding, as messing these up can result in hard-to-diagnose downstream failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point on having cache-control:no-transform
by default. I tried to make the feature as least opinionated as possible, so the users could opt for skipping compression on whatever cache-control value/s they need to, including no-transform.
I will work on these changes, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd defer to @htuch and @mattklein123 and @alyssawilk on the design philosophy here, as I am still new to Envoy development. A few options are:
- If nginx's gzip module does it, then Envoy's should too.
- Provide mechanism not policy about how the proxy should behave, and let the config figure out what to do (ie "least opinionated as possible)
- Provide a paved path toward a few different sensible scenarios
There is some subtlety here that I think is worth getting right in the C++ code, so I kind of favor option 3, and these scenarios come to mind:
a. Enable gzip with some level of compression aggressiveness, respecting cache-control:no-transform from origin, and adding vary:accept-encoding to the response to the client.
b. Disable gzip completely, e.g. because the Envoy being configured is sitting in a high-bandwidth rack and some other proxy will gzip en route to client.
But I totally understand if the market for Envoy wants it to be a clean drop-in replacement for nginx and whatever features that has in its gzip setup.
The challenge with #2 above (least opinionated) is that you then need the person configuring Envoy to know the HTTP semantics and best practices, and enforce them via config.
|
||
// Acceptance Testing with default configuration. | ||
TEST_F(GzipFilterTest, AcceptanceGzipEncoding) { | ||
setUpTest(R"EOF({})EOF"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this complicated constant rather than just "{}"?
bool GzipFilter::isContentTypeAllowed(HeaderMap& headers) const { | ||
if (config_->contentTypeValues().size() && headers.ContentType()) { | ||
std::cmatch match; | ||
std::regex_search(headers.ContentType()->value().c_str(), match, contentTypeRegex()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks a lot better.
I'm not super-intuitive about regexes. Will this strip any extra whitespace too?
Note could also write this with str::find and some sort of string-view-trimming helper function (if such exists in absl::). That might be faster.
In any case this function deserves some direct unit-testing wihout waking up the rest of the gzip filter.
} | ||
|
||
bool GzipFilter::isContentTypeAllowed(HeaderMap& headers) const { | ||
if (config_->contentTypeValues().size() && headers.ContentType()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use .empty() to check whether it's empty, rather than taking the size(), which might in theory be a non-trivial calculation, depending on the STL impl.
} | ||
|
||
return headers.TransferEncoding() && | ||
headers.TransferEncoding()->value().find( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth putting TransferEncoding() in a temp as I think it's actually a lookup.
} | ||
|
||
bool GzipFilter::isCacheControlAllowed(HeaderMap& headers) const { | ||
if (config_->cacheControlValues().size() && headers.CacheControl()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use !empty() rather than size()
} | ||
|
||
bool GzipFilter::isTransferEncodingAllowed(HeaderMap& headers) const { | ||
if (headers.TransferEncoding()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth doing a pass over all these methods and storing header lookups in temps rather than doing them twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a great chance that the second call to TransferEncoding() might never happen, reason why this is also the second last checking in the if statement. Do you think that it's worth doing all these assignments to temps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO yes...HeaderMap's impl currently instantiates a discrete member variable per common header so the lookup is fast, but I'm wondering if that might be something that gets re-tuned later as that policy might be a drag on header construction speed as the number of potentially interesting headers grows. And in any case the temp is free, so I don't see a downside to a temp in readability or performance.
|
||
FilterHeadersStatus GzipFilter::decodeHeaders(HeaderMap& headers, bool) { | ||
// TODO(gsagula): The current implementation checks for the presence of 'gzip' and if the same | ||
// is followed by Qvalue. Since gzip is the only available encoding right now, order/priority of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note there is also "deflate"(which isn't used much anymore and "br" (brotli) which is "up and coming", though I'm not sure how fast :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, comment should be more specific -> "Since gzip is the only available content-encoding in Envoy at the moment".
I haven't look at Brotli specification yet, but I saw that it's supposed to be superior to gzip which is quite impressive.
namespace Envoy { | ||
namespace Http { | ||
|
||
static const std::regex& acceptEncodingRegex() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use anon namespace rather than 'static' (from google style guide).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I initially did, but it got caught in previous review. So, I thought it wasn't relevant anymore and ended up moving back to static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-reading https://google.github.io/styleguide/cppguide.html#Unnamed_Namespaces_and_Static_Variables it does not forbid 'static', but within the context of this style guide, everyone I know that writes C++ based on this guide interprets anonymous namespace to be the preferred mechanism here, even though they have the same effect. I think this interpretation might have been due to an earlier version of the styleguide or because the only example given is with an anonymous namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super sweet. Thanks for the high quality PR. A couple of random comments. I'm going to defer to @jmarantz for logic review as I have very little familiarity with high level web stuff so can't help that much with the various headers and when to use them.
@gsagula do you mind finalizing the docs PR in data-plane-api? (unhide filter docs, clean them up, etc.) Then we can do a final pass on everything together.
Thank you!
source/common/config/filter_json.cc
Outdated
*cache_control = json_cache_control; | ||
} | ||
|
||
const auto& json_compression_level = json_config.getString("compression_level", "default"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically this is taking a reference to a temporary. I would lose the &
on the getString
calls. Same below.
return (window_bits_ > 0 ? window_bits_ : DEFAULT_WINDOW_BITS) | GZIP_HEADER_VALUE; | ||
} | ||
|
||
GzipFilter::GzipFilter(GzipFilterConfigSharedPtr config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const GzipFilterConfigSharedPtr&
GzipFilter::GzipFilter(GzipFilterConfigSharedPtr config) | ||
: skip_compression_{true}, compressed_data_(), compressor_(), config_(config) {} | ||
|
||
void GzipFilter::onDestroy() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would just define this in the header file.
|
||
Compressor::ZlibCompressorImpl compressor_; | ||
|
||
GzipFilterConfigSharedPtr config_{nullptr}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The initialization here is redundant and can be removed.
@mattklein123 Working on all the above. @jmarantz I'm happy to go with the design changes that you suggested earlier. I will open another PR in data-plane to unlock docs, so we can work on the config changes from there. Thanks for reviewing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking better, but I'm still looking for the Vary:Accept-Encoding addition.
source/common/common/utility.cc
Outdated
@@ -98,6 +98,27 @@ void StringUtil::rtrim(std::string& source) { | |||
} | |||
} | |||
|
|||
void StringUtil::ltrim(std::string& source) { | |||
std::size_t pos = source.find_first_not_of(" \t\f\v\n\r"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use absl::string_view you will not have to copy/allocate any string data for this operation.
https://github.com/abseil/abseil-cpp/blob/master/absl/strings/string_view.h
absl is already incorporated into the Envoy build.
I admit that you may need to construct a temp std::string later to do the hash lookup though, but in one case it would at least save you one intermediate copy. You could also avoid that intermediate copy by using std::unordered_setabsl::string_view but you'll need to keep the backing store for the strings separately then. Not sure if that's worth it, but it would save allocations/string-copying per request in the server.
|
||
bool GzipFilter::isCacheControlAllowed(HeaderMap& headers) const { | ||
if (headers.CacheControl() && !config_->cacheControlValues().empty()) { | ||
auto header_values = StringUtil::split(headers.CacheControl()->value().c_str(), ','); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't forget to trim the tokens. If you just split on "," you'll wind up with tokens like " max-age=0" and " public". For some reason, it's common to have "public, max-age=300" in headers.
I'm not really sure if splitting off the max-age value is really a concern. E.g. would someone want to say "I'm willing to gzip, but only if it's cached less than 5 minutes"?
But you need to strip the whitespace so you can find "no-transform".
} | ||
|
||
bool GzipFilter::isTransferEncodingAllowed(HeaderMap& headers) const { | ||
if (headers.TransferEncoding()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO yes...HeaderMap's impl currently instantiates a discrete member variable per common header so the lookup is fast, but I'm wondering if that might be something that gets re-tuned later as that policy might be a drag on header construction speed as the number of potentially interesting headers grows. And in any case the temp is free, so I don't see a downside to a temp in readability or performance.
namespace Envoy { | ||
namespace Http { | ||
|
||
static const std::regex& acceptEncodingRegex() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-reading https://google.github.io/styleguide/cppguide.html#Unnamed_Namespaces_and_Static_Variables it does not forbid 'static', but within the context of this style guide, everyone I know that writes C++ based on this guide interprets anonymous namespace to be the preferred mechanism here, even though they have the same effect. I think this interpretation might have been due to an earlier version of the styleguide or because the only example given is with an anonymous namespace.
@jmarantz Thanks again for the detailed review. I have my eyes on the comments above. I'm back to work tomorrow, so I won't be able to look at them till evening. I did some modifications in gzip.proto (not in use yet) based on our previous discussions:
Note that I will also include verification for "Cache-Control: no-transform". Please take a look at whenever you have time: |
Signed-off-by: Gabriel <gsagula@gmail.com>
Signed-off-by: Gabriel <gsagula@gmail.com>
Signed-off-by: Gabriel <gsagula@gmail.com>
Signed-off-by: Gabriel <gsagula@gmail.com>
…ted api SHA Signed-off-by: Gabriel <gsagula@gmail.com>
Manual system-testing with real browser & envoy as a forward-proxy @jmarantz I think we are done here, but feel free to give it another pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epic PR! LGTM from a quick skim. @jmarantz do you mind doing the final pass on this one?
source/common/config/filter_json.cc
Outdated
@@ -430,5 +430,54 @@ void FilterJson::translateClientSslAuthFilter( | |||
*proto_config.mutable_ip_white_list()); | |||
} | |||
|
|||
void FilterJson::translateGzipFilter(const Json::Object& json_config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we decide to keep v1 support or are we dropping it? If we are keeping it we should add docs here: https://www.envoyproxy.io/docs/envoy/latest/api-v1/http_filters/http_filters. My preference is to drop it though and encourage people to upgrade to v2. Either way is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to drop V1 as per Harvey's recommendation, but I forgot to remove it from Envoy code. I'll do that tomorrow 👀
Oh also needs a master merge @gsagula |
Signed-off-by: Gabriel <gsagula@gmail.com>
Thanks @mattklein123 |
@jmarantz Lastest commit addresses all the places where code needs to handle case-insensitive header values. I tried to extend xxHash instead of defining a new one, but I ran out of time. I'll get back to it at some point. |
xxHash vs a new one can be a TODO. RE rebasing: this is a bit of the blind leading the blind here, but I don't think you need to rebase. You need to merge.
But I might have that wrong. I attempted to do that flow for one of my own branches and I think it it did the right thing. |
In any case, please go ahead & merge so I can see the final state before my walkthrough. I'd recommend saving a simple tree of all your actual code somewhere outside of git's control :) |
Signed-off-by: Gabriel <gsagula@gmail.com>
I think it still needs to merge upstream -> git merge upstream/master.
Sounds good. Resolving last conflicts. |
Signed-off-by: Gabriel <gsagula@gmail.com>
@jmarantz It's ready for another pass. Whenever you get a chance. Thanks! |
Signed-off-by: Gabriel <gsagula@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more v1 reverts needed I think?
: v1_converter_({BUFFER, CORS, DYNAMO, FAULT, GRPC_HTTP1_BRIDGE, GRPC_JSON_TRANSCODER, | ||
GRPC_WEB, HEALTH_CHECK, IP_TAGGING, RATE_LIMIT, ROUTER, LUA, | ||
EXT_AUTHORIZATION}) {} | ||
: v1_converter_({BUFFER, CORS, DYNAMO, ENVOY_GZIP, FAULT, GRPC_HTTP1_BRIDGE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should revert this change.
source/common/json/config_schemas.cc
Outdated
@@ -1075,6 +1075,48 @@ const std::string Json::Schema::GRPC_JSON_TRANSCODER_FILTER_SCHEMA(R"EOF( | |||
} | |||
)EOF"); | |||
|
|||
const char Json::Schema::GZIP_HTTP_FILTER_SCHEMA[] = R"EOF( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should revert this change.
source/common/json/config_schemas.h
Outdated
@@ -39,6 +39,7 @@ class Schema { | |||
static const std::string BUFFER_HTTP_FILTER_SCHEMA; | |||
static const std::string FAULT_HTTP_FILTER_SCHEMA; | |||
static const std::string GRPC_JSON_TRANSCODER_FILTER_SCHEMA; | |||
static const char GZIP_HTTP_FILTER_SCHEMA[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
source/server/config/http/gzip.cc
Outdated
|
||
HttpFilterFactoryCb GzipFilterConfig::createFilterFactory(const Json::Object& json_config, | ||
const std::string&, FactoryContext&) { | ||
envoy::config::filter::http::gzip::v2::Gzip proto_config; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this version can never happen if we drop v1? NOT_IMPLEMENTED?
Signed-off-by: Gabriel <gsagula@gmail.com>
Signed-off-by: Gabriel <gsagula@gmail.com>
@mattklein123 Done with V1 cleanup. Sorry, I tried to do that very quickly this morning so that you or Joshua could take another look, but I missed some spots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks great, thanks for pushing this all the way through!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! Thanks for the great contribution.
* use route directive regardless of rpc status Signed-off-by: Kuat Yessenov <kuat@google.com> * log response code Signed-off-by: Kuat Yessenov <kuat@google.com>
static uint64_t djb2CaseInsensitiveHash(absl::string_view input) { | ||
uint64_t hash = 5381; | ||
for (unsigned char c : input) { | ||
hash += ((hash << 5) + hash) + absl::ascii_tolower(c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like an unintended deviation from the original DJB2 hash having:
hash += ((hash << 5) + hash) + absl::ascii_tolower(c);
instead of:
hash = ((hash << 5) + hash) + absl::ascii_tolower(c);
While at it, a typo in ingnoring
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andre-rosa can i suggest opening a PR with your suggested change - seems like the typo is fixed already so i would suggest looking at current main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your answer. Please check #21787.
Description:
This PR implements http filter that compresses data dispatched from an upstream service upon client request. To enable this filter, create an object literal under filters with name field “gzip” and an empty config object for minimum configuration. Envoy will deliver compressed content to any http request that contains gzip in accept-encoding header.
#269
Risk Level: Medium
Testing: unit testing, integration testing and manual testing thru envoy front-proxy.
Note: envoy::api::v2::filter will be included in this PR.