-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Envoy crashes when ext_authz is reconfigured while under load #8025
Comments
Please provide a core dump or at least a back trace with fully resolved symbols. Thank you! |
Hey @mattklein123, I was trying to get a back trace with fully resolved symbols using https://github.com/envoyproxy/envoy/tree/master/bazel#stack-trace-symbol-resolution I think the docs are for stack traces produced while running tests via bazel - while what I have is an envoy binary with no test for this behavior. Can you help me with how can I provide you back trace with resolved symbols? |
Okay, I didn't have much luck with using Here is the tail of the logs when Envoy started crashing - https://paste.fedoraproject.org/paste/gnQJR9yI9ri5n9TwXbmFpg/raw Here are the entire logs - https://drive.google.com/file/d/17lp120Rk1AfLrqnbOqfQSM-QavoLyFkF/view @mattklein123 let me know if you need something else, thanks 👍 |
@containscafeine unfortunately that stack trace is not giving me useful info. I would work on potentially getting a core dump, or if you can't do that, a self contained repro. |
@mattklein123 okay, let me get something useful for you tomorrow. In the meantime, can you point me to how to get a core dump? Skimmed the docs but couldn't find anything. Thanks for the help :) |
Hey @mattklein123, so I tried again with getting a fresh build with bazel and tried to decode the logs with Anyway, here is the repro that you asked for -
{
"admin": {
"access_log_path": "/tmp/admin_access_log",
"address": {
"socket_address": {
"address": "127.0.0.1",
"port_value": 8001
}
}
},
"dynamic_resources": {
"ads_config": {
"api_type": "GRPC",
"grpc_services": [
{
"envoy_grpc": {
"cluster_name": "xds_cluster"
}
}
]
},
"cds_config": {
"ads": {}
},
"lds_config": {
"ads": {}
}
},
"node": {
"cluster": "ambassador-default",
"id": "test-id"
},
"static_resources": {
"clusters": [
{
"connect_timeout": "1s",
"hosts": [
{
"socket_address": {
"address": "127.0.0.1",
"port_value": 8003
}
}
],
"http2_protocol_options": {},
"name": "xds_cluster"
}
]
}
}
Make sure you update
{
"@type": "/envoy.config.bootstrap.v2.Bootstrap",
"static_resources": {
"clusters": [
{
"connect_timeout": "3.000s",
"dns_lookup_family": "V4_ONLY",
"lb_policy": "ROUND_ROBIN",
"load_assignment": {
"cluster_name": "cluster_extauth_example_auth_3000",
"endpoints": [
{
"lb_endpoints": [
{
"endpoint": {
"address": {
"socket_address": {
"address": "example-auth",
"port_value": 3000,
"protocol": "TCP"
}
}
}
}
]
}
]
},
"name": "cluster_extauth_example_auth_3000",
"type": "STRICT_DNS"
},
{
"connect_timeout": "3.000s",
"dns_lookup_family": "V4_ONLY",
"lb_policy": "ROUND_ROBIN",
"load_assignment": {
"cluster_name": "cluster_tour_8080",
"endpoints": [
{
"lb_endpoints": [
{
"endpoint": {
"address": {
"socket_address": {
"address": "tour",
"port_value": 8080,
"protocol": "TCP"
}
}
}
}
]
}
]
},
"name": "cluster_tour_8080",
"type": "STRICT_DNS"
}
],
"listeners": [
{
"address": {
"socket_address": {
"address": "0.0.0.0",
"port_value": 8080,
"protocol": "TCP"
}
},
"filter_chains": [
{
"filters": [
{
"config": {
"access_log": [
{
"config": {
"format": "ACCESS [%START_TIME%] \"%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%\" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% \"%REQ(X-FORWARDED-FOR)%\" \"%REQ(USER-AGENT)%\" \"%REQ(X-REQUEST-ID)%\" \"%REQ(:AUTHORITY)%\" \"%UPSTREAM_HOST%\"\n",
"path": "/dev/fd/1"
},
"name": "envoy.file_access_log"
}
],
"http_filters": [
{
"config": {
"http_service": {
"authorization_request": {
"allowed_headers": {
"patterns": [
{
"exact": "cookie"
},
{
"exact": "x-forwarded-for"
},
{
"exact": "user-agent"
},
{
"exact": "proxy-authorization"
},
{
"exact": "authorization"
},
{
"exact": "from"
},
{
"exact": "x-forwarded-proto"
},
{
"exact": "x-qotm-session"
},
{
"exact": "x-forwarded-host"
}
]
},
"headers_to_add": []
},
"authorization_response": {
"allowed_client_headers": {
"patterns": [
{
"exact": "location"
},
{
"exact": "www-authenticate"
},
{
"exact": "x-qotm-session"
},
{
"exact": "proxy-authenticate"
},
{
"exact": "authorization"
},
{
"exact": "set-cookie"
}
]
},
"allowed_upstream_headers": {
"patterns": [
{
"exact": "location"
},
{
"exact": "www-authenticate"
},
{
"exact": "x-qotm-session"
},
{
"exact": "proxy-authenticate"
},
{
"exact": "authorization"
},
{
"exact": "set-cookie"
}
]
}
},
"path_prefix": "/extauth",
"server_uri": {
"cluster": "cluster_extauth_example_auth_3000",
"timeout": "5.000s",
"uri": "http://extauth"
}
}
},
"name": "envoy.ext_authz"
},
{
"name": "envoy.cors"
},
{
"name": "envoy.router"
}
],
"http_protocol_options": {
"accept_http_10": false
},
"normalize_path": true,
"route_config": {
"virtual_hosts": [
{
"domains": [
"*"
],
"name": "backend",
"routes": [
{
"match": {
"case_sensitive": true,
"prefix": "/backend/",
"runtime_fraction": {
"default_value": {
"denominator": "HUNDRED",
"numerator": 100
},
"runtime_key": "routing.traffic_shift.cluster_tour_8080"
}
},
"route": {
"cluster": "cluster_tour_8080",
"prefix_rewrite": "/",
"priority": null,
"timeout": "3.000s"
}
}
]
}
]
},
"server_name": "envoy",
"stat_prefix": "ingress_http",
"use_remote_address": true,
"xff_num_trusted_hops": 0
},
"name": "envoy.http_connection_manager"
}
],
"use_proxy_proto": false
}
],
"name": "ambassador-listener-8080"
}
]
}
} Note that this refers to
Note that you get 200 OK with this request.
{
"@type": "/envoy.config.bootstrap.v2.Bootstrap",
"static_resources": {
"clusters": [
{
"connect_timeout": "3.000s",
"dns_lookup_family": "V4_ONLY",
"lb_policy": "ROUND_ROBIN",
"load_assignment": {
"cluster_name": "cluster_extauth_example_auth2_3000",
"endpoints": [
{
"lb_endpoints": [
{
"endpoint": {
"address": {
"socket_address": {
"address": "example-auth2",
"port_value": 3000,
"protocol": "TCP"
}
}
}
}
]
}
]
},
"name": "cluster_extauth_example_auth2_3000",
"type": "STRICT_DNS"
},
{
"connect_timeout": "3.000s",
"dns_lookup_family": "V4_ONLY",
"lb_policy": "ROUND_ROBIN",
"load_assignment": {
"cluster_name": "cluster_tour_8080",
"endpoints": [
{
"lb_endpoints": [
{
"endpoint": {
"address": {
"socket_address": {
"address": "tour",
"port_value": 8080,
"protocol": "TCP"
}
}
}
}
]
}
]
},
"name": "cluster_tour_8080",
"type": "STRICT_DNS"
}
],
"listeners": [
{
"address": {
"socket_address": {
"address": "0.0.0.0",
"port_value": 8080,
"protocol": "TCP"
}
},
"filter_chains": [
{
"filters": [
{
"config": {
"access_log": [
{
"config": {
"format": "ACCESS [%START_TIME%] \"%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%\" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% \"%REQ(X-FORWARDED-FOR)%\" \"%REQ(USER-AGENT)%\" \"%REQ(X-REQUEST-ID)%\" \"%REQ(:AUTHORITY)%\" \"%UPSTREAM_HOST%\"\n",
"path": "/dev/fd/1"
},
"name": "envoy.file_access_log"
}
],
"http_filters": [
{
"config": {
"http_service": {
"authorization_request": {
"allowed_headers": {
"patterns": [
{
"exact": "cookie"
},
{
"exact": "x-forwarded-for"
},
{
"exact": "user-agent"
},
{
"exact": "proxy-authorization"
},
{
"exact": "authorization"
},
{
"exact": "from"
},
{
"exact": "x-forwarded-proto"
},
{
"exact": "x-qotm-session"
},
{
"exact": "x-forwarded-host"
}
]
},
"headers_to_add": []
},
"authorization_response": {
"allowed_client_headers": {
"patterns": [
{
"exact": "location"
},
{
"exact": "www-authenticate"
},
{
"exact": "x-qotm-session"
},
{
"exact": "proxy-authenticate"
},
{
"exact": "authorization"
},
{
"exact": "set-cookie"
}
]
},
"allowed_upstream_headers": {
"patterns": [
{
"exact": "location"
},
{
"exact": "www-authenticate"
},
{
"exact": "x-qotm-session"
},
{
"exact": "proxy-authenticate"
},
{
"exact": "authorization"
},
{
"exact": "set-cookie"
}
]
}
},
"path_prefix": "/extauth",
"server_uri": {
"cluster": "cluster_extauth_example_auth2_3000",
"timeout": "5.000s",
"uri": "http://extauth"
}
}
},
"name": "envoy.ext_authz"
},
{
"name": "envoy.cors"
},
{
"name": "envoy.router"
}
],
"http_protocol_options": {
"accept_http_10": false
},
"normalize_path": true,
"route_config": {
"virtual_hosts": [
{
"domains": [
"*"
],
"name": "backend",
"routes": [
{
"match": {
"case_sensitive": true,
"prefix": "/backend/",
"runtime_fraction": {
"default_value": {
"denominator": "HUNDRED",
"numerator": 100
},
"runtime_key": "routing.traffic_shift.cluster_tour_8080"
}
},
"route": {
"cluster": "cluster_tour_8080",
"prefix_rewrite": "/",
"priority": null,
"timeout": "3.000s"
}
}
]
}
]
},
"server_name": "envoy",
"stat_prefix": "ingress_http",
"use_remote_address": true,
"xff_num_trusted_hops": 0
},
"name": "envoy.http_connection_manager"
}
],
"use_proxy_proto": false
}
],
"name": "ambassador-listener-8080"
}
]
}
} It is now that Envoy will crash upon updating the auth service config. Let me know if you need something else! Thanks! 👍 |
I think it's because there is a request (initiated by envoy/source/extensions/filters/common/ext_authz/ext_authz_http_impl.cc Lines 205 to 207 in eff0201
I tried to add a try-catch block around the request, diff --git a/source/extensions/filters/common/ext_authz/ext_authz_http_impl.cc b/source/extensions/filters/common/ext_authz/ext_authz_http_impl.cc
index b40a927fd..ff44a716a 100644
--- a/source/extensions/filters/common/ext_authz/ext_authz_http_impl.cc
+++ b/source/extensions/filters/common/ext_authz/ext_authz_http_impl.cc
@@ -202,9 +202,14 @@ void RawHttpClientImpl::check(RequestCallbacks& callbacks,
std::make_unique<Buffer::OwnedImpl>(request.attributes().request().http().body());
}
- request_ = cm_.httpAsyncClientForCluster(config_->cluster())
- .send(std::move(message), *this,
- Http::AsyncClient::RequestOptions().setTimeout(config_->timeout()));
+ try {
+ request_ = cm_.httpAsyncClientForCluster(config_->cluster())
+ .send(std::move(message), *this,
+ Http::AsyncClient::RequestOptions().setTimeout(config_->timeout()));
+ } catch (const EnvoyException&) {
+ callbacks_->onComplete(std::make_unique<Response>(errorResponse()));
+ callbacks_ = nullptr;
+ }
}
void RawHttpClientImpl::onSuccess(Http::MessagePtr&& message) { It seems it can remedy the issue, as demonstrated here: https://github.com/dio/update-lds-cds-when-under-load. While this is clearly a hack, I think the right fix will be using the "dynamic_forward_proxy"-way of resolving upstream "cluster". |
@containscafeine if you can try to replace your envoy image with this image defined in https://github.com/dio/update-lds-cds-when-under-load/blob/6d9ef4998becf9760177cbbd7bd1eb3b78bfcb64/fixed/docker-compose.yaml#L4 That will give us more ideas. Thank you! |
@dio just tried the binary from that image, and it works perfect - no more crashes! Thanks for looking into this ;) |
@containscafeine yeah, we need to put a proper fix for this. Do you want to take this? Since today, for the
I see this comment here is interesting: envoy/api/envoy/api/v2/core/http_uri.proto Lines 30 to 33 in 39a4423
@mattklein123 do yo have suggestions on this? |
There are a bunch of examples like this. The general pattern is to get the thread local cluster from the CM. If it doesn't exist, handle, otherwise it should be safe to make an async client call. |
@mattklein123 Cool. That will be relatively easy to fix. I mixed the the thing in my head with this issue: #6099 😅 |
Title: Envoy crashes when ext_authz is reconfigured while under load
Description:
This has been tested on master as well as the v1.11.1 release.
Repro steps:
ext_authz
using a controller (in this case Ambassador)Config:
Logs:
The text was updated successfully, but these errors were encountered: