Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation faults in Lua filter (LuaThreadLocal) #10241

Closed
jmuia opened this issue Mar 3, 2020 · 10 comments
Closed

Segmentation faults in Lua filter (LuaThreadLocal) #10241

jmuia opened this issue Mar 3, 2020 · 10 comments

Comments

@jmuia
Copy link
Contributor

jmuia commented Mar 3, 2020

Title: Segmentation faults in Lua filter (LuaThreadLocal)

Description:
Envoy occasionally crashes, which is unexpected.

Repro steps:
Unsure how to reproduce at the moment, it seems spurious (<5 times per day in a QA fleet since collecting core dumps).

It has been observed in:

We haven't collected core dumps from other versions.

Config:
We have various Lua filters configured.

Call Stack:
From the 1.12.2 build described above:

    #0  Envoy::ThreadLocal::Slot::getTyped<Envoy::Extensions::Filters::Common::Lua::ThreadLocalState::LuaThreadLocal> (this=0xe) at bazel-out/k8-opt/bin/external/envoy/include/envoy/thread_local/_virtual_includes/thread_local_interface/envoy/thread_local/thread_local.h:51
    #1  0x000055e16e7980ec in Envoy::Extensions::Filters::Common::Lua::ThreadLocalState::registerType<Envoy::Extensions::Filters::Common::Lua::BufferWrapper>()::{lambda()#1}::operator()() const (__closure=<optimized out>) at bazel-out/k8-opt/bin/external/envoy/source/extensions/filters/common/lua/_virtual_includes/lua_lib/extensions/filters/common/lua/lua.h:390
    #2  std::_Function_handler<void (), Envoy::Extensions::Filters::Common::Lua::ThreadLocalState::registerType<Envoy::Extensions::Filters::Common::Lua::BufferWrapper>()::{lambda()#1}>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/7/bits/std_function.h:316
    #3  0x000055e16ec9b053 in std::function<void ()>::operator()() const (this=0x7f7fb763b850) at /usr/include/c++/7/bits/std_function.h:706
    #4  Envoy::Event::DispatcherImpl::runPostCallbacks (this=this@entry=0x55e17203cdc0) at external/envoy/source/common/event/dispatcher_impl.cc:222
    #5  0x000055e16ec9b14c in Envoy::Event::DispatcherImpl::run (this=0x55e17203cdc0, type=Envoy::Event::Dispatcher::RunType::Block) at external/envoy/source/common/event/dispatcher_impl.cc:193
    #6  0x000055e16ec951fa in Envoy::Server::WorkerImpl::threadRoutine (this=0x55e172222850, guard_dog=...) at external/envoy/source/server/worker_impl.cc:110
    #7  0x000055e16eff01d5 in std::function<void ()>::operator()() const (this=<optimized out>) at /usr/include/c++/7/bits/std_function.h:706
    #8  Envoy::Thread::ThreadImplPosix::<lambda(void*)>::operator() (__closure=0x0, arg=<optimized out>) at external/envoy/source/common/common/posix/thread_impl.cc:33
    #9  Envoy::Thread::ThreadImplPosix::<lambda(void*)>::_FUN(void *) () at external/envoy/source/common/common/posix/thread_impl.cc:35
    #10 0x00007f7fbba2a6ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
    #11 0x00007f7fbb76041d in clone () from /lib/x86_64-linux-gnu/libc.so.6
@mattklein123
Copy link
Member

Is there any more detail you can share about your use of Lua or any of the times that it crashes? Do you use LDS to deliver Lua config? Nothing immediately comes to mind from the core dump. cc @dio

@dio
Copy link
Member

dio commented Mar 4, 2020

ACK, more info on this will be very helpful.

@surki
Copy link
Contributor

surki commented Mar 4, 2020

@mattklein123 @dio I see a similar crash due to SDS problem (when certs are missing), may be related to this? #9765 (comment)

@jmuia do you have listeners with tls context configured?

@mattklein123
Copy link
Member

Yes I think it's possible that the crash is due to listener teardown issues. We just fixed something similar in a different filter (cc @yuval-k), but it would be nice to confirm that is what is happening.

@jmuia
Copy link
Contributor Author

jmuia commented Mar 5, 2020

Is there any more detail you can share about your use of Lua or any of the times that it crashes?

Uses of Lua include:

  • Transforming request headers (eg. setting new request headers based on incoming request headers)
  • Responding to certain healthcheck requests

Timing of crashes:
I need to collect more data here, but the couple of crashes that I've spot checked so far have happened upon a cold start.

...
[2020-03-04 01:38:32.457][2678][info][config] [external/envoy/source/server/listener_manager_impl.cc:707] all dependencies initialized. starting workers
[2020-03-04 01:38:32.479][3604][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:83] Caught Segmentation fault, suspect faulting address 0x0

Do you use LDS to deliver Lua config?

Yep, we use LDS for this.

Nothing immediately comes to mind from the core dump.

If it would be helpful I can provide the backtrace for all threads from the core dump.

do you have listeners with tls context configured?

Yep, the listener that is configured with a Lua filter also configures tls_context.

--

I've attached below the (sanitized) log output prior to a crash:

[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:251] initializing epoch 0 (hot restart version=11.104)
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:253] statically linked extensions:
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.transport_sockets.downstream: envoy.transport_sockets.alts, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.tap, envoy.transport_sockets.tls, raw_buffer, tls
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.filters.http: envoy.buffer, envoy.cors, envoy.csrf, envoy.ext_authz, envoy.fault, envoy.filters.http.adaptive_concurrency, envoy.filters.http.dynamic_forward_proxy, envoy.filters.http.grpc_http1_reverse_bridge, envoy.filters.http.grpc_stats, envoy.filters.http.header_to_metadata, envoy.filters.http.jwt_authn, envoy.filters.http.on_demand, envoy.filters.http.original_src, envoy.filters.http.rbac, envoy.filters.http.tap, envoy.grpc_http1_bridge, envoy.grpc_json_transcoder, envoy.grpc_web, envoy.gzip, envoy.health_check, envoy.http_dynamo_filter, envoy.ip_tagging, envoy.lua, envoy.rate_limit, envoy.router, envoy.squash
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.thrift_proxy.filters: envoy.filters.thrift.rate_limit, envoy.filters.thrift.router
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.dubbo_proxy.protocols: dubbo
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.filters.network: envoy.client_ssl_auth, envoy.echo, envoy.ext_authz, envoy.filters.network.dubbo_proxy, envoy.filters.network.kafka_broker, envoy.filters.network.local_ratelimit, envoy.filters.network.mysql_proxy, envoy.filters.network.rbac, envoy.filters.network.sni_cluster, envoy.filters.network.thrift_proxy, envoy.filters.network.zookeeper_proxy, envoy.http_connection_manager, envoy.mongo_proxy, envoy.ratelimit, envoy.redis_proxy, envoy.tcp_proxy
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.health_checkers: envoy.health_checkers.redis
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.thrift_proxy.protocols: auto, binary, binary/non-strict, compact, twitter
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.access_loggers: envoy.file_access_log, envoy.http_grpc_access_log, envoy.tcp_grpc_access_log
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.retry_priorities: envoy.retry_priorities.previous_priorities
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.resolvers: envoy.ip
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.dubbo_proxy.route_matchers: default
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.filters.udp_listener: envoy.filters.udp_listener.udp_proxy
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.resource_monitors: envoy.resource_monitors.fixed_heap, envoy.resource_monitors.injected_resource
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.udp_listeners: raw_udp_listener
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.filters.listener: envoy.listener.http_inspector, envoy.listener.original_dst, envoy.listener.original_src, envoy.listener.proxy_protocol, envoy.listener.tls_inspector
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.dubbo_proxy.filters: envoy.filters.dubbo.router
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.retry_host_predicates: envoy.retry_host_predicates.omit_canary_hosts, envoy.retry_host_predicates.previous_hosts
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.dubbo_proxy.serializers: dubbo.hessian2
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.thrift_proxy.transports: auto, framed, header, unframed
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.stats_sinks: envoy.dog_statsd, envoy.metrics_service, envoy.stat_sinks.hystrix, envoy.statsd
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.tracers: envoy.dynamic.ot, envoy.lightstep, envoy.tracers.datadog, envoy.tracers.opencensus, envoy.tracers.xray, envoy.zipkin
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.clusters: envoy.cluster.eds, envoy.cluster.logical_dns, envoy.cluster.original_dst, envoy.cluster.static, envoy.cluster.strict_dns, envoy.clusters.aggregate, envoy.clusters.dynamic_forward_proxy, envoy.clusters.redis
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.transport_sockets.upstream: envoy.transport_sockets.alts, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.tap, envoy.transport_sockets.tls, raw_buffer, tls
[2020-03-04 01:38:25.967][2678][info][main] [external/envoy/source/server/server.cc:255]   envoy.grpc_credentials: envoy.grpc_credentials.aws_iam, envoy.grpc_credentials.default, envoy.grpc_credentials.file_based_metadata
[2020-03-04 01:38:26.069][2678][info][main] [external/envoy/source/server/server.cc:336] admin address: IP:PORT
[2020-03-04 01:38:26.078][2678][info][main] [external/envoy/source/server/server.cc:455] runtime: layers:
  - name: static_layer
    static_layer:
      envoy.deprecated_features:envoy.api.v2.Cluster.tls_context: true
[2020-03-04 01:38:26.082][2678][info][config] [external/envoy/source/server/configuration_impl.cc:62] loading 0 static secret(s)
[2020-03-04 01:38:26.082][2678][info][config] [external/envoy/source/server/configuration_impl.cc:68] loading 5 cluster(s)
[2020-03-04 01:38:26.130][2678][info][config] [external/envoy/source/server/configuration_impl.cc:72] loading 0 listener(s)
[2020-03-04 01:38:26.133][2678][info][config] [external/envoy/source/server/configuration_impl.cc:97] loading tracing configuration
[2020-03-04 01:38:26.133][2678][info][config] [external/envoy/source/server/configuration_impl.cc:116] loading stats sink configuration
[2020-03-04 01:38:26.135][2678][info][main] [external/envoy/source/server/server.cc:550] starting main dispatch loop
[2020-03-04 01:38:26.189][2678][info][upstream] [external/envoy/source/common/upstream/cluster_manager_impl.cc:167] cm init: initializing cds
[2020-03-04 01:38:26.369][2678][info][upstream] [external/envoy/source/common/upstream/cds_api_impl.cc:74] cds: add 72 cluster(s), remove 5 cluster(s)
[2020-03-04 01:38:26.418][2678][info][upstream] [external/envoy/source/common/upstream/cds_api_impl.cc:90] cds: add/update cluster 'a-cluster'
<truncated for brevity cds: add/update cluster ....>
[2020-03-04 01:38:26.418][2678][info][upstream] [external/envoy/source/common/upstream/cds_api_impl.cc:90] cds: add/update cluster 'some-cluster'
[2020-03-04 01:38:28.352][2678][info][upstream] [external/envoy/source/common/upstream/cluster_manager_impl.cc:171] cm init: all clusters initialized
[2020-03-04 01:38:28.365][2678][info][main] [external/envoy/source/server/server.cc:529] all clusters initialized. initializing init manager
[2020-03-04 01:38:30.777][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'a-listener'
[2020-03-04 01:38:30.779][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'other-listener'
[2020-03-04 01:38:31.192][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:31.194][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:31.258][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:31.259][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:31.272][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'listener-with-lua-and-tls-context'
[2020-03-04 01:38:31.288][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'another-listener'
[2020-03-04 01:38:31.293][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:31.294][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'another-listener-2'
[2020-03-04 01:38:31.297][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:31.298][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'another-listener-3'
[2020-03-04 01:38:31.299][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'another-listener-4'
[2020-03-04 01:38:31.300][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'another-listener-5'
[2020-03-04 01:38:32.069][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:32.070][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:32.120][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:32.120][2678][info][lua] [external/envoy/source/extensions/filters/http/lua/lua_filter.cc:510] envoy_on_response() function not found. Lua filter will not hook responses.
[2020-03-04 01:38:32.137][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'listener-with-lua-and-tls-context'
[2020-03-04 01:38:32.457][2678][info][config] [external/envoy/source/server/listener_manager_impl.cc:707] all dependencies initialized. starting workers
[2020-03-04 01:38:32.479][3604][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:83] Caught Segmentation fault, suspect faulting address 0x0

@mattklein123
Copy link
Member

Possibly related: #9940

@yuval-k
Copy link
Contributor

yuval-k commented Apr 2, 2020

looking at the code and stack trace i imagine that the issue is capturing this in this line:

[this]() { T::registerType(tls_slot_->getTyped<LuaThreadLocal>().state_.get()); });

an quick solution would be to pass tls_slot_ by copy instead of this
do note that note above is just from reviewing code, I didn't test this.

@wbpcode
Copy link
Member

wbpcode commented Jul 7, 2020

The reason is that when the worker executes registerType, the ThreadLocalState object captured by the callback function has been destructed. I encountered this problem when trying to implement routing granularity Lua scripts #11235. This problem is triggered when you create a ThreadLocalState object during initialization and immediately destruct it. Your Listener listener-with-lua-and-tls-context was updated (but why?) during the initialization phase, which means that the old listener configuration was destructed, so this problem was triggered.

[2020-03-04 01:38:31.272][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'listener-with-lua-and-tls-context'
.....................
[2020-03-04 01:38:32.137][2678][info][upstream] [external/envoy/source/server/lds_api.cc:73] lds: add/update listener 'listener-with-lua-and-tls-context'

@wbpcode
Copy link
Member

wbpcode commented Jul 16, 2020

@jmuia You can check to see if this problem has been fixed by #11944 . If it has been fixed, we can close this issue. If this issue still exists, I hope you can provide more information, such as how to reproduce it.

@jmuia
Copy link
Contributor Author

jmuia commented Dec 15, 2020

Thanks @wbpcode.

The issue was spurious and we never found a repro. I'm not sure yet what conditions changed in our environment, but we stopped triggering this segfault without deploying #11944 internally.

I'll close this issue for now since you've merged #11944 and will re-open if it re-surfaces.

@jmuia jmuia closed this as completed Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants