Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XRay Tracer: Segfault in finishSpan #10142

Closed
dpetzold opened this issue Feb 23, 2020 · 6 comments
Closed

XRay Tracer: Segfault in finishSpan #10142

dpetzold opened this issue Feb 23, 2020 · 6 comments
Assignees

Comments

@dpetzold
Copy link

dpetzold commented Feb 23, 2020

Title: "Segfault in Envoy::Extensions::Tracers::XRay::Span::finishSpan"*

Description:
Hello. After enabling the xray tracer Envoy segfaults on the first request. I was able to determine the fault occurs at this line:

https://github.com/envoyproxy/envoy/blob/master/source/extensions/tracers/xray/tracer.cc#L71

The code is covered here:

https://github.com/envoyproxy/envoy/blob/master/test/extensions/tracers/xray/tracer_test.cc

which passes for me locally. So I am at a loss how to duplicate via test. Removing the validation code here:

https://github.com/envoyproxy/envoy/blob/master/source/extensions/tracers/xray/daemon.proto

Fixed the segfault but I know that is not the correct solution and I have unfortunately hit my knowledge limit. The stacktrace is from a dbg build using master. I will be happy to look at this more with some guidance. Thank you.

Repro steps:
Enable the tracer using the sample config below. Make a request.

Config:

admin:                   
  access_log_path: /tmp/admin_access.log
  address:                         
    socket_address:
      protocol: TCP
      address: 0.0.0.0    
      port_value: 9901                    
                                                 
dynamic_resources:
  cds_config:          
    path: /etc/envoy/clusters.yaml
                                                 
stats_sinks:                           
  - name: envoy.statsd   
    typed_config:                                                                                 
      "@type": type.googleapis.com/envoy.config.metrics.v2.StatsdSink
      prefix: envoy.local
      address:         
        socket_address:                  
          address: 172.22.0.4
          port_value: 8125                                                                        
          protocol: UDP      
                                                 
tracing:                        
  http:                                    
    name: envoy.tracers.xray
    typed_config:                          
      "@type": type.googleapis.com/envoy.config.trace.v3.XRayConfig
      daemon_endpoint:       
        address: 172.22.0.3          
        port_value: 2000
        protocol: UDP

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        protocol: TCP
        address: 0.0.0.0
        port_value: 443
    traffic_direction: OUTBOUND
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
          generate_request_id: true
          tracing:
            verbose: true
          stat_prefix: ingress_http
          rds:
            config_source:
              path: /etc/envoy/routes.yaml
            route_config_name: local_route

          http_filters:
          - name: envoy.router
          access_log:
          - name: envoy.file_access_log
            typed_config:
              '@type': type.googleapis.com/envoy.config.accesslog.v2.FileAccessLog
              path: /dev/stdout

      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.api.v2.auth.DownstreamTlsContext
          common_tls_context:
            tls_certificates:
              certificate_chain:
                filename: /etc/ssl/cert.pem
              private_key:
                filename: /etc/ssl/cert.key

Call Stack:

#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000055d26f753f5f in Envoy::SignalAction::sigHandler (sig=6, info=0x7f5d8270f1f0, context=0x7f5d8270f0c0) at source/common/signal/signal_action.cc:74
#2  <signal handler called>
#3  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#4  0x00007f5d84f9f899 in __GI_abort () at abort.c:79
#5  0x000055d26f75713d in Envoy::TerminateHandler::<lambda()>::operator()(void) const (__closure=0x0) at source/exe/terminate_handler.cc:15
#6  0x000055d26f75714f in Envoy::TerminateHandler::<lambda()>::_FUN(void) () at source/exe/terminate_handler.cc:16
#7  0x000055d27016201c in __cxxabiv1::__terminate(void (*)()) ()
#8  0x000055d2701feea9 in __cxa_call_terminate ()
#9  0x000055d270161cc4 in __gxx_personality_v0 ()
#10 0x000055d27021232b in _Unwind_RaiseException_Phase2 ()
#11 0x000055d270212d1a in _Unwind_Resume ()
#12 0x000055d26ebce445 in Envoy::Extensions::Tracers::XRay::Span::finishSpan (this=0x55d273b349a0) at source/extensions/tracers/xray/tracer.cc:71
#13 0x000055d26f41e955 in Envoy::Tracing::HttpTracerUtility::finalizeDownstreamSpan (span=..., request_headers=0x55d273b21600, response_headers=0x55d273b21b80, response_trailers=0x0, 
    stream_info=..., tracing_config=...) at source/common/tracing/http_tracer_impl.cc:208
#14 0x000055d26f300dd8 in Envoy::Http::ConnectionManagerImpl::ActiveStream::~ActiveStream (this=0x55d273bc2d80, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at source/common/http/conn_manager_impl.cc:602
#15 0x000055d26f301200 in Envoy::Http::ConnectionManagerImpl::ActiveStream::~ActiveStream (this=0x55d273bc2d80, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at source/common/http/conn_manager_impl.cc:611
#16 0x000055d26da0f486 in std::default_delete<Envoy::Event::DeferredDeletable>::operator() (this=0x55d273c0df78, __ptr=0x55d273bc2d80) at /usr/include/c++/9/bits/unique_ptr.h:81
#17 0x000055d26ee21033 in std::unique_ptr<Envoy::Event::DeferredDeletable, std::default_delete<Envoy::Event::DeferredDeletable> >::reset (this=0x55d273c0df78, __p=0x55d273bc2d80)
    at /usr/include/c++/9/bits/unique_ptr.h:394
#18 0x000055d26ee1e634 in Envoy::Event::DispatcherImpl::clearDeferredDeleteList (this=0x55d2737466e0) at source/common/event/dispatcher_impl.cc:95
#19 0x000055d26ee1d911 in Envoy::Event::DispatcherImpl::<lambda()>::operator()(void) const (__closure=0x55d27398daa0) at source/common/event/dispatcher_impl.cc:42
#20 0x000055d26ee1fe4f in std::_Function_handler<void(), Envoy::Event::DispatcherImpl::DispatcherImpl(Envoy::Buffer::WatermarkFactoryPtr&&, Envoy::Api::Api&, Envoy::Event::TimeSystem&)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/9/bits/std_function.h:300
#21 0x000055d26d8d0e38 in std::function<void ()>::operator()() const (this=0x55d27398daa0) at /usr/include/c++/9/bits/std_function.h:690
#22 0x000055d26ee4a38e in Envoy::Event::TimerImpl::<lambda(int, short int, void*)>::operator()(int, short, void *) const (__closure=0x0, arg=0x55d27398da20)
    at source/common/event/timer_impl.cc:15
#23 0x000055d26ee4a44f in Envoy::Event::TimerImpl::<lambda(int, short int, void*)>::_FUN(int, short, void *) () at source/common/event/timer_impl.cc:15
#24 0x000055d26f4efc78 in event_process_active_single_queue (base=0x55d2737fa840, activeq=0x55d273750750, max_to_process=2147483647, endtime=0x0)
    at /home/derrick/.cache/bazel/_bazel_derrick/acfe07f95c4ac7685cbaf68a9a7df9a3/sandbox/linux-sandbox/2313/execroot/envoy/external/com_github_libevent_libevent/event.c:1713
#25 0x000055d26f4f0162 in event_process_active (base=0x55d2737fa840)
    at /home/derrick/.cache/bazel/_bazel_derrick/acfe07f95c4ac7685cbaf68a9a7df9a3/sandbox/linux-sandbox/2313/execroot/envoy/external/com_github_libevent_libevent/event.c:1805
#26 0x000055d26f4f0a51 in event_base_loop (base=0x55d2737fa840, flags=0)
    at /home/derrick/.cache/bazel/_bazel_derrick/acfe07f95c4ac7685cbaf68a9a7df9a3/sandbox/linux-sandbox/2313/execroot/envoy/external/com_github_libevent_libevent/event.c:2047
#27 0x000055d26ee48f1f in Envoy::Event::LibeventScheduler::run (this=0x55d273746730, mode=Envoy::Event::Dispatcher::RunType::Block) at source/common/event/libevent_scheduler.cc:47
#28 0x000055d26ee1fa81 in Envoy::Event::DispatcherImpl::run (this=0x55d2737466e0, type=Envoy::Event::Dispatcher::RunType::Block) at source/common/event/dispatcher_impl.cc:196
#29 0x000055d26ee056e4 in Envoy::Server::WorkerImpl::threadRoutine (this=0x55d273983ce0, guard_dog=...) at source/server/worker_impl.cc:110
#30 0x000055d26ee04f52 in Envoy::Server::WorkerImpl::<lambda()>::operator()(void) const (__closure=0x55d273c68938) at source/server/worker_impl.cc:75
#31 0x000055d26ee05fcb in std::_Function_handler<void(), Envoy::Server::WorkerImpl::start(Envoy::Server::GuardDog&)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /usr/include/c++/9/bits/std_function.h:300
#32 0x000055d26d8d0e38 in std::function<void ()>::operator()() const (this=0x55d273c68938) at /usr/include/c++/9/bits/std_function.h:690
#33 0x000055d26f7b35c0 in Envoy::Thread::ThreadImplPosix::<lambda(void*)>::operator()(void *) const (__closure=0x0, arg=0x55d273c68930) at source/common/common/posix/thread_impl.cc:33
#34 0x000055d26f7b35e8 in Envoy::Thread::ThreadImplPosix::<lambda(void*)>::_FUN(void *) () at source/common/common/posix/thread_impl.cc:35
#35 0x00007f5d85174669 in start_thread (arg=<optimized out>) at pthread_create.c:479
#36 0x00007f5d8509c323 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
@mattklein123
Copy link
Member

cc @marcomagdy

@marcomagdy
Copy link
Contributor

marcomagdy commented Feb 24, 2020

@dpetzold we've communicated on twitter about this, but I'll put it here for posterity. Can you try to narrow down the validation field that's causing the error? comment out individual validators in the daemon.proto file and see which one is the offender. I'll try to reproduce the issue on my end.

@dpetzold
Copy link
Author

@marcomagdy I can confirm is the the validation on the name. Below is the output from the xray daemon when the name validation is commented out. I'll look into why its not being set.

xray-daemon_1  | 2020-02-24T19:38:41Z [Debug] processor: sending partial batch
xray-daemon_1  | 2020-02-24T19:38:41Z [Debug] processor: segment batch size: 1. capacity: 50
xray-daemon_1  | 2020-02-24T19:38:41Z [Info] Successfully sent batch of 1 segments (0.443 seconds)
xray-daemon_1  | 2020-02-24T19:38:41Z [Error] Unprocessed segment: {
xray-daemon_1  |   ErrorCode: "MissingName",
xray-daemon_1  |   Id: "0000023a103d126e",
xray-daemon_1  |   Message: "Invalid segment. ErrorCode: MissingName"
xray-daemon_1  | }

@dpetzold
Copy link
Author

dpetzold commented Feb 24, 2020

Ok so the name wasn't being set since segment_name wasn't defined in the config. I think it should warn or default. Maybe both?

@marcomagdy
Copy link
Contributor

Thanks. I'll have a PR with a fix up shortly.

@dallasanta
Copy link

So the final fix for this is to make sure segment_name is not empty (#10520)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants