-
Notifications
You must be signed in to change notification settings - Fork 27
Distributed Tracing
Distributed tracing is quite a useful feature to have in a micro-services environment. Phantom's support for this is based on Twitter's Zipkin which is an open source implementation of Google Dapper paper. Zipkin is implemented in Scala. Phantom uses Brave Java distributed tracing implementation that is compatible with Zipkin. The Collector, Query and Web interfaces of Zipkin are deployed as-is following the : Zipkin Install.
The tracing instrumentation using Brave libraries in Phantom is as shown below:
Phantom can participate in a distributed trace (or initiate one) where-in all calls to deployed handlers are instrumented to emit spans. The core libraries take care of copying relevant span information across threads (Netty worker, Hystrix command processor) and this support extends to Http, Thrift and Command protocols. For Http, the trace and parent span information is also copied onto Http request headers so that downstream services invoked by Phantom handlers can also participate in the trace initiated by Phantom or its client.
The following core components in Phantom are instrumented in order to support tracing:
- Socket Listeners i.e. Netty Channel Handlers - these components are instrumented to start a new Server trace, disable tracing for the request or participate in a trace that was initiated by the client application. Also takes care of copying active trace related information to handler requests that originate here.
- Executors i.e. Hystrix Command wrappers for handlers - these components are instrumented to emit client traces for all outgoing calls to services. The Http implementation additionally also copies active trace information onto request headers.
- Task context (as described in Nesting calls) - copies active trace information to nested calls on handlers when
TaskResult executeCommand(String commandName, byte[] data, Map<String,String> params)
is called. A sample trace with nested calls is shown below:
Phantom also provides support for initializing traces in Java Servlet containers. This is available as a servlet filter ServletTraceFilter
that may be suitably initialized and configured using standard filter definitions in web.xml
deployment configuration files. This feature is useful when tracing Web APIs using Phantom as explained below:
- Edit the web application/API
web.xml
and register Phantom web app event listener (WebContextLoaderListener
) as follows:
<listener>
<listener-class>com.flipkart.phantom.runtime.impl.spring.web.WebContextLoaderListener</listener-class>
</listener>
This listener locates a file by name common-web-config.xml
in the configuration deployment directory i.e. /resources/external
, and if present, creates a Spring Application Context containing common proxy and web beans that would serve as parent for the concerned web application. common-web-config.xml
contains the tracing filter declaration and other event producer-consumer beans required by tracing. A sample is available here : Web API Tracing
- Again edit the web application/API
web.xml
and add the SpringDelegatingFilterProxy
as a filter and specify URL pattern mapping:
<filter>
<filter-name>ServletTraceFilter</filter-name>
<filter-class>org.springframework.web.filter.DelegatingFilterProxy</filter-class>
</filter>
<filter-mapping>
<filter-name>ServletTraceFilter</filter-name>
<url-pattern>/apis</url-pattern>
</filter-mapping>
This will ensure that all requests to the URL pattern /apis
will be intercepted and traced by the Phantom tracing filter.
Phantom uses the Trooper Event framework libraries to publish service proxy events to logical VM endpoints like evt://com.flipkart.phantom.events.HTTP_HANDLER
. The events contain useful information about Hystrix events like Success, Failure, Fallback, Thread pool rejections etc.
The events are consumed from endpoints by consumers like the RequestLogger
that logs the events to file.
The tracing implementation i.e. a Brave SpanCollector
piggybacks on this implementation to emit service proxy events containing Zipkin Span
information to the logical VM endpoint evt://com.flipkart.phantom.events.TRACING_COLLECTOR
. This indirection provides a useful abstraction to span emitting and collection/forwarding.
The PushToZipkinEventConsumer
class is an Event consumer that consumes the emitted Span data from evt://com.flipkart.phantom.events.TRACING_COLLECTOR
endpoint and forwards it to any Thrift interface that can receive and store Span data such as the Brave Flume agent or the Zipkin Collector process.
A sample configuration for using this span collector is available in Http proxy with tracing as shown below:
<bean id="zipkinCollector" class="com.flipkart.phantom.event.consumer.PushToZipkinEventConsumer">
<property name="requestLogger" ref="commonRequestLogger"/>
<property name="spanCollector">
<bean class="com.flipkart.phantom.task.impl.collector.DelegatingZipkinSpanCollector">
<property name="zipkinCollectorHost" value="localhost"/>
<property name="zipkinCollectorPort" value="9410"/>
</bean>
</property>
<property name="subscriptions">
<list>
<value>evt://com.flipkart.phantom.events.TRACING_COLLECTOR</value>
</list>
</property>
</bean>
Sampling of requests is essential for low overhead distributed tracing as described by Zipkin and Dapper. It is also necessary to leave tracing instrumentation turned-on by default in production deployments but incur minimal overheads in data capture and storage. Sampling is controlled using one or both mechanisms described below:
- For traces originating outside Phantom (say in the client app) - In this scenario, the trace in initiated externally and Phantom only participates in it by emitting spans for handler calls. Tracing can be switched on/off by setting the request header attribute :
X-B3-Sampled
totrue
orfalse
. Additionally each deployed handler may be configured with a BraveTraceFilter
implementation to override emitting spans for that handler. - For traces originating within Phantom (say when a Http, Thrift or Command proxy is deployed) - This is controlled by configuring a Brave
TraceFilter
implementation on Phantom's protocol specific Netty channel handlers i.e.RoutingHttpChannelHandler
or one of its sub-types,AsyncCommandProcessingChannelHandler
,CommandProcessingChannelHandler
,ThriftChannelHandler
. Tracing is turned-off by default for these handlers. It may be turned on with a suitable implementation like the Brave Zookeeper based tracing filter :ZooKeeperSamplingTraceFilter
or a simpleFixedSampleRateTraceFilter
in Http proxy with tracing example as shown below:
<bean id="httpRequestHandler" class="com.flipkart.phantom.runtime.impl.server.netty.handler.http.HttpChannelHandler" scope="prototype">
<property name="defaultChannelGroup" ref="defaultChannelGroup"/>
<property name="repository" ref="httpProxyRepository"/>
<property name="defaultProxy" value="defaultProxy" />
<property name="eventProducer" ref="serviceProxyEventProducer"/>
<property name="eventDispatchingSpanCollector" ref="eventDispatchingSpanCollector"/>
<property name="traceFilter">
<!-- Trace every call -->
<bean class="com.github.kristofa.brave.FixedSampleRateTraceFilter">
<constructor-arg index="0" value="1"/>
</bean>
</property>
</bean>