Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

OpenTracing API & IPFS #277

Open
frrist opened this issue Jan 9, 2018 · 4 comments
Open

OpenTracing API & IPFS #277

frrist opened this issue Jan 9, 2018 · 4 comments

Comments

@frrist
Copy link
Member

frrist commented Jan 9, 2018

OpenTracing API

Goals from reading:

  • Understand what OpenTracing is and how its works at a high level
  • Understand what a Tracer is and how it works at a high level
  • Understand how OpenTracing is Instramented in the go-ipfs code base
  • Understand how Tracers can be instramented with the go-ipfs code base

Why use OpenTracing?

Tracing can reconstruct the different journeys operations take as they propagate throughout a system. OpenTracing is a standard mechanism used to instrument tracing on a system without binding to any particular tracing vendor. This allows every component of a distributed system to be instrumented in isolation and for different tracing technologies to be swapped out via a single configuration change [1].

Some overloaded terms

  • Tracing
    • Tracing involves a specialized use of logging to record information about a program's execution. [2]
  • Tracer
    • A tracing system, Jaeger, Zipkin, Appdash, etc.
  • Trace
    • The story of a transaction or workflow as it propagates through a (potentially distributed) system. [3]

What is OpenTracing?

What should I know about OpenTracing?

Terminology & Concepts

Everthing below is covered in the above links, this is an attempt at a summary

Trace

  • Directed Acyclic Graph consisting of Spans
  • Each edged between a Spans is a Reference

Span

  • Operation Name
  • Start Timestamp
  • End Timestamp
  • Set of zero or more Span Tags ⚠️ (should be added carefully)
  • Set of zero or more Span Logs
    • Key Value pairs + a Timestamp
    • Think file or or errorMessage
  • Set of zero or more References
    • Model directed causal relationships between a child Span and a parent Span
  • SpanContext
    • Any OpenTracing-implementation-dependent state needed to refer to a distinct Span across a process boundaries
  • Baggage Item
    • Key Value pairs that cross process boundaries

Reference

Model direct casual relationships between a child Span and a parent Span

  • ChildOf
    • The parent Span depends on the child Span in some capacity.
  • FollowsFrom
    • The parent Span does not depend in any way on the result of the child Span

Tracer

  • A Tracer is an Interface that creates Span's and understands how to Inject and Extract them across process boundaries (Carrier)
  • Capabilities
    • Start new Span
    • Inject SpanContext into a carrier
    • Extract SpanContext from a carrier
  • Common Tracers

Carrier

TL;DR: A Trace is the story of an actions execution written and viewed with a Tracer where each chapter is a Span

Pictures and Diagrams

Causal relationships between Spans in a single Trace:
        [Span A]  ←←←(the root span)
            |
     +------+------+
     |             |
 [Span B]      [Span C] ←←←(Span C is a `ChildOf` Span A)
     |             |
 [Span D]      +---+-------+
               |           |
           [Span E]    [Span F] >>> [Span G] >>> [Span H]
                                       ↑
                                       ↑
                         (Span G `FollowsFrom` Span F)
Temporal relationships between Spans in a single Trace:
––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|> time
 [Span A···················································]
   [Span B··············································]
      [Span D··········································]
    [Span C········································]
         [Span E·······]        [Span F··] [Span G··] [Span H··]
Trace JSON snippet:
  "data": [
    {
      "traceID": "925650bd537d3c3",
      "spans": [
        {
          "traceID": "925650bd537d3c3",
          "spanID": "5f932c2028df4d75",
          "flags": 1,
          "operationName": "AddFile",
          "references": [
            {
              "refType": "CHILD_OF",
              "traceID": "925650bd537d3c3",
              "spanID": "925650bd537d3c3"
            }
          ],
          "startTime": 1515512927406588,
          "duration": 57,
          "tags": [
            {
              "key": "component",
              "type": "string",
              "value": "coreunix"
            }
          ],
          "logs": [
            {
              "timestamp": 1515512927406644,
              "fields": [
                {
                  "key": "file",
                  "type": "string",
                  "value": "img"
                }
              ]
            },
            {
              "timestamp": 1515512927406645,
              "fields": [
                {
                  "key": "duration",
                  "type": "string",
                  "value": "55.764µs"
                }
              ]
            }
          ],
          "processID": "p1",
          "warnings": null
        },
        {...Span...},
        {...Span...},
        ],
        "processes": {
        "p1": {
          "serviceName": "IPFS-NODE-ID",
          "tags": [
            {
              "key": "hostname",
              "type": "string",
              "value": "blue.travis.fyi"
            },
            {
              "key": "ip",
              "type": "string",
              "value": "192.168.0.200"
            },
            {
              "key": "jaeger.version",
              "type": "string",
              "value": "Go-2.11.1dev0"
            },
        {},
        {},
        ]

How Does go-ipfs Instrument OpenTracing?

Plugin Interface

(currently in feat/opentrace branch)

github.com/ipfs/go-ipfs/plugin/tracer.go

// PluginTracer is an interface that can be implemented to add a tracer plugin
type PluginTracer interface {
        Plugin
        InitTracer() (opentracing.Tracer, error)
}

go-jaeger-plugin

go-jaeger-plugin implements the PluginTracer Interface, and configures a tracer

github.com/ipfs/go-jaeger-plugin/plugin/jaeger.go

func (*jaegerPlugin) InitGlobalTracer() (opentracing.Tracer, error) {

        tracerCfg := &config.Configuration{
                Sampler: &config.SamplerConfig{
                        // Type specifies the type of the sampler: const, probabilistic, rateLimiting, or remote
                        Type: "const",

                        // "const" sampler, 0 or 1 for always false/true respectively
                        // "probabilistic" sampler, a probability between 0 and 1
                        // "rateLimiting" sampler, the number of spans per second
                        // "remote" sampler, param is the same as for "probabilistic" and indicates the initial
                        //  sampling rate before the actual one is received from the mothership
                        Param: 1,

                        // SamplingServerURL is the address of jaeger-agent's HTTP sampling server
                        // SamplingServerURL: "",

                        // MaxOperations is the maximum number of operations that the sampler
                        // will keep track of. If an operation is not tracked, a default probabilistic
                        // sampler will be used rather than the per operation specific sampler.
                        // MaxOperations: 0,

                        // SamplingRefreshInterval controls how often the remotely controlled sampler will poll
                        // jaeger-agent for the appropriate sampling strategy.
                        // SamplingRefreshInterval: 0,
                },

                Reporter: &config.ReporterConfig{
                        // QueueSize controls how many spans the reporter can keep in memory before it starts
                        // dropping new spans. The queue is continuously drained by a background go-routine,
                        // as fast as spans can be sent out of process
                        // QueueSize: 0,

                        // BufferFlushInterval controls how often the buffer is force-flushed, even if it's not full.
                        // It is generally not useful, as it only matters for very low traffic services.
                        // BufferFlushInterval: 0,

                        // LogSpans, when true, enables LoggingReporter that runs in parallel with the main reporter
                        // and logs all submitted spans. Main Configuration.Logger must be initialized in the code for
                        // this option to have any effect.
                        LogSpans: true,

                        // LocalAgentHostPort instructs reporter to send spans to jaeger-agent at this address
                        LocalAgentHostPort: "",
                },
        }
        // New creates a new Jaeger Tracer, and a closer func that can be used to flush buffers before shutdown.
        //TODO: Get the actual IPFS Node Id
        //TODO: Handle closer gracefully
        tracer, _, err := tracerCfg.New("IPFS-NODE-ID")
        if err != nil {
                return nil, err
        }
        return tracer, nil
}

Me made a tracer!

go-ipfs/plugins Initializer

(currently in feat/opentrace branch)

The plugins initializer in go-ipfs will call the tracers InitTracer method and set the returned tracer as the GlobalTracer via an OpenTracing API call.

github.com/ipfs/go-ipfs/plugin/loader/initializer.go

func runTracerPlugin(pl plugin.PluginTracer) error {
        tracer, err := pl.InitTracer()
        if err != nil {
                return err
        }
        // SetGlobalTracer sets the [singleton] opentracing.Tracer returned by
        // GlobalTracer(). Those who use GlobalTracer (rather than directly manage an
        // opentracing.Tracer instance) should call SetGlobalTracer as early as
        // possible in main(), prior to calling the `StartSpan` global func below.
        // Prior to calling `SetGlobalTracer`, any Spans started via the `StartSpan`
        // (etc) globals are noops.
        opentracing.SetGlobalTracer(tracer)
        return nil
}

We set the tracer as the GlobalTracer!

ServeHTTP (Root Span Created)

(currently in feat/opentrace branch)

When a request comes into the ServeHTTP method, a rootSpan is created, tagged accordingly, and Finish'd when handling of the request is complete.

func (i internalHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
        ...
        rootSpan := opentracing.StartSpan("ServeHTTP")
        defer rootSpan.Finish()

        otExt.HTTPUrl.Set(rootSpan, r.URL.String())
        otExt.HTTPMethod.Set(rootSpan, r.Method)
        rootSpan.SetTag("requestId", reqId["requestId"])

        // All spans created after this point will be children of rootSpan
        ctx = opentracing.ContextWithSpan(ctx, rootSpan)
        ...  
}
 

We created a rootSpan and associated it with a context!

go-log

Spans are created when Events are done. Because we set the tracer as GlobalTracer earlier opentracing will create all traces using that tracer. Since we previously associated a Span with a context in ServeHTTP any Spans created here that have been associated with that context will have references to other Spans created during the request.

func (el *eventLogger) EventBegin(ctx context.Context, event string, metadata ...Loggable) *EventInProgress {
        ...
        // StartSpanFromContext starts and returns a Span with `operationName`, using
        // any Span found within `ctx` as a ChildOfRef. If no such parent could be
        // found, StartSpanFromContext creates a root (parentless) Span.
        span, ctx := opentrace.StartSpanFromContext(ctx, event)
        ...
        eip.doneFunc = func(additional []Loggable) {
                ...
                // if the GlobalTracer == noopTracer stahp
                if traceingDisabled() {
                        return
                }
                // SpanTag the system
                otExt.Component.Set(span, el.system)
                for _, m := range metadata {
                        for l, v := range m.Loggable() {
                                if l == "error" {
                                        //SpanTag the error
                                        otExt.Error.Set(span, true)
                                }
                                //extract field to SpanLog
                                f := getOpentracingField(l, v)
                                span.LogFields(f)
                        }
                }
                //when the event-log done function is called finished the Span
                span.Finish()
        }
        ...
}

We can log with go-log and generate spans for traces with the GlobalTracer!

What does this look like in practice?

  • Jaeger Dashboard UI
    Jaeger Dashboard
  • Trace of ipfs add -r dir/ command
    Trace of ipfs add -r dir/
  • Expanded Spans with metadata
    Outter Trace with Spans
  • Expanded rootSpan with metadata
    Span with Tags and Logs
  • Expanded Span with error
    selection_005

Where do we go next?

  • Carrier for SpanContext propagation across IPFS nodes (Currently only works on single node)
    • Auxiliary field in protobuf maybe?
  • Improve Event logging in critical paths
    • Event logs on exported Interface methods could make comparing different implementations easier
  • Support for different implementations
    • Opentracing has only been instrumented on go-ipfs
  • Performance Considerations
    • Play with sampling config
  • Consider adding to CI runs?
  • Given that we can get the trace json object and add it to ipfs, we could consider creating an ipfs webapp to render them.
  • ⚠️ What should be a SpanLog vs. SpanTag ⚠️ this is a place where ambiguity can occur.
@daviddias
Copy link
Member

@jacobheun, @alanshaw getting this on your radar for the objectives in js-ipfs and js-libp2p error handling, monitoring and debugging.

@lanzafame
Copy link

@frrist What are your thoughts on using OpenCensus which provides both the OpenTracing API and tags and metrics?

@frrist
Copy link
Member Author

frrist commented Jul 12, 2018

OpenCensus looks really interesting (this is the first I've heard of it), it appears to do everything OpenTracing does, plus more. However, I haven't seen any information that says the OT and OC API's are compatible; I think they are similar concepts with different specs. I also don't see JavaScript listed under supported languages for OC, although there is a GH repo so maybe its on the way. What OT and OC would look like working together is a bit fuzzy to me - although I think it's probably doable.

Have you played around with OC at all?

@lanzafame
Copy link

However, I haven't seen any information that says the OT and OC API's are compatible

sorry, I misspoke, OC tracing outputs are compatible with OT backends, which is the 'promise' of OT that the code you write to do tracing doesn't lock you into a particular tracing backend, to that extent OC provides that same.

I also don't see JavaScript listed under supported languages for OC, although there is a GH repo so maybe its on the way.

There is currently node support, with frontend JS support on its way this year, see the paragraph Client Application Support.

What OT and OC would look like working together is a bit fuzzy to me - although I think it's probably doable.

Technically, you would not use OT. OC makes use of OT's tracing format.

Have you played around with OC at all?

Yes, I have been playing around with it in ipfs-cluster. I added OC stats to the go-libp2p-gorpc library. Check out the metrics.go and the server.go files for examples of how basic stats are recorded.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants