Question or Bug? #88

feifeiiiiiiiiiii · 2017-03-24T05:43:29Z

I use restify and redis to test zipkin-js, script as follow

const restify = require('restify');
const {Tracer, ExplicitContext, ConsoleRecorder, BatchRecorder} = require('zipkin');
const zipkinMiddleware = require('zipkin-instrumentation-restify').restifyMiddleware;
const zipkinClient = require('zipkin-instrumentation-redis');
const Redis = require('redis');

const redisConnectionOptions = {
  host: '192.168.4.114',
  port: 6379
};

const ctxImpl = new ExplicitContext();
const recorder = new ConsoleRecorder();
const tracer = new Tracer({ctxImpl, recorder});

const redis = zipkinClient(tracer, Redis, redisConnectionOptions, "redis");
const app = restify.createServer();

app.use(zipkinMiddleware({
  tracer,
  serviceName: 'service-a'
}));

app.get('/echo/:name', function (req, res, next) {
  const name = req.params.name
  redis.set(name, name, (err, data) => {
    redis.get(name, (err, data) => {
      res.send(req.params)
      return next()
    })
  });
});

app.listen(9999, function () {
  console.log('%s listening at %s', app.name, app.url);
});

I think result is not right

The text was updated successfully, but these errors were encountered:

eirslett · 2017-03-24T08:42:40Z

Try with CLSContext? With ExplicitContext you have to keep track of the context manually/explicitly.

feifeiiiiiiiiiii · 2017-03-24T08:47:58Z

@eirslett Use CLSContext also not right, I doubt currentCtx not right

feifeiiiiiiiiiii · 2017-03-24T10:11:30Z

@eirslett I try to modify CLSContext's scoped function patch

 scoped(callable) {
     return callable();
 }

and get result look like right, and I think CLSContext _session should save the newest ctx, but now can't get newest ctx

please help me to solve the problem

eirslett · 2017-03-24T14:46:57Z

The way it works, is that the context has a value outside of a scope, inside a scope you can set a new value, and that new value will be used until the end of the scope - after that, the old value will be back.

feifeiiiiiiiiiii · 2017-03-24T15:37:30Z

@eirslett how best solve the problem?

ppg · 2017-03-26T23:32:00Z

This seems like a fundamental flaw of the restify and express instrumentations:

zipkin-js/packages/zipkin-instrumentation-restify/src/restifyMiddleware.js

Line 99 in 9806955

next();

zipkin-js/packages/zipkin-instrumentation-express/src/expressMiddleware.js

Line 99 in 9806955

next();

I think the belief here was that since next() is called inside the tracer.scoped that all subsequent middlewares and the request handler will be within that scope. That's not how middleware works though; the non-asynchronous path (i.e. redis.set above) is called inside that scope and therefore gets the right nested span; but the execution of the handler (which is what next essentially is in the example above) continues on and finishes before the redis.set callback; specifically before it calls next() (or res.send more appropriately) in the example above.

To make it concrete:

The restify middleware arrives at L99 (the first link) with tracer having created a new scoped span; let's call this T1/P1/S1 and its in the context.

The request handler calls redis.set which calls, synchronously,

zipkin-js/packages/zipkin-instrumentation-redis/src/zipkinClient.js

Lines 51 to 60 in 9806955

    
           const callback = args.pop(); 
        
           let id; 
        
           tracer.scoped(() => { 
        
             id = tracer.createChildId(); 
        
             tracer.setId(id); 
        
             commonAnnotations(method); 
        
           }); 
        
           const wrapper = mkZipkinCallback(callback, id); 
        
           const newArgs = [...args, wrapper]; 
        
           actualFn.apply(this, newArgs);

and creates a new child span from the current context; i.e. T1/P1/S2. This is captured for the redis call, but the context is unwound when redis.set synchronously returns.

The redis client library starts the call out to the redis server asynchronously but continues it's synchronously flow and returns execution to the request handler.
redis.set is the last synchronous call in the request handler, so it also returns execution to L99 in the instrumentation.
The restify middleware has completed its call to next() (even though there is async work pending it has no hook into that) and exits the scope, thus destroying the context on tracer.
The redis call completes and calls back to the wrapper, which calls back to the request handler.
The request handler now calls redis.get following the same path synchronously (to the callback at least) as in step 2; however at this point the tracer has destroyed its scoped context since next()'s synchronous path has completed.; therefore it gets a new trace id T3/P3/S3, completely ignorant of the original one.

As far as fixing, I don't think the current approach can ever work. next on express for sure (and restify as it appears) doesn't wrap all subsequent calls, it only gives a hook to signal them; any asynchronous work will always be outside the scope of the tracer. The only fix would be to ensure all middlewares capture the tracer's context within the synchronous path and inject into its asynchronous work; there's no way one could get all middlewares to do that.

Can we not approach this by passing around an explicit context object on the request object? That seems much easier to reckon about, although would require updating clients like redis to know about the context.

codefromthecrypt · 2017-03-27T01:12:56Z

I think I get what the concern is: how do we get async callbacks to honor the scope we think they ought to be in? I think this is an implicit (no pun intended) challenge in the act of instrumenting. Each library will behave differently. The way I have recently thought of it is this..

When instrumenting, it is the instrumentation author's job to propagate the trace identifiers between the request and the response and error callbacks.

Usually this is very explicit. ie a constant is defined and used directly when constructing the callbacks.
iotw this doesn't need to use a Context, if it helps, it can.

In cases where you want downstream code to "see" the trace identifiers, you must scope them.

think of using the Context as publishing the trace identifiers. If someone is using CLS, they will be able to see them.

We've run as far as we can without tests

propagation is very easy to get wrong, and while insightful github issue comments aren't a way to keep behavior in tact. I'd love to see a volunteer backfill tests such that propagation malpractice is a build break vs a product of experience.

disclaimer.. I help maintain, but don't use any of this code, so I might be mildly off.. feelfree to correct me if I've misunderstood anything

feifeiiiiiiiiiii · 2017-03-27T01:52:40Z

@ppg @adriancole thanks for yours detailed answer

ppg · 2017-04-06T05:30:52Z

@adriancole thanks for the response. I don't think we're in any disagreement, but for your item 2, per the example @feifeiiiiiiiiiii provided, using CLS with scoped doesn't actually scope it and downstream code won't be able to 'see' it, only synchronous code will 'see' it. Its hard (for me) to imagine many uses of express that don't have some asynchronous behavior in their middleware and endpoints, so out of the box the express middleware isn't going to work very well for them. The only solution is every single middleware would need to call .scoped, which aside from being tedious, is unrealistic if you don't own all the middlewares explicitly.

It seems like a better approach for express (and likely restify) would be to add a concept of the context onto either the request, or response.locals; it is after all where express says to put per request scoped objects, which is what the TraceId is. Then when the context, or more appropriately span, is needed you could grab from there and either pass into context/span aware clients, or create child spans to perform work.

codefromthecrypt · 2017-04-06T06:33:50Z

I think perhaps the interactions around "scoped" need more exploration. I don't know how scheduling works here. Maybe we can get more specific, like raise a test which has a user-provided code which needs to add to a trace, and show how scoping doesn't work? It would be faster for me to help as I am not as good in js as others..

One thing I'd like to see is if there's an instrumentation approach we can use that prevents users from needing to know special knowledge about a library specific request or response scope. Not saying we won't end up needing that out, just want to see concretely where it ends..

PS I put a few words here and happy to have help elaborating the general problem https://docs.google.com/document/d/16byriP7jCi2xmLf8IveTTy5ttoGFgd9hn8hg_QkfMTU/edit#heading=h.burk117bfcxf

feifeiiiiiiiiiii closed this as completed Mar 24, 2017

feifeiiiiiiiiiii reopened this Mar 24, 2017

evan-scott-zocdoc mentioned this issue Apr 25, 2018

CLSContext now supports continuations with async-await #201

Closed

feifeiiiiiiiiiii closed this as completed Nov 11, 2018

kenspirit mentioned this issue Dec 12, 2018

Should TraceId be set to express req.headers or res.locals if it's empty at the beginning? #318

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question or Bug? #88

Question or Bug? #88

feifeiiiiiiiiiii commented Mar 24, 2017 •

edited

Loading

eirslett commented Mar 24, 2017

feifeiiiiiiiiiii commented Mar 24, 2017

feifeiiiiiiiiiii commented Mar 24, 2017

eirslett commented Mar 24, 2017

feifeiiiiiiiiiii commented Mar 24, 2017 •

edited

Loading

ppg commented Mar 26, 2017

codefromthecrypt commented Mar 27, 2017

feifeiiiiiiiiiii commented Mar 27, 2017 •

edited

Loading

ppg commented Apr 6, 2017

codefromthecrypt commented Apr 6, 2017

Question or Bug? #88

Question or Bug? #88

Comments

feifeiiiiiiiiiii commented Mar 24, 2017 • edited Loading

eirslett commented Mar 24, 2017

feifeiiiiiiiiiii commented Mar 24, 2017

feifeiiiiiiiiiii commented Mar 24, 2017

eirslett commented Mar 24, 2017

feifeiiiiiiiiiii commented Mar 24, 2017 • edited Loading

ppg commented Mar 26, 2017

codefromthecrypt commented Mar 27, 2017

feifeiiiiiiiiiii commented Mar 27, 2017 • edited Loading

ppg commented Apr 6, 2017

codefromthecrypt commented Apr 6, 2017

feifeiiiiiiiiiii commented Mar 24, 2017 •

edited

Loading

feifeiiiiiiiiiii commented Mar 24, 2017 •

edited

Loading

feifeiiiiiiiiiii commented Mar 27, 2017 •

edited

Loading