Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question or Bug? #88

Closed
feifeiiiiiiiiiii opened this issue Mar 24, 2017 · 10 comments
Closed

Question or Bug? #88

feifeiiiiiiiiiii opened this issue Mar 24, 2017 · 10 comments

Comments

@feifeiiiiiiiiiii
Copy link
Contributor

feifeiiiiiiiiiii commented Mar 24, 2017

I use restify and redis to test zipkin-js, script as follow

const restify = require('restify');
const {Tracer, ExplicitContext, ConsoleRecorder, BatchRecorder} = require('zipkin');
const zipkinMiddleware = require('zipkin-instrumentation-restify').restifyMiddleware;
const zipkinClient = require('zipkin-instrumentation-redis');
const Redis = require('redis');

const redisConnectionOptions = {
  host: '192.168.4.114',
  port: 6379
};

const ctxImpl = new ExplicitContext();
const recorder = new ConsoleRecorder();
const tracer = new Tracer({ctxImpl, recorder});

const redis = zipkinClient(tracer, Redis, redisConnectionOptions, "redis");
const app = restify.createServer();

app.use(zipkinMiddleware({
  tracer,
  serviceName: 'service-a'
}));

app.get('/echo/:name', function (req, res, next) {
  const name = req.params.name
  redis.set(name, name, (err, data) => {
    redis.get(name, (err, data) => {
      res.send(req.params)
      return next()
    })
  });
});

app.listen(9999, function () {
  console.log('%s listening at %s', app.name, app.url);
});

I think result is not right

2017-03-24 1 32 20

@eirslett
Copy link
Contributor

Try with CLSContext? With ExplicitContext you have to keep track of the context manually/explicitly.

@feifeiiiiiiiiiii
Copy link
Contributor Author

@eirslett Use CLSContext also not right, I doubt currentCtx not right

@feifeiiiiiiiiiii
Copy link
Contributor Author

@eirslett I try to modify CLSContext's scoped function patch

 scoped(callable) {
     return callable();
 }

and get result look like right, and I think CLSContext _session should save the newest ctx, but now can't get newest ctx

please help me to solve the problem

@eirslett
Copy link
Contributor

The way it works, is that the context has a value outside of a scope, inside a scope you can set a new value, and that new value will be used until the end of the scope - after that, the old value will be back.

@feifeiiiiiiiiiii
Copy link
Contributor Author

feifeiiiiiiiiiii commented Mar 24, 2017

@eirslett how best solve the problem?

@ppg
Copy link

ppg commented Mar 26, 2017

This seems like a fundamental flaw of the restify and express instrumentations:


I think the belief here was that since next() is called inside the tracer.scoped that all subsequent middlewares and the request handler will be within that scope. That's not how middleware works though; the non-asynchronous path (i.e. redis.set above) is called inside that scope and therefore gets the right nested span; but the execution of the handler (which is what next essentially is in the example above) continues on and finishes before the redis.set callback; specifically before it calls next() (or res.send more appropriately) in the example above.

To make it concrete:

  1. The restify middleware arrives at L99 (the first link) with tracer having created a new scoped span; let's call this T1/P1/S1 and its in the context.
  2. The request handler calls redis.set which calls, synchronously,
    const callback = args.pop();
    let id;
    tracer.scoped(() => {
    id = tracer.createChildId();
    tracer.setId(id);
    commonAnnotations(method);
    });
    const wrapper = mkZipkinCallback(callback, id);
    const newArgs = [...args, wrapper];
    actualFn.apply(this, newArgs);
    and creates a new child span from the current context; i.e. T1/P1/S2. This is captured for the redis call, but the context is unwound when redis.set synchronously returns.
  3. The redis client library starts the call out to the redis server asynchronously but continues it's synchronously flow and returns execution to the request handler.
  4. redis.set is the last synchronous call in the request handler, so it also returns execution to L99 in the instrumentation.
  5. The restify middleware has completed its call to next() (even though there is async work pending it has no hook into that) and exits the scope, thus destroying the context on tracer.
  6. The redis call completes and calls back to the wrapper, which calls back to the request handler.
  7. The request handler now calls redis.get following the same path synchronously (to the callback at least) as in step 2; however at this point the tracer has destroyed its scoped context since next()'s synchronous path has completed.; therefore it gets a new trace id T3/P3/S3, completely ignorant of the original one.

As far as fixing, I don't think the current approach can ever work. next on express for sure (and restify as it appears) doesn't wrap all subsequent calls, it only gives a hook to signal them; any asynchronous work will always be outside the scope of the tracer. The only fix would be to ensure all middlewares capture the tracer's context within the synchronous path and inject into its asynchronous work; there's no way one could get all middlewares to do that.

Can we not approach this by passing around an explicit context object on the request object? That seems much easier to reckon about, although would require updating clients like redis to know about the context.

@codefromthecrypt
Copy link
Member

I think I get what the concern is: how do we get async callbacks to honor the scope we think they ought to be in? I think this is an implicit (no pun intended) challenge in the act of instrumenting. Each library will behave differently. The way I have recently thought of it is this..

  1. When instrumenting, it is the instrumentation author's job to propagate the trace identifiers between the request and the response and error callbacks.
  • Usually this is very explicit. ie a constant is defined and used directly when constructing the callbacks.
  • iotw this doesn't need to use a Context, if it helps, it can.
  1. In cases where you want downstream code to "see" the trace identifiers, you must scope them.
  • think of using the Context as publishing the trace identifiers. If someone is using CLS, they will be able to see them.
  1. We've run as far as we can without tests
  • propagation is very easy to get wrong, and while insightful github issue comments aren't a way to keep behavior in tact. I'd love to see a volunteer backfill tests such that propagation malpractice is a build break vs a product of experience.

disclaimer.. I help maintain, but don't use any of this code, so I might be mildly off.. feelfree to correct me if I've misunderstood anything

@feifeiiiiiiiiiii
Copy link
Contributor Author

feifeiiiiiiiiiii commented Mar 27, 2017

@ppg @adriancole thanks for yours detailed answer

@ppg
Copy link

ppg commented Apr 6, 2017

@adriancole thanks for the response. I don't think we're in any disagreement, but for your item 2, per the example @feifeiiiiiiiiiii provided, using CLS with scoped doesn't actually scope it and downstream code won't be able to 'see' it, only synchronous code will 'see' it. Its hard (for me) to imagine many uses of express that don't have some asynchronous behavior in their middleware and endpoints, so out of the box the express middleware isn't going to work very well for them. The only solution is every single middleware would need to call .scoped, which aside from being tedious, is unrealistic if you don't own all the middlewares explicitly.

It seems like a better approach for express (and likely restify) would be to add a concept of the context onto either the request, or response.locals; it is after all where express says to put per request scoped objects, which is what the TraceId is. Then when the context, or more appropriately span, is needed you could grab from there and either pass into context/span aware clients, or create child spans to perform work.

@codefromthecrypt
Copy link
Member

I think perhaps the interactions around "scoped" need more exploration. I don't know how scheduling works here. Maybe we can get more specific, like raise a test which has a user-provided code which needs to add to a trace, and show how scoping doesn't work? It would be faster for me to help as I am not as good in js as others..

One thing I'd like to see is if there's an instrumentation approach we can use that prevents users from needing to know special knowledge about a library specific request or response scope. Not saying we won't end up needing that out, just want to see concretely where it ends..

PS I put a few words here and happy to have help elaborating the general problem https://docs.google.com/document/d/16byriP7jCi2xmLf8IveTTy5ttoGFgd9hn8hg_QkfMTU/edit#heading=h.burk117bfcxf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants