Use the route name to define the tracing span/operation name. #1406

objectiser · 2017-10-06T15:32:25Z

What this PR does / why we need it: Uses the route name as the span/operation name in the tracing data.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:
If this PR is accepted, I would propose doing a separate PR with an update to the bookinfo example to show how individual routes could be named (using business relevant terms) based on specific REST endpoints.

Release note:

The operation/span name in the tracing data now uses the route name, allowing business relevant names to be associated with the tracing information. Individual routes could be defined for each REST endpoint, allowing them to be individually named.

istio-merge-robot · 2017-10-06T15:32:33Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
We suggest the following additional approver: greghanson

Assign the PR to them by writing /assign @greghanson in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

OWNERS

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

istio-testing · 2017-10-06T15:32:34Z

Hi @objectiser. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

objectiser · 2017-10-06T15:38:06Z

This is a followup to #1386 and istio/api#186 to use @rshriram idea for defining the operation/span name based on the route name. Some example images:

Jaeger screenshot of Bookinfo before route applied:

After route-rule-all-v1.yaml applied:

Same example (with routing rule) using Zipkin UI:

NOTE: The operation/span names are using the current route rule names (e.g. productpage-default) - but could define more specific routing rules with appropriate business relevant names.

istio-testing · 2017-10-06T15:38:22Z

@objectiser: you can't request testing unless you are a istio member.

In response to this:

This is a followup to #1386 and istio/api#186 to use @rshriram idea for defining the operation/span name based on the route name. Some example images:

Jaeger screenshot of Bookinfo before route applied:

After route-rule-all-v1.yaml applied:

Same example (with routing rule) using Zipkin UI:

NOTE: The operation/span names are using the current route rule names (e.g. productpage-default) - but could define more specific routing rules with appropriate business relevant names.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rshriram · 2017-10-06T17:31:29Z

/ok-to-test

rshriram · 2017-10-06T17:34:29Z

proxy/envoy/config.go

+					} else if config.ConfigMeta.Name != "" {
+						route = buildInboundRoute(config, rule, cluster)
+					}
+					if route != nil {


Is this needed on inbound only? Why not use it in outbound as well?

The outbound is handled in buildHTTPRoute (route.go).

rshriram · 2017-10-06T17:45:09Z

you need to run ./bin/check.sh to format code

codecov · 2017-10-06T18:04:52Z

Codecov Report

Merging #1406 into master will increase coverage by 1%.
The diff coverage is 80%.

@@           Coverage Diff            @@
##           master    #1406    +/-   ##
========================================
+ Coverage   83.02%   84.03%    +1%     
========================================
  Files          52       52            
  Lines        6456     6764   +308     
========================================
+ Hits         5360     5684   +324     
+ Misses        894      877    -17     
- Partials      202      203     +1

Impacted Files	Coverage Δ
proxy/envoy/resources.go	`85.08% <ø> (ø)`	⬆️
proxy/envoy/config.go	`91.56% <100%> (-0.03%)`	⬇️
proxy/envoy/route.go	`93.54% <76.47%> (-1.31%)`	⬇️
platform/consul/monitor.go	`79.76% <0%> (-3.58%)`	⬇️
adapter/config/crd/conversion.go	`100% <0%> (ø)`	⬆️
model/validation.go	`97.69% <0%> (+0.73%)`	⬆️
adapter/config/memory/monitor.go	`89.18% <0%> (+2.7%)`	⬆️
platform/kube/inject/inject.go	`93.41% <0%> (+5.49%)`	⬆️
platform/kube/inject/initializer.go	`78.6% <0%> (+12.44%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13005f5...49784cd. Read the comment docs.

objectiser · 2017-10-06T18:16:02Z

/retest

rshriram · 2017-10-06T20:40:21Z

test/integration/routing.go

@@ -154,9 +155,34 @@ func (t *routing) verifyRouting(scheme, src, dst, headerKey, headerVal string,
 		}
 	}

+	if operation != "" {
+		errs = t.verifyDecorator(operation)
+	}


Sweet! Thanks for this!

rshriram · 2017-10-06T20:43:25Z

proxy/envoy/route.go

@@ -83,6 +83,22 @@ func buildInboundWebsocketRoute(rule *proxyconfig.RouteRule, cluster *Cluster) *
 	return route
 }

+func buildInboundRoute(config model.Config, rule *proxyconfig.RouteRule, cluster *Cluster) *HTTPRoute {
+	route := buildHTTPRouteMatch(rule.Match)


Can you just fold this and the buildInboundWebsocketRoute into one single function
buildInboundRoute ? You can then check
if rule.WebsocketUpgrade { and set the route.WebSocketUpgrade to true.. This way, the code is compact.. This should also kill the if-else loop that you have above in buildInboundListener

Why do we need to know rules at inbound location? I understand the hack for websockets was placed due to the fact that Envoy cannot multiplex regular HTTP with websockets, but we should not rely on that.

Its for tagging the traces with the operation names.. The thing is, multiple routes could lead to the same inbound route like / . In this case, the operation name becomes pointless right? @objectiser
You might as well have "inbound" as the operation name.

Not sure if pointless, but less useful - although using the current approach it still gives them the ability to define the span/operation name - even if all endpoints for the same destination use the same name (initially). But from there they then have the ability to refine the routes and provide more specific names.

rshriram

This looks good overall. Just one comment to deduplicate code. After that its good to go.

kyessenov · 2017-10-06T21:15:44Z

proxy/envoy/config.go

-
+						route = buildInboundWebsocketRoute(rule, cluster)
+					} else if config.ConfigMeta.Name != "" {
+						route = buildInboundRoute(config, rule, cluster)


why add another route and not change default route?

I think I'm concerned that we rely on route rules by destination here. We cannot be certain that the request originates from within the mesh (or from within the snapshot of the rules known at destination). If a request doest not fit any rule, then this logic does not apply.

My understanding is that there could be multiple routes here - if multiple routes are defined for different endpoints on the same destination? If so, then each endpoint should be configured with a different span/operation name.

kyessenov · 2017-10-06T21:20:24Z

proxy/envoy/route.go

+func buildDecorator(config model.Config) *Decorator {
+	if config.ConfigMeta.Name != "" {
+		return &Decorator{
+			Operation: config.ConfigMeta.Name,


Please use config.Key() or include the namespace. Names are not unique.

The intention is not to be unique but to enable users to specify a business relevant name that can be used in the tracing operation/span name.

objectiser · 2017-10-08T14:06:46Z

Modified version of the route-rule-all-v1.yaml:

apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: get-product
spec:
  destination:
    name: productpage
  precedence: 2
  route:
  - labels:
      version: v1
  match:
    request:
      headers:
         uri:
          exact: "/productpage"
---
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: login
spec:
  destination:
    name: productpage
  precedence: 3
  route:
  - labels:
      version: v1
  match:
    request:
      headers:
         uri:
          exact: "/login"
---
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: get-reviews
spec:
  destination:
    name: reviews
  precedence: 2
  route:
  - labels:
      version: v1
  match:
    request:
      headers:
         uri:
          prefix: "/reviews/"
---
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: get-ratings
spec:
  destination:
    name: ratings
  precedence: 2
  route:
  - labels:
      version: v1
  match:
    request:
      headers:
         uri:
          prefix: "/ratings/"
---
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: get-details
spec:
  destination:
    name: details
  precedence: 2
  route:
  - labels:
      version: v1
  match:
    request:
      headers:
         uri:
          prefix: "/details/"

With example image:

rshriram · 2017-10-09T17:00:47Z

@kyessenov thoughts?

kyessenov · 2017-10-09T20:39:15Z

Hi @objectiser ,
Can you give me some insight why you decided to use the inbound route instead of outbound route to record the rule name decorator? I'd assume that the client routing is responsible for creating the request connection between the client and the server, and it's the responsibility of the client to record the rule decorator in the trace metadata.

objectiser · 2017-10-09T21:17:16Z

@kyessenov Apologies as I haven't been working on the project for long, so may have misunderstood the terminology - but not sure what you mean by inbound and outbound rules.

From my perspective there are RouteRule and that information is being used to create the inbound and outbound listeners in Envoy - possibly that is what you are referring to? If so, then the route rule is being used to set the decorator (tracing operation name) on both the outbound (egress) and inbound (ingress) listeners.

The reason this is important is that it means the same operation name is being used in the client and server span (representing the same request) - so consistent naming.

rshriram · 2017-10-10T02:32:13Z

In general, (until your PR), RouteRules apply only on outbound. The only time we compute inbound rules is when we need to set websockets. What I am not clear about (may be I missed this in the conversation above), is what happens if I have two route rules, from two different services, pointing to the same path of the destination service? [and may be this is what @kyessenov was driving at]
for e.g.
from productpage, for reviews go to reviews:/getReviews
from foobar, for reviews/getAllReviews go to reviews:/getReviews

These rules will be applied on caller side and the decorators will be generated properly on productpage and foobar. But on the callee side/destination, you might end up overwriting the decorator (or possibly adding two Envoy routes for same /getReviews, but with different decorators).

objectiser · 2017-10-10T07:03:45Z

@rshriram Good point - you are correct this would be an issue. Another thing I think might be a problem is that using the route rule name means it wouldn't be possible to use a stable name (e.g. get-reviews) when having multiple rules for the same endpoint - each rule would need a different name.

As a possible solution:

I introduce an optional "operation" field on the RouteRule (possibly in a TracingInfo sub-message as discussed before)
If no 'operation' is defined, then the outbound rule will use the route rule name as the operation name in the tracing data - which should provide better clarity/info to users as they will clearly see which routes have been applied under which circumstances
If an 'operation' is defined, then apply the value to both the inbound and outbound rules - and in this scenario it is up to the user to use consistent naming for the same endpoint on the destination service when used from multiple sources (or even across multiple route rules for the same source/destination service).

Let me know whether this approach sounds reasonable. If so - I can either impl as one PR, or update this PR just to implement step 2 (i.e. route rule used on outbound rule), and then create separate PR for the optional operation?

rshriram · 2017-10-10T15:53:57Z

This sounds fine, except for the part about inferring the operation name on the inbound route. It still doesn't solve the multiple rules (or even no rule ones) arriving at the same inbound route. It almost feels like you need a global map of service name, operation name and path, and have the decorators act independent of route matching. Like having a decorator list in the virtual host, that does additional matching of path/prefix, in addition to route block doing the same. Which reminds me, didn't you have a PR in envoy to do just this? I vaguely recall that your initial version of PR in envoy was using a separate match block in Envoy (outside the confines of route).

cc @fabolive (in case you are interested in this discussion).

rshriram · 2017-10-10T15:55:19Z

and for the record, I totally support the decorators idea. I think it makes for a fantastic addition to the trace. Just trying to figure out how to handle the inbound part. Will the trace become mangled if the server does not add the decorator (i.e. only client adds the decorator?)

objectiser · 2017-10-10T16:45:30Z

@rshriram Yes that was how my original envoy PR worked - but was rejected because it created essentially a duplicate route matching definition - which at the time did appear to be an unnecessary overhead.

While discussing other options, I could create a separate PR for using the RouteRule name in outbound rule decorator - and check that it works fine in zipkin/jaeger. Sound reasonable?

rshriram · 2017-10-10T17:30:54Z

In retrospect, the duplication seems like an acceptable overhead if someone wants detailed tracing. Sorry, I didn't see the end to end picture there. Had I known that, I could have added my support as well :). In the interim, lets go with decorating outbound routes, as you proposed, and see how it turns up in Zipkin/Jaeger.

…

On Tue, Oct 10, 2017 at 12:45 PM, Gary Brown ***@***.***> wrote: @rshriram <https://github.com/rshriram> Yes that was how my original envoy PR worked - but was rejected because it created essentially a duplicate route matching definition - which at the time did appear to be an unnecessary overhead. While discussing other options, I could create a separate PR for using the RouteRule name in outbound rule decorator - and check that it works fine in zipkin/jaeger. Sound reasonable? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1406 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AH0qd9MiNBy6QCvegU3IdhSr0GYPUSHzks5sq58tgaJpZM4PwqGO> .

objectiser · 2017-10-11T14:09:17Z

@rshriram Unfortunately doesn't work properly in zipkin due to the shared span model - the operation name is not consistently set to the client span name.

Will have a think about alternatives - should this be closed for now?

rshriram · 2017-10-11T14:33:23Z

I see. How about we set a default decorator name for all inbound routes? Will that mess things up as well? Alternatively (and more complex), we could have a decorator selection algorithm for the inbound route which does the following:
a. If there is a specific path, that is not shared by other routes, we create a dedicated route in the inbound and add decorator
b. If a path is shared by multiple routes, then we select the first one from the sorted list
c. If its the default path (e.g., prefix: / ), then we add a default decorator

I am not sure how complicated the above code will turn out to be.

Also, it doesn't matter if operations don't appear properly in Zipkin, as long as this is additional data over what Zipkin gives by default. If Jaeger picks it up, then that still useful for folks using Jaeger. But if it messes up common zipkin usage, then we have to consider the alternatives.

objectiser · 2017-10-11T14:41:58Z

I tried setting a default blank decorator name for inbound, but that still caused inconsistent results - some spans with the route-rule name, some with the blank.

In terms of (a) and (b), this is effectively what this current implementation does, isn't it? If there is only a single route, then the name is defined, but if there were to be multiple applicable route rules, then the first would be selected.

In terms of zipkin usage - the result would be some spans having the route name, while others have the service name (as now).

objectiser · 2017-10-11T14:47:30Z

Or is the point of your algorithm that we just focus on exact path matches, to reduce the potential for conflict? If so, might require a future PR to enable exact matches to deal with path parameters (possibly by specifying some form of path template).

rshriram · 2017-10-11T16:08:19Z

hmm.. so using a dummy "default" name for inbound also causes inconsistent results in Zipkin? Would it be possible to provide a screen shot?

with regard to the options a/b/c, you are still missing the operation name for the default route..

case model.ProtocolHTTP, model.ProtocolHTTP2, model.ProtocolGRPC:
 			defaultRoute := buildDefaultRoute(cluster)```

rshriram · 2017-10-11T16:16:47Z

Lost my comment.. the code you ahve does not add a default decorator when building the default inbound route (this is outside the rules forloop)

case model.ProtocolHTTP, model.ProtocolHTTP2, model.ProtocolGRPC:
 			defaultRoute := buildDefaultRoute(cluster)

Secondly, would you mind posting a screen shot of how the mismatched operation names look in Zipkin?

objectiser · 2017-10-11T16:26:50Z

It would be similar to the issue I found in the original PR: #1386 (comment) - so the Jaeger image shows the productpage outbound requests using the route name, but the associated inbound requests on the details and reviews services have the host/port.

However, as shown in the second screenshot, with zipkin it is non-deterministic - the details request has resulted in the host/port name, whereas the reviews request used the route name. The problem is that zipkin will be receiving the client and server spans (with same id) in no fixed order, so if they have different operation names, then one will overwrite the other.

Currently if there is no route rule, then the behaviour would be as now - no decorator. So the operation name is defined by Envoy - using target service or host/port. If you want a different default then that should be simple to do - the problem is finding a suitable value?

rshriram · 2017-10-12T14:57:01Z

Okay, this does seem like a problem. What exactly did you mean by path templates?

objectiser · 2017-10-12T15:59:33Z

Was just thinking that if we were focusing on exact path matches to decorate with an operation name, then would have problems with paths that include parameters, e.g. "/orders/123".

Solution might be to enable exact path matches to define a template, such as "/orders/{orderId}", and have Envoy internally translate this to a regex. The other benefit is that the parameter names and values could be extracted and added to the spans as tags.

objectiser · 2017-10-13T04:37:36Z

@rshriram Added the default route decorator for compleness.

Had another idea - apart from the conflicting inbound rule issue, I believe this PR works - i.e. it names all client/server spans consistently so doesn't result in issue with zipkin shared spans.

As a possible solution to the inbound rule issue, was thinking it could be solved if the operation name used by the outbound rule could be propagated to the inbound listener - and if provided, this would be used by the inbound listener overriding any other locally defined decorator value it may have. Thoughts?

If sounds reasonable, I'll create an issue on Envoy for discussion.

rshriram · 2017-10-13T15:45:33Z

As a possible solution to the inbound rule issue, was thinking it could be solved if the operation >name used by the outbound rule could be propagated to the inbound listener - and if provided, this >would be used by the inbound listener overriding any other locally defined decorator value it may >have. Thoughts?

Perfect. Lets do this!

objectiser · 2017-10-13T16:19:19Z

@rshriram @kyessenov Great - can you give your +1 on the envoy issue, so hopefully Matt will give his blessing 😄

objectiser · 2017-10-18T19:22:12Z

@rshriram The envoy PR envoyproxy/envoy#1858 has now been merged, so can this one be merged aswell now? It doesn't directly depend on the envoy PR, so will just pick up the change when the commit is updated next time.

rshriram

Thanks for all the work in both repos! @kyessenov any other feedback? Lgtm on my end.

rshriram · 2017-10-19T00:54:31Z

Also please sync with master

objectiser · 2017-10-19T09:40:23Z

@rshriram No problem. Sync done.

rshriram · 2017-10-19T12:21:06Z

Please update the Envoy sha in istio/proxy repo (WORKSPACE file). we pick up Envoy updates manually. It should be a one line change. Once istio/proxy updates, other updates propagate automatically.

objectiser · 2017-10-19T12:55:31Z

Envoy sha updated: istio/proxy#594

istio-testing added the release-note label Oct 6, 2017

googlebot added the cla: yes label Oct 6, 2017

istio-testing added the needs-ok-to-test label Oct 6, 2017

istio-testing removed the needs-ok-to-test label Oct 6, 2017

rshriram reviewed Oct 6, 2017

View reviewed changes

kyessenov reviewed Oct 6, 2017

View reviewed changes

objectiser force-pushed the opname branch from c270344 to 2e11b86 Compare October 6, 2017 22:31

objectiser force-pushed the opname branch from e697c35 to 77f6658 Compare October 13, 2017 08:12

objectiser mentioned this pull request Oct 13, 2017

tracing: Enable inbound request header to override tracing operation name envoyproxy/envoy#1849

Closed

rshriram approved these changes Oct 19, 2017

View reviewed changes

objectiser added 5 commits October 19, 2017 10:32

Use the route name to define the tracing span/operation name.

46e18e2

Fix formatting issues

13e2f20

Remove duplicate code in buildInboundWebsocketRoute

067d098

Added decorator for default route

e794dba

Fix failing test

49784cd

objectiser force-pushed the opname branch from 77f6658 to 49784cd Compare October 19, 2017 09:32

rshriram merged commit 812a88e into istio:master Oct 19, 2017

objectiser deleted the opname branch October 19, 2017 12:55

Use the route name to define the tracing span/operation name. #1406

Use the route name to define the tracing span/operation name. #1406

Conversation

objectiser commented Oct 6, 2017 • edited Loading

istio-merge-robot commented Oct 6, 2017

istio-testing commented Oct 6, 2017

objectiser commented Oct 6, 2017

istio-testing commented Oct 6, 2017

rshriram commented Oct 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rshriram commented Oct 6, 2017

codecov bot commented Oct 6, 2017 • edited Loading

Codecov Report

objectiser commented Oct 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rshriram left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

objectiser commented Oct 8, 2017

rshriram commented Oct 9, 2017

kyessenov commented Oct 9, 2017

objectiser commented Oct 9, 2017

rshriram commented Oct 10, 2017 • edited Loading

objectiser commented Oct 10, 2017

rshriram commented Oct 10, 2017

rshriram commented Oct 10, 2017

objectiser commented Oct 10, 2017

rshriram commented Oct 10, 2017 via email

objectiser commented Oct 11, 2017

rshriram commented Oct 11, 2017

objectiser commented Oct 11, 2017

objectiser commented Oct 11, 2017

rshriram commented Oct 11, 2017

rshriram commented Oct 11, 2017

objectiser commented Oct 11, 2017

rshriram commented Oct 12, 2017

objectiser commented Oct 12, 2017

objectiser commented Oct 13, 2017

rshriram commented Oct 13, 2017

objectiser commented Oct 13, 2017

objectiser commented Oct 18, 2017

rshriram left a comment

Choose a reason for hiding this comment

rshriram commented Oct 19, 2017

objectiser commented Oct 19, 2017

rshriram commented Oct 19, 2017

objectiser commented Oct 19, 2017

objectiser commented Oct 6, 2017 •

edited

Loading

codecov bot commented Oct 6, 2017 •

edited

Loading

rshriram commented Oct 10, 2017 •

edited

Loading