Skip to content
This repository has been archived by the owner on Jun 14, 2018. It is now read-only.

Use the route name to define the tracing span/operation name. #1406

Merged
merged 5 commits into from
Oct 19, 2017

Conversation

objectiser
Copy link
Contributor

@objectiser objectiser commented Oct 6, 2017

What this PR does / why we need it: Uses the route name as the span/operation name in the tracing data.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:
If this PR is accepted, I would propose doing a separate PR with an update to the bookinfo example to show how individual routes could be named (using business relevant terms) based on specific REST endpoints.

Release note:

The operation/span name in the tracing data now uses the route name, allowing business relevant names to be associated with the tracing information. Individual routes could be defined for each REST endpoint, allowing them to be individually named.

@istio-merge-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
We suggest the following additional approver: greghanson

Assign the PR to them by writing /assign @greghanson in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@istio-testing
Copy link
Contributor

Hi @objectiser. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@objectiser
Copy link
Contributor Author

This is a followup to #1386 and istio/api#186 to use @rshriram idea for defining the operation/span name based on the route name. Some example images:

Jaeger screenshot of Bookinfo before route applied:

istio-opname-jaeger-before

After route-rule-all-v1.yaml applied:

istio-opname-jaeger-after

Same example (with routing rule) using Zipkin UI:

istio-opname-zipkin-after

NOTE: The operation/span names are using the current route rule names (e.g. productpage-default) - but could define more specific routing rules with appropriate business relevant names.

@istio-testing
Copy link
Contributor

@objectiser: you can't request testing unless you are a istio member.

In response to this:

This is a followup to #1386 and istio/api#186 to use @rshriram idea for defining the operation/span name based on the route name. Some example images:

Jaeger screenshot of Bookinfo before route applied:

istio-opname-jaeger-before

After route-rule-all-v1.yaml applied:

istio-opname-jaeger-after

Same example (with routing rule) using Zipkin UI:

istio-opname-zipkin-after

NOTE: The operation/span names are using the current route rule names (e.g. productpage-default) - but could define more specific routing rules with appropriate business relevant names.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rshriram
Copy link
Member

rshriram commented Oct 6, 2017

/ok-to-test

} else if config.ConfigMeta.Name != "" {
route = buildInboundRoute(config, rule, cluster)
}
if route != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed on inbound only? Why not use it in outbound as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outbound is handled in buildHTTPRoute (route.go).

@rshriram
Copy link
Member

rshriram commented Oct 6, 2017

you need to run ./bin/check.sh to format code

@codecov
Copy link

codecov bot commented Oct 6, 2017

Codecov Report

Merging #1406 into master will increase coverage by 1%.
The diff coverage is 80%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #1406    +/-   ##
========================================
+ Coverage   83.02%   84.03%    +1%     
========================================
  Files          52       52            
  Lines        6456     6764   +308     
========================================
+ Hits         5360     5684   +324     
+ Misses        894      877    -17     
- Partials      202      203     +1
Impacted Files Coverage Δ
proxy/envoy/resources.go 85.08% <ø> (ø) ⬆️
proxy/envoy/config.go 91.56% <100%> (-0.03%) ⬇️
proxy/envoy/route.go 93.54% <76.47%> (-1.31%) ⬇️
platform/consul/monitor.go 79.76% <0%> (-3.58%) ⬇️
adapter/config/crd/conversion.go 100% <0%> (ø) ⬆️
model/validation.go 97.69% <0%> (+0.73%) ⬆️
adapter/config/memory/monitor.go 89.18% <0%> (+2.7%) ⬆️
platform/kube/inject/inject.go 93.41% <0%> (+5.49%) ⬆️
platform/kube/inject/initializer.go 78.6% <0%> (+12.44%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13005f5...49784cd. Read the comment docs.

@objectiser
Copy link
Contributor Author

/retest

@@ -154,9 +155,34 @@ func (t *routing) verifyRouting(scheme, src, dst, headerKey, headerVal string,
}
}

if operation != "" {
errs = t.verifyDecorator(operation)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! Thanks for this!

@@ -83,6 +83,22 @@ func buildInboundWebsocketRoute(rule *proxyconfig.RouteRule, cluster *Cluster) *
return route
}

func buildInboundRoute(config model.Config, rule *proxyconfig.RouteRule, cluster *Cluster) *HTTPRoute {
route := buildHTTPRouteMatch(rule.Match)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just fold this and the buildInboundWebsocketRoute into one single function
buildInboundRoute ? You can then check
if rule.WebsocketUpgrade { and set the route.WebSocketUpgrade to true.. This way, the code is compact.. This should also kill the if-else loop that you have above in buildInboundListener

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to know rules at inbound location? I understand the hack for websockets was placed due to the fact that Envoy cannot multiplex regular HTTP with websockets, but we should not rely on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its for tagging the traces with the operation names.. The thing is, multiple routes could lead to the same inbound route like / . In this case, the operation name becomes pointless right? @objectiser
You might as well have "inbound" as the operation name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if pointless, but less useful - although using the current approach it still gives them the ability to define the span/operation name - even if all endpoints for the same destination use the same name (initially). But from there they then have the ability to refine the routes and provide more specific names.

Copy link
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall. Just one comment to deduplicate code. After that its good to go.


route = buildInboundWebsocketRoute(rule, cluster)
} else if config.ConfigMeta.Name != "" {
route = buildInboundRoute(config, rule, cluster)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add another route and not change default route?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm concerned that we rely on route rules by destination here. We cannot be certain that the request originates from within the mesh (or from within the snapshot of the rules known at destination). If a request doest not fit any rule, then this logic does not apply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that there could be multiple routes here - if multiple routes are defined for different endpoints on the same destination? If so, then each endpoint should be configured with a different span/operation name.

func buildDecorator(config model.Config) *Decorator {
if config.ConfigMeta.Name != "" {
return &Decorator{
Operation: config.ConfigMeta.Name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use config.Key() or include the namespace. Names are not unique.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is not to be unique but to enable users to specify a business relevant name that can be used in the tracing operation/span name.

@objectiser
Copy link
Contributor Author

Modified version of the route-rule-all-v1.yaml:

apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: get-product
spec:
  destination:
    name: productpage
  precedence: 2
  route:
  - labels:
      version: v1
  match:
    request:
      headers:
         uri:
          exact: "/productpage"
---
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: login
spec:
  destination:
    name: productpage
  precedence: 3
  route:
  - labels:
      version: v1
  match:
    request:
      headers:
         uri:
          exact: "/login"
---
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: get-reviews
spec:
  destination:
    name: reviews
  precedence: 2
  route:
  - labels:
      version: v1
  match:
    request:
      headers:
         uri:
          prefix: "/reviews/"
---
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: get-ratings
spec:
  destination:
    name: ratings
  precedence: 2
  route:
  - labels:
      version: v1
  match:
    request:
      headers:
         uri:
          prefix: "/ratings/"
---
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: get-details
spec:
  destination:
    name: details
  precedence: 2
  route:
  - labels:
      version: v1
  match:
    request:
      headers:
         uri:
          prefix: "/details/"

With example image:
istio-opnames

@rshriram
Copy link
Member

rshriram commented Oct 9, 2017

@kyessenov thoughts?

@kyessenov
Copy link
Contributor

Hi @objectiser ,
Can you give me some insight why you decided to use the inbound route instead of outbound route to record the rule name decorator? I'd assume that the client routing is responsible for creating the request connection between the client and the server, and it's the responsibility of the client to record the rule decorator in the trace metadata.

@objectiser
Copy link
Contributor Author

@kyessenov Apologies as I haven't been working on the project for long, so may have misunderstood the terminology - but not sure what you mean by inbound and outbound rules.

From my perspective there are RouteRule and that information is being used to create the inbound and outbound listeners in Envoy - possibly that is what you are referring to? If so, then the route rule is being used to set the decorator (tracing operation name) on both the outbound (egress) and inbound (ingress) listeners.

The reason this is important is that it means the same operation name is being used in the client and server span (representing the same request) - so consistent naming.

@rshriram
Copy link
Member

rshriram commented Oct 10, 2017

In general, (until your PR), RouteRules apply only on outbound. The only time we compute inbound rules is when we need to set websockets. What I am not clear about (may be I missed this in the conversation above), is what happens if I have two route rules, from two different services, pointing to the same path of the destination service? [and may be this is what @kyessenov was driving at]
for e.g.
from productpage, for reviews go to reviews:/getReviews
from foobar, for reviews/getAllReviews go to reviews:/getReviews

These rules will be applied on caller side and the decorators will be generated properly on productpage and foobar. But on the callee side/destination, you might end up overwriting the decorator (or possibly adding two Envoy routes for same /getReviews, but with different decorators).

@objectiser
Copy link
Contributor Author

@rshriram Good point - you are correct this would be an issue. Another thing I think might be a problem is that using the route rule name means it wouldn't be possible to use a stable name (e.g. get-reviews) when having multiple rules for the same endpoint - each rule would need a different name.

As a possible solution:

  • I introduce an optional "operation" field on the RouteRule (possibly in a TracingInfo sub-message as discussed before)
  • If no 'operation' is defined, then the outbound rule will use the route rule name as the operation name in the tracing data - which should provide better clarity/info to users as they will clearly see which routes have been applied under which circumstances
  • If an 'operation' is defined, then apply the value to both the inbound and outbound rules - and in this scenario it is up to the user to use consistent naming for the same endpoint on the destination service when used from multiple sources (or even across multiple route rules for the same source/destination service).

Let me know whether this approach sounds reasonable. If so - I can either impl as one PR, or update this PR just to implement step 2 (i.e. route rule used on outbound rule), and then create separate PR for the optional operation?

@rshriram
Copy link
Member

This sounds fine, except for the part about inferring the operation name on the inbound route. It still doesn't solve the multiple rules (or even no rule ones) arriving at the same inbound route. It almost feels like you need a global map of service name, operation name and path, and have the decorators act independent of route matching. Like having a decorator list in the virtual host, that does additional matching of path/prefix, in addition to route block doing the same. Which reminds me, didn't you have a PR in envoy to do just this? I vaguely recall that your initial version of PR in envoy was using a separate match block in Envoy (outside the confines of route).

cc @fabolive (in case you are interested in this discussion).

@rshriram
Copy link
Member

and for the record, I totally support the decorators idea. I think it makes for a fantastic addition to the trace. Just trying to figure out how to handle the inbound part. Will the trace become mangled if the server does not add the decorator (i.e. only client adds the decorator?)

@objectiser
Copy link
Contributor Author

@rshriram Yes that was how my original envoy PR worked - but was rejected because it created essentially a duplicate route matching definition - which at the time did appear to be an unnecessary overhead.

While discussing other options, I could create a separate PR for using the RouteRule name in outbound rule decorator - and check that it works fine in zipkin/jaeger. Sound reasonable?

@rshriram
Copy link
Member

rshriram commented Oct 10, 2017 via email

@objectiser
Copy link
Contributor Author

@rshriram Unfortunately doesn't work properly in zipkin due to the shared span model - the operation name is not consistently set to the client span name.

Will have a think about alternatives - should this be closed for now?

@rshriram
Copy link
Member

I see. How about we set a default decorator name for all inbound routes? Will that mess things up as well? Alternatively (and more complex), we could have a decorator selection algorithm for the inbound route which does the following:
a. If there is a specific path, that is not shared by other routes, we create a dedicated route in the inbound and add decorator
b. If a path is shared by multiple routes, then we select the first one from the sorted list
c. If its the default path (e.g., prefix: / ), then we add a default decorator

I am not sure how complicated the above code will turn out to be.

Also, it doesn't matter if operations don't appear properly in Zipkin, as long as this is additional data over what Zipkin gives by default. If Jaeger picks it up, then that still useful for folks using Jaeger. But if it messes up common zipkin usage, then we have to consider the alternatives.

@objectiser
Copy link
Contributor Author

I tried setting a default blank decorator name for inbound, but that still caused inconsistent results - some spans with the route-rule name, some with the blank.

In terms of (a) and (b), this is effectively what this current implementation does, isn't it? If there is only a single route, then the name is defined, but if there were to be multiple applicable route rules, then the first would be selected.

In terms of zipkin usage - the result would be some spans having the route name, while others have the service name (as now).

@objectiser
Copy link
Contributor Author

Or is the point of your algorithm that we just focus on exact path matches, to reduce the potential for conflict? If so, might require a future PR to enable exact matches to deal with path parameters (possibly by specifying some form of path template).

@rshriram
Copy link
Member

hmm.. so using a dummy "default" name for inbound also causes inconsistent results in Zipkin? Would it be possible to provide a screen shot?

with regard to the options a/b/c, you are still missing the operation name for the default route..

case model.ProtocolHTTP, model.ProtocolHTTP2, model.ProtocolGRPC:
 			defaultRoute := buildDefaultRoute(cluster)```

@rshriram
Copy link
Member

Lost my comment.. the code you ahve does not add a default decorator when building the default inbound route (this is outside the rules forloop)

case model.ProtocolHTTP, model.ProtocolHTTP2, model.ProtocolGRPC:
 			defaultRoute := buildDefaultRoute(cluster)

Secondly, would you mind posting a screen shot of how the mismatched operation names look in Zipkin?

@objectiser
Copy link
Contributor Author

It would be similar to the issue I found in the original PR: #1386 (comment) - so the Jaeger image shows the productpage outbound requests using the route name, but the associated inbound requests on the details and reviews services have the host/port.

However, as shown in the second screenshot, with zipkin it is non-deterministic - the details request has resulted in the host/port name, whereas the reviews request used the route name. The problem is that zipkin will be receiving the client and server spans (with same id) in no fixed order, so if they have different operation names, then one will overwrite the other.

Currently if there is no route rule, then the behaviour would be as now - no decorator. So the operation name is defined by Envoy - using target service or host/port. If you want a different default then that should be simple to do - the problem is finding a suitable value?

@rshriram
Copy link
Member

Okay, this does seem like a problem. What exactly did you mean by path templates?

@objectiser
Copy link
Contributor Author

Was just thinking that if we were focusing on exact path matches to decorate with an operation name, then would have problems with paths that include parameters, e.g. "/orders/123".

Solution might be to enable exact path matches to define a template, such as "/orders/{orderId}", and have Envoy internally translate this to a regex. The other benefit is that the parameter names and values could be extracted and added to the spans as tags.

@objectiser
Copy link
Contributor Author

@rshriram Added the default route decorator for compleness.

Had another idea - apart from the conflicting inbound rule issue, I believe this PR works - i.e. it names all client/server spans consistently so doesn't result in issue with zipkin shared spans.

As a possible solution to the inbound rule issue, was thinking it could be solved if the operation name used by the outbound rule could be propagated to the inbound listener - and if provided, this would be used by the inbound listener overriding any other locally defined decorator value it may have. Thoughts?

If sounds reasonable, I'll create an issue on Envoy for discussion.

@rshriram
Copy link
Member

As a possible solution to the inbound rule issue, was thinking it could be solved if the operation >name used by the outbound rule could be propagated to the inbound listener - and if provided, this >would be used by the inbound listener overriding any other locally defined decorator value it may >have. Thoughts?

Perfect. Lets do this!

@objectiser
Copy link
Contributor Author

@rshriram @kyessenov Great - can you give your +1 on the envoy issue, so hopefully Matt will give his blessing 😄

@objectiser
Copy link
Contributor Author

@rshriram The envoy PR envoyproxy/envoy#1858 has now been merged, so can this one be merged aswell now? It doesn't directly depend on the envoy PR, so will just pick up the change when the commit is updated next time.

Copy link
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work in both repos! @kyessenov any other feedback? Lgtm on my end.

@rshriram
Copy link
Member

Also please sync with master

@objectiser
Copy link
Contributor Author

@rshriram No problem. Sync done.

@rshriram rshriram merged commit 812a88e into istio:master Oct 19, 2017
@rshriram
Copy link
Member

Please update the Envoy sha in istio/proxy repo (WORKSPACE file). we pick up Envoy updates manually. It should be a one line change. Once istio/proxy updates, other updates propagate automatically.

@objectiser objectiser deleted the opname branch October 19, 2017 12:55
@objectiser
Copy link
Contributor Author

Envoy sha updated: istio/proxy#594

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants