Using refinery and otel-collector to route traces based on content (add dataset to inmemory collector cache key) #269

tr-fteixeira · 2021-06-09T18:26:45Z

Started the discussion on the pollinators slack, here.

Im trying to use refinery as one of the tools to achieve "trace routing" or "trace multiplexing" capabilities, it is a unusual use case, but here is the context of the ask.

this might be an unconventional use case, and seems like it runs into some problems here, because it uses only traceID to group spans.

Details 👇
Context:
Using istio and a shared ingress gateway for multiple applications/environments/teams (each mapping to its own dataset)
This means i have to split the destination of the traces based on content they carry (not doable in istio alone, or any app)..

What am i trying to do?
Get those istio traces to the correct dataset, by means of:
Shared istio gateway -> otel-collector (fan out to 2 exporters, one for each dataset) -> refinery (rulesbasedSampler - drop traces of other environments/namespaces) -> HC
This effectively duplicates all traces generated by istio, and adds the meta of different datasets to each,
After that, the thought was to drop based on Refinery rules the ones going to the wrong dataset.

What ends up happening?
For a single exporter, it works great.. when multiple exporters are enabled, traces don’t get evaluated properly, some being kept/dropped, seemly in a random fashion.

What i think the problem is:
(No batching is enabled in otel)
When refinery collect the spans in memory and collate them to form a trace here, only the TraceID is taken into account, so this is combining (sometimes) both exporters data, with different datasets as upstream into one single trace, making the sampling decision look wrong.

To The actual questions
Does this sound right? or did i get it all wrong 😃
Was this a conscious decision or i am trying to use the tool on an unexpected use case?

Possible solutions

Adding sp.Dataset to the cache object key as a prefix should fix it..
using multiple refinery deployments for each exporter

Would prefer to do 1, but wanted to hear your thoughts before working on a PR 😃

The text was updated successfully, but these errors were encountered:

paulosman · 2021-07-21T21:09:26Z

HI @tr-fteixeira - sorry for the slow response on this.

I'm going to try and summarize your use case. Can you please tell me if I got it right or if I'm still misunderstanding something? I'm going to try and remove components specific to your architecture that don't impact the use case:

You have the same traces going through an OTel Collector. The OTel collector has two exporters, each sending the traces to a Refinery instance, but with different Datasets specified. So it would look something like this:

exporters:
  otlp/honeycombOne:
    endpoint: "refinery.yourco.com:9090"
    headers:
      "x-honeycomb-team": "s3cret"
      "x-honeycomb-dataset": "dataset-one"
  otlp/honeycombTwo:
    endpoint: "refinery.yourco.com:9090"
    headers:
      "x-honeycomb-team": "s3cret"
      "x-honeycomb-dataset": "dataset-two"

So at this point, you're essentially tee'ing your traces, sending the same data to two different datasets.

In Refinery, you want to sample these at different rates or based on different rules, so you'd like to look at the dataset and make a rule-based sampling decision based on that value, for example, you may want to sample all traces sent to dataset-one at 1/5 and all traces sent to dataset-two at 1/10.

Because Refinery makes sampling decisions based on trace IDs alone, you're going to end up with a seemingly arbitrary mix of sampling at 1/5 and 1/10 in each dataset, which will result in broken traces in both.

Is that correct? Please let me know if I've misunderstood or misstated anything.

If that's the case, I'm trying to think of edge cases where encoding the dataset could create a problem. I don't think there are any, but it will require some thought and/or testing. I'm concerned about how traces are sent to other nodes, what ends up being set in Redis, etc, etc.

Sampling on TraceID was an intentional design choice since they're meant to be unique. I'm hesitant to encode Dataset as an arbitrary 2nd attribute just to guarantee uniqueness, but I do understand the need (I think?). This is definitely a "there be dragons" kind of situation, so please do be patient with us as we think it through :-)

tr-fteixeira · 2021-07-23T15:27:14Z

Hey @paulosman, thanks for taking a look at it, and yes, you got it right, that is the use case. At least, thats the only way i could think of achieving what i needed.

By that i mean, If we go up an abstraction level, i am looking for the ability to send specific traces from the same source system to different datasets, based on rules(trace/span content) anyway i can =). There are some discussions on Otel collector routing it, but nothing quite there yet.

If you want any help on testing, or clarifications, let me know (here or on pollinators 👍 )

vreynolds · 2021-07-28T22:09:21Z

Summary from slack: decided against changing the Refinery caching keys, given that this is a fringe use case for Refinery, and fits better as a collector concern. Potential candidates for achieving this with the collector: trace filter processor or the routing processor.

tr-fteixeira changed the title ~~Using refinery and otel-collector to route traces based on content (add dataset to inmemory collectore cache key)~~ Using refinery and otel-collector to route traces based on content (add dataset to inmemory collector cache key) Jun 22, 2021

tr-fteixeira mentioned this issue Jun 22, 2021

Adds dataset to inMemory cache key patterns #272

Closed

vreynolds closed this as completed Jul 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using refinery and otel-collector to route traces based on content (add dataset to inmemory collector cache key) #269

Using refinery and otel-collector to route traces based on content (add dataset to inmemory collector cache key) #269

tr-fteixeira commented Jun 9, 2021

paulosman commented Jul 21, 2021 •

edited

Loading

tr-fteixeira commented Jul 23, 2021

vreynolds commented Jul 28, 2021

Using refinery and otel-collector to route traces based on content (add dataset to inmemory collector cache key) #269

Using refinery and otel-collector to route traces based on content (add dataset to inmemory collector cache key) #269

Comments

tr-fteixeira commented Jun 9, 2021

paulosman commented Jul 21, 2021 • edited Loading

tr-fteixeira commented Jul 23, 2021

vreynolds commented Jul 28, 2021

paulosman commented Jul 21, 2021 •

edited

Loading