-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using refinery and otel-collector to route traces based on content (add dataset to inmemory collector cache key) #269
Comments
HI @tr-fteixeira - sorry for the slow response on this. I'm going to try and summarize your use case. Can you please tell me if I got it right or if I'm still misunderstanding something? I'm going to try and remove components specific to your architecture that don't impact the use case: You have the same traces going through an OTel Collector. The OTel collector has two exporters, each sending the traces to a Refinery instance, but with different Datasets specified. So it would look something like this:
So at this point, you're essentially tee'ing your traces, sending the same data to two different datasets. In Refinery, you want to sample these at different rates or based on different rules, so you'd like to look at the dataset and make a rule-based sampling decision based on that value, for example, you may want to sample all traces sent to dataset-one at 1/5 and all traces sent to dataset-two at 1/10. Because Refinery makes sampling decisions based on trace IDs alone, you're going to end up with a seemingly arbitrary mix of sampling at 1/5 and 1/10 in each dataset, which will result in broken traces in both. Is that correct? Please let me know if I've misunderstood or misstated anything. If that's the case, I'm trying to think of edge cases where encoding the dataset could create a problem. I don't think there are any, but it will require some thought and/or testing. I'm concerned about how traces are sent to other nodes, what ends up being set in Redis, etc, etc. Sampling on TraceID was an intentional design choice since they're meant to be unique. I'm hesitant to encode Dataset as an arbitrary 2nd attribute just to guarantee uniqueness, but I do understand the need (I think?). This is definitely a "there be dragons" kind of situation, so please do be patient with us as we think it through :-) |
Hey @paulosman, thanks for taking a look at it, and yes, you got it right, that is the use case. At least, thats the only way i could think of achieving what i needed. By that i mean, If we go up an abstraction level, i am looking for the ability to send specific traces from the same source system to different datasets, based on rules(trace/span content) anyway i can =). There are some discussions on Otel collector routing it, but nothing quite there yet. If you want any help on testing, or clarifications, let me know (here or on pollinators 👍 ) |
Summary from slack: decided against changing the Refinery caching keys, given that this is a fringe use case for Refinery, and fits better as a collector concern. Potential candidates for achieving this with the collector: trace filter processor or the routing processor. |
Started the discussion on the pollinators slack, here.
Im trying to use refinery as one of the tools to achieve "trace routing" or "trace multiplexing" capabilities, it is a unusual use case, but here is the context of the ask.
this might be an unconventional use case, and seems like it runs into some problems here, because it uses only traceID to group spans.
Details 👇
Context:
Using istio and a shared ingress gateway for multiple applications/environments/teams (each mapping to its own dataset)
This means i have to split the destination of the traces based on content they carry (not doable in istio alone, or any app)..
What am i trying to do?
Get those istio traces to the correct dataset, by means of:
Shared istio gateway -> otel-collector (fan out to 2 exporters, one for each dataset) -> refinery (rulesbasedSampler - drop traces of other environments/namespaces) -> HC
This effectively duplicates all traces generated by istio, and adds the meta of different datasets to each,
After that, the thought was to drop based on Refinery rules the ones going to the wrong dataset.
What ends up happening?
For a single exporter, it works great.. when multiple exporters are enabled, traces don’t get evaluated properly, some being kept/dropped, seemly in a random fashion.
What i think the problem is:
(No batching is enabled in otel)
When refinery collect the spans in memory and collate them to form a trace here, only the TraceID is taken into account, so this is combining (sometimes) both exporters data, with different datasets as upstream into one single trace, making the sampling decision look wrong.
To The actual questions
Does this sound right? or did i get it all wrong 😃
Was this a conscious decision or i am trying to use the tool on an unexpected use case?
Possible solutions
sp.Dataset
to the cache object key as a prefix should fix it..Would prefer to do 1, but wanted to hear your thoughts before working on a PR 😃
The text was updated successfully, but these errors were encountered: