Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add tail sampling for traces #467

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Lp-Francois
Copy link

@Lp-Francois Lp-Francois commented Apr 17, 2024

Add tail sampling for traces. Useful when dropping health probes from your traces for instance, as they can be quite noisy 💸.

Documentation: https://grafana.com/docs/grafana-cloud/monitor-applications/application-observability/setup/sampling/tail/

Related issues:

Comments:

  • How to generate the example properly? I didn't find a way to generate the examples/tail-sampling/*.alloy files.
  • Is there an easy way of testing the generated config? I made sure the helm chart output looks OK (with helm template k8s-monitoring charts/k8s-monitoring -f examples/tail-sampling/values.yaml > examples/tail-sampling/output.yaml), but didn't tested it would work correctly.
    • If there is no official way, I guess I could copy paste from the helm chart output the generated config and pass it locally to the alloy cli?
  • The allowed configuration is quite a lot: https://grafana.com/docs/alloy/latest/reference/components/otelcol.processor.tail_sampling/#blocks I wonder if there would be an easy way to pass config for the policies? Maybe like the extraConfig is working, just let the user pass any string?

Note: It is really painful to work with these alloy_config/*.alloy.txt files 😅

@CLAassistant
Copy link

CLAassistant commented Apr 23, 2024

CLA assistant check
All committers have signed the CLA.

- also remove default value, as they have to be defined anyway

based on reviews from @t00mas
@Lp-Francois Lp-Francois requested a review from t00mas May 6, 2024 10:14
@Lp-Francois Lp-Francois requested a review from t00mas May 8, 2024 14:30
@Lp-Francois
Copy link
Author

@t00mas I applied your reviews :)

Could you have another look if you find some time please 🙏 ?

@t00mas
Copy link
Contributor

t00mas commented May 28, 2024

I'm OK with this, just take into account @rlankfo 's recommendation here #443 (comment)

@rlankfo
Copy link
Member

rlankfo commented May 28, 2024

I'd be concerned about adding the tail sampling processor at this stage of the pipeline. It won't be scalable -- to scale the tail sampling processor we recommend introducing a load balancing exporter in a separate layer that can route traces by trace ID. @petewall do you have any thoughts here?

@a-abella
Copy link
Contributor

I am hitting the need for this now and have some issues with the solution proposed in #443 (comment). It doesn't integrate well with the k8s-monitoring chart because k8s-monitoring supports a variety of authentication mechanisms for different backends that the grafana-sampling chart does not. Additionally the grafana-sampling chart has no built-in support for OTTL metric/datapoint filtering like this chart does.

If sampling within this k8s-monitoring chart is not encouraged due to scalability concerns then instead of including sampling templates, a suitable alternative could be a way to add an alloy extraConfig pre-batch processor within the traces template, allowing users to configure their own sampling in a way that integrates with this chart at their own peril.

Alternatively, and I have no idea how possible this is, it would be interesting if this chart and the grafana-sampling chart could interop such that spans and OTEL metrics arrive at the k8s-monitoring statefulset, Traces are forwarded to grafana-sampling load balancing endpoint for processing, then the grafana-sampling configuration points its outputs back to k8s-monitoring which handles final output/export to backends. This would allow k8s-monitoring to handle the routing of metric, log, and trace data to its various supported backend datasources while still providing a tail sampler that can scale with load.

@petewall petewall added this to the 2.0 milestone Sep 30, 2024
@petewall petewall self-requested a review as a code owner October 4, 2024 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants