Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[telemetry] extend collector telemetry with additional attributes #12316

Open
vigneshshanmugam opened this issue Feb 6, 2025 · 9 comments
Open

Comments

@vigneshshanmugam
Copy link

vigneshshanmugam commented Feb 6, 2025

Is your feature request related to a problem? Please describe.

The current telemetry configuration in the collector #4970 doesnt allow the use of custom providers (e.g., Meter, Trace) or the addition of custom dimensions that can be dynamically injected to the components collecting the internal telemetry data. This limitation makes it challenging to use the data in a multi-tenant environment or use it impact analysis etc.

Describe the solution you'd like

Given there is already an open issue to extend the Telemetry factory to accept various providers, I would like to describe the solution only for extending the collected metrics/trace/logs data with additional dimensions via attributes.

  • Extend the telemetry configuration to support baggage propagator
  • Add the Contents from the baggage to the produced spans/metrics/logs as attributes. Common place to start adding them would be to the ObsReport package that is the base for capturing telemetry for Receiver, Processors and Exporters.
	for _, member := range baggage.FromContext(ctx).Members() {
		if filter(member) {
			span.SetAttributes(attribute.String(member.Key(), member.Value()))
		}
	}
// similarly for metrics to be recorded
  • Do the same for other providers
  • Provide a option for filtering additional dimensions if it might be problematic.

Alternatives considered

I am not sure if there is any easy way to propagate them across all components, For custom components it was possible via setting TelemetrySettings.

Happy to work on it if there is a consensus.

@vigneshshanmugam
Copy link
Author

@mx-psi Since you wrote the original proposal for the Telemetry extensions, Adding you incase if you have any thoughts on this approach. Thanks.

@mx-psi
Copy link
Member

mx-psi commented Feb 11, 2025

@vigneshshanmugam What about setting the resource

// Resource specifies user-defined attributes to include with all emitted telemetry.
// Note that some attributes are added automatically (e.g. service.version) even
// if they are not specified here. In order to suppress such attributes the
// attribute must be specified in this map with null YAML value (nil string pointer).
Resource map[string]*string `mapstructure:"resource"`
?

@vigneshshanmugam
Copy link
Author

@mx-psi Thanks for the pointer, Resource can only help with static attributes and doesn't provide a way to extend the telemetry trace data based on HTTP headers / Client metadata etc?

The use case is as follows: Suppose the collector is deployed in a multi-tenant environment. Ideally, we want to determine which tenant (ex - identifiable via x-tenant-id) is experiencing issues with data ingestion by analyzing the telemetry metrics/traces/logs produced by the collector.

@vigneshshanmugam
Copy link
Author

If baggage would be too much of a concern, can we allow the attributes via client metadata_keys to the telemetry config and have max cardinality limits similar to other components

telemetry:
   metadata_keys:
      x-tenant-id

Thoughts?

@jade-guiton-dd
Copy link
Contributor

jade-guiton-dd commented Feb 21, 2025

Could you provide some more details about this "multi-tenant environment" use case? Is the idea that multiple users have applications forwarding telemetry to the same Collector, and you want to provide them with Collector telemetry relevant to the data they sent?

If that is your use case, it seems rather complicated, as telemetry items from multiple users will often be batched together, and failures in the pipeline will be hard to attribute to a single user.

It seems to me that adding metadata keys in the telemetry config would not fit this use case (or my interpretation of it at least), since the config is set on the Collector as a whole.

@axw
Copy link
Contributor

axw commented Feb 24, 2025

(@vigneshshanmugam and I work together)

Could you provide some more details about this "multi-tenant environment" use case? Is the idea that multiple users have applications forwarding telemetry to the same Collector, and you want to provide them with Collector telemetry relevant to the data they sent?

Think of it a bit like virtual hosting. We have (conceptually) a single collector that serves multiple tenants; tenants will send requests to a tenant-specific URL that routes to the same collector. At the collector we can identify the tenant by the specified host/URL.

In case it's unclear, we're not talking about telemetry based on the request data (payload), only on the request metadata (e.g. request URL, headers). We want to add a tenant-identifying attribute to any request-based telemetry: at least metrics and spans, ideally also logs.

If that is your use case, it seems rather complicated, as telemetry items from multiple users will often be batched together, and failures in the pipeline will be hard to attribute to a single user.

We configure the collector to not batch data for multiple tenants.

It seems to me that adding metadata keys in the telemetry config would not fit this use case (or my interpretation of it at least), since the config is set on the Collector as a whole.

The way this can work is to make the meaning of the config "add this attribute if it exists in client metadata". That should be the case for any request-based instrumentation, and not for anything else.

@jade-guiton-dd
Copy link
Contributor

jade-guiton-dd commented Feb 24, 2025

I see, that makes sense. I guess that as long as you carefully configure your Collector pipeline (no batch processor, batching disabled in exporters), it may be possible to propagate Baggage from the incoming requests all the way down the pipeline. Systematically adding that baggage to all emitted internal telemetry would probably be impossible, but if you use a limited and known set of components, the solution you propose would be doable.

I agree that unless the approach is rethought entirely (using one Collector instance by tenant for example), I don't think there would be many alternatives in the current state of things.

This is a niche enough feature with enough caveats that I think it should ideally be implemented as an Extension, but I don't think that's really possible.

@axw
Copy link
Contributor

axw commented Feb 27, 2025

I had a bit of a look into this, and it's not super easy at the moment due to lack of metrics processors in SDKs. This is currently under discussion at open-telemetry/opentelemetry-specification#4298

If we had that, then I think it would relatively straightforward to implement this: in the service package we would install a processor to each of the log, metric, and trace providers; and these processors would add attributes from client metadata to each log record/metric datapoint/span.

@jade-guiton-dd
Copy link
Contributor

That's a good point.

It sounds like the feature you're looking for already exists for spans/logs in the form of the baggagecopy processor, so the only thing missing is the ability to add it to the Collector's telemetry pipeline.

We use the go.opentelemetry.io/contrib/config package to instantiate our providers/processors/exporters based on the YAML config, so I think the best solution would be to work with the Go SIG on adding it to that package as a third option besides the simple and batch processors.

As for metrics: We could wait and see if the "measurement processors" proposal moves forward, then do a similar thing. We could also try to "push" the proposal forward, by working with the Go SIG to put out a POC implementation in the Go SDK.

I would be personally interested in metric processors moving forward, as a potential alternative to the very inelegant solution in #12384. If that PR moves forward in its current state, it could probably be modified to support your use case for metrics while waiting for proper metric processors to become a thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants