Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify and improve GCP resource detection, second attempt #2310

Merged
merged 6 commits into from
Jun 1, 2022

Conversation

dashpole
Copy link
Contributor

@dashpole dashpole commented May 23, 2022

This PR supersedes GoogleCloudPlatform/opentelemetry-operations-go#383 and #2223. It was also discussed at the May 5th sig meeting.

It implements resource detection for GCE, GKE, GAE, Cloud Run, and Cloud Functions using a single detector. The underlying logic is implemented (and integration tested in GCP environments) in GoogleCloudPlatform/opentelemetry-operations-go. The underlying library provides a function, CloudPlatform, which detects which platform it is on, and then one function for each attribute. The reason for the split between "library" and detector is so that the library functions can be integration tested in real GCP environments, which is difficult to do in upstream repositories.

Unfortunately, the current resource detector was already marked stable, so we won't be able to remove existing types, etc. I've added deprecation warnings to them to encourage users to move to the new implementation.

When users switch to the new detector, there are a few breaking changes:

  • No longer supports detecting k8s.pod.name, container.name, k8s.namespace.name when running on k8s. These are expected to be set outside of this resource detector as described in the README.
  • No longer sets service.namespace to "cloud-run-managed" on Cloud Run.
  • Sets faas.name instead of service.name on Cloud Run.
  • Sets faas.id instead of service.instance.id on Cloud Run.

Additions:

  • Adds detector for Google App Engine
  • Correctly sets cloud.platform on GKE and Cloud Run
  • Detects cloud.availability_zone/region when running on GKE with workload identity enabled (the GCE metadata server is not accessible in that scenario)
  • Detects faas.version on cloud run and cloud functions.

@dashpole dashpole force-pushed the gcp_resource_v2 branch 2 times, most recently from ae4136a to 8a4626a Compare May 23, 2022 19:45
@dashpole dashpole changed the title Unify and improve GCP resource detection Unify and improve GCP resource detection, second attempt May 23, 2022
@codecov
Copy link

codecov bot commented May 23, 2022

Codecov Report

Merging #2310 (21e8fc8) into main (520efe3) will increase coverage by 0.0%.
The diff coverage is 82.1%.

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #2310   +/-   ##
=====================================
  Coverage   73.6%   73.7%           
=====================================
  Files        141     142    +1     
  Lines       6384    6457   +73     
=====================================
+ Hits        4704    4764   +60     
- Misses      1539    1550   +11     
- Partials     141     143    +2     
Impacted Files Coverage Δ
detectors/gcp/cloud-function.go 100.0% <ø> (ø)
detectors/gcp/cloud-run.go 100.0% <ø> (ø)
detectors/gcp/gce.go 8.7% <ø> (ø)
detectors/gcp/gke.go 0.0% <ø> (ø)
detectors/gcp/detector.go 82.1% <82.1%> (ø)

@dashpole dashpole force-pushed the gcp_resource_v2 branch 2 times, most recently from f4abae8 to cb346ca Compare May 23, 2022 20:00
@dashpole dashpole marked this pull request as ready for review May 23, 2022 20:39
detectors/gcp/README.md Outdated Show resolved Hide resolved
detectors/gcp/README.md Outdated Show resolved Hide resolved
detectors/gcp/types.go Outdated Show resolved Hide resolved
detectors/gcp/detector.go Outdated Show resolved Hide resolved
@dmathieu
Copy link
Member

Looks like a go mod tidy is required.

@dashpole
Copy link
Contributor Author

Looks like a go mod tidy is required.

Done

dashpole and others added 3 commits June 1, 2022 15:44
@MrAlias MrAlias merged commit b7910af into open-telemetry:main Jun 1, 2022
@dashpole dashpole deleted the gcp_resource_v2 branch June 1, 2022 20:27
@jaredjenkins
Copy link

I know this is very old - but why did you stop detecting the pod, container, etc? Having to do this with env vars is less than ideal. Since it makes it hard to leverage the detected OTEL resources to build a unique ID for the resource like: /cluster/namespace/pod

@dashpole
Copy link
Contributor Author

dashpole commented Aug 4, 2023

@jaredjenkins can you expand on what you are unable to do? The previous detector was using environment variables to set those attributes.

You can achieve the previous behavior by adding:

- name: OTEL_RESOURCE_ATTRIBUTES
  value: k8s.pod.name=$(HOSTNAME),k8s.namespace.name=$(NAMESPACE),k8s.container.name=$(CONTAINER_NAME)

In your pod spec.

k8s.* attributes are not GCP-specific, and thus don't belong in the GCP detector.

@jaredjenkins
Copy link

@dashpole I hear you on the separation of concerns. Here's the context: I'm coming from a heavy Prometheus background where "resource labels" go through different stages: discovered -> available for augmentation or control flow. The latter is accomplished through a relabel rule that can be applied on scrape time. Benefit is that folks get consistent Instance IDs and can use the "resource labels" to build other labels.

detector := gcpdetector.NewDetector()
instanceID, err := kubernetesInstanceID(ctx, detector)
//...
res, err := resource.New(
	context.Background(),
	resource.WithDetectors(detector),
	// Keep the default detectors
	resource.WithTelemetrySDK(),
	// Add your own custom attributes to identify your application.
	resource.WithAttributes(
		semconv.ServiceNameKey.String("myservice"),
		semconv.ServiceInstanceIDKey.String(instanceID),
	),
)

// To obtain a unique identifier for Kube pods.
func kubernetesInstanceID(ctx context.Context, d resource.Detector) (string, error) {
res, err := d.Detect(ctx)
if err != nil {
	return "", errors.WithMessage(err, "cannot detect GCP resources")
}
attrs := res.Set()
cluster, ok := attrs.Value(semconv.K8SClusterNameKey)
if !ok {
	return "", errors.New("missing cluster name")
}
ns, ok := attrs.Value(semconv.K8SNamespaceNameKey)
if !ok {
	return "", errors.New("missing namespace name")
}
pod, ok := attrs.Value(semconv.K8SPodNameKey)
if !ok {
	return "", errors.New("missing pod name")
}
return fmt.Sprint(cluster, "/", ns, "/", pod), nil

We could do that using the env variables that we defined, but it's just not very ergonomic.

Based on this open-telemetry/opentelemetry-specification#3136. It looks like users are still required to provide some kind of UUID, but that's honestly pretty to useless for debugging. The OTLP -> Prometheus spec will extract the semconv.ServiceInstanceIDKey and translate it to the canonical instance tag and otherwise labels about the Kubernetes context are omitted.

You can of course try to read this data from the detectors like the GCP one, but you can't easily set semconv.ServiceInstanceIDKey based on the attributes not yet discovered.

@jaredjenkins
Copy link

I'm aware that we could probably perform this augmentation in the OTEL Collector - just not sure how yet (still new to this!).

@dashpole
Copy link
Contributor Author

dashpole commented Aug 4, 2023

Got it, that makes sense. I think the best path forward would be #4136

@pellared pellared added this to the untracked milestone Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants