Unify and improve GCP resource detection, second attempt #2310

dashpole · 2022-05-23T19:42:35Z

This PR supersedes GoogleCloudPlatform/opentelemetry-operations-go#383 and #2223. It was also discussed at the May 5th sig meeting.

It implements resource detection for GCE, GKE, GAE, Cloud Run, and Cloud Functions using a single detector. The underlying logic is implemented (and integration tested in GCP environments) in GoogleCloudPlatform/opentelemetry-operations-go. The underlying library provides a function, CloudPlatform, which detects which platform it is on, and then one function for each attribute. The reason for the split between "library" and detector is so that the library functions can be integration tested in real GCP environments, which is difficult to do in upstream repositories.

Unfortunately, the current resource detector was already marked stable, so we won't be able to remove existing types, etc. I've added deprecation warnings to them to encourage users to move to the new implementation.

When users switch to the new detector, there are a few breaking changes:

No longer supports detecting k8s.pod.name, container.name, k8s.namespace.name when running on k8s. These are expected to be set outside of this resource detector as described in the README.
No longer sets service.namespace to "cloud-run-managed" on Cloud Run.
Sets faas.name instead of service.name on Cloud Run.
Sets faas.id instead of service.instance.id on Cloud Run.

Additions:

Adds detector for Google App Engine
Correctly sets cloud.platform on GKE and Cloud Run
Detects cloud.availability_zone/region when running on GKE with workload identity enabled (the GCE metadata server is not accessible in that scenario)
Detects faas.version on cloud run and cloud functions.

codecov · 2022-05-23T19:47:57Z

Codecov Report

Merging #2310 (21e8fc8) into main (520efe3) will increase coverage by 0.0%.
The diff coverage is 82.1%.

@@          Coverage Diff          @@
##            main   #2310   +/-   ##
=====================================
  Coverage   73.6%   73.7%           
=====================================
  Files        141     142    +1     
  Lines       6384    6457   +73     
=====================================
+ Hits        4704    4764   +60     
- Misses      1539    1550   +11     
- Partials     141     143    +2

Impacted Files	Coverage Δ
detectors/gcp/cloud-function.go	`100.0% <ø> (ø)`
detectors/gcp/cloud-run.go	`100.0% <ø> (ø)`
detectors/gcp/gce.go	`8.7% <ø> (ø)`
detectors/gcp/gke.go	`0.0% <ø> (ø)`
detectors/gcp/detector.go	`82.1% <82.1%> (ø)`

detectors/gcp/README.md

detectors/gcp/types.go

detectors/gcp/detector.go

dmathieu · 2022-05-31T07:00:25Z

Looks like a go mod tidy is required.

dashpole · 2022-05-31T13:06:16Z

Looks like a go mod tidy is required.

Done

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>

jaredjenkins · 2023-08-04T07:30:58Z

I know this is very old - but why did you stop detecting the pod, container, etc? Having to do this with env vars is less than ideal. Since it makes it hard to leverage the detected OTEL resources to build a unique ID for the resource like: /cluster/namespace/pod

dashpole · 2023-08-04T13:20:12Z

@jaredjenkins can you expand on what you are unable to do? The previous detector was using environment variables to set those attributes.

You can achieve the previous behavior by adding:

- name: OTEL_RESOURCE_ATTRIBUTES
  value: k8s.pod.name=$(HOSTNAME),k8s.namespace.name=$(NAMESPACE),k8s.container.name=$(CONTAINER_NAME)

In your pod spec.

k8s.* attributes are not GCP-specific, and thus don't belong in the GCP detector.

jaredjenkins · 2023-08-04T14:26:29Z

@dashpole I hear you on the separation of concerns. Here's the context: I'm coming from a heavy Prometheus background where "resource labels" go through different stages: discovered -> available for augmentation or control flow. The latter is accomplished through a relabel rule that can be applied on scrape time. Benefit is that folks get consistent Instance IDs and can use the "resource labels" to build other labels.

detector := gcpdetector.NewDetector()
instanceID, err := kubernetesInstanceID(ctx, detector)
//...
res, err := resource.New(
	context.Background(),
	resource.WithDetectors(detector),
	// Keep the default detectors
	resource.WithTelemetrySDK(),
	// Add your own custom attributes to identify your application.
	resource.WithAttributes(
		semconv.ServiceNameKey.String("myservice"),
		semconv.ServiceInstanceIDKey.String(instanceID),
	),
)

// To obtain a unique identifier for Kube pods.
func kubernetesInstanceID(ctx context.Context, d resource.Detector) (string, error) {
res, err := d.Detect(ctx)
if err != nil {
	return "", errors.WithMessage(err, "cannot detect GCP resources")
}
attrs := res.Set()
cluster, ok := attrs.Value(semconv.K8SClusterNameKey)
if !ok {
	return "", errors.New("missing cluster name")
}
ns, ok := attrs.Value(semconv.K8SNamespaceNameKey)
if !ok {
	return "", errors.New("missing namespace name")
}
pod, ok := attrs.Value(semconv.K8SPodNameKey)
if !ok {
	return "", errors.New("missing pod name")
}
return fmt.Sprint(cluster, "/", ns, "/", pod), nil

We could do that using the env variables that we defined, but it's just not very ergonomic.

Based on this open-telemetry/opentelemetry-specification#3136. It looks like users are still required to provide some kind of UUID, but that's honestly pretty to useless for debugging. The OTLP -> Prometheus spec will extract the semconv.ServiceInstanceIDKey and translate it to the canonical instance tag and otherwise labels about the Kubernetes context are omitted.

You can of course try to read this data from the detectors like the GCP one, but you can't easily set semconv.ServiceInstanceIDKey based on the attributes not yet discovered.

jaredjenkins · 2023-08-04T14:27:11Z

I'm aware that we could probably perform this augmentation in the OTEL Collector - just not sure how yet (still new to this!).

dashpole · 2023-08-04T15:41:12Z

Got it, that makes sense. I think the best path forward would be #4136

dashpole force-pushed the gcp_resource_v2 branch 2 times, most recently from ae4136a to 8a4626a Compare May 23, 2022 19:45

dashpole changed the title ~~Unify and improve GCP resource detection~~ Unify and improve GCP resource detection, second attempt May 23, 2022

dashpole force-pushed the gcp_resource_v2 branch from 8a4626a to 30f2157 Compare May 23, 2022 19:47

dashpole force-pushed the gcp_resource_v2 branch 2 times, most recently from f4abae8 to cb346ca Compare May 23, 2022 20:00

dashpole marked this pull request as ready for review May 23, 2022 20:39

dashpole requested review from jmacd, MrAlias, Aneurysm9, evantorrie, XSAM, MadVikingGod, pellared, hanyuancheung and dmathieu as code owners May 23, 2022 20:39

MrAlias reviewed May 24, 2022

View reviewed changes

detectors/gcp/README.md Outdated Show resolved Hide resolved

detectors/gcp/README.md Outdated Show resolved Hide resolved

detectors/gcp/types.go Outdated Show resolved Hide resolved

detectors/gcp/detector.go Outdated Show resolved Hide resolved

dashpole force-pushed the gcp_resource_v2 branch from 8aaf8a6 to 1b66bea Compare May 24, 2022 17:35

MrAlias approved these changes May 24, 2022

View reviewed changes

dashpole force-pushed the gcp_resource_v2 branch from 1b66bea to 441bfe1 Compare May 24, 2022 17:46

dashpole mentioned this pull request May 25, 2022

[resourcedetection] Unify gke and gce detectors and fix gke zone/region detection open-telemetry/opentelemetry-collector-contrib#10347

Merged

dashpole force-pushed the gcp_resource_v2 branch from c0d12e2 to 17ac1b7 Compare May 31, 2022 13:04

dmathieu approved these changes Jun 1, 2022

View reviewed changes

dashpole force-pushed the gcp_resource_v2 branch from 80ac85e to ced4cc7 Compare June 1, 2022 15:44

dashpole and others added 3 commits June 1, 2022 15:44

update GCP resource detection

11cd22d

Apply suggestions from code review

da2d836

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>

return partial errors

0debf39

dashpole added 2 commits June 1, 2022 15:44

gotidy

5c3e949

remove host.name from GKE, since GKE doesn't support it

73ba153

dashpole force-pushed the gcp_resource_v2 branch from ced4cc7 to 73ba153 Compare June 1, 2022 15:44

Merge branch 'main' into gcp_resource_v2

21e8fc8

MrAlias merged commit b7910af into open-telemetry:main Jun 1, 2022

dashpole deleted the gcp_resource_v2 branch June 1, 2022 20:27

dashpole mentioned this pull request Aug 4, 2023

Request to Add Detector for Kubernetes #4136

Open

10 tasks

pellared added this to the untracked milestone Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify and improve GCP resource detection, second attempt #2310

Unify and improve GCP resource detection, second attempt #2310

dashpole commented May 23, 2022 •

edited

Loading

codecov bot commented May 23, 2022 •

edited

Loading

dmathieu commented May 31, 2022

dashpole commented May 31, 2022

jaredjenkins commented Aug 4, 2023

dashpole commented Aug 4, 2023

jaredjenkins commented Aug 4, 2023

jaredjenkins commented Aug 4, 2023

dashpole commented Aug 4, 2023

Unify and improve GCP resource detection, second attempt #2310

Unify and improve GCP resource detection, second attempt #2310

Conversation

dashpole commented May 23, 2022 • edited Loading

codecov bot commented May 23, 2022 • edited Loading

Codecov Report

dmathieu commented May 31, 2022

dashpole commented May 31, 2022

jaredjenkins commented Aug 4, 2023

dashpole commented Aug 4, 2023

jaredjenkins commented Aug 4, 2023

jaredjenkins commented Aug 4, 2023

dashpole commented Aug 4, 2023

dashpole commented May 23, 2022 •

edited

Loading

codecov bot commented May 23, 2022 •

edited

Loading