-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update otelcontribcol to latest #1391
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Why id this safe to re-apply now? |
Are you referring to the upstream fix or our release schedule? |
I mean, if we reverted it before, why is it now safe to reapply? What changed? |
To ensure correct behavior, we've included a no_op_label in the metricstransform processor's label_set filter (explained in the description). This is a temporary solution until the upstream fix is available, but we need to update to the latest otelcontribcol regardless. |
This isn't really a revert then, since new changes were added to the original commit. Suggest just creating a new commit instead of reapplying.
why? |
Re @sdowell While not currently anticipated, there's a possibility that future vulnerability fixes need dependency upgrades that are incompatible with current version we use (0.103). In such cases, the version upgrade process will be similar. Otelcontribcol versions beyond 0.104.0 will introduce breaking changes. I could start a new PR. |
Recreated the commit and message. |
This change updates otelcoontribcol to latest and modifies to fit breaking changes from 0.104.0 and 0.106.0. Breaking change in 0.104.0 https://github.com/open-telemetry/opentelemetry-collector-contrib/releases/tag/v0.104.0 Breaking change in 0.106.0 open-telemetry/opentelemetry-collector-contrib#34430 - Localhost is now the default setting, while otel-agent and otel-collector require 0.0.0.0, so the feature gate has been removed. - The format of the environment variable was updated to meet the new syntax requirements. The otel-agent ConfigMap was split between the reconciler and controllers, ensuring that sync-related labels are only applied to reconcilers. - A `no_op_label` has been added to ensure that the aggregation in the metricstransform processor filters on all metric labels. This is a temporary workaround until a permanent fix is implemented upstream.
Also specifying host 0.0.0.0 instead of the new default localhost.
For all receivers and health_check. Adding networkpolicy for ingress on essential ports.
- port 13133: health_check for readiness check. Apply to otel-agent in reconciler-manager and reconcilers; Otel-collector. - port 55678: opencensus receiver for otel-collector to receive metric from other pods Fix test ConfigMap.
label_set: [] | ||
# Using a no_op_label to get around issue in the upstream | ||
# https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34430 | ||
label_set: [no_op_label] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does setting label_set to nil or removing it work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it. Maybe clarify the comment?
exporters: | ||
opencensus: | ||
endpoint: otel-collector.config-management-monitoring:55678 | ||
tls: | ||
insecure: true | ||
processors: | ||
# Attributes processor adds custom configsync metric labels to applicable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did these attributes get removed from the agent config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be replaced by manifests/otel-agent-reconciler-cm.yaml
@@ -81,6 +81,7 @@ spec: | |||
ports: | |||
- containerPort: 55678 # Default OpenCensus receiver port. | |||
- containerPort: 8888 # Metrics. | |||
- containerPort: 13133 # Health check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the otel-agent readiness probe port?
/blocked open-telemetry/opentelemetry-collector-contrib#34844 Generated files (go.mod etc.) were removed, which would break AutoVC integration. No affect to the continuous build of the internal mirror so far. A CI can be added to restore the generated files to a new branch, along with base image update for CVE scan purpose. |
configMap: | ||
name: otel-agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ConfigMap still used anywhere?
exporters: | ||
opencensus: | ||
endpoint: otel-collector.config-management-monitoring:55678 | ||
tls: | ||
insecure: true | ||
processors: | ||
# Attributes processor adds custom configsync metric labels to applicable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be replaced by manifests/otel-agent-reconciler-cm.yaml
@code-review-assist review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request updates the otelcontribcol
version and introduces several important changes to address a breaking change in the default endpoint behavior. The upgrade to v0.115.0-gke.0
defaults the opencensus
receiver and prometheus
exporter endpoints to localhost, requiring explicit configuration for external access. The PR description effectively communicates this change and its implications. The subsequent modifications to set the opencensus
endpoint to 0.0.0.0:55678
, the prometheus
endpoint to 0.0.0.0:8675
, and the health_check
endpoint to 0.0.0.0:13133
are necessary and correctly implemented.
The introduction of the otel-agent-reconciler
ConfigMap is a valuable addition. The accompanying comments explaining the rationale for the changes, including the removal of the attributes
processor in the otel-agent
ConfigMap and the use of no_op_label
in the metric pipelines, enhance code clarity and maintainability. The comprehensive updates to the test files ensure consistency and validate the implemented changes.
To further improve this PR, I recommend the following:
- README Update: Document the new
otel-agent-reconciler
ConfigMap, its purpose, and the changes to the default endpoints. This will aid users in understanding the modifications and configuring their deployments accordingly. - Version Bump Justification: While the PR description mentions the breaking change, it's beneficial to explicitly state the reason for the version bump to
v0.115.0-gke.0
in the commit message or PR description. This helps track changes and provides valuable context for future maintainers. - NetworkPolicy Documentation: Add a comment to the NetworkPolicy in
manifests/templates/otel-collector.yaml
explaining its purpose and the ports it opens. This enhances readability and maintainability. - ConfigMap Clarification: Add a comment to
test/kustomization/expected.yaml
clarifying the relationship between the updatedotel-agent-config.yaml
and theotel-agent-reconciler-config.yaml
. - Upstream Issue Link: Include a link to the upstream issue ([processor/metricstransform] Aggregation skipped when label_set is empty but not nil open-telemetry/opentelemetry-collector-contrib#34430) related to the use of
no_op_label
directly in the comments withinpkg/metrics/otel.go
. This facilitates quick access to the relevant context.
@@ -89,7 +89,7 @@ COSIGN := $(BIN_DIR)/cosign | |||
GIT_SYNC_VERSION := v4.3.0-gke.4__linux_amd64 | |||
GIT_SYNC_IMAGE_NAME := gcr.io/config-management-release/git-sync:$(GIT_SYNC_VERSION) | |||
|
|||
OTELCONTRIBCOL_VERSION := v0.103.0-gke.6 | |||
OTELCONTRIBCOL_VERSION := v0.115.0-gke.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch: | ||
# Populate resource attributes from OTEL_RESOURCE_ATTRIBUTES env var and | ||
# the GCE metadata service, if available. | ||
resourcedetection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the attributes
processor is removed, delete these comments to maintain clarity and prevent confusion.
batch: | |
# Populate resource attributes from OTEL_RESOURCE_ATTRIBUTES env var and | |
# the GCE metadata service, if available. | |
resourcedetection: | |
# Populate resource attributes from OTEL_RESOURCE_ATTRIBUTES env var and | |
# the GCE metadata service, if available. | |
resourcedetection: | |
detectors: [env, gcp] |
apiVersion: networking.k8s.io/v1 | ||
kind: NetworkPolicy | ||
metadata: | ||
name: allow-port-ingress | ||
namespace: config-management-monitoring | ||
spec: | ||
podSelector: | ||
matchLabels: | ||
app: opentelemetry | ||
component: otel-collector | ||
ingress: | ||
- from: | ||
- namespaceSelector: {} | ||
ports: | ||
- protocol: TCP | ||
port: 55678 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Using a no_op_label to get around issue in the upstream | ||
# https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34430 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include a direct link to the upstream issue: open-telemetry/opentelemetry-collector-contrib#34430 for easier reference.
otel-agent-config.yaml: | | ||
receivers: | ||
opencensus: | ||
endpoint: 0.0.0.0:55678 | ||
exporters: | ||
opencensus: | ||
endpoint: otel-collector.config-management-monitoring:55678 | ||
tls: | ||
insecure: true | ||
processors: | ||
batch: | ||
# Populate resource attributes from OTEL_RESOURCE_ATTRIBUTES env var and | ||
# the GCE metadata service, if available. | ||
resourcedetection: | ||
detectors: [env, gcp] | ||
extensions: | ||
health_check: | ||
endpoint: 0.0.0.0:13133 | ||
service: | ||
extensions: [health_check] | ||
pipelines: | ||
metrics: | ||
receivers: [opencensus] | ||
processors: [batch, resourcedetection] | ||
exporters: [opencensus] | ||
telemetry: | ||
logs: | ||
level: "INFO" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating otelcontribcol to latest to include fix of open-telemetry/opentelemetry-collector-contrib#35006 and other essential security updates in components like Prometheus exporter.
The upgrade includes a breaking change of defaulting to local host instead of 0.0.0.0, the feature is now stable and feature gate has been removed, Config Sync should explicitly set endpoint to host 0.0.0.0 to expose port for external pod access. This applies to the metric receiver
opencensus
, and the exporterprometheus
.