-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
find files with '/var/log/pods/*/*/*.log' pattern: open .: permission denied #33083
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
A very similar issue was reported yesterday. Notably, using |
That changed the error from Also, the OpenTelemetry docs themselves say to use singular I tried using
The pods have identical hostPath mounts: # Fluentbit
apiVersion: v1
kind: Pod
metadata:
name: fluentbit-fluent-bit-dpnnk
namespace: openobserve
spec:
containers:
- name: fluent-bit
volumeMounts:
- mountPath: /var/log
name: varlog
volumes:
- hostPath:
path: /var/log
type: ""
name: varlog # agent-collector
apiVersion: v1
kind: Pod
metadata:
name: openobserve-collector-agent-collector-bvrln
namespace: openobserve
spec:
containers:
- name: otc-container
volumeMounts:
- mountPath: /var/log
name: varlog
volumes:
- hostPath:
path: /var/log
type: ""
name: varlog |
Hey @MathiasPius , could you share more details about the environment you are running? It could either be a permission issue that comes from the operator or could be platform specific 🤔 . I wasn't able to reproduce it locally (using the collector Helm chart) on k8s Sharing the values file I used for reference (using latest mode: daemonset
presets:
logsCollection:
enabled: true
command:
name: otelcontribcol
config:
exporters:
debug:
verbosity: detailed
otlp/some:
....
receivers:
filelog:
start_at: end
include_file_name: false
include_file_path: true
exclude:
- /var/log/pods/default_daemonset-opentelemetry-collector*_*/opentelemetry-collector/*.log
include:
- /var/log/pods/*/*/*.log
service:
pipelines:
logs:
receivers: [filelog]
processors: [batch]
exporters: [otlp/some] |
I am using an The whole config is available here: https://github.com/MathiasPius/kronform/blob/main/manifests/infrastructure/openobserve/agent-collector.yaml#L82-L146 Note that this is the version using I deployed the following pod: apiVersion: v1
kind: Pod
metadata:
name: investigator
namespace: openobserve
spec:
containers:
- name: pod
image: busybox
command: ["sleep", "3600"]
volumeMounts:
- mountPath: /var/log
name: varlog
volumes:
- hostPath:
path: /var/log
type: ""
name: varlog And did some digging around:
I won't rule out that Talos might have something to do with it, but since both my own pods as shown above and Fluent-bit works out of the box, |
Thank's @MathiasPius. I tried to reproduce the issue (on a GKE cluster) using the operator as well but I can't. I'm using:
Sharing the manifest I used for reference: otel-col.yamlkind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: otel-collector
labels:
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- nodes
- namespaces
- events
- pods
- services
- persistentvolumes
- persistentvolumeclaims
verbs: ["get", "watch", "list"]
- apiGroups: [ "extensions" ]
resources:
- replicasets
verbs: [ "get", "list", "watch" ]
- apiGroups: [ "apps" ]
resources:
- statefulsets
- deployments
- replicasets
- daemonsets
verbs: [ "get", "list", "watch" ]
- apiGroups: [ "batch" ]
resources:
- jobs
- cronjobs
verbs: [ "get", "list", "watch" ]
- apiGroups: [ "storage.k8s.io" ]
resources:
- storageclasses
verbs: [ "get", "list", "watch" ]
- apiGroups:
- ""
resources:
- nodes/stats
verbs:
- get
- nonResourceURLs:
- "/metrics"
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: otelcol
subjects:
- kind: ServiceAccount
name: daemonset-collector # name of your service account
namespace: default
roleRef: # referring to your ClusterRole
kind: ClusterRole
name: otel-collector
apiGroup: rbac.authorization.k8s.io
---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: daemonset
spec:
mode: daemonset
serviceAccount:
hostNetwork: true
envFrom:
- secretRef:
name: otlp-secret
volumeMounts:
- name: varlogpods
mountPath: /var/log/pods
readOnly: true
volumes:
- name: varlogpods
hostPath:
path: /var/log/pods
config: |
exporters:
debug: {}
otlp:
compression: none
endpoint: ${env:otlp_endpoint}
headers:
Authorization: Bearer ${env:otlp_secret_token}
extensions:
health_check: {}
processors:
batch: {}
filter/logs_instrumented_pods:
logs:
log_record:
- resource.attributes["logs.exporter"] == "otlp"
resource/k8s:
attributes:
- key: service.name
from_attribute: app.label.component
action: insert
k8sattributes:
extract:
metadata:
- k8s.namespace.name
- k8s.deployment.name
- k8s.statefulset.name
- k8s.daemonset.name
- k8s.cronjob.name
- k8s.job.name
- k8s.node.name
- k8s.pod.name
- k8s.pod.uid
- k8s.pod.start_time
- container.id
labels:
- tag_name: app.label.component
key: app.kubernetes.io/component
from: pod
- tag_name: logs.exporter
key: otel.logs.exporter
from: pod
filter:
node_from_env_var: K8S_NODE_NAME
passthrough: false
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.ip
- sources:
- from: resource_attribute
name: k8s.pod.uid
- sources:
- from: connection
memory_limiter:
check_interval: 5s
limit_percentage: 80
spike_limit_percentage: 25
receivers:
filelog:
exclude:
- /var/log/pods/default_daemonset-collector*_*/opentelemetry-collector/*.log
include:
- /var/log/pods/*/*/*.log
include_file_name: false
include_file_path: true
operators:
- id: get-format
routes:
- expr: body matches "^\\{"
output: parser-docker
- expr: body matches "^[^ Z]+ "
output: parser-crio
- expr: body matches "^[^ Z]+Z"
output: parser-containerd
type: router
- id: parser-crio
regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
timestamp:
layout: 2006-01-02T15:04:05.999999999Z07:00
layout_type: gotime
parse_from: attributes.time
type: regex_parser
- combine_field: attributes.log
combine_with: ""
id: crio-recombine
is_last_entry: attributes.logtag == 'F'
max_log_size: 102400
output: extract_metadata_from_filepath
source_identifier: attributes["log.file.path"]
type: recombine
- id: parser-containerd
regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
timestamp:
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
parse_from: attributes.time
type: regex_parser
- combine_field: attributes.log
combine_with: ""
id: containerd-recombine
is_last_entry: attributes.logtag == 'F'
max_log_size: 102400
output: extract_metadata_from_filepath
source_identifier: attributes["log.file.path"]
type: recombine
- id: parser-docker
output: extract_metadata_from_filepath
timestamp:
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
parse_from: attributes.time
type: json_parser
- id: extract_metadata_from_filepath
parse_from: attributes["log.file.path"]
regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
type: regex_parser
- from: attributes.stream
to: attributes["log.iostream"]
type: move
- from: attributes.container_name
to: resource["k8s.container.name"]
type: move
- from: attributes.namespace
to: resource["k8s.namespace.name"]
type: move
- from: attributes.pod_name
to: resource["k8s.pod.name"]
type: move
- from: attributes.restart_count
to: resource["k8s.container.restart_count"]
type: move
- from: attributes.uid
to: resource["k8s.pod.uid"]
type: move
- from: attributes.log
to: body
type: move
- type: json_parser
if: 'body matches "^{.*}$"'
severity:
parse_from: attributes.level
start_at: end
service:
extensions:
- health_check
pipelines:
logs:
exporters:
- otlp
processors:
- k8sattributes
- batch
- resource/k8s
- filter/logs_instrumented_pods
receivers:
- filelog
--- I would suggest simplifying your configuration here to only focus on volumeMounts:
- name: varlogpods
mountPath: /var/log/pods
readOnly: true
volumes:
- name: varlogpods
hostPath:
path: /var/log/pods @open-telemetry/operator-approvers any ideas on if that could be an operator specific issue and specifically the privileges set? |
Just tried applying your settings to my setup: MathiasPius/kronform@13014b1#diff-90c4b8f3bdde68e8eefedefaa8f8a89b6f5360e5135ccc32c524ca49e9057d9dR84-R299
Same exact result. I'm a little curious about the |
One obvious difference between Fluent Bit/busybox container images and the otel contrib image, is that the former run as root by default, while the latter runs as a normal user. Can you try explicitly setting the securityContext to run the otel container as root? In the Collector CR, it'd be something like: securityContext:
runAsUser: 0
runAsGroup: 0 |
@swiatekm-sumo That's such an obvious oversight on my part 🤡. I had simply assumed that it ran as root since those logs are often owned by root, and frankly a lot of Kubernetes-packaged software doesn't bother configuring a correct securityContext or running as non-root. Explicitly running as root like you suggested fixed the issue! |
Hi @MathiasPius where did You add :
which file ? i facing the same issue |
@Ivalberto Here's my entire OpenTelemetryCollector configuration, with the securityContext section highlighted: https://github.com/MathiasPius/kronform/blob/4055fbc830cb829d247be5759f14dae44d1ceb6e/manifests/infrastructure/openobserve/agent-collector.yaml#L275-L277 |
This solved it for me (Talos 1.8.0 + 1.8.2)
|
Just a heads up, Talos Linux 1.8.3 no longer requires this workaround because the permissions have been changed. |
Thanks for reporting it to Talos! |
No, thank you for your blog! It pushed me (hopefully :)) in the right way and helped immensely especially in the beginning :) |
Component(s)
receiver/filelog
What happened?
Description
Filesystem permission issues attempting to set up an OpenTelemetry Collector to hoover up pod log files.
The collector is deployed using the opentelemetry operator.
I have a suspicion this might be caused by the way doublestar traverses the filesystem.
Steps to Reproduce
Deploy an
OpenTelemetryCollector
agent into a Talos Kubernetes cluster.Expected Result
Logs are read from
/var/log/pods/
correctly.Actual Result
No logs are collected, and the collector agent pod reports
permission denied
and "no files match the configured criteria".Collector version
v0.99.0 & v0.100.0 (probably others)
Environment information
Environment
OS: Talos Linux 1.6.4, 1.7.0, 1.7.1
OpenTelemetry Collector configuration
Log output
Additional context
Talos creators suggested the issue might be resolved by granting the daemonset/pod the
CAP_DAC_READ_SEARCH
capability, but this did not work. I additionally tried adding the following security context, which also did not work:The text was updated successfully, but these errors were encountered: