issue - telegraf-operator - MountVolume.SetUp failed for volume "telegraf-config" : secret "telegraf-XXXX" not found #137

anthosz · 2024-02-07T11:59:13Z

Hello,

Since few month, we experiment this kind of issue (50% of time when we plan an upgrade (when the pod respawn) and 20% of time during a pod reschedule (when it switch from a node to another one).

It is included in a Varnish statefulset.

Template

apiVersion: v1
kind: Secret
metadata:
  name: varnish
[...]
data:
  key: "XXXXXXX"
---
[...]
---
apiVersion: apps/v1
kind: StatefulSet
[...]
spec:
  template:
    metadata:
      annotations:
        telegraf.influxdata.com/env-secretkeyref-SECRET_VALUE: varnish.secret
        telegraf.influxdata.com/volume-mounts: '{"datas":"/datas"}'
        telegraf.influxdata.com/inputs: |+
[...]

How to reproduce

Deploy a new version or move the pod to another node.

Current behavior (randomly):

Warning FailedMount 2m23s (x242 over 7h58m) kubelet MountVolume.SetUp failed for volume "telegraf-config" : secret "telegraf-config-varnish-0" not found

Due to that, the pod cannot start.

Workaround:

Kill the pod and the secret is well recreated.

Expected behavior:

The secret is found

Other informations

The age of secret source is more 100 days so cannot be related to this one.

But the telegraf secret seems to be recreated every time than the pod is spawn and it seems there is an issue here: the secret cannot be created so telegraf cannot spawn (unable to mount not found secret) so the pod is freezed until we terminate it.

Versions

K8S: 1.28.4
Telegraf: 1.28.5
Telegraf operator: chart: 1.3.12 / APP version: 1.3.11

GBlodgett35 · 2024-02-23T20:08:11Z

We have the same issue. It looks like there is a race condition in the webhook with STS, due to the pod names and therefore the secret names to be the same from deploy to deploy. What is supposed to happen:

Old pod is deleted
Webhook deletes secret
New pod comes up
Webhook updates or creates secret

But what ends up happening is:

Old pod is deleted
New pod comes up
Webhook updates the existing secret
Webhook deletes secret due to the pod being deleted

conman2305 · 2024-03-01T20:17:29Z

Throwing in a +1 for this issue happening for us as well. Running influxdb/telegraf-operator:v1.3.10

anthosz · 2024-03-08T10:20:50Z

@GBlodgett35 did you found a workaround? It's pitty that we cannot tel to pod to be respawned when this issue occur :/

anthosz · 2024-04-02T21:38:13Z

Not sure why but it start to be recurrent during deployment and especially consolidation of nodes (and in specific case, it's really blocker because there is now way to automatically trigger a restart if pod is in error due to this missing secret).

Starting to have a doubt if this project is always maintained or if we need to investigate for another solution? @gitirabassi @wojciechka

Cannot really help in go but if you need test or more informations, don't hesitate.

GBlodgett35 · 2024-04-10T13:06:24Z

@anthosz We ended up embedding Telegraf on the image instead of using the operator :(

anthosz · 2024-04-10T16:06:17Z

@anthosz We ended up embedding Telegraf on the image instead of using the operator :(

That's what I feared, seems this project is not maintained anymore 😑

tlereste · 2024-04-19T15:17:23Z

Hello,
I have the same issue with telegraf-operator 1.3.11 and kubernetes 1.27.3.
And also, this issue is random and appears 50% of the time.
I welcome any information on this problem.
Thanks.

anthosz · 2024-04-19T18:06:55Z

Hello, I have the same issue with telegraf-operator 1.3.11 and kubernetes 1.27.3. And also, this issue is random and appears 50% of the time. I welcome any information on this problem. Thanks.

According to influxdata/telegraf#15192 (comment)

It seems that it's not maintained anymore by influxdata, the only way is to create the PR ourself with the fix.. (not sure if someone have knowledge about golang/operator..)

jmickey · 2024-06-24T23:51:16Z

@anthosz @tlereste @GBlodgett35 Hi folks, As this project appears to no longer be maintained, I've gone ahead re-written this project from scratch over at https://github.com/jmickey/telegraf-sidecar-operator.

It currently supports the majority of the annotations as this project, with one notable exception: telegraf.influxdata.com/istio- annotations aren't currently supported.

The project it technically pre-1.0.0, but I've been running it on a staging cluster for about a week and it's been working well.

It also resolves this issue.

anthosz · 2024-06-25T05:33:36Z

@anthosz @tlereste @GBlodgett35 Hi folks, As this project appears to no longer be maintained, I've gone ahead re-written this project from scratch over at https://github.com/jmickey/telegraf-sidecar-operator.

It currently supports the majority of the annotations as this project, with one notable exception: telegraf.influxdata.com/istio- annotations aren't currently supported.

The project it technically pre-1.0.0, but I've been running it on a staging cluster for about a week and it's been working well.

It also resolves this issue.

Thank you for the feedback, at this time, personnally, I moved all the stuff to sidecar & removed the operator.

anthosz mentioned this issue Apr 19, 2024

inputs.prometheus - http_headers - host header ignored influxdata/telegraf#15192

Closed

jmickey mentioned this issue Apr 22, 2024

fix race condition with statefulset pod recreation startup failure #142

Closed

3 tasks

jmickey mentioned this issue Jun 24, 2024

Secrets not removed if pod fails admission #144

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue - telegraf-operator - MountVolume.SetUp failed for volume "telegraf-config" : secret "telegraf-XXXX" not found #137

issue - telegraf-operator - MountVolume.SetUp failed for volume "telegraf-config" : secret "telegraf-XXXX" not found #137

anthosz commented Feb 7, 2024 •

edited

Loading

GBlodgett35 commented Feb 23, 2024

conman2305 commented Mar 1, 2024

anthosz commented Mar 8, 2024

anthosz commented Apr 2, 2024

GBlodgett35 commented Apr 10, 2024

anthosz commented Apr 10, 2024

tlereste commented Apr 19, 2024

anthosz commented Apr 19, 2024

jmickey commented Jun 24, 2024

anthosz commented Jun 25, 2024

issue - telegraf-operator - MountVolume.SetUp failed for volume "telegraf-config" : secret "telegraf-XXXX" not found #137

issue - telegraf-operator - MountVolume.SetUp failed for volume "telegraf-config" : secret "telegraf-XXXX" not found #137

Comments

anthosz commented Feb 7, 2024 • edited Loading

Template

How to reproduce

Current behavior (randomly):

Workaround:

Expected behavior:

Other informations

Versions

GBlodgett35 commented Feb 23, 2024

conman2305 commented Mar 1, 2024

anthosz commented Mar 8, 2024

anthosz commented Apr 2, 2024

GBlodgett35 commented Apr 10, 2024

anthosz commented Apr 10, 2024

tlereste commented Apr 19, 2024

anthosz commented Apr 19, 2024

jmickey commented Jun 24, 2024

anthosz commented Jun 25, 2024

anthosz commented Feb 7, 2024 •

edited

Loading