Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remediate intermittent deploy failures/timeout on prometheus #122

Closed
zachariahmiller opened this issue Jan 23, 2024 · 6 comments
Closed
Labels
ci Issues pertaining to CI / Pipelines / Testing possible-bug Something may not be working

Comments

@zachariahmiller
Copy link
Contributor

Describe what should be investigated or refactored

Occasionally in local dev and frequently in CI Prometheus is failing to deploy successfully and deployment times out without manual intervention.

Additional context

This may be related to pepr istio container job termination and the admission/patch jobs not getting killed but we dont always have good feedback as to if this is the case in ci.

@zachariahmiller zachariahmiller added the possible-bug Something may not be working label Jan 23, 2024
@corang
Copy link
Contributor

corang commented Feb 1, 2024

Just encountered this myself, the istio sidecar isn't terminating correctly on the kube-prometheus patch something or other, job container is done but sidecar won't die

@zachariahmiller
Copy link
Contributor Author

Yeah that's the assumption. Pepr is supposed to terminate that job. It's an edge case, but not totally sure if it's the watch getting dropped or something in the actual job termination code.

@corang
Copy link
Contributor

corang commented Feb 5, 2024

I do want to say after like 8 minutes the sidecar did eventually terminate, so its like pepr is getting hung up on something

@mjnagel mjnagel added the ci Issues pertaining to CI / Pipelines / Testing label Mar 28, 2024
@docandrew
Copy link
Contributor

I'm running into this as well in the monitoring namespace when trying UDS Core on RKE2. I had to manually kubectl debug into the pod and kill istio-proxy with the magic /quitquitquit URL to get the UDS Core deployment to continue.

@mjnagel
Copy link
Contributor

mjnagel commented May 28, 2024

Should be resolved by #419 - leaving open until that is released and we have feedback though.

@mjnagel
Copy link
Contributor

mjnagel commented Jun 20, 2024

Tentatively closing - please reopen or create a new issue if you encounter this problem in latest core versions.

@mjnagel mjnagel closed this as completed Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Issues pertaining to CI / Pipelines / Testing possible-bug Something may not be working
Projects
None yet
Development

No branches or pull requests

4 participants