Image Automation controller silently stops working #286

sjdweb · 2022-01-04T14:22:31Z

Describe the bug

On our clusters, we had connectivity issues on Dec 26th. I noticed that since then, image automation failed to update (but ImagePolicies were up to date).

Here are the last logs for one of the controllers:

{"level":"error","ts":"2021-12-26T21:37:24.895Z","logger":"controller-runtime.manager.controller.imageupdateautomation","msg":"Reconciler error","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/myco/fleet-infra', error: SSH could not read data: Error waiting on socket"}

Since that point in time, the controller stopped working.
Killing the pod fixed this issue, but it'd be great if it could self-heal in this scenario?

Steps to reproduce

Allow image automation controller to have connectivity issue from Git repo
Observe that the controller will not try again to connect, or crash

Expected behavior

The controller should try again to reconcile as the connectivity would have been resolved.

Screenshots and recordings

No response

OS / Distro

Ubuntu 20.04

Flux version

flux version 0.16.1

Flux check

N/A

Git provider

GitHub

Container Registry provider

No response

Additional context

No response

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

aholbreich · 2022-01-13T12:02:58Z

Same for us.

but it alls seem to be go down to git level if i not mistaken. fluxcd/source-controller#439 (comment)

squaremo · 2022-01-13T12:06:23Z

@aholbreich There are two related problems:

getting "error: SSH could not read data: Error waiting on socket" in the logs
the image-automation-controller stops doing anything after that message

I know you are seeing the log message, from the comment you linked -- are you also experiencing the second problem, that the controller stops doing anything?

aholbreich · 2022-01-14T16:48:14Z

Hi, no. Not really seen controller problems.

pjbgf · 2022-03-07T17:02:13Z

Same as #282. I will be updating that thread instead, here's the latest comment:

#282 (comment)

stefanprodan transferred this issue from fluxcd/flux2 Jan 4, 2022

stefanprodan added the bug Something isn't working label Jan 4, 2022

squaremo self-assigned this Jan 5, 2022

kingdonb mentioned this issue Jan 6, 2022

Image update automation not committing the resolved version to git #159

Open

pjbgf self-assigned this Mar 7, 2022

pjbgf added this to Maintainers' Focus Mar 7, 2022

pjbgf moved this to In Progress in Maintainers' Focus Mar 7, 2022

pjbgf mentioned this issue Mar 11, 2022

Experimental managed transport for libgit2 operations fluxcd/source-controller#606

Merged

darkowlzz closed this as completed in fluxcd/source-controller#606 Mar 16, 2022

Repository owner moved this from In Progress to Done in Maintainers' Focus Mar 16, 2022

pjbgf mentioned this issue Mar 18, 2022

Add experimental managed transport for libgit2 operations #326

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image Automation controller silently stops working #286

Image Automation controller silently stops working #286

sjdweb commented Jan 4, 2022

aholbreich commented Jan 13, 2022

squaremo commented Jan 13, 2022

aholbreich commented Jan 14, 2022

pjbgf commented Mar 7, 2022

Image Automation controller silently stops working #286

Image Automation controller silently stops working #286

Comments

sjdweb commented Jan 4, 2022

Describe the bug

Steps to reproduce

Expected behavior

Screenshots and recordings

OS / Distro

Flux version

Flux check

Git provider

Container Registry provider

Additional context

Code of Conduct

aholbreich commented Jan 13, 2022

squaremo commented Jan 13, 2022

aholbreich commented Jan 14, 2022

pjbgf commented Mar 7, 2022