Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty commits #166

Closed
Nicklason opened this issue May 16, 2021 · 25 comments
Closed

Empty commits #166

Nicklason opened this issue May 16, 2021 · 25 comments
Assignees
Labels
bug Something isn't working

Comments

@Nicklason
Copy link

I am using the ImageUpdateAutomation resource and see empty commits being pushed to my git repository. I looked at a similiar issue (#135) but they got an empty commit every minute, where I get them randomly, so I believe it is a different issue.

I have had the issue since 20. april. I was using flux v0.12.1 back then, and now I use latest version v0.13.4 and I still get them.

20/4/2021
image

16/5/2021
image

---
apiVersion: image.toolkit.fluxcd.io/v1alpha2
kind: ImageUpdateAutomation
metadata:
  name: flux-system
  namespace: flux-system
spec:
  sourceRef:
    kind: GitRepository
    name: flux-system
  git:
    checkout:
      ref:
        branch: main
    commit:
      author:
        name: fluxcdbot
        email: fluxcdbot@users.noreply.github.com
      messageTemplate: 'update manifests'
  interval: 1m0s
@stefanprodan stefanprodan added the bug Something isn't working label May 26, 2021
@squaremo
Copy link
Member

Any chance I can see the git repo you're using? I understand if it contains sensitive data and you'd prefer not. You might have to help me out with narrowing down where the problem happens ..

@Nicklason
Copy link
Author

The git repository is a private GitHub repository but I can share it with you. I would prefer to mail you a zip file but I guess I could just add you as a contributor.

@squaremo
Copy link
Member

squaremo commented Jun 2, 2021

I would prefer to mail you a zip file but I guess I could just add you as a contributor.

Either works for me -- you can email a zip to the address on my profile, or michael@weave.works.

@DaemonDude23
Copy link

I'm having this same problem as well. Same behavior on Flux2 v0.13.4 and v0.14.2.

I haven't built or put a new image in this AWS ECR repo since last week. I just updated to Flux's latest and reapplied my kustomization to continue testing today.
My image automation intervals are set to every 2m, so it seems that at each time of reconciliation, it's making a commit at that interval even if it's empty/unneeded because there's no update to the image tag.
I'm using BitBucket.

My Kustomization:

apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
  name: www-test-eks-com-v1
  namespace: backend
spec:
  images:
  - name: 000000000000.dkr.ecr.us-east-1.amazonaws.com/www-test-eks-com-v1
    newTag: 20210527-223554 # {"$imagepolicy": "flux-system:www-test-eks-com-v1:tag"}
  interval: 10m0s
  path: ./flux/kustomize/apps/www-test-eks-com-v1
  prune: true
  sourceRef:
    kind: GitRepository
    name: flux-system
    namespace: flux-system
  validation: client
  healthChecks:
  - apiVersion: apps/v1
    kind: Deployment
    name: www-test-eks-com-v1
    namespace: backend
  timeout: 2m

Excerpt from the relevant deployment.yaml:

      containers:
      - name: www-test-eks-com-v1
        image: 000000000000.dkr.ecr.us-east-1.amazonaws.com/www-test-eks-com-v1:20210505-233849

Image Automation objects:

---
apiVersion: image.toolkit.fluxcd.io/v1alpha2
kind: ImageUpdateAutomation
metadata:
  name: www-test-eks-com-v1
  namespace: flux-system
spec:
  git:
    commit:
      messageTemplate: |
        Automated image update

        Automation name: {{ .AutomationObject }}

        Files:
        {{ range $filename, $_ := .Updated.Files -}}
        - {{ $filename }}
        {{ end -}}

        Objects:
        {{ range $resource, $_ := .Updated.Objects -}}
        - {{ $resource.Kind }} {{ $resource.Name }}
        {{ end -}}

        Images:
        {{ range .Updated.Images -}}
        - {{.}}
        {{ end -}}
      author:
        email: fluxcdbot@users.noreply.github.com
        name: fluxcdbot
  interval: 2m0s
  sourceRef:
    kind: GitRepository
    name: flux-system
    # namespace: flux-system
  suspend: false
---
apiVersion: image.toolkit.fluxcd.io/v1alpha2
kind: ImageRepository
metadata:
  name: www-test-eks-com-v1
  namespace: flux-system
spec:
  image: 000000000000.dkr.ecr.us-east-1.amazonaws.com/www-test-eks-com-v1
  interval: 2m0s
  secretRef:
    name: ecr-credentials
---
apiVersion: image.toolkit.fluxcd.io/v1alpha2
kind: ImagePolicy
metadata:
  name: www-test-eks-com-v1
  namespace: flux-system
spec:
  filterTags:
    pattern: '((?P<timestamp>\d{8}-\d{6})|(?P<timestamp>\d{8}-\d{6}-\d{1}))'
    extract: '$timestamp'
  imageRepositoryRef:
    name: www-test-eks-com-v1
  policy:
    alphabetical:
      order: asc

Flux CLI output concerning images:

❯ flux get images all
NAME                                    READY   MESSAGE                         LAST SCAN                       SUSPENDED
imagerepository/www-test-eks-com-v1     True    successful scan, found 3 tags   2021-06-03T13:00:30-07:00       False

NAME                            READY   MESSAGE                                                                                                                         LATEST IMAGE
imagepolicy/www-test-eks-com-v1 True    Latest image tag for '000000000000.dkr.ecr.us-east-1.amazonaws.com/www-test-eks-com-v1' resolved to: 20210527-223554   000000000000.dkr.ecr.us-east-1.amazonaws.com/www-test-eks-com-v1:20210527-223554

NAME                                            READY   MESSAGE                                                                 LAST RUN                        SUSPENDED
imageupdateautomation/www-test-eks-com-v1       True    committed and pushed 6421cfd32363cac84bd70b6fa2a8e0a4402a7845 to master 2021-06-03T13:00:45-07:00       False

Logs - Image Automation Controller:

{"level":"info","ts":"2021-06-03T19:51:44.079Z","logger":"controller-runtime.manager.controller.imageupdateautomation","msg":"Starting workers","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","worker count":1}
{"level":"info","ts":"2021-06-03T19:51:59.653Z","logger":"controller-runtime.manager.controller.imageupdateautomation","msg":"pushed commit to origin","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","name":"www-test-eks-com-v1","namespace":"flux-system","revision":"fa68e0a261ad6aa9b8cb4ecc8db89cc68f07326f","branch":"master"}
{"level":"info","ts":"2021-06-03T19:52:08.908Z","logger":"controller-runtime.manager.controller.imageupdateautomation","msg":"pushed commit to origin","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","name":"www-test-eks-com-v1","namespace":"flux-system","revision":"8e214093bf5e9870ea18319ff63032e000569296","branch":"master"}
{"level":"info","ts":"2021-06-03T19:54:12.551Z","logger":"controller-runtime.manager.controller.imageupdateautomation","msg":"pushed commit to origin","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","name":"www-test-eks-com-v1","namespace":"flux-system","revision":"c8c8dbd510859b8c8202eef437841693d37b8641","branch":"master"}
{"level":"info","ts":"2021-06-03T19:56:24.110Z","logger":"controller-runtime.manager.controller.imageupdateautomation","msg":"pushed commit to origin","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","name":"www-test-eks-com-v1","namespace":"flux-system","revision":"89e4da1fe3221f66222ff32af1120061ab7859bc","branch":"master"}

Events - Image Automation controller:

Events:
  Type    Reason  Age    From                         Message
  ----    ------  ----   ----                         -------
  Normal  info    15m    image-automation-controller  committed and pushed change 8b49c700a03b516d5796b267724a66d85b483d60 to master
  Normal  info    13m    image-automation-controller  committed and pushed change a98daef1c988002f452fa1e1c39986c7dd9310ca to master
  Normal  info    11m    image-automation-controller  committed and pushed change 41a74a71939bdb116a1682d757614d34b9d540ef to master
  Normal  info    9m4s   image-automation-controller  committed and pushed change fe2b439a906fc48b7771cbed71130fa0da253e18 to master
  Normal  info    8m1s   image-automation-controller  committed and pushed change efc761e63bd7544f81275bb8437f5cd47992518a to master
  Normal  info    7m13s  image-automation-controller  committed and pushed change d08486200e77261161d2d3eb44c07b0a19ce8e03 to master
  Normal  info    7m9s   image-automation-controller  committed and pushed change 1371b7876e1de0adb49e9581a44b5324e9f603dc to master
  Normal  info    7m3s   image-automation-controller  committed and pushed change 03cf55b6acc038d8a2b0fdb193138bac5399f49c to master
  Normal  info    6m1s   image-automation-controller  committed and pushed change fa68e0a261ad6aa9b8cb4ecc8db89cc68f07326f to master
  Normal  info    5m52s  image-automation-controller  committed and pushed change 8e214093bf5e9870ea18319ff63032e000569296 to master
  Normal  info    3m48s  image-automation-controller  committed and pushed change c8c8dbd510859b8c8202eef437841693d37b8641 to master
  Normal  info    96s    image-automation-controller  committed and pushed change 89e4da1fe3221f66222ff32af1120061ab7859bc to master

BitBucket's commit page:
Screenshot_20210603_130338

@squaremo
Copy link
Member

squaremo commented Jun 9, 2021

I'm having this same problem as well

Sorry to hear that @chicken231 , and thank you for all the environmental info. Is it possible to fork your repo, or to otherwise get access to it (keeping it private of course)?

@squaremo
Copy link
Member

squaremo commented Jun 9, 2021

@Nicklason I'm sure you would have mentioned it, but just in case: is there anything that appears to trigger the empty commits; e.g., making a commit yourself to the same branch, or restarting the controller, or a particular image getting a new tag?

Also: do you have any symlinks or other non-regular files/directories in your git repo?

@DaemonDude23
Copy link

@squaremo

I can't give you access to this cluster as it's for work (AWS EKS). Was the info and YAMLs I provided insufficient to reproduce or attempt to reproduce? Would anything short of giving you a copy of the whole repo be acceptable?

What I can do this weekend is package up all the relevant YAMLs relating to this deployment and image automation and test it in my bare metal lab cluster. Different container registry, different git repo provider, different k8s version, etc. If it exhibits the same symptoms, would providing those relevant YAMLs to you so you can kubectl apply -f be sufficient? I'd throw them all in one file for easy apply/delete.


As for your note to the OP, it seems every reconciliation loop (at every automation interval), it creates and pushes a commit no matter what. I'm 90% sure the same occurs if I attempt a reconciliation via the flux cli (flux reconcile image update www-test-eks-com-v1). I will be able to confirm that this weekend.

I have 0 symlinks in my entire repo. I just verified this with certainty on my end.

Thanks for any help! If we can get this fixed, I can roll out Flux for my CD.

@squaremo
Copy link
Member

squaremo commented Jun 9, 2021

I can't give you access to this cluster as it's for work (AWS EKS).

Fair enough, that's the usual situation -- it's worth asking just in case :-)

Was the info and YAMLs I provided insufficient to reproduce or attempt to reproduce?

A lot of the behaviour of the image-automation-controller -- especially buggy behaviour -- depends on the contents of the git repo. For example, the problem in #164 depended on a particular interleaving of commits, and #135 involved broken symlinks in the repo (thus the question about symlinks!). As a result, it's tricky to see what's really going on given just the Kubernetes configuration.

What I can do this weekend is package up all the relevant YAMLs relating to this deployment and image automation and test it in my bare metal lab cluster. Different container registry, different git repo provider, different k8s version, etc. If it exhibits the same symptoms, would providing those relevant YAMLs to you so you can kubectl apply -f be sufficient?

Yes for sure -- or slightly better, if you can make a github repo with the reduced/redacted files, I can fork it and go from there.

it seems every reconciliation loop (at every automation interval), it creates and pushes a commit no matter what.

That sounds like #135, which produced a commit every loop because go-git mistakenly reported a broken symlink as modified. So perhaps there's some other similar thing going on.

@Nicklason
Copy link
Author

@Nicklason I'm sure you would have mentioned it, but just in case: is there anything that appears to trigger the empty commits; e.g., making a commit yourself to the same branch, or restarting the controller, or a particular image getting a new tag?

Also: do you have any symlinks or other non-regular files/directories in your git repo?

I am not sure if it has something to do with me working on the repository or restarting flux pods, it just seems random. The repository shouldn't have any symbolic links either.

@squaremo squaremo self-assigned this Jun 21, 2021
@squaremo
Copy link
Member

@Nicklason Naively, I ran the controller against the definitions in your repo, to see if I could get empty commits. I've been running it for a day and have not seen any. I notice you see around one or two empty commits a day, but I will need to run it longer to have confidence that I'm failing to reproduce the problem (something something poisson distribution). In the meantime, I don't see any smoking guns in the repo -- it all looks innocuous 🤔.
The next step might be for you to run an image with lots of tracing output, so we can see what it thinks has changed. If you're OK to do that, I'll put one together ..

@Nicklason
Copy link
Author

@squaremo Yes I still want to help. If you make me an image I'll run it. It might be a good idea to add debugging to all the images anyway 😃

@DaemonDude23
Copy link

I wasn't able to reproduce empty commits in my lab (Gitlab repo, Gitlab registry, Flux v0.15.2). It works like a charm. The only differences in my manifests pasted above between my two environments is the object names and using SemVer image policy.
I'll update if anything new becomes relevant from my end, or if I can help in testing with this special image.

@squaremo
Copy link
Member

The image docker.io/squaremo/image-automation-controller:0a9c9a7437629093349eb3d3c77f8d9e5d6fba7c is built from #190. It outputs a bunch of tracing to the log at debug level -- if you use it, you'll need to set --log-level=debug in the controller deployment.

Among the tracing:

  • which files are added to the commit, and which are ignored
  • if any policies result in changing a file

By definition, it cannot be changing files if the result is an empty commit; but it's difficult to know what else to trace. Suggestions encouraged :-)

@Nicklason
Copy link
Author

Nicklason commented Jun 24, 2021

@squaremo You need to build it for arm64 :)

Might have been worth mentioning this earlier.

@DaemonDude23
Copy link

@squaremo I tested the image you provided with debug logs enabled in my x86 EKS cluster. Current Flux version = 0.15.3
Here is the log output with a few redactions: https://pastebin.com/DmmCajbD

An interesting excerpt from the logs that it records before creating the empty commit:

"old":"20210527-223554","new":"20210527-223554"

@hiddeco
Copy link
Member

hiddeco commented Jun 25, 2021

That smells like the comparison thinks it is looking at different types, e.g. the old value being a interpreted as a string, but the new value as an integer, or vice-versa.

@Nicklason
Copy link
Author

I find it hard to get anything useful from the logs. I have quite a lot of image policies and most of the log messages are about them and are repeated every minute. I think I would have to set up a notification for when a commit is made in order to get a chance to see the relevant logs.

These are the messages I keep seeing:

scanning files
reading file
found policy
found schema extension
applying setter

As @chicken231 said, I also see "old" and "new" being the same. The "old" and "new" are just links to a docker image. I don't know what it means but I would guess that it is used to check if there is a new tag matching the image policy, but you can correct me on that.

@squaremo
Copy link
Member

I find it hard to get anything useful from the logs

That is the nature of log messages at this level -- I don't know what I'm looking for, so I output everything I can think of. I was hoping that there would be a noticeable pattern of trace messages before a commit which turns out to be empty -- given the commit ref, you can grep the log for it and look at what came immediately before (and compare with a commit that isn't empty). But it's a shot in the dark, really -- I rather suspect there's some environmental cause, rather than a particular path through the code.

The trace message about applying a setter with "old" and "new" values is output whether or not the values differ (the filter sets the value unconditionally and the log follows). There's a message about finding a setter mark, so it would make sense to restrict the trace message to just when it's changing the value.

@Nicklason
Copy link
Author

That is the nature of log messages at this level

Yes I understand. Sorry my last comment came out a bit wrong.

I rather suspect there's some environmental cause

I think you are right. It looks like I have indirectly solved the issue - I have not had empty commits for a week (since 9/7/2021). The cluster consists of 6 network booted raspberry pi and a week ago (same day as last empty commit) I migrated to a new NAS, so it looks like the NAS was the problem.

I should have looked closer at the logs before moving to a new NAS and I should have included information about the environment when I made the issue, but I just didn't suspect that the NAS / filesystem would affect the controller.

@DaemonDude23
Copy link

I tried switching ImagePolicy policies with different RegExs, SemVer, but the empty commits kept coming.
I also was able to reproduce this in my lab, simply by deleting the container registry (GitLab) associated with an Image automation. The next day I noticed I had 500+ empty commits.
I'm on the latest Flux v0.17.1.

Is there anything I can do from my end to help get this fixed?

@squaremo
Copy link
Member

I also was able to reproduce this in my lab, simply by deleting the container registry (GitLab) associated with an Image automation. The next day I noticed I had 500+ empty commits.

To clarify: you stopped running the image registry that hosts images used by the automation?

@DaemonDude23
Copy link

@squaremo This was a month or more ago so my memory is fuzzy.
I changed my username on GitLab. Before they'd allow me to do that, I had to delete all container images in all of my repos (1 repo to 1 registry is the formula I think).

So I may have spoken incorrectly, I can't recall at what point in time these empty commits occurred, if it was after I deleted all the container images in the registry, or after I did that and after I changed username (then the endpoint should have returned a 404 I'd think). Either way I don't think it should be pushing a commit unless a newer image is actually detected.

How about I spin up a small test cluster somewhere and give a few folks in this thread access to it and the corresponding repo, assuming I can reproduce the problem there?

@squaremo
Copy link
Member

How about I spin up a small test cluster somewhere and give a few folks in this thread access to it and the corresponding repo, assuming I can reproduce the problem there?

That would be super helpful! Thank you for putting time into this 🙏

@DaemonDude23
Copy link

I'm happy to report that I can no longer reproduce this issue.
Flux version 0.24.1. image-update-automation controller version 0.14.1 (manually overridden due to this issue).

@pjbgf
Copy link
Member

pjbgf commented Mar 22, 2022

@DaemonDude23 thank you for testing this and confirming you could not reproduce it anymore, therefore I am closing the issue.

If users are still experiencing this, please add a comment and we will reopen the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants