Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Running fluxctl sync does not apply manifests #2818

Closed
dewe opened this issue Feb 4, 2020 · 4 comments
Closed

Running fluxctl sync does not apply manifests #2818

dewe opened this issue Feb 4, 2020 · 4 comments
Labels
bug flux2 Resolution suggested - already fixed in Flux v2

Comments

@dewe
Copy link
Contributor

dewe commented Feb 4, 2020

Describe the bug

When doing a manual sync, with fluxctl sync flux does not apply the current manifests. Nothing gets applied until the next sync-interval time.

We expect flux sync to apply the manifests immediately and not wait for sync-interval to occur.

This is important in situations when something gets deleted by accident and we need to re-apply the current state.

To Reproduce

We run a standard kustomized flux deployment with the following container args:

args:
- --manifest-generation=true
- --memcached-hostname=flux-memcached.flux-system
- --memcached-service=
- --git-poll-interval=60s
- --sync-interval=5m
- --ssh-keygen-dir=/var/fluxd/keygen
- --git-user="Flux CD"
- --git-branch=k8s-labs
- --git-url=git@bitbucket.org:<private-repo>.git
- --git-path=cluster/k8s-labs
- --git-label=flux-sync/k8s-labs

Expected behavior

When doing a fluxctl sync we expect to see some method=Sync in the logs, i.e flux actually applying the manifests. But that doesn't happen until the next sync-interval.

Logs

When running fluxctl sync:

ts=2020-02-04T09:20:07.868300265Z caller=loop.go:135 component=sync-loop jobID=67fd82e8-3d25-f9d4-c433-70deb65bb606 state=in-progress
ts=2020-02-04T09:20:13.222389638Z caller=loop.go:147 component=sync-loop jobID=67fd82e8-3d25-f9d4-c433-70deb65bb606 state=done success=true
ts=2020-02-04T09:20:15.168075186Z caller=loop.go:127 component=sync-loop event=refreshed url=ssh://git@bitbucket.org/<private-repo>.git branch=k8s-labs HEAD=73e34beca9eab1ce3c88bbbdec2ee82e1c022c95

Then a few minutes later at 09:23:20, the real sync kicks in:

ts=2020-02-04T09:21:01.297425897Z caller=loop.go:127 component=sync-loop event=refreshed url=ssh://git@bitbucket.org/<private-repo>.git branch=k8s-labs HEAD=73e34beca9eab1ce3c88bbbdec2ee82e1c022c95
ts=2020-02-04T09:22:04.167663692Z caller=loop.go:127 component=sync-loop event=refreshed url=ssh://git@bitbucket.org/<private-repo>.git branch=k8s-labs HEAD=73e34beca9eab1ce3c88bbbdec2ee82e1c022c95
ts=2020-02-04T09:23:11.70381995Z caller=loop.go:127 component=sync-loop event=refreshed url=ssh://git@bitbucket.org/<private-repo>.git branch=k8s-labs HEAD=73e34beca9eab1ce3c88bbbdec2ee82e1c022c95
ts=2020-02-04T09:23:20.318411895Z caller=sync.go:482 method=Sync cmd=apply args= count=283
ts=2020-02-04T09:23:23.697816155Z caller=sync.go:548 method=Sync cmd="kubectl apply -f -" took=3.375037855s err=null output="namespace/flex unchanged\nnamespace/flux-system unchanged\nnamespace/kubernetes-dashboard <cut out the rest of kubectl apply output>"
ts=2020-02-04T09:23:23.92548348Z caller=sync.go:548 method=Sync cmd="kubectl apply -f -" took=227.512498ms err=null output="statefulset.apps/thanos-compactor configured"
ts=2020-02-04T09:23:24.170148836Z caller=sync.go:548 method=Sync cmd="kubectl apply -f -" took=244.601502ms err=null output="statefulset.apps/thanos-receive configured"
ts=2020-02-04T09:23:24.272645768Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"

Additional context

  • Flux version: 1.17.1
  • Kubernetes version: 1.14 (EKS)
  • Git provider: Bitbucket
@dewe dewe added blocked-needs-validation Issue is waiting to be validated before we can proceed bug labels Feb 4, 2020
@hiddeco
Copy link
Member

hiddeco commented Feb 4, 2020

Related to #2487

@alewis001
Copy link

@hiddeco May I ask why this issue is related to #2487 please?

I'm experiencing this sync issue but I don't understand the connection to the other one. My logs look the same as the OP.

Flux Version: 1.20.0
Kubernetes Version: 1.14.7 (AKS)
Git Provider: Bitbucket

@kingdonb
Copy link
Member

Could also be related to #3450

I'm sorry that this has gone unresponded for a while, I hope you have found your way to Flux v2 where this issue should be mitigated already. We've added an entry to the Flux v1 FAQ that addresses performance problems related to image scanning, as a subheading under "Why should I upgrade": https://fluxcd.io/legacy/flux/faq/#flux-v1-runtime-behavior-doesnt-scale-well

Flux v1 is formally superseded since the GitOps Toolkit APIs have been declared stable:

https://fluxcd.io/docs/migration/timetable/

The repo will remain in maintenance for some time, but no new features can be accepted. Bugs can be addressed if they are critical and there is a PR to resolve it, but soon only CVEs can be addressed in Flux v1, and new users are all recommended to use Flux v2 for some time now.

Thanks for using Flux!

I am closing this issue, but please don't take this as a sign of no support. Flux v1 remains in supported maintenance for now. I can only work on issues that are actively affecting users and I am purely trying to get the number of issues in the queue down to a more manageable number. If you are still suffering then we can reopen, or address this in a new issue report.

@kingdonb
Copy link
Member

kingdonb commented Jul 31, 2021

To clarify in plain language what I think causes this, since it feels unresolved leaving the discussion like this

Flux's image automation (v1) scans the cluster for images that it can automate, then it scans their image repos for any tags that might be candidate for updating. When you have not enabled semver automation, the only information that Flux has to go by in order to determine which image is the correct "latest" image is the build timestamp.

In order to retrieve the build time, Flux must download the metadata of the image, which Docker Hub and other registries count as a "pull" for the purposes of rate limiting, since even though it is in the header, it is in the image layer, and they must access it from storage. (Ref: cold storage vs hot, cold storage is cheaper as long as you never ever need access to that data.)

Flux v1 cannot properly assess which image should be synced until it has all of the metadata from every possible candidate image. So when Flux first starts, and finds image automation enabled in some places, syncs don't happen at all until the image scanning process is completed, to avoid the risk of reverting some newer image (perhaps manually applied to the cluster while Flux was offline) back to an older one. And this whole process is rate limited, so that will take a bit longer than we'd like (in the pathological case of a cluster with many images, from many different sources, with many different candidate tags, it can be that this problem is for all intents and purposes utterly intractable, and thus might never finish.)

Flux v2 does not pull image metadata anymore, instead requiring automation to use "Sortable Image Tags" or tags with enough information in the tag string itself to sort images without any metadata, simply by filtering (usually with some form of regex) and then ordering the data returned in the tag index by some sortable string part. This is heaps more efficient.

I apologize for the length of time that has elapsed, but hope you've resolved this issue one way or another, and I encourage you to check out Flux v2. Don't hesitate to contact us if you need support in the way of migration or otherwise. Thanks for using Flux.

@kingdonb kingdonb added flux2 Resolution suggested - already fixed in Flux v2 and removed blocked-needs-validation Issue is waiting to be validated before we can proceed labels Jul 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug flux2 Resolution suggested - already fixed in Flux v2
Projects
None yet
Development

No branches or pull requests

4 participants