Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Fluxctl timesout, loading notes from repo - context deadline exceeded on v1.22.2 #3489

Closed
dvp34 opened this issue Jun 7, 2021 · 6 comments
Closed
Assignees
Labels
flux2 Resolution suggested - already fixed in Flux v2

Comments

@dvp34
Copy link

dvp34 commented Jun 7, 2021

Describe the bug
we use GitHub enterprise version, and we started seeing the below error, which is causing the latest changes not being applied to kubernetes.

To Reproduce

Steps to reproduce the behaviour:
just running "fluxctl sync".
started seeing this behavior more often with linkerD 2.10

  1. Provide Flux install instructions
  2. Provide a GitHub repository with Kubernetes manifests

Expected behavior

Expect "fluxctl sync" to complete and apply the resources to kubernetes.

Logs

ts=2021-06-07T18:46:09.544252696Z caller=loop.go:108 component=sync-loop err="loading notes from repo: running git command: git [notes --ref flux list]: context deadline exceeded"

Additional context

  • Flux version: 1.22.2
  • Kubernetes version: 1.17
  • Git provider: GitHub Enterprise
  • Container registry provider: private
@dvp34 dvp34 added blocked-needs-validation Issue is waiting to be validated before we can proceed bug labels Jun 7, 2021
@kingdonb
Copy link
Member

kingdonb commented Jun 7, 2021

Greetings, thanks for posting. The getNotes function is where the message loading notes from repo is emitted from. This is in the middle of the Sync function, which is the center of how Flux v1 works to do its sync. I am not sure if this hits the remote git repo, but the context deadline exceeded error usually indicates something which is taking longer than the allowed timeout.

You can increase the timeout duration or you can address the root cause of whatever is taking longer. Can you provide some info about what dimensions are scaled out, so we can understand better how this error condition might be arrived at?

If your git repo is very large or has a lot of tags, those are examples of what I mean. I inferred from your post that you have been using Flux v1 for a while. Something is growing over time, and as a result the timeout condition has become more likely to reach, until it is apparently timing out most every time.

Other things which might be a leading cause of this trouble: (1) your GitHub Enterprise instance is heavily taxed and under load, so is slow to respond, or your git repository is large and expensive to clone, or (2) your cluster or node hosting the Flux installation is similarly taxed or under heavy load, which is putting pressure on Flux resource limits and causing the context deadline exceeded error.

Generally we are recommending all users migrate to Flux v2 as soon as possible since feature parity was announced, since users will experience many performance benefits and observability of timeout-related failure is raised significantly vs the Flux v1 user experience. There are fewer points of failure due to the Kustomize changes and they are isolated better, so we generally don't have to guess anymore whether the problem is experienced due to network issues during git clone, or during an apply operation later on when the git sources have already been retrieved and the issue must be caused somewhere else.

@dvp34
Copy link
Author

dvp34 commented Jun 7, 2021

Thank you for the quick response @kingdonb
I did try to increase the timeout. by calling fluxctl sync --timeout 15m still similar behavior.
the git repo not too crazy. we have observed on tiny repo too.
we are evaluating migrating to FluxV2. Thank you for the recommendation.
the node which has Flux seems to be okay. nothing crazy there. the only thing I can't seem to verify is the Enterprise instance.

@kingdonb
Copy link
Member

kingdonb commented Jun 7, 2021

@durprasa The timeouts which should be adjusted are not in fluxctl but instead options to the fluxd daemon, you will need to customize them in helm values or in the container definition if you weren't using helm to install.

--git-timeout    20s    duration after which git operations time out
--sync-timeout   1m     duration after which sync operations time out

These are the timeout values that can be adjusted to positive effect.

They are documented here: https://docs.fluxcd.io/en/1.22.2/references/daemon/#setup-and-configuration

If the git repository is not large, and does not take long to clone from scratch, then I would start by attempting to adjust --sync-timeout here rather than --git-timeout. (That kind of makes sense, and I think that's the point during which git notes is called.)

If increasing the timeout abates the situation, there are a number of places where timeout-inducing loads could be coming from. Diagnosing this in Flux v1 is a tricky situation and will require some specific knowledge of your configuration.

We often see these sorts of timeouts coming from kustomize build but not usually with the error message loading notes from repo. If you have a large number of image-automated workloads or other heavy use of certain flux features I might not have thought of that might use a git note, that could be causing a large volume of notes, as it appears at a glance these notes entries are where Flux daemon stages some details or instructions to pass forward to the next step in the flux Sync.

@dvp34
Copy link
Author

dvp34 commented Jun 7, 2021

Thank you again!
let me try to increase --sync-timeout to see if that changes any.

@dvp34
Copy link
Author

dvp34 commented Jun 7, 2021

That did the trick. increasing sync-timeout. helped. Thank you very much for your support.
we are evaluating migrating to FluxV2. In the meantime, we have to get out of the weeds :)

@kingdonb kingdonb added flux2 Resolution suggested - already fixed in Flux v2 and removed blocked-needs-validation Issue is waiting to be validated before we can proceed bug labels Jun 7, 2021
@kingdonb kingdonb self-assigned this Jun 7, 2021
@kingdonb
Copy link
Member

kingdonb commented Jun 7, 2021

Superb. Thanks for being responsive and playing along!

I'm going to close this since there is nothing left to resolve. But you can feel free to reply or open a new issue if you need more support. We're also happy to answer questions in #flux on the CNCF slack!

Thanks for using Flux.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flux2 Resolution suggested - already fixed in Flux v2
Projects
None yet
Development

No branches or pull requests

2 participants