flux 1.11.0 no longer syncs without ClusterRole #1830

zeeZ · 2019-03-14T15:32:51Z

I run flux with explicit permissions, as limited as possible and with only a single namespaced Role and --k8s-namespace-whitelist set. After upgrading to 1.11.0 it no longer syncs unless it is able to list virtually everything in the cluster.

This is the ClusterRole I created from sync-loop errors before it was able to sync again. You can tell where I gave up:

apiVersion: rbac.authorization.k8s.io
kind: ClusterRole
metadata:
  labels:
    name: flux
  name: flux
rules:
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - componentstatuses
  - configmaps
  - endpoints
  - events
  - limitranges
  - namespaces
  - nodes
  - persistentvolumeclaims
  - persistentvolumes
  - pods
  - podtemplates
  - replicationcontrollers
  - "*"
  verbs:
  - list
- apiGroups:
  - apiregistration.k8s.io
  resources:
  - apiservices
  verbs:
  - list
- apiGroups:
  - extensions
  resources:
  - daemonsets
  - deployments
  - ingresses
  - networkpolicies
  - podsecuritypolicies
  - "*"
  verbs:
  - list
- apiGroups:
  - apps
  - events.k8s.io
  - autoscaling
  - batch
  - "*"
  resources:
  - "*"
  verbs:
  - list

The FAQ answers "Can I restrict the namespaces that Flux can see" with "yes, experimental". Sadly, this is no longer the case.

Also name dropping #1217 and #1471

The text was updated successfully, but these errors were encountered:

squaremo · 2019-03-14T15:38:31Z

Curses, I did not intend this to be the case with #1442, though I admit I wasn't very diligent about trying out this scenario.

Where exactly does it come to a halt, when it's not given a ClusterRole? (what do the logs say?)

2opremio · 2019-03-14T15:50:48Z

#1830 , which should fix this, is complete but pending review

zeeZ · 2019-03-14T16:38:50Z

Hey, thanks for the responses.

Where exactly does it come to a halt, when it's not given a ClusterRole? (what do the logs say?)

Without ClusterRole:

ts=2019-03-14T13:39:48.868422318Z caller=main.go:165 version=1.11.0
ERROR: logging before flag.Parse: E0314 13:39:49.929945       8 reflector.go:205] github.com/weaveworks/flux/cluster/kubernetes/cached_disco.go:100: Failed to list *v1beta1.CustomResourceDefinition: customresourcedefinitions.apiextensions.k8s.io is forbidden: User "system:serviceaccount:flux:flux" cannot list resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope
ts=2019-03-14T13:39:49.947370986Z caller=main.go:295 component=cluster identity=/etc/fluxd/ssh/identity
ts=2019-03-14T13:39:49.947449236Z caller=main.go:296 component=cluster identity.pub="ssh-rsa ..."
ts=2019-03-14T13:39:49.947527827Z caller=main.go:297 component=cluster host=https://10.3.0.1:443 version=kubernetes-v1.12.5
ts=2019-03-14T13:39:49.947616546Z caller=main.go:309 component=cluster kubectl=/usr/local/bin/kubectl
ts=2019-03-14T13:39:49.949160458Z caller=main.go:319 component=cluster ping=true
ERROR: logging before flag.Parse: E0314 13:39:50.932939       8 reflector.go:205] github.com/weaveworks/flux/cluster/kubernetes/cached_disco.go:100: Failed to list *v1beta1.CustomResourceDefinition: customresourcedefinitions.apiextensions.k8s.io is forbidden: User "system:serviceaccount:flux:flux" cannot list resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope

The last line is spammed forever after.

After adding the first set of permissions, updated the repo and tried to fluxctl sync:

ts=2019-03-14T13:44:07.898249713Z caller=checkpoint.go:24 component=checkpoint msg="up to date" latest=1.11.0
ts=2019-03-14T13:44:31.133643198Z caller=loop.go:103 component=sync-loop event=refreshed url=... branch=... HEAD=beb4159a14847c5d0b0e5d4cbeccb7f3d4da2766
ts=2019-03-14T13:44:31.247826109Z caller=loop.go:210 component=sync-loop err="collating resources in cluster for sync: componentstatuses is forbidden: User \"system:serviceaccount:flux:flux\" cannot list resource \"componentstatuses\" in API group \"\" at the cluster scope"
ts=2019-03-14T13:44:31.250451239Z caller=loop.go:90 component=sync-loop err="collating resources in cluster for sync: componentstatuses is forbidden: User \"system:serviceaccount:flux:flux\" cannot list resource \"componentstatuses\" in API group \"\" at the cluster scope"
ts=2019-03-14T13:45:08.121177099Z caller=warming.go:268 component=warmer info="refreshing image" image=... tag_count=207 to_update=1 of_which_refresh=1 of_which_missing=0
ts=2019-03-14T13:45:08.139291505Z caller=warming.go:364 component=warmer updated=... successful=1 attempted=1
ts=2019-03-14T13:49:07.446622744Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2019-03-14T13:49:38.983850606Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://git@....git branch=... HEAD=beb4159a14847c5d0b0e5d4cbeccb7f3d4da2766
ts=2019-03-14T13:54:07.629381015Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2019-03-14T13:54:44.119740704Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://git@....git branch=... HEAD=84970b52031752ec2790c20802a0f2419f6b4c84
ts=2019-03-14T13:54:44.336051836Z caller=loop.go:210 component=sync-loop err="collating resources in cluster for sync: configmaps is forbidden: User \"system:serviceaccount:flux:flux\" cannot list resource \"configmaps\" in API group \"\" at the cluster scope"
ts=2019-03-14T13:54:44.338921916Z caller=loop.go:90 component=sync-loop err="collating resources in cluster for sync: configmaps is forbidden: User \"system:serviceaccount:flux:flux\" cannot list resource \"configmaps\" in API group \"\" at the cluster scope"
ts=2019-03-14T13:59:07.767724146Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2019-03-14T13:59:49.26397648Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://git@....git branch=... HEAD=84970b52031752ec2790c20802a0f2419f6b4c84
ts=2019-03-14T14:04:07.889994656Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2019-03-14T14:04:56.89208238Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://git@....git branch=... HEAD=84970b52031752ec2790c20802a0f2419f6b4c84
ts=2019-03-14T14:05:23.734827732Z caller=loop.go:111 component=sync-loop jobID=1d217122-5fbe-df8e-976f-05db5f03a6f0 state=in-progress
ts=2019-03-14T14:05:31.362681374Z caller=loop.go:123 component=sync-loop jobID=1d217122-5fbe-df8e-976f-05db5f03a6f0 state=done success=true
ts=2019-03-14T14:05:36.499539849Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://git@....git branch=... HEAD=84970b52031752ec2790c20802a0f2419f6b4c84
ts=2019-03-14T14:09:08.028520016Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2019-03-14T14:10:04.550939503Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://git@....git branch=... HEAD=84970b52031752ec2790c20802a0f2419f6b4c84

Always the following after a restart with the tag behind head, with varying resources.

caller=loop.go:210 component=sync-loop err="collating resources in cluster for sync: configmaps is forbidden: User \"system:serviceaccount:flux:flux\" cannot list resource \"configmaps\" in API group \"\" at the cluster scope"
caller=loop.go:90 component=sync-loop err="collating resources in cluster for sync: configmaps is forbidden: User \"system:serviceaccount:flux:flux\" cannot list resource \"configmaps\" in API group \"\" at the cluster scope"

Repo tag never moved and nothing was applied. I added that resource, killed the pod and repeated until I added the * to the role. No errors after and it applied and moved the tag.

#1830 , which should fix this, is complete but pending review

#1668 I assume?

squaremo · 2019-03-14T16:56:59Z

Brill, thanks for that @zeeZ, most helpful!

squaremo · 2019-03-14T17:03:13Z

You might have to stick to v1.10.1 for now @zeeZ -- sorry about that :-/

2opremio · 2019-03-14T22:51:12Z

#1668 I assume?

Yeah, sorry

2opremio · 2019-03-14T23:17:13Z

Now I am thinking that #1668 by itself won't be enough since it doesn't prevent flux from attempting to list cluster-scoped resources.

We need to think about this.

2opremio · 2019-03-15T10:37:09Z

@zeeZ The fix will be included in the next Fix release. For now, you can test whether your issue is definitely fixed by using image quay.io/weaveworks/flux:master-5f0e9292.

Please reopen this issue if it isn't fixed.

zeeZ · 2019-03-15T11:05:36Z

@2opremio I actually checked out your branch earlier. With no config change from 1.10.1 to yours sync worked as expected, thank you.

What remains is the following, but didn't have any impact for me as there are no CRDs managed by flux:

ERROR: logging before flag.Parse: E0315 11:00:55.601512       9 reflector.go:205] github.com/weaveworks/flux/cluster/kubernetes/cached_disco.go:100: Failed to list *v1beta1.CustomResourceDefinition: customresourcedefinitions.apiextensions.k8s.io is forbidden: User "system:serviceaccount:flux:flux" cannot list resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope

This is repeated every second

2opremio · 2019-03-15T11:17:21Z

Fantastic! I will look into fixing that as well

2opremio · 2019-03-15T11:36:54Z

@zeeZ Are you getting any other errors? (even if not repeated)

zeeZ · 2019-03-15T15:01:50Z

No further errors after adding a watch/list CRD cluster role.

2opremio · 2019-03-15T15:11:42Z

Great, I will try to get a fix for that early next week

zeeZ · 2019-03-16T15:55:05Z

I've created a sample repo of some of the things I did to lock down Flux, maybe it can be of some use:
https://github.com/zeeZ/locked-down-flux

I believe that's as far as I can go without Helm or GC enabled. Removing any of the rules defined will produce some kind of error during common operations, though I haven't played around with it enough to be able to tell where sync is actually affected and what is just noise.

2opremio · 2019-03-18T11:32:57Z

I've taken a look at the remaining recurring error. It's a tricky one because the client-go library swallows it and handles it internally (logging by default):

func (r *Reflector) Run(stopCh <-chan struct{}) {
	glog.V(3).Infof("Starting reflector %v (%s) from %s", r.expectedType, r.resyncPeriod, r.name)
	wait.Until(func() {
		if err := r.ListAndWatch(stopCh); err != nil {
			utilruntime.HandleError(err)
		}
	}, r.period, stopCh)
}

I see a bunch of options:

Create a PR which passes an error-handling function to the controller and reflector (I can try, but I doubt it will succeed).
Create and maintain our own implementation of the controller/reflector (which sounds awful)
Modify runtime.ErrorHandlers to mute Forbidden/NotExist errors (probably a bad idea) or to do some smart error handling (probably another bad idea).

I dealt with a similar problem in Scope before, going for (2) but the error handling wasn't so deep down in the call stack.

@squaremo / @hiddeco thoughts?

squaremo · 2019-03-18T13:59:58Z

2. Create and maintain our own implementation of the controller/reflector (which sounds awful)

Yes; adapting parts of client-go is usually a quixotic enterprise. If it's much more complicated than the solution in weaveworks/scope, I'd say it's not worth it.

Can we mute glog by doing flag.Parse with some fake command-line options? I'm grasping at straws .. (it's probably better to do 3. instead)

2opremio · 2019-03-18T15:37:56Z

I went for (3) in the end

2opremio · 2019-03-18T17:41:15Z

@zeeZ It should be fixed now. I would appreciate if you could give it a try ( quay.io/weaveworks/flux:master-2d4cc4d )

zeeZ · 2019-03-18T21:42:53Z

After removing the CRD role I still get a constant stream of

ts=2019-03-18T21:05:54.062786645Z caller=main.go:175 type="internal kubernetes error" err="github.com/weaveworks/flux/cluster/kubernetes/cached_disco.go:100: Failed to list *v1beta1.CustomResourceDefinition: customresourcedefinitions.apiextensions.k8s.io is forbidden: User \"system:serviceaccount:flux-system:flux\" cannot list resource \"customresourcedefinitions\" in API group \"apiextensions.k8s.io\" at the cluster scope"

I did some digging around the IsForbidden || IsNotFound workaround you added, but it seems ReasonForError returns StatusReasonUnknown. I'm not familiar with K8S source, but I believe what we're dealing with here is no metav1 error but a more generic one: https://github.com/kubernetes/client-go/blob/7d04d0e2a0a1a4d4a1cd6baa432a2301492e4e65/tools/cache/reflector.go#L251

While it stings a bit, I can live with allowing CRD listing. My initial issue was with list access to everything in the cluster, which has been resolved thanks to you.

Perhaps documentation could be added with the minimum privileges Flux needs in order to operate properly, though I suspect that be complicated with helm and GC. Maybe a more restricted minimal example next to deploy?

On a positive note, at least it is not silently firing a request every second that may add up for each instance you run ;)

2opremio · 2019-03-18T21:55:49Z

Crap, sorry about that. I need to do some further thinking.

…

On Mon, Mar 18, 2019, 22:42 Christian ***@***.***> wrote: After removing the CRD role I still get a constant stream of ts=2019-03-18T21:05:54.062786645Z caller=main.go:175 type="internal kubernetes error" err="github.com/weaveworks/flux/cluster/kubernetes/cached_disco.go:100: Failed to list *v1beta1.CustomResourceDefinition: customresourcedefinitions.apiextensions.k8s.io is forbidden: User \"system:serviceaccount:flux-system:flux\" cannot list resource \"customresourcedefinitions\" in API group \"apiextensions.k8s.io\" at the cluster scope" I did some digging around the IsForbidden || IsNotFound workaround you added, but it seems ReasonForError returns StatusReasonUnknown. I'm not familiar with K8S source, but I believe what we're dealing with here is no metav1 error but a more generic one: https://github.com/kubernetes/client-go/blob/7d04d0e2a0a1a4d4a1cd6baa432a2301492e4e65/tools/cache/reflector.go#L251 While it stings a bit, I can live with allowing CRD listing. My initial issue was with list access to *everything* in the cluster, which has been resolved thanks to you. Perhaps documentation could be added with the minimum privileges Flux needs in order to operate properly, though I suspect that be complicated with helm and GC. Maybe a more restricted minimal example next to deploy? On a positive note, at least it is not *silently* firing a request every second that may add up for each instance you run ;) — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1830 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACQOJAtebOUuSS-4-nR9ZRSwjKfPOgvyks5vYAhfgaJpZM4b0c3f> .

2opremio mentioned this issue Mar 14, 2019

Extend --kubernetes-namespace-whitelist to all operations #1471

Closed

2opremio mentioned this issue Mar 15, 2019

Be more benign towards cluster-access errors #1832

Merged

2opremio closed this as completed in #1832 Mar 15, 2019

2opremio reopened this Mar 15, 2019

2opremio added the bug label Mar 15, 2019

2opremio mentioned this issue Mar 18, 2019

Silence access errors logged internally by kubernetes libraries #1837

Merged

2opremio closed this as completed in #1837 Mar 18, 2019

2opremio mentioned this issue Mar 19, 2019

Improve Forbidden and NotFound error detection #1840

Merged

2opremio reopened this Mar 19, 2019

2opremio closed this as completed in #1840 Mar 19, 2019

arturo-c mentioned this issue Apr 12, 2019

Limit scope of flux and flux operator in helm chart. #1928

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flux 1.11.0 no longer syncs without ClusterRole #1830

flux 1.11.0 no longer syncs without ClusterRole #1830

zeeZ commented Mar 14, 2019

squaremo commented Mar 14, 2019

2opremio commented Mar 14, 2019

zeeZ commented Mar 14, 2019

squaremo commented Mar 14, 2019

squaremo commented Mar 14, 2019

2opremio commented Mar 14, 2019 •

edited

Loading

2opremio commented Mar 14, 2019 •

edited

Loading

2opremio commented Mar 15, 2019 •

edited

Loading

zeeZ commented Mar 15, 2019

2opremio commented Mar 15, 2019

2opremio commented Mar 15, 2019

zeeZ commented Mar 15, 2019

2opremio commented Mar 15, 2019

zeeZ commented Mar 16, 2019

2opremio commented Mar 18, 2019 •

edited

Loading

squaremo commented Mar 18, 2019 •

edited

Loading

2opremio commented Mar 18, 2019

2opremio commented Mar 18, 2019

zeeZ commented Mar 18, 2019

2opremio commented Mar 18, 2019 via email

flux 1.11.0 no longer syncs without ClusterRole #1830

flux 1.11.0 no longer syncs without ClusterRole #1830

Comments

zeeZ commented Mar 14, 2019

squaremo commented Mar 14, 2019

2opremio commented Mar 14, 2019

zeeZ commented Mar 14, 2019

squaremo commented Mar 14, 2019

squaremo commented Mar 14, 2019

2opremio commented Mar 14, 2019 • edited Loading

2opremio commented Mar 14, 2019 • edited Loading

2opremio commented Mar 15, 2019 • edited Loading

zeeZ commented Mar 15, 2019

2opremio commented Mar 15, 2019

2opremio commented Mar 15, 2019

zeeZ commented Mar 15, 2019

2opremio commented Mar 15, 2019

zeeZ commented Mar 16, 2019

2opremio commented Mar 18, 2019 • edited Loading

squaremo commented Mar 18, 2019 • edited Loading

2opremio commented Mar 18, 2019

2opremio commented Mar 18, 2019

zeeZ commented Mar 18, 2019

2opremio commented Mar 18, 2019 via email

2opremio commented Mar 14, 2019 •

edited

Loading

2opremio commented Mar 14, 2019 •

edited

Loading

2opremio commented Mar 15, 2019 •

edited

Loading

2opremio commented Mar 18, 2019 •

edited

Loading

squaremo commented Mar 18, 2019 •

edited

Loading