Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg: Migrate from cluster-config-v1 to Infrastructure.config.openshift.io #260

Closed
wants to merge 5 commits into from

Conversation

wking
Copy link
Member

@wking wking commented Apr 23, 2019

Mostly, this is replacing cluster-config-v1 (which is deprecated) with the Infrastructure config object. As part of that, I'm dropping UserTags support (see discussion here and here). I'm also replacing the openshiftClusterID tag with a kubernetes.io/cluster/{infraName}: owned tag, see here and here.

This also replaces OperatorFailing with OperatorDegraded, catching up with openshift/api#287.

I've taken the vendor bump in two steps, since dep sees a lot in master's vendor/ that it doesn't think we need. Let me know if you'd rather have me squash that down into a single commit, or split the second vendor bump out into individual pivots.

[EDIT: expanding on the above motivation a bit, the OperatorFailing -> OperatorDegraded bit is going to be a 4.1 blocker soon (registry Bugzilla should be landing tonight, for now, there is this). Dropping cluster-config-v1 in favor of Infrastructure simplifies user-facing docs because all you need to tear down an AWS cluster is the kubernetes.io/cluster/{infrastructureName}: owned tag. And it's tied up in the 4.1-targetted rhbz#1685089. There may be some slush about delivering that bug, although at one point it was a beta-blocker].

Generated with:

  $ dep ensure

using:

  $ dep version
  dep:
   version     : v0.5.1
   build date  : 2019-03-20
   git hash    : faa61893
   go version  : go1.10.3
   go compiler : gc
   platform    : linux/amd64
   features    : ImportDuringSolve=false

I'm not sure why there are so many changes.  For example, the
go.opencensus.io stuff is originally from a531f87 (Update vendor,
2018-09-19, 2018-09-19, openshift#11), where it was used by
vendor/cloud.google.com and vendor/google.golang.org/api/transport.
Those consumers are still there, but are apparently unused by the
operator, because I can build the operator without issues after
removing them.  I expect this project wants a:

  [prune]
    non-go = true
    go-tests = true
    unused-packages = true

or similar in Gopkg.toml, but am punting on that for now.
Because I need InfrastructureName from openshift/api@c9b4e5b2d7 (Adds
platform-specific specs to Infrastructure in config.openshift.io,
2019-03-01, openshift/api#266) in order to drop the cluster-config-v1
stuff here.

Generated with:

  $ sed -i s/ea5d05408a95a765d44b5a4b31561b530f0b1f4c/ea5d05408a95a765d44b5a4b31561b530f0b1f4c/ Gopkg.toml  # openshift/api
  $ sed -i s/31ef839c86359881d2c312c4f0131febc6662400/0255926f53935175fe90b8e7672c4c06c17d79e6/ Gopkg.toml  # openshift/client-go, to avoid "undefined Features" errors after the API bump
  $ emacs Gopkg.toml  # remove openshift/installer, now that we're using the Infrastructure config
  $ dep ensure

using:

  $ dep version
  dep:
   version     : v0.5.1
   build date  : 2019-03-20
   git hash    : faa61893
   go version  : go1.10.3
   go compiler : gc
   platform    : linux/amd64
   features    : ImportDuringSolve=false
…ient"

The function was unused outside this package, and the lowercase name
makes the package's public API (which does not include this function)
more clear.
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wking
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: dmage

If they are not already assigned, you can assign the PR to them by writing /assign @dmage in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Apr 23, 2019
if err != nil {
return nil, err
if infra.Status.Platform == configv1.AWSPlatformType {
cfg.Storage.S3.Region = "us-east-1" // FIXME: installConfig.Platform.AWS.Region
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to do about this. Until we get it sorted:

/hold

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible the region is part of the cloud-provider config (which is availble via Infrastructure since openshift/api#245).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhinavdahiya suggests getting this from the metadata service (and using the region where your node is running), although I don't have example code at hand for that yet.

@@ -46,11 +50,23 @@ func NewDriver(c *imageregistryv1.ImageRegistryConfigStorageS3, listers *regopcl
// getS3Service returns a client that allows us to interact
// with the aws S3 service
func (d *driver) getS3Service() (*s3.S3, error) {
ctx := context.TODO()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure how far up I should push this context. Can we take it all the way up to cmd/cluster-image-registry-operator without too much trouble?

return nil, err
}

client, err := rest.RESTClientFor(config)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a number of places that initialized fresh configs and clients from scratch. It seems like we should open a single REST client early on (in cmd/cluster-image-registry-operator?) and then pass that around for these other functions to use. That would also make it easy to plug in dummy clients for unit tests, etc. But I wanted some feedback on that before pushing it though. For now, I'm just inlining the various client initialization steps where we need them.

@wking
Copy link
Member Author

wking commented Apr 23, 2019

e2e-aws:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1304/artifacts/e2e-aws/installer/.openshift_install.log | grep fatal
time="2019-04-23T06:07:45Z" level=fatal msg="failed to initialize the cluster: Cluster operator image-registry is still updating: timed out waiting for the condition"
$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1304/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-fc9948cc-b767n_cluster-image-registry-operator.log.gz | gunzip
I0423 05:43:03.599910       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-g38fc8f9-dirty
...
I0423 05:43:04.942285       1 bootstrap.go:38] generating registry custom resource
E0423 05:43:04.942775       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: GroupVersion is required when initializing a RESTClient, requeuing
...
I0423 05:47:50.891197       1 bootstrap.go:38] generating registry custom resource
E0423 05:47:50.891602       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: GroupVersion is required when initializing a RESTClient, requeuing
W0423 05:49:33.748130       1 reflector.go:270] k8s.io/client-go/informers/factory.go:131: watch of *v1.ServiceAccount ended with: too old resource version: 10505 (11039)
W0423 05:50:03.482668       1 reflector.go:270] k8s.io/client-go/informers/factory.go:131: watch of *v1.ConfigMap ended with: too old resource version: 11916 (16477)
...
W0423 05:50:03.602167       1 reflector.go:270] github.com/openshift/client-go/route/informers/externalversions/factory.go:101: watch of *v1.Route ended with: The resourceVersion for the provided watch is too old.
W0423 05:52:51.403461       1 reflector.go:270] k8s.io/client-go/informers/factory.go:131: watch of *v1.ConfigMap ended with: too old resource version: 16622 (17342)
...
W0423 05:52:51.505446       1 reflector.go:270] github.com/openshift/cluster-image-registry-operator/pkg/generated/informers/externalversions/factory.go:101: watch of *v1.Config ended with: too old resource version: 13353 (17476)
W0423 05:58:14.444641       1 reflector.go:270] k8s.io/client-go/informers/factory.go:131: watch of *v1.ConfigMap ended with: too old resource version: 17477 (18844)
I0423 05:59:48.418350       1 bootstrap.go:38] generating registry custom resource
E0423 05:59:48.418814       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: GroupVersion is required when initializing a RESTClient, requeuing
...

So something seems to be broken with my REST-client initialization.

@wking
Copy link
Member Author

wking commented Apr 23, 2019

Looks like RESTClientFor requires GroupVersion which InClusterConfig does not set. I will switch to UnversionedRESTClientFor, which can fill this in.

@wking
Copy link
Member Author

wking commented Apr 23, 2019

I will switch to UnversionedRESTClientFor, which can fill this in.

Done with f96b28b -> d89aeb1.

@wking
Copy link
Member Author

wking commented Apr 23, 2019

Heh:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1305/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-66d8b56784-5wt9v_cluster-image-registry-operator.log.gz | gunzip | head -n7
I0423 09:21:42.998300       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-g64d5f62-dirty
I0423 09:21:42.998640       1 main.go:19] Go Version: go1.10.8
I0423 09:21:42.998689       1 main.go:20] Go OS/Arch: linux/amd64
I0423 09:21:43.039920       1 controller.go:406] waiting for informer caches to sync
I0423 09:21:44.345449       1 controller.go:415] started events processor
I0423 09:21:44.345677       1 bootstrap.go:38] generating registry custom resource
E0423 09:21:44.346403       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: NegotiatedSerializer is required when initializing a RESTClient, requeuing

Why is this so difficult? :p

@wking
Copy link
Member Author

wking commented Apr 23, 2019

Ok, with 1cbcb4d, I'm washing through typed/core/v1's NewForConfig to get defaults for both GroupVerision and NegotiatedSerializer. I'm curious to see how specific those defaults are...

@wking
Copy link
Member Author

wking commented Apr 23, 2019

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1306/artifacts/e2e-aws/pods/openshift-image-regist$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1306/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-7f59899bf6-8c8x8_cluster-image-registry-operator.log.gz | gunzip | head -n7
I0423 10:44:51.121540       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-gfa5b59b-dirty
I0423 10:44:51.123383       1 main.go:19] Go Version: go1.10.8
I0423 10:44:51.123407       1 main.go:20] Go OS/Arch: linux/amd64
I0423 10:44:51.153584       1 controller.go:406] waiting for informer caches to sync
I0423 10:44:52.461600       1 controller.go:415] started events processor
I0423 10:44:52.461691       1 bootstrap.go:38] generating registry custom resource
E0423 10:44:52.464035       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: infrastructures "cluster" is forbidden: User "system:serviceaccount:openshift-image-registry:cluster-image-registry-operator" cannot get resource "infrastructures" in API group "" at the cluster scope, requeuingry_cluster-image-registry-operator-7f59899bf6-8c8x8_cluster-image-registry-operator.log.gz | gunzip | head -n7
I0423 10:44:51.121540       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-gfa5b59b-dirty
I0423 10:44:51.123383       1 main.go:19] Go Version: go1.10.8
I0423 10:44:51.123407       1 main.go:20] Go OS/Arch: linux/amd64
I0423 10:44:51.153584       1 controller.go:406] waiting for informer caches to sync
I0423 10:44:52.461600       1 controller.go:415] started events processor
I0423 10:44:52.461691       1 bootstrap.go:38] generating registry custom resource
E0423 10:44:52.464035       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: infrastructures "cluster" is forbidden: User "system:serviceaccount:openshift-image-registry:cluster-image-registry-operator" cannot get resource "infrastructures" in API group "" at the cluster scope, requeuing

Ok, progress :). Looks like we need something like this, and I've pushed 1cbcb4d -> a248182 with a stab at that.

…t.io

Mostly, this is replacing cluster-config-v1 (which is deprecated [1])
with the Infrastructure config object.  As part of that, I'm dropping
UserTags support (see discussion in [2,3]).  I'm also replacing the
openshiftClusterID tag with a "kubernetes.io/cluster/{infraName}:
owned" tag [4,5].

The trip through clientcorev1.NewForConfig is because calling
RESTClientFor on the InClusterConfig raises:

  $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1304/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-fc9948cc-b767n_cluster-image-registry-operator.log.gz | gunzip
  I0423 05:43:03.599910       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-g38fc8f9-dirty
  ...
  I0423 05:43:04.942285       1 bootstrap.go:38] generating registry custom resource
  E0423 05:43:04.942775       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: GroupVersion is required when initializing a RESTClient, requeuing

InClusterConfig does not set GroupVersion [6] and RESTClientFor
requires it [7] while UnversionedRESTClientFor can fill it in [8].
But UnversionedRESTClientFor leaves NegotiatedSerializer unset:

  $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1305/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-66d8b56784-5wt9v_cluster-image-registry-operator.log.gz | gunzip
  I0423 09:21:42.998300       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-g64d5f62-dirty
  ...
  I0423 09:21:44.345677       1 bootstrap.go:38] generating registry custom resource
  E0423 09:21:44.346403       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: NegotiatedSerializer is required when initializing a RESTClient, requeuing

So I'm washing through clientcorev1.NewForConfig to get defaults for
both [9].

The RBAC change avoids:

  $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1306/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-7f59899bf6-8c8x8_cluster-image-registry-operator.log.gz | gunzip | head -n7
  I0423 10:44:51.121540       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-gfa5b59b-dirty
  ...
  I0423 10:44:52.461691       1 bootstrap.go:38] generating registry custom resource
  E0423 10:44:52.464035       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: infrastructures "cluster" is forbidden: User "system:serviceaccount:openshift-image-registry:cluster-image-registry-operator" cannot get resource "infrastructures" in API group "" at the cluster scope, requeuing

[1]: openshift/installer#680
[2]: openshift/api#231 (comment)
[3]: openshift/api#266
[4]: openshift/installer#1280
[5]: https://bugzilla.redhat.com/show_bug.cgi?id=1685089#c11
[6]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L422-L428
[7]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L269-L270
[8]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/rest/config.go#L331-L333
[9]: https://github.com/kubernetes/client-go/blob/59781b88d0faa9d51be48b4ca49eb08e05246de4/kubernetes/typed/core/v1/core_client.go#L144-L155
Catching up with openshift/api@8e476cb732 (Create a new
ClusterStatusCondition Degraded, 2019-04-16, openshift/api#287) and
openshift/api@a9fb3b1629 (Remove ClusterStatusConditionType Failing,
2019-04-16, openshift/api#287).
@wking
Copy link
Member Author

wking commented Apr 23, 2019

images:

release "release-latest" failed: pod release-latest was already deleted
$ oc delete project ci-op-yk62f9di
error: You must be logged in to the server (Unauthorized)

so I've waited long enough for the namespace to be reaped.

/retest

@wking
Copy link
Member Author

wking commented Apr 23, 2019

Hrm, still haven't cracked the RBAC incantation:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-image-registry-operator/260/pull-ci-openshift-cluster-image-registry-operator-master-e2e-aws/1312/artifacts/e2e-aws/pods/openshift-image-registry_cluster-image-registry-operator-554c68f744-5s7g7_cluster-image-registry-operator.log.gz | gunzip
I0423 17:16:59.933261       1 main.go:18] Cluster Image Registry Operator Version: 4.0.0-142-gcae9f2c-dirty
...
E0423 17:17:01.269295       1 controller.go:235] unable to sync: unable to get storage configuration from cluster install config: infrastructures "cluster" is forbidden: User "system:serviceaccount:openshift-image-registry:cluster-image-registry-operator" cannot get resource "infrastructures" in API group "" at the cluster scope, requeuing
...

@adambkaplan
Copy link
Contributor

/assign @legionus

We need someone to get this code over the finish line - can you or @dmage fork @wking 's branch and address the RBAC issues?

@openshift-ci-robot
Copy link
Contributor

@wking: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 24, 2019
@coreydaley
Copy link
Member

/retest
looks like a bunch of install failures, let's try to get some real errors

@openshift-ci-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws 7a58c7c link /test e2e-aws
ci/prow/e2e-aws-image-registry 7a58c7c link /test e2e-aws-image-registry
ci/prow/e2e-aws-operator 7a58c7c link /test e2e-aws-operator
ci/prow/e2e-aws-upgrade 7a58c7c link /test e2e-aws-upgrade

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@coreydaley
Copy link
Member

Once #265 has merged I will see what I can do with this getting this fixed up.

@coreydaley coreydaley mentioned this pull request Apr 26, 2019
@coreydaley
Copy link
Member

Replaced by #267

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants