Skip to content

Commit

Permalink
Periodically refresh APIServer CR in memory
Browse files Browse the repository at this point in the history
If not explicitly set on its CR,
HCO webhook is consuming TLS configuration
from Openshift cluster-wide APIServer CR.
For performance reason it's not reading it on each request
to the HCO CR but it's consuming a cached representation.
The cache was only refreshed by a controller
based on an informer.
We got reports that due to the nature
of changes in the APIServer CR, the connection
to the APIserver itself could become stuck:
```
W1025 13:50:16.898592       1 reflector.go:424] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262: failed to list *v1.APIServer: Get "https://172.30.0.1:443/apis/config.openshift.io/v1/apiservers?resourceVersion=1572273": dial tcp 172.30.0.1:443: connect: connection refused
E1025 13:50:16.898683       1 reflector.go:140] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262: Failed to watch *v1.APIServer: failed to list *v1.APIServer: Get "https://172.30.0.1:443/apis/config.openshift.io/v1/apiservers?resourceVersion=1572273": dial tcp 172.30.0.1:443: connect: connection refused
I1025 13:50:43.182360       1 trace.go:205] Trace[621733159]: "Reflector ListAndWatch" name:sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262 (25-Oct-2022 13:50:19.338) (total time: 23843ms):
Trace[621733159]: ---"Objects listed" error:<nil> 23843ms (13:50:43.182)
Trace[621733159]: [23.843677488s] [23.843677488s] END
I1025 13:50:43.716723       1 trace.go:205] Trace[255710357]: "Reflector ListAndWatch" name:sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262 (25-Oct-2022 13:50:12.260) (total time: 31456ms):
Trace[255710357]: ---"Objects listed" error:<nil> 31456ms (13:50:43.716)
Trace[255710357]: [31.45666834s] [31.45666834s] END
I1025 13:50:43.968506       1 trace.go:205] Trace[2001360213]: "Reflector ListAndWatch" name:sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262 (25-Oct-2022 13:50:11.520) (total time: 32447ms):
Trace[2001360213]: ---"Objects listed" error:<nil> 32447ms (13:50:43.968)
Trace[2001360213]: [32.44785055s] [32.44785055s] END
```

On controller runtime the default SyncPeriod when all the
watched resources are refreshed is 10 hourse (
see kubernetes-sigs/controller-runtime#521
for its reasons) but it appears
too long for this specific use case.

Let's ensure we read the APIServer CR at least once every minute.

Make the logs less verbose.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2137896

Remove this once kubernetes-sigs/controller-runtime#2032
is properly addressed

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
  • Loading branch information
tiraboschi committed Oct 26, 2022
1 parent 5aac78b commit 101672d
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 4 deletions.
8 changes: 7 additions & 1 deletion controllers/webhooks/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package webhooks

import (
"context"
"time"

logf "sigs.k8s.io/controller-runtime/pkg/log"

Expand Down Expand Up @@ -31,7 +32,12 @@ var _ reconcile.Reconciler = &ReconcileAPIServer{}

func (r *ReconcileAPIServer) Reconcile(ctx context.Context, request reconcile.Request) (reconcile.Result, error) {
err := r.ci.RefreshAPIServerCR(ctx, r.client)
return reconcile.Result{}, err
// TODO: once https://github.com/kubernetes-sigs/controller-runtime/issues/2032
// is properly addressed, avoid polling on the APIServer CR
if err != nil {
return reconcile.Result{Requeue: true}, err
}
return reconcile.Result{RequeueAfter: 1 * time.Minute}, nil
}

// RegisterReconciler creates a new HyperConverged Reconciler and registers it into manager.
Expand Down
5 changes: 3 additions & 2 deletions controllers/webhooks/controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package webhooks

import (
"context"
"time"

"k8s.io/client-go/rest"

Expand Down Expand Up @@ -154,7 +155,7 @@ var _ = Describe("HyperconvergedController", func() {
res, err := r.Reconcile(context.TODO(), request)
Expect(err).ToNot(HaveOccurred())
Expect(res.Requeue).To(BeFalse())
Expect(res).Should(Equal(reconcile.Result{}))
Expect(res.RequeueAfter).To(Equal(1 * time.Minute))

// Update ApiServer CR
apiServer.Spec.TLSSecurityProfile = customTLSSecurityProfile
Expand All @@ -166,7 +167,7 @@ var _ = Describe("HyperconvergedController", func() {
res, err = r.Reconcile(context.TODO(), request)
Expect(err).ToNot(HaveOccurred())
Expect(res.Requeue).To(BeFalse())
Expect(res).Should(Equal(reconcile.Result{}))
Expect(res.RequeueAfter).To(Equal(1 * time.Minute))

Expect(hcoutil.GetClusterInfo().GetTLSSecurityProfile(nil)).To(Equal(customTLSSecurityProfile), "should return the up-to-date value")

Expand Down
1 change: 0 additions & 1 deletion pkg/webhooks/validator/validator.go
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,6 @@ func (wh WebhookHandler) MutateTLSConfig(cfg *tls.Config) {
// be executed only after a while for fresh connections and not on existing ones
cfg.GetConfigForClient = func(_ *tls.ClientHelloInfo) (*tls.Config, error) {
cipherNames, minTypedTLSVersion := wh.selectCipherSuitesAndMinTLSVersion()
wh.logger.Info("Mutating TLS config - callback", "CipherSuites", cfg.CipherSuites, "MinVersion", cfg.MinVersion)
cfg.CipherSuites = crypto.CipherSuitesOrDie(crypto.OpenSSLToIANACipherSuites(cipherNames))
cfg.MinVersion = crypto.TLSVersionOrDie(string(minTypedTLSVersion))
return cfg, nil
Expand Down

0 comments on commit 101672d

Please sign in to comment.