Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: CrashLoopBackOff once tlsProfile changed #640

Merged
merged 1 commit into from
Aug 28, 2023

Conversation

opokornyy
Copy link
Contributor

@opokornyy opokornyy commented Aug 1, 2023

What this PR does / why we need it:

This PR adds the usage of callbacks to Prometheus and the webhook server, enabling the dynamic reloading of the TLS configuration. This eliminates the necessity of restarting the SSP pod, which had previously led to the pod entering the CrashLoopBackOff state.

Fixes #

https://bugzilla.redhat.com/show_bug.cgi?id=2151248

Release note:

None

@kubevirt-bot kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Aug 1, 2023
@openshift-ci
Copy link

openshift-ci bot commented Aug 1, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@opokornyy
Copy link
Contributor Author

/cc @0xFelix @akrejcir

@opokornyy opokornyy marked this pull request as ready for review August 1, 2023 08:59
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 1, 2023
@opokornyy opokornyy changed the title fix: SSP operator moving to CrashLoopBackOff after tlsProfile change WIP: fix: SSP operator moving to CrashLoopBackOff after tlsProfile change Aug 1, 2023
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 1, 2023
@codingben
Copy link
Member

Hi, if this PR is still under work, can you please convert it to Draft? so CI won't run.

Copy link
Member

@0xFelix 0xFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this goes into the right direction. The GetConfigForClient functions still need to be deduplicated, improved and should not be lambda functions.

@@ -239,9 +239,6 @@ func (r *sspReconciler) isRestartNeeded(sspObj *ssp.SSP) bool {
if reflect.DeepEqual(r.lastSspSpec, ssp.SSPSpec{}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This func is never returning true, so it can likely be dropped.

Copy link
Collaborator

@akrejcir akrejcir Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed my comment here, because it was irrelevant. Sorry for the noise.

main.go Outdated
os.Exit(1)
}

var webhookServer *webhook.Server
if tlsOptions.IsEmpty() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we can do this. I don't think this allows to change the config later without restart?

@@ -65,7 +64,7 @@ var _ = Describe("Crypto Policy", func() {
},
},
}

// is not supported anymore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean?

Namespace: strategy.GetSSPDeploymentNameSpace(),
}, deployment)
Expect(err).ToNot(HaveOccurred())
Expect(deployment.Status.ReadyReplicas).To(BeNumerically(">=", 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready replicas is dangerous, might better use UpdatedReplicas. @jcanocan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check both just to be sure.

main.go Outdated
tlsOptions, err := common.GetSspTlsOptions(ctx)
if err != nil {
setupLog.Error(err, "Error while getting tls profile")
os.Exit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not exit here, better return an error.

main.go Outdated
}
cfg.CipherSuites = tlsOptions.CipherIDs(&setupLog)
cfg.MinVersion, _ = minTLSVersionId(tlsOptions.MinTLSVersion)
certificate, err := loadCertificates(path.Join(sdkTLSDir, sdkTLSCrt), path.Join(sdkTLSDir, sdkTLSKey))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how expensive it is to load certs on every request. You might cache some of this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.
There is already a certificate watcher in the controller-runtime library. Maybe we can use that, I will check...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is: https://github.com/kubevirt/ssp-operator/blob/9d3869accf1c183cdf66b3fdf572a3d8e9ec5a2d/vendor/sigs.k8s.io/controller-runtime/pkg/certwatcher/certwatcher.go#L45-L44

You can simply create a new CertWatcher and then add it to the manager to start it. Or call Start() manually in a separate goroutine.

main.go Outdated
// webhook server configuration is used.
if sspTLSOptions.IsEmpty() {
return nil
// vm-console-proxy function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a link to that function? e.g. so I can compare the changes here and there, if there any changes.

main.go Outdated

funcs := []func(*tls.Config){tlsCfgFunc}
return &webhook.Server{Port: webhookPort, TLSMinVersion: sspTLSOptions.MinTLSVersion, TLSOpts: funcs}
// crypto_policy.go function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a link to that function, in that file?

main.go Outdated
Addr: metricsAddr,
Handler: mux,
TLSConfig: &tls.Config{
GetConfigForClient: func(_ *tls.ClientHelloInfo) (*tls.Config, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please extract it to a new function, to keep runPrometheusServer() short?

main.go Outdated
@@ -138,26 +180,63 @@ func main() {

ctx := ctrl.SetupSignalHandler()

tlsOptions, err := common.GetSspTlsOptions(ctx)
tlsCfgFunc := func(cfg *tls.Config) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to keep that function outside of main()? I think it's a good practice to keep main() short function, it's easier to read the code that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently main() is very long and should be very short, in general.

@codingben
Copy link
Member

codingben commented Aug 1, 2023

Hi, if this PR is still under work, can you please convert it to Draft? so CI won't run.

Also I'd suggest to have shorter commit title, and update the PR description to be the same as commit description (so you won't forget to sync them later).

Now you have:

fix: SSP operator moving to CrashLoopBackOff after tlsProfile change

This commit adds the usage of callbacks to Prometheus and
Webhook server to fetch TLS config on every request.

Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>

I'd suggest to give it a name according to "This commit will fix: CrashLoopBackOff once tlsProfile changed" sentence. For example:

fix: CrashLoopBackOff once tlsProfile changed

Add usage of callbacks to Prometheus and webhook
server to fetch TLS config on each client request.

Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>

main.go Outdated
cfg.MinVersion, _ = minTLSVersionId(tlsOptions.MinTLSVersion)
certificate, err := loadCertificates(path.Join(sdkTLSDir, sdkTLSCrt), path.Join(sdkTLSDir, sdkTLSKey))
if err != nil {
setupLog.Error(err, "Prometheus server error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to log the error here. Just return the error and the calling function will log it for you

Namespace: strategy.GetSSPDeploymentNameSpace(),
}, deployment)
Expect(err).ToNot(HaveOccurred())
Expect(deployment.Status.ReadyReplicas).To(BeNumerically(">=", 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check both just to be sure.

@opokornyy opokornyy changed the title WIP: fix: SSP operator moving to CrashLoopBackOff after tlsProfile change fix: CrashLoopBackOff once tlsProfile changed Aug 2, 2023
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 2, 2023
main.go Outdated
@@ -66,53 +68,115 @@ const (
webhookPort = 9443
)

func runPrometheusServer(metricsAddr string, tlsOptions common.SSPTLSOptions) error {
// crypto_policy.go function
func minTLSVersionId(s string) (uint16, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the func in internal/common/crypto_policy.go?

main.go Outdated
return &tls.Config{
GetConfigForClient: func(_ *tls.ClientHelloInfo) (*tls.Config, error) {
cfg := &tls.Config{}
tlsOptions, err := common.GetSspTlsOptions(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common.GetSspTlsOptions is making API calls for every request if it stays like this. IMO we should have a Watcher/SharedInformer for the SSP CR and use it to retrieve the SSP if it was changed. Otherwise the SSP CR should be cached.

main.go Outdated
return cfg, nil
}
cfg.CipherSuites = tlsOptions.CipherIDs(&setupLog)
setupLog.Info("Configured ciphers", "ciphers", cfg.CipherSuites)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we don't need to log this on every request.

main.go Outdated
// please be aware that the APIServer is using http keepalive so this is going to
// be executed only after a while for fresh connections and not on existing ones
cfg.GetConfigForClient = func(_ *tls.ClientHelloInfo) (*tls.Config, error) {
tlsOptions, err := common.GetSspTlsOptions(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above, SSP CR needs to be cached.

main.go Outdated
}

cfg.CipherSuites = tlsOptions.CipherIDs(&setupLog)
setupLog.Info("Configured ciphers", "ciphers", cfg.CipherSuites)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@opokornyy opokornyy force-pushed the fix-tls-bug branch 2 times, most recently from f886ca7 to 82e898e Compare August 8, 2023 09:37
main.go Outdated
Comment on lines 78 to 151
GetConfigForClient: func(_ *tls.ClientHelloInfo) (*tls.Config, error) {
cfg := &tls.Config{}
var err error
cfg.GetCertificate = certWatcher.GetCertificate

if controllers.TLSProfile == nil {
cfg.MinVersion = crypto.DefaultTLSVersion()
cfg.CipherSuites = nil
return cfg, nil
}

if controllers.TLSProfile.Type == ocpconfigv1.TLSProfileCustomType {
cfg.CipherSuites = common.CipherIDs(controllers.TLSProfile.Custom.Ciphers)
cfg.MinVersion, err = crypto.TLSVersion(string(controllers.TLSProfile.Custom.MinTLSVersion))
if err != nil {
return nil, err
}
return cfg, nil
}

cipherNames, minTypedTLSVersion := ocpconfigv1.TLSProfiles[controllers.TLSProfile.Type].Ciphers, ocpconfigv1.TLSProfiles[controllers.TLSProfile.Type].MinTLSVersion
cfg.CipherSuites = common.CipherIDs(cipherNames)
cfg.MinVersion, err = crypto.TLSVersion(string(minTypedTLSVersion))
if err != nil {
return nil, err
}
return cfg, nil
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract this to an external function? It is duplicated on getTLSConfigFunc().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed

@@ -67,6 +66,8 @@ var kvsspCRDs = map[string]string{
"kubevirtcommontemplatesbundles.ssp.kubevirt.io": "KubevirtCommonTemplatesBundle",
}

var TLSProfile *osconfv1.TLSSecurityProfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid globals.

@@ -192,8 +193,8 @@ func (ti *TLSInfo) CreateTlsConfig() *tls.Config {
}

if !ti.sspTLSOptions.IsEmpty() {
tlsConfig.CipherSuites = ti.sspTLSOptions.CipherIDs(nil)
minVersion, err := ti.sspTLSOptions.MinTLSVersionId()
tlsConfig.CipherSuites = common.CipherIDs(ti.sspTLSOptions.OpenSSLCipherNames)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a lib for this too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is this function, but the problem is that it causes panic for some cipher names and we just want to log that the cipher is not supported

func CipherSuitesOrDie(cipherNames []string) []uint16 {
if len(cipherNames) == 0 {
return DefaultCiphers()
}
cipherValues := []uint16{}
for _, cipherName := range cipherNames {
cipher, err := CipherSuite(cipherName)
if err != nil {
panic(err)
}
cipherValues = append(cipherValues, cipher)
}
return cipherValues

main.go Outdated
}

if controllers.TLSProfile.Type == ocpconfigv1.TLSProfileCustomType {
cfg.CipherSuites = common.CipherIDs(controllers.TLSProfile.Custom.Ciphers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you dedup the calls to common.CipherIDs and crypto.TLSVersion? This block and the block below look very similar.

main.go Outdated
Comment on lines 78 to 151
GetConfigForClient: func(_ *tls.ClientHelloInfo) (*tls.Config, error) {
cfg := &tls.Config{}
var err error
cfg.GetCertificate = certWatcher.GetCertificate

if controllers.TLSProfile == nil {
cfg.MinVersion = crypto.DefaultTLSVersion()
cfg.CipherSuites = nil
return cfg, nil
}

if controllers.TLSProfile.Type == ocpconfigv1.TLSProfileCustomType {
cfg.CipherSuites = common.CipherIDs(controllers.TLSProfile.Custom.Ciphers)
cfg.MinVersion, err = crypto.TLSVersion(string(controllers.TLSProfile.Custom.MinTLSVersion))
if err != nil {
return nil, err
}
return cfg, nil
}

cipherNames, minTypedTLSVersion := ocpconfigv1.TLSProfiles[controllers.TLSProfile.Type].Ciphers, ocpconfigv1.TLSProfiles[controllers.TLSProfile.Type].MinTLSVersion
cfg.CipherSuites = common.CipherIDs(cipherNames)
cfg.MinVersion, err = crypto.TLSVersion(string(minTypedTLSVersion))
if err != nil {
return nil, err
}
return cfg, nil
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed

@lyarwood
Copy link
Member

/cc

Copy link
Collaborator

@akrejcir akrejcir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A partial review. I will look at it more later.

@@ -192,8 +193,8 @@ func (ti *TLSInfo) CreateTlsConfig() *tls.Config {
}

if !ti.sspTLSOptions.IsEmpty() {
tlsConfig.CipherSuites = ti.sspTLSOptions.CipherIDs(nil)
minVersion, err := ti.sspTLSOptions.MinTLSVersionId()
tlsConfig.CipherSuites = common.CipherIDs(ti.sspTLSOptions.OpenSSLCipherNames, &logger.Log)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the nil log here, because the change is not part of this PR. If you want, you can open a new PR to add the log.

tlsConfig.CipherSuites = ti.sspTLSOptions.CipherIDs(nil)
minVersion, err := ti.sspTLSOptions.MinTLSVersionId()
tlsConfig.CipherSuites = common.CipherIDs(ti.sspTLSOptions.OpenSSLCipherNames, &logger.Log)
minVersion, err := crypto.TLSVersion(ti.sspTLSOptions.MinTLSVersion)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, because crypto.TLSVersion() function expects one of these names:

var versions = map[string]uint16{
"VersionTLS10": tls.VersionTLS10,
"VersionTLS11": tls.VersionTLS11,
"VersionTLS12": tls.VersionTLS12,
"VersionTLS13": tls.VersionTLS13,
}

So a string like: VersionTLS12, but ti.sspTLSOptions.MinTLSVersion is a string returned by this function:

func tlsVersionToHumanReadable(version ocpv1.TLSProtocolVersion) (string, error) {
switch version {
case "":
return "", nil
case ocpv1.VersionTLS10:
return "1.0", nil
case ocpv1.VersionTLS11:
return "1.1", nil
case ocpv1.VersionTLS12:
return "1.2", nil
case ocpv1.VersionTLS13:
return "1.3", nil
default:
return "", fmt.Errorf("invalid ocpv1.VersionTLS %v", version)
}
}

For example: 1.2.

main.go Outdated
"github.com/prometheus/client_golang/prometheus/promhttp"
ssp "kubevirt.io/ssp-operator/api/v1beta2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move local imports to the block below.

main.go Outdated
// This callback executes on each client call returning a new config to be used
// please be aware that the APIServer is using http keepalive so this is going to
// be executed only after a while for fresh connections and not on existing ones
func getConfigForClientCallback(cfg *tls.Config, apiClient client.Client, ctx context.Context) (*tls.Config, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three comments for this function:

  • After thinking about this more, I think it is better to have a parameter type cache.Cache instead of client.Client. It has the same reading methods as Client, but cannot be used for writing.
  • Usually the context.Context parameter is the first.
  • Please don't include the word Callback in a function name. In my opinion, it's better to name a function, by what it does, then by how it is used.

main.go Outdated
// please be aware that the APIServer is using http keepalive so this is going to
// be executed only after a while for fresh connections and not on existing ones
func getConfigForClientCallback(cfg *tls.Config, apiClient client.Client, ctx context.Context) (*tls.Config, error) {
var err error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be removed, because err is redefined below.

cfg.CipherSuites = common.CipherIDs(tlsProfile.Custom.Ciphers, &setupLog)
cfg.MinVersion, err = crypto.TLSVersion(string(tlsProfile.Custom.MinTLSVersion))
if err != nil {
return nil, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this error is returned, the cfg parameter has already been modified. Can you change the code, so that on error, the cfg parameter is kept unchanged?

cfg.CipherSuites = common.CipherIDs(cipherNames, &setupLog)
cfg.MinVersion, err = crypto.TLSVersion(string(minTypedTLSVersion))
if err != nil {
return nil, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, please keep the cfg unchanged on error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too?

main.go Outdated
}
}

func getPrometheusServer(metricsAddr string, ctx context.Context) (*prometheusServer, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename this function to newPrometheusServer()? Because it is a constructor.

main.go Outdated
Comment on lines 156 to 129
setupLog.Info("Starting Prometheus metrics endpoint server with TLS")
metrics.Registry.MustRegister(common_templates.CommonTemplatesRestored)
metrics.Registry.MustRegister(common.SSPOperatorReconcileSucceeded)
handler := promhttp.HandlerFor(metrics.Registry, promhttp.HandlerOpts{})
mux := http.NewServeMux()
mux.Handle("/metrics", handler)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this logic to prometheusServer.Start() method? It makes sense to call it when the server is started, not when it's created.

main.go Outdated
Comment on lines 168 to 147
certWatcher, err := certwatcher.New(metricsServer.certPath, metricsServer.keyPath)
if err != nil {
return nil, err
}

tlsCfgFunc := func(cfg *tls.Config) {
cfg.CipherSuites = sspTLSOptions.CipherIDs(&setupLog)
setupLog.Info("Configured ciphers", "ciphers", cfg.CipherSuites)
go func() {
if err := certWatcher.Start(ctx); err != nil {
setupLog.Error(err, "certificate watcher error")
}
}()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, can you move the certWatcher creation and start to the prometheusServer.Start() method?

Copy link
Member

@0xFelix 0xFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments

main.go Outdated
Comment on lines 85 to 89
var tlsProfile *ocpconfigv1.TLSSecurityProfile

if len(sspList.Items) != 0 {
tlsProfile = sspList.Items[0].Spec.TLSSecurityProfile
}

if tlsProfile == nil {
cfg.MinVersion = crypto.DefaultTLSVersion()
cfg.CipherSuites = nil
return cfg, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var tlsProfile *ocpconfigv1.TLSSecurityProfile
if len(sspList.Items) != 0 {
tlsProfile = sspList.Items[0].Spec.TLSSecurityProfile
}
if tlsProfile == nil {
cfg.MinVersion = crypto.DefaultTLSVersion()
cfg.CipherSuites = nil
return cfg, nil
}
if len(sspList.Items) == 0 {
cfg.MinVersion = crypto.DefaultTLSVersion()
cfg.CipherSuites = nil
return cfg, nil
}
tlsProfile := sspList.Items[0].Spec.TLSSecurityProfile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do it like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to check if spec.TLSSecurityProfile is nil or not, so we would need to do it like this

if len(sspList.Items) == 0 || sspList.Items[0].Spec.TLSSecurityProfile == nil {
  cfg.MinVersion = crypto.DefaultTLSVersion()
  cfg.CipherSuites = nil
  return cfg, nil
}

tlsProfile := sspList.Items[0].Spec.TLSSecurityProfile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

main.go Outdated
return cfg, nil
}

cipherNames := ocpconfigv1.TLSProfiles[tlsProfile.Type].Ciphers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe inline this?

cfg.CipherSuites = common.CipherIDs(cipherNames, &setupLog)
cfg.MinVersion, err = crypto.TLSVersion(string(minTypedTLSVersion))
if err != nil {
return nil, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too?

main.go Outdated
cipherNames := ocpconfigv1.TLSProfiles[tlsProfile.Type].Ciphers
cfg.CipherSuites = common.CipherIDs(cipherNames, &ctrl.Log)

minTypedTLSVersion := ocpconfigv1.TLSProfiles[tlsProfile.Type].MinTLSVersion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too?

main.go Outdated
setupLog.Info("Starting Prometheus metrics endpoint server with TLS")
metrics.Registry.MustRegister(common_templates.CommonTemplatesRestored)
metrics.Registry.MustRegister(common.SSPOperatorReconcileSucceeded)
handler := promhttp.HandlerFor(metrics.Registry, promhttp.HandlerOpts{})
mux := http.NewServeMux()
mux.Handle("/metrics", handler)

minTlsVersion, err := tlsOptions.MinTLSVersionId()
s.server.Handler = mux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge it with L138?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not possible to call s.server.Handler.Handle() function, so we probably can not do it like that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

Namespace: strategy.GetSSPDeploymentNameSpace(),
}, deployment)
Expect(err).ToNot(HaveOccurred())
Expect(deployment.Status.ReadyReplicas).To(BeNumerically(">=", 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should assert that we have the amount of ready and updated replicaes specified in spec.Replicas. Not just >= 1.

@jcanocan
Copy link
Contributor

@0xFelix @akrejcir @opokornyy I have two questions:

  1. As it is now, as far as I was able to test, the crypto suite is not reverting TLS profiles to its original state. Should we fix this as part of this PR or as a follow-up? Could @opokornyy confirm if it's still happening with your code?
  2. IIUC, when this PR lands, TLS configuration changes do not require restarting pods or even to update the pods. Am I right? I'm raising this because in case pods do not need to be updated, it's very likely that line Expect(deployment.Status.UpdatedReplicas).To(BeNumerically(">=", 1)) will never be satisfied.

Thanks! :)

Copy link
Member

@0xFelix 0xFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question and one nit.

@jcanocan

  1. IMO this should be fixed in a follow up PR. It has nothing to do with SSP restarting.
  2. Good point!

main.go Outdated
}

tlsProfile := sspList.Items[0].Spec.TLSSecurityProfile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline

g.Expect(err).ToNot(HaveOccurred())
g.Expect(deployment.Status.ReadyReplicas).To(BeNumerically(">=", 1))
}, env.Timeout(), time.Second).Should(Succeed())
deployment := &apps.Deployment{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need Eventually here, or does strategy.RevertToOriginalSspCr wait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should wait, based on the previous comment the waitting was needed just because of the bug

// Because of bug[1], the SSP operator will move to CrashLoopBackOff state,
// so we need to wait until it is running.
// [1] - https://bugzilla.redhat.com/show_bug.cgi?id=2151248

func (s *existingSspStrategy) RevertToOriginalSspCr() {
waitForSspDeletionIfNeeded(s.ssp)
createOrUpdateSsp(s.ssp)
waitUntilDeployed()
}

Comment on lines 108 to 114
deployment := &apps.Deployment{}
err := apiClient.Get(ctx, client.ObjectKey{
Name: strategy.GetSSPDeploymentName(),
Namespace: strategy.GetSSPDeploymentNameSpace(),
}, deployment)
Expect(err).ToNot(HaveOccurred())
Expect(deployment.Status.ReadyReplicas).To(Equal(*deployment.Spec.Replicas))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, can't we drop all of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we can drop it

Copy link
Member

@0xFelix 0xFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! It looks good to me.

/approve
/retest-required

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 0xFelix

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 22, 2023
main.go Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
main.go Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
TLSConfig: &tlsConfig,
}
go func() {
if err := certWatcher.Start(ctx); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should use a different context for the certWatcher than ctx, so that it can be closed when this function returns on error.

We can do it in a follow-up PR.

main.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
server.TLSConfig = s.getPrometheusTLSConfig(ctx, certWatcher)

if err := server.ListenAndServeTLS(s.certPath, s.keyPath); err != nil {
setupLog.Error(err, "Failed to start Prometheus metrics endpoint server")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error should be returned from this function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main.go Outdated Show resolved Hide resolved
err := server.ListenAndServeTLS(path.Join(sdkTLSDir, sdkTLSCrt), path.Join(sdkTLSDir, sdkTLSKey))
if err != nil {
setupLog.Error(err, "Failed to start Prometheus metrics endpoint server")
<-ctx.Done()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When below server.ListenAndServeTLS() returns because of an error, and this Start() function returns, this goroutine will still be waiting until context is closed.
Can you change the code, so that before the Start() returns, it makes sure that this goroutine has finished?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reconsidered this.
In this PR, we are already letting the above goroutine leak, from the Start() function, so this one can be left as is.

Can you add a // TODO comment to the code next to both goroutines, and open a new github issue, so that we don't forget to open a followup PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

This commit adds the usage of callbacks to Prometheus and
Webhook server to fetch TLS config on every request.

Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>
@sonarcloud
Copy link

sonarcloud bot commented Aug 28, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@akrejcir
Copy link
Collaborator

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 28, 2023
@kubevirt-bot kubevirt-bot merged commit 4570a61 into kubevirt:main Aug 28, 2023
4 checks passed
@opokornyy
Copy link
Contributor Author

/cherry-pick release-v0.18

@kubevirt-bot
Copy link
Contributor

@opokornyy: new pull request created: #668

In response to this:

/cherry-pick release-v0.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants