-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFE-984: Add support for custom CA bundle for reencrypt termination type routes #998
base: master
Are you sure you want to change the base?
Conversation
@bharath-b-rh: This pull request references CFE-984 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira refresh |
@bharath-b-rh: This pull request references CFE-984 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
cc @ShudiLi |
/retest |
1 similar comment
/retest |
/test e2e-gcp-operator |
991bf4a
to
98c7b2b
Compare
/test e2e-gcp-operator |
/test e2e-aws-ovn-single-node |
/cc @Miciah |
Tested it with 4.15.0-0.test-2023-11-28-002931-ci-ln-z8rzn72-latest % oc -n openshift-ingress get deployment router-int1 -oyaml | grep "var/run/configmaps" % oc -n openshift-ingress get deployment router-int1 -oyaml | grep -n "name: ingress" %oc -n openshift-config create configmap admin-ca-bundle --from-file=ca-bundle.crt=two-servers.pem % oc -n openshift-ingress rsh router-int1-86c8bf4447-8475m -----BEGIN CERTIFICATE----- % oc create route reencrypt myreen2 --service=sec-apach2 --hostname=reen222.int1.shudi-415g28.qe.gcp.devcluster.openshift.com % oc get route sh-4.4# curl https://reen222.int1.shudi-415g28.qe.gcp.devcluster.openshift.com -k sh-4.4# curl https://reenwithdstca.int1.shudi-415g28.qe.gcp.devcluster.openshift.com -k |
/label qe-approved |
@bharath-b-rh: This pull request references CFE-984 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Thank you @ShudiLi! |
/label docs-approved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I just took a quick review. I plan to continue to review more later.
pkg/operator/controller/cabundle-configmap/cabundle_configmap.go
Outdated
Show resolved
Hide resolved
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
pkg/operator/controller/cabundle-configmap/cabundle_configmap.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice unit and E2E tests, thanks for doing that. I think it's looking great, just some nit picks and questions.
pkg/operator/controller/cabundle-configmap/cabundle_configmap.go
Outdated
Show resolved
Hide resolved
pkg/operator/controller/cabundle-configmap/cabundle_configmap.go
Outdated
Show resolved
Hide resolved
pkg/operator/controller/cabundle-configmap/cabundle_configmap.go
Outdated
Show resolved
Hide resolved
if curTime.Sub(cert.NotBefore).Hours() < 0 { | ||
log.Info("certificate not yet valid, but will be considered", "certificate", certID, "valid after", cert.NotBefore.String(), "certificate bundle", caCertBundleName) | ||
} | ||
if curTime.Sub(cert.NotAfter).Hours() > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I believe an expired CA is useless to HaProxy and will cause SSL errors, but have you thought about the scenario in which the CA expires while the router is running (most likely the scenario)? Then, the next time a cluster-admin does a rollout of the router, the CA will be removed.
I don't think this is problematic, but I wonder if it may cause "churn" or confusion for a cluster admin because it's possible the SSL errors of an expired CA are different than having no CA at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Router returns the common error, but yes, if the HAProxy debug logs are enabled a user can find out the exact SSL error, and it would create confusion, but I think we could cover this in docs? or should we just log the error and continue to keep the CA certificate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it might be safer to err on the side of leaving the cert / logging the error and keeping the HaProxy failure mode of an expired CA Certificate. As long as the expired CA cert doesn't crash HaProxy (I'm pretty confident it doesn't), then I'm not totally sure what the net advantage is of skipping/removing it. HaProxy should handle expired CA's in a reasonable way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If HAProxy does not error out with the expired CA certificate, we should keep it. The cluster-admin expressed intent to use the CA certificate, so we should use it if possible (that is, if using it doesn't break ingress by causing HAProxy to refuse to start). However, it wouldn't be a bad idea to warn the cluster-admin about the expired CA certificate in an alert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion incorporated. Just the certificates with insecure signature algorithm and non-ca certificates will be skipped now.
func buildServerDeployment(name types.NamespacedName, labels map[string]string, port int32, secretName, caConfigMapName string) *appsv1.Deployment { | ||
const ( | ||
containerName = "hello-world-server" | ||
imageName = "quay.io/bharath-b-rh/hello-world:latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to try to use a standard image, and not one in a personal repo that could get deleted, or changed.
Some examples of images our existing tests use:
image-registry.openshift-image-registry.svc:5000/openshift/tools:latest
- The router deployment's:
deployment.Spec.Template.Spec.Containers[0].Image
quay.io/centos7/httpd-24-centos7
(this one I'm a little hesitant, but at least it's in a public repo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can work on making source code public, and add more description to image in repo.
For reencrypt, since we need to verify service certificates, we need an https server and I was looking for alternatives. One I came across was apache httpd, but this requires enabling certain configurations for SSL handling, which would require building own image with changes or handling it through entrypoint script.
Can we use https://hub.docker.com/r/mendhak/http-https-echo/tags
?
Please suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should use image-registry.openshift-image-registry.svc:5000/openshift/tools:latest
or quay.io/centos7/httpd-24-centos7
if we can. Alternatively, if the server code isn't too complex, we can build it into the ingress-operator executable, the way we did the the test servers in https://github.com/openshift/cluster-ingress-operator/tree/master/test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, we cannot use an image from Docker Hub as we could be affected by rate limiting or other issues if we did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something we can use quay.io/centos7/httpd-24-centos7
and add the httpd configuration we need via a configmap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the public image, instead now making use of the ingress-operator
image fetched from the deployment resource to use the test http server made available.
|
||
const ( | ||
retries = 3 | ||
retryInterval = time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might just be my speculative opinion, but a minute quite high to sleep for between requests. Maybe 10 or 20 seconds at most?
I also think it doesn't hurt to do more retries, maybe 5 or 10. Since this is going through the internet, we deal with flakes, especially flakey DNS lookup issue within our CI cluster. Might seem overly generous, but flakes are quite time and resource consuming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially used a minute retry interval thinking of the time taken configmap changes to reflect in pod.
I have updated the retries and interval.
) | ||
|
||
var ( | ||
insecureCertificateSignatureAlgorithms = map[x509.SignatureAlgorithm]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a godoc comment to explain why these algorithms are considered insecure, and why we need this map? Something like the following (but double-check what I wrote for factual accuracy!):
insecureCertificateSignatureAlgorithms = map[x509.SignatureAlgorithm]string{ | |
// insecureCertificateSignatureAlgorithms is used to warn about and | |
// filter out certificates that use algorithms that are no longer | |
// supported by OpenSSL. Configuring the router with a certificate that | |
// used one of these algorithms would cause HAProxy to refuse to start. | |
insecureCertificateSignatureAlgorithms = map[x509.SignatureAlgorithm]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion incorporated.
// currentIngressCABundleConfigMap returns the current configmap. Returns | ||
// if configmap exists, and an error value when an error is encountered. | ||
func (r *reconciler) currentAdminCABundleConfigMap(ctx context.Context) (*corev1.ConfigMap, error) { | ||
cm, err := r.fetchCABundleConfigMap(ctx, r.config.AdminCAConfigMapName) | ||
if err != nil { | ||
if errors.IsNotFound(err) { | ||
return nil, nil | ||
} | ||
return nil, err | ||
} | ||
return cm, nil | ||
} | ||
|
||
// currentIngressCABundleConfigMap returns the current configmap. Returns | ||
// if configmap exists, and an error value when an error is encountered. | ||
func (r *reconciler) currentServiceCABundleConfigMap(ctx context.Context) (*corev1.ConfigMap, error) { | ||
cm, err := r.fetchCABundleConfigMap(ctx, r.config.ServiceCAConfigMapName) | ||
if err != nil { | ||
return nil, err | ||
} | ||
return cm, nil | ||
} | ||
|
||
// currentIngressCABundleConfigMap returns the current configmap. Returns | ||
// if configmap exists, and an error value when an error is encountered. | ||
func (r *reconciler) currentIngressCABundleConfigMap(ctx context.Context) (bool, *corev1.ConfigMap, error) { | ||
cm, err := r.fetchCABundleConfigMap(ctx, r.config.IngressCAConfigMapName) | ||
if err != nil { | ||
if errors.IsNotFound(err) { | ||
return false, nil, nil | ||
} | ||
return false, nil, err | ||
} | ||
return true, cm, nil | ||
} | ||
|
||
// fetchCABundleConfigMap fetches a configmap. Returns if | ||
// configmap exists, and an error value when an error is encountered. | ||
func (r *reconciler) fetchCABundleConfigMap(ctx context.Context, name types.NamespacedName) (*corev1.ConfigMap, error) { | ||
cm := &corev1.ConfigMap{} | ||
err := r.client.Get(ctx, name, cm) | ||
return cm, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider defining a single currrentConfigMap
function:
// currentConfigMap returns the current named configmap. Returns a Boolean
// indicating whether the configmap existed, the configmap if it did exist,
// and an error value.
func (r *reconciler) currentConfigMap(ctx context.Context, name types.NamespacedName) (bool, *corev1.ConfigMap, error) {
var cm corev1.ConfigMap
if err := r.client.Get(ctx, name, &cm); err != nil {
if errors.IsNotFound(err) {
return false, nil, nil
}
return false, nil, err
}
return true, &cm, nil
}
Re-using a function avoids code duplication. Having the explicit Boolean return value makes the logic more consistent and forces you to do explicit if !haveFooConfigmap
checks, which is a convention we try to follow in this operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion incorporated.
pkg/operator/controller/cabundle-configmap/cabundle_configmap.go
Outdated
Show resolved
Hide resolved
// <Certificate CommonName> which is used in logging and for better identification of | ||
// certificate in the bundle. | ||
func getCertificatePrintId(cert *x509.Certificate) string { | ||
return fmt.Sprintf("%s(SerialNumber)-%s(CommonName)", encodeSerialNumber(*cert.SerialNumber), cert.Subject.CommonName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is SerialNumber
guaranteed to be non-nil? I don't see that documented as part of the API contract: https://pkg.go.dev/crypto/x509#Certificate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SerialNumber
must be a positive integer in conforming certificates according to RFC. It's not documented in API doc, butParseCertificate
API reads the serial number from the certificate and when not present will have the empty value of type. Please let me know your thoughts, should I remove logic around serial number.
if cert.Subject.CommonName == c.commonName && | ||
bytes.Compare(cert.SubjectKeyId, c.subjectKeyId) == 0 && | ||
cert.SerialNumber.Cmp(&c.serialNumber) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine to have two CA certificates with the same CN if they have different subject key ids or different serial numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, CommonName
is just an identifier provided by the user.
if adminCABundle != nil && adminCABundle.Data != nil { | ||
data, exist := adminCABundle.Data[adminCABundleConfigMapKeyName] | ||
if !exist { | ||
return false, nil, fmt.Errorf("%s is invalid, must contain \"%s\" key with required CA certificates", adminCABundle.Name, adminCABundleConfigMapKeyName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it an error if adminCABundle.Data[adminCABundleConfigMapKeyName]
is missing but not if adminCABundle.Data
is nil?
Would it be safer to log that adminCABundle
is invalid but continue on to create the ingress CA bundle configmap with the just service CA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it an error if adminCABundle.Data[adminCABundleConfigMapKeyName] is missing but not if adminCABundle.Data is nil?
A user could configure an empty configmap, but if it's not empty and the key name is misspelt, I thought that should be logged.
Would it be safer to log that adminCABundle is invalid but continue on to create the ingress CA bundle configmap with the just service CA?
Yeah, I have updated the code, to create events when adminCABundle has errors.
@@ -483,7 +483,7 @@ func TestDesiredRouterDeploymentSpecTemplate(t *testing.T) { | |||
if volume.Secret.SecretName != secretName { | |||
t.Errorf("router Deployment expected volume %s to have secret %s, got %s", volume.Name, secretName, volume.Secret.SecretName) | |||
} | |||
} else if volume.Name != "service-ca-bundle" { | |||
} else if volume.Name != controller.IngressCAConfigMapName().Name { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for the changes in this file? For test code, I think it is appropriate to use explicit string literals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The volume name is changed to ingress-ca-bundle
by this implementation and while updating the same thought to use the definition and avoid hard coding. Please let me know, if it needs to be reverted.
if err = wait.PollUntilContextTimeout(ctx, 2*time.Second, 3*time.Minute, true, func(ctx context.Context) (bool, error) { | ||
if err := kclient.Get(ctx, svcDeploymentName1, svcDeployment1); err != nil { | ||
t.Logf("failed to get server deployment %q: %v", svcDeploymentName1, err) | ||
return false, nil | ||
} | ||
for _, cond := range svcDeployment1.Status.Conditions { | ||
if cond.Type == appsv1.DeploymentAvailable { | ||
return cond.Status == corev1.ConditionTrue, nil | ||
} | ||
} | ||
return false, nil | ||
}); err != nil { | ||
t.Fatalf("timed out waiting for deployment %q to become ready: %v", svcDeploymentName1, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be a helper function, like waitForDeploymentComplete
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion incorporated.
13a8229
to
b7046f6
Compare
/hold openshift/router#537 |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
@bharath-b-rh: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
I won't be able to review this due to other commitments. |
PR has the changes for enhancement proposed to support custom CA bundle to be used by the router to verify the server's certificate for the
reencrypt
termination type when thedestinationCA
is not configured.Below functionalities are added:
ingress-ca-bundle
configmap will be created by a new controllercabundleconfigmap
of the operator, which contains CA certificates bundle containing the certificates inservice-ca-bundle
configmap and user createdadmin-ca-bundle
configmap.ingress-ca-bundle
,service-ca-bundle
andadmin-ca-bundle
configmaps for any modifications and updatesingress-ca-bundle
to desired state.ingress-ca-bundle
available in operandrouter
as a file at/var/run/configmaps/ca-trust/ca-bundle.crt
and sets same path toDEFAULT_DESTINATION_CA_PATH
environment variable used by operandrouter
.