Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASMTRIAGE-7616 make certmanager upgrade more robust, redo upgrade if issuers chart deploy fails #5603

Merged
merged 1 commit into from
Dec 17, 2024

Conversation

leliasen-hpe
Copy link
Contributor

Description

If the cray-certmanger-issuers chart fails to deploy, the cert-manager upgrade step in prerequisites.sh will not reattempt to upgrade cert-manger because it doesn't test if the cray-certmanager-issuers chart is deployed.

The following changes were made to make the certmanager upgrade more robust:

  • Check if the cray-certmanager-issuers chart is deployed, if not, redo the cert-manager upgrade
  • Make the backup_secret variable global inside the cert-manager upgrade block. This way it can be used even if the upgrade section is skipped.
  • If certificates don't exist at the end of the upgrade section and both cray-certmanager and cray-certmanager-issers have been installed, then retry to apply the certificates.

Testing

Case 1: Certmanager and certmanager-issers have been upgraded but no certificates exist.

note: cert-manager helm chart version 1.12.9
note: no cert-manager upgrade steps needed, cert-manager 1.5.5 is not installed
WARNING: certificates were not restored after certmanager upgrade. 'kubectl get certificates -A' does not show certificates.
Certificates should have been restored from backup: 'kubectl get secret cm-restore-data-16'
cray-certmanager and cray-certmanager-issuers have been installed. Attempting to restore cert-manager backup
certificate.cert-manager.io/cfs-ara-postgres-tls created
certificate.cert-manager.io/cray-console-data-postgres-tls created
certificate.cert-manager.io/cray-dhcp-kea-postgres-tls created
certificate.cert-manager.io/cray-dns-powerdns-postgres-tls created
certificate.cert-manager.io/cray-hms-rts-default-tls created
certificate.cert-manager.io/cray-oauth2-proxies-customer-access created
certificate.cert-manager.io/cray-oauth2-proxies-customer-high-speed created
certificate.cert-manager.io/cray-oauth2-proxies-customer-management created
certificate.cert-manager.io/cray-sls-postgres-tls created
certificate.cert-manager.io/cray-smd-postgres-tls created
certificate.cert-manager.io/gitea-vcs-postgres-tls created
certificate.cert-manager.io/keycloak-postgres-tls created
certificate.cert-manager.io/cray-spire-postgres-tls created
certificate.cert-manager.io/cray-spire-tokens-tls created
certificate.cert-manager.io/tapms-webhook-server-cert created
Error from server (Conflict): error when applying patch:
{"metadata":{"resourceVersion":"11766"}}
to:
Resource: "cert-manager.io/v1, Resource=certificates", GroupVersionKind: "cert-manager.io/v1, Kind=Certificate"
Name: "dvs-mqtt-cert", Namespace: "istio-system"
for: "STDIN": Operation cannot be fulfilled on certificates.cert-manager.io "dvs-mqtt-cert": the object has been modified; please apply your changes to the latest version and try again
Error from server (Conflict): error when applying patch:
{"metadata":{"resourceVersion":"11764"}}
to:
Resource: "cert-manager.io/v1, Resource=certificates", GroupVersionKind: "cert-manager.io/v1, Kind=Certificate"
Name: "ingress-gateway-cert", Namespace: "istio-system"
for: "STDIN": Operation cannot be fulfilled on certificates.cert-manager.io "ingress-gateway-cert": the object has been modified; please apply your changes to the latest version and try again
Error from server (Conflict): error when applying patch:
{"metadata":{"resourceVersion":"11732"}}
to:
Resource: "cert-manager.io/v1, Resource=issuers", GroupVersionKind: "cert-manager.io/v1, Kind=Issuer"
Name: "cert-manager-issuer-common", Namespace: "ceph-rgw"
for: "STDIN": Operation cannot be fulfilled on issuers.cert-manager.io "cert-manager-issuer-common": the object has been modified; please apply your changes to the latest version and try again
Error from server (Conflict): error when applying patch:
{"metadata":{"resourceVersion":"11735"}}
to:
Resource: "cert-manager.io/v1, Resource=issuers", GroupVersionKind: "cert-manager.io/v1, Kind=Issuer"
Name: "cert-manager-issuer-common", Namespace: "istio-system"
for: "STDIN": Operation cannot be fulfilled on issuers.cert-manager.io "cert-manager-issuer-common": the object has been modified; please apply your changes to the latest version and try again
Error from server (Conflict): error when applying patch:
{"metadata":{"resourceVersion":"11780"}}
to:
Resource: "cert-manager.io/v1, Resource=issuers", GroupVersionKind: "cert-manager.io/v1, Kind=Issuer"
Name: "cert-manager-issuer-common", Namespace: "services"
for: "STDIN": Operation cannot be fulfilled on issuers.cert-manager.io "cert-manager-issuer-common": the object has been modified; please apply your changes to the latest version and try again
Error from server (Conflict): error when applying patch:
{"metadata":{"resourceVersion":"11374"}}
to:
Resource: "cert-manager.io/v1, Resource=issuers", GroupVersionKind: "cert-manager.io/v1, Kind=Issuer"
Name: "cert-manager-issuer-common", Namespace: "slurm-operator"
for: "STDIN": Operation cannot be fulfilled on issuers.cert-manager.io "cert-manager-issuer-common": the object has been modified; please apply your changes to the latest version and try again
Error from server (Conflict): error when applying patch:
{"metadata":{"resourceVersion":"11372"}}
to:
Resource: "cert-manager.io/v1, Resource=issuers", GroupVersionKind: "cert-manager.io/v1, Kind=Issuer"
Name: "cert-manager-issuer-common", Namespace: "sma"
for: "STDIN": Operation cannot be fulfilled on issuers.cert-manager.io "cert-manager-issuer-common": the object has been modified; please apply your changes to the latest version and try again
Error from server (Conflict): error when applying patch:
{"metadata":{"resourceVersion":"11577"}}
to:
Resource: "cert-manager.io/v1, Resource=issuers", GroupVersionKind: "cert-manager.io/v1, Kind=Issuer"
Name: "cert-manager-issuer-common", Namespace: "spire"
for: "STDIN": Operation cannot be fulfilled on issuers.cert-manager.io "cert-manager-issuer-common": the object has been modified; please apply your changes to the latest version and try again
Error from server (Conflict): error when applying patch:
{"metadata":{"resourceVersion":"11545"}}
to:
Resource: "cert-manager.io/v1, Resource=issuers", GroupVersionKind: "cert-manager.io/v1, Kind=Issuer"
Name: "cert-manager-issuer-common", Namespace: "tapms-operator"
for: "STDIN": Operation cannot be fulfilled on issuers.cert-manager.io "cert-manager-issuer-common": the object has been modified; please apply your changes to the latest version and try again
Certificates were successfully restored

Case 2: Error message when certificates failed to restore

note: cert-manager helm chart version 1.12.9
note: no cert-manager upgrade steps needed, cert-manager 1.5.5 is not installed
WARNING: certificates were not restored after certmanager upgrade. 'kubectl get certificates -A' does not show certificates.
Certificates should have been restored from backup: 'kubectl get secret cm-restore-data-16'
cray-certmanager and cray-certmanager-issuers have been installed. Attempting to restore cert-manager backup
# did nothing
ERROR: certificates failed to restore. 'kubectl get certificates -A' does not show certificates.

Case 3: Uninstalled cray-certmanager.

Upgrade ran smoothly and certificates were restored.

Case 4: Uninstalled cray-certmanager-issuers

note: cert-manager helm chart version 1.12.9
note: no cert-manager upgrade steps needed, cert-manager 1.5.5 is not installed
note: no helm install exists for cert-manager-issuers. Cert-manager upgrade is needed to install cert-manager-issuers
secret/cm-restore-data-16 created
W1217 16:27:52.588163   28802 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1217 16:27:52.588183   28802 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1217 16:27:52.588165   28802 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
release "cray-certmanager" uninstalled
namespace "cert-manager" deleted
2024-12-17T16:28:00Z INF Initializing the connection to the Kubernetes cluster using KUBECONFIG (system default), and context (current-context) command=ship
2024-12-17T16:28:00Z INF Initializing helm client object command=ship
         |\
         | \
         |  \
         |___\      Shipping your Helm workloads with Loftsman
       \--||___/
  ~~~~~~\_____/~~~~~~~

2024-12-17T16:28:00Z INF Ensuring that the loftsman namespace exists command=ship
2024-12-17T16:28:00Z INF Loftsman will use the packaged charts at /var/www/ephemeral/csm-1.6.1-alpha.5/helm as the Helm install source command=ship
2024-12-17T16:28:00Z INF Running a release for the provided manifest at /tmp/certmanager-tmp-manifest.yaml command=ship

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Releasing cray-drydock v2.18.4
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...

Case 5: Check that if certmanager or certmanager-issuers are not deployed, certificates will not be attempted to be restored

WARNING: certificates were not restored after certmanager upgrade. 'kubectl get certificates -A' does not show certificates.
Certificates should have been restored from backup: 'kubectl get secret cm-restore-data-16'
ERROR: cray-certmanager and/or cray-certmanager-issers charts failed to deploy

Checklist

  • If I added any command snippets, the steps they belong to follow the prompt conventions (see example).
  • If I added a new directory, I also updated .github/CODEOWNERS with the corresponding team in Cray-HPE.
  • My commits or Pull-Request Title contain my JIRA information, or I do not have a JIRA.

@leliasen-hpe leliasen-hpe changed the title CASMTRIAGE-7616 make certmanager upgrade more robust, should redo upgrade if issuers chart deploy fails CASMTRIAGE-7616 make certmanager upgrade more robust, redo upgrade if issuers chart deploy fails Dec 17, 2024
@studenym-hpe studenym-hpe self-requested a review December 17, 2024 19:15
@rustydb rustydb merged commit ffc55b8 into release/1.6 Dec 17, 2024
8 checks passed
@rustydb rustydb deleted the CASMTRIAGE-7616 branch December 17, 2024 20:07
Srinivas-Anand-HPE pushed a commit that referenced this pull request Dec 20, 2024
… issuers chart deploy fails (#5603)

CASMTRIAGE-7616 make certmanager upgrade more robust, should redo upgrade if issuers chart deploy fails
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants