Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MGMT-13713: Add confidential VM support #54

Conversation

mresvanis
Copy link
Contributor

@mresvanis mresvanis commented Mar 10, 2023

This change adds support for Azure Confidential VMs and Trusted Launch for VMs.
Feature link: https://issues.redhat.com/browse/OCPBU-233

Depends on: openshift/api#1403 - merged

Upstream Cluster API provider Azure PR: kubernetes-sigs/cluster-api-provider-azure#3265

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 10, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 10, 2023

@mresvanis: This pull request references MGMT-13713 which is a valid jira issue.

In response to this:

This change adds support for Azure Confidential VMs.
Feature link: https://issues.redhat.com/browse/OCPBU-233

Depends on: openshift/api#1403 - open

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from elmiko and lobziik March 10, 2023 11:24
@mresvanis
Copy link
Contributor Author

/hold
This PR depends on openshift/api#1403, which is still open.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 10, 2023
@mresvanis
Copy link
Contributor Author

@mresvanis mresvanis force-pushed the mgmt-13713-add-confidential-vm-support branch from ec2ffce to 1b4afac Compare June 15, 2023 12:58
@mresvanis
Copy link
Contributor Author

/test unit

@mresvanis
Copy link
Contributor Author

/retest-required

@mresvanis mresvanis force-pushed the mgmt-13713-add-confidential-vm-support branch from 1b4afac to a1270aa Compare June 16, 2023 13:01
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 16, 2023

@mresvanis: This pull request references MGMT-13713 which is a valid jira issue.

In response to this:

This change adds support for Azure Confidential VMs.
Feature link: https://issues.redhat.com/browse/OCPBU-233

Depends on: openshift/api#1403 - open

Upstream Cluster API provider Azure PR: kubernetes-sigs/cluster-api-provider-azure#3265

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mresvanis mresvanis force-pushed the mgmt-13713-add-confidential-vm-support branch from a1270aa to b64aef1 Compare June 20, 2023 10:03
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 20, 2023
@mresvanis mresvanis force-pushed the mgmt-13713-add-confidential-vm-support branch from b64aef1 to e12d9d5 Compare June 20, 2023 10:23
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 20, 2023
@mresvanis
Copy link
Contributor Author

@elmiko @lobziik since the respective upstream Cluster API PR has been merged and the OpenShift API PR is on the way, I would appreciate your review and feedback for this change. Thank you.

@elmiko
Copy link
Contributor

elmiko commented Jun 28, 2023

apologies @mresvanis , i am just seeing this. i will try to review this week.

@mresvanis mresvanis force-pushed the mgmt-13713-add-confidential-vm-support branch from e12d9d5 to f7e9189 Compare June 30, 2023 09:39
@mresvanis
Copy link
Contributor Author

/test unit

@mresvanis
Copy link
Contributor Author

/test e2e-azure-operator

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this generally makes sense to me, i just have a few questions

edit: fwiw, the failure on e2e-azure-operator is a known brittle test on azure, i doubt it is this PR affecting its failure.

@mresvanis mresvanis force-pushed the mgmt-13713-add-confidential-vm-support branch from f7e9189 to d51d3eb Compare July 6, 2023 14:52
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 7, 2023

@mresvanis: This pull request references MGMT-13713 which is a valid jira issue.

In response to this:

This change adds support for Azure Confidential VMs.
Feature link: https://issues.redhat.com/browse/OCPBU-233

Depends on: openshift/api#1403 - merged

Upstream Cluster API provider Azure PR: kubernetes-sigs/cluster-api-provider-azure#3265

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mresvanis mresvanis force-pushed the mgmt-13713-add-confidential-vm-support branch from d51d3eb to cbc3c71 Compare July 7, 2023 13:26
@mresvanis
Copy link
Contributor Author

/unhold
This PR depends on openshift/api#1403, which has been merged.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 7, 2023
Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this generally looks good to me, i'd like to get another member of the cloud team to review as well.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 7, 2023
@elmiko
Copy link
Contributor

elmiko commented Jul 7, 2023

on second thought, it seems we've broken the unit test somehow...

/unlgtm

@elmiko
Copy link
Contributor

elmiko commented Jul 7, 2023

/lgtm cancel

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 7, 2023
@mresvanis
Copy link
Contributor Author

/test unit

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 10, 2023

@mresvanis: This pull request references MGMT-13713 which is a valid jira issue.

In response to this:

This change adds support for Azure Confidential VMs and Trusted Launch for VMs.
Feature link: https://issues.redhat.com/browse/OCPBU-233

Depends on: openshift/api#1403 - merged

Upstream Cluster API provider Azure PR: kubernetes-sigs/cluster-api-provider-azure#3265

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mresvanis
Copy link
Contributor Author

mresvanis commented Jul 10, 2023

on second thought, it seems we've broken the unit test somehow...

@elmiko I am also seeing this error randomly on the main branch when running the tests locally.

Specifically:

$ git log -1
commit 97ea1221e4ca999a161ddee4d1e075880fcbad67 (HEAD -> main, origin/release-4.15, origin/release-4.14, origin/main, origin/HEAD)

$ make test
Testing...
KUBEBUILDER_ASSETS="/home/mresvani/Projects/machine-api-provider-azure/bin/k8s/1.27.1-linux-amd64" ./hack/ci-test.sh
go run ./hack/../vendor/github.com/onsi/ginkgo/v2/ginkgo -v --randomize-all --randomize-suites --keep-going --race --trace --timeout=30m <omitted>
...
Summarizing 1 Failure:
  [TIMEDOUT] Handler Suite when polling the termination endpoint [JustBeforeEach] and the poll URL cannot be reached should return an error
  /home/mresvani/Projects/machine-api-provider-azure/pkg/termination/termination_test.go:138

Ran 1 of 13 Specs in 1765.299 seconds
FAIL! - Suite Timeout Elapsed -- 0 Passed | 1 Failed | 0 Pending | 12 Skipped
--- FAIL: TestReconciler (1765.30s)
FAIL

Ginkgo ran 9 suites in 30m1.387893037s

There were failures detected in the following suites:
   termination ./pkg/termination
    machineset ./pkg/cloud/azure/actuators/machineset [Suite did not run because the timeout elapsed]
     actuators ./pkg/cloud/azure/actuators [Suite did not run because the timeout elapsed]
         azure ./pkg/cloud/azure [Suite did not run because the timeout elapsed]
   versioninfo ./cmd/versioninfo [Suite did not run because the timeout elapsed]
       machine ./pkg/cloud/azure/actuators/machine [Suite did not run because the timeout elapsed]
  resourceskus ./pkg/cloud/azure/services/resourceskus [Suite did not run because the timeout elapsed]
        ttllru ./pkg/util/cache/ttllru [Suite did not run because the timeout elapsed]

Test Suite Failed
exit status 1
make: *** [Makefile:118: test] Error 1

I will try to track the issue down, since I can reproduce it easily locally, but I don't think it's caused by any changes in this PR. WDYT?

UPDATE: I think I found the unit tests issue and proposed a PR.

@elmiko
Copy link
Contributor

elmiko commented Jul 14, 2023

thanks for doing the debug work @mresvanis , i have approved the other PR, once it has merged and we can rebase this one i am happy to give an lgtm, i would like to get another member of the team to review as well.

Signed-off-by: Michail Resvanis <mresvani@redhat.com>
@mresvanis mresvanis force-pushed the mgmt-13713-add-confidential-vm-support branch from cbc3c71 to dd02e5a Compare July 17, 2023 08:35
@mresvanis
Copy link
Contributor Author

thanks for doing the debug work @mresvanis , i have approved the other PR, once it has merged and we can rebase this one i am happy to give an lgtm, i would like to get another member of the team to review as well.

@elmiko thank you for the review, I have rebased the PR branch to latest main.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 17, 2023

@mresvanis: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@elmiko
Copy link
Contributor

elmiko commented Jul 17, 2023

thanks @mresvanis , i'd like to get another review from the team but for me it's

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 17, 2023
Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Comment on lines +467 to +494
if vmSpec.SecurityProfile.Settings.SecurityType != machinev1.SecurityTypesConfidentialVM {
return nil, apierrors.InvalidMachineConfiguration("failed to generate security profile for vm %s. "+
"SecurityType should be set to %s when SecurityEncryptionType is defined.",
vmSpec.Name, compute.SecurityTypesConfidentialVM)
}

if vmSpec.SecurityProfile.Settings.ConfidentialVM == nil {
return nil, apierrors.InvalidMachineConfiguration("failed to generate security profile for vm %s. "+
"UEFISettings should be set when SecurityEncryptionType is defined.", vmSpec.Name)
}

if vmSpec.SecurityProfile.Settings.ConfidentialVM.UEFISettings.VirtualizedTrustedPlatformModule != machinev1.VirtualizedTrustedPlatformModulePolicyEnabled {
return nil, apierrors.InvalidMachineConfiguration("failed to generate security profile for vm %s. "+
"VirtualizedTrustedPlatformModule should be enabled when SecurityEncryptionType is defined.", vmSpec.Name)
}

if osDisk.ManagedDisk.SecurityProfile.SecurityEncryptionType == compute.SecurityEncryptionTypesDiskWithVMGuestState {
if vmSpec.SecurityProfile.EncryptionAtHost != nil && *vmSpec.SecurityProfile.EncryptionAtHost {
return nil, apierrors.InvalidMachineConfiguration("failed to generate security profile for vm %s. "+
"EncryptionAtHost cannot be set to true when SecurityEncryptionType is set to %s.",
vmSpec.Name, compute.SecurityEncryptionTypesDiskWithVMGuestState)
}
if vmSpec.SecurityProfile.Settings.ConfidentialVM.UEFISettings.SecureBoot != machinev1.SecureBootPolicyEnabled {
return nil, apierrors.InvalidMachineConfiguration("failed to generate security profile for vm %s. "+
"SecureBoot should be enabled when SecurityEncryptionType is set to %s.",
vmSpec.Name, compute.SecurityEncryptionTypesDiskWithVMGuestState)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should duplicate these validations into the webhooks in MAO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point, I'll open a PR as soon as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the respective MAO PR.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 18, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 18, 2023
@openshift-merge-robot openshift-merge-robot merged commit 458e10c into openshift:main Jul 18, 2023
@mresvanis mresvanis deleted the mgmt-13713-add-confidential-vm-support branch July 18, 2023 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants