Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for confidential VMs #3265

Merged

Conversation

mresvanis
Copy link
Contributor

@mresvanis mresvanis commented Mar 10, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

This change adds support for Confidential VMs and Trusted launch for VMs.

Azure Confidential VMs are defined by their SecurityProfile.SecurityType ConfidentialVM, which should be defined along with the OSDisk.ManagedDisk.SecurityProfile.SecurityEncryptionType field. Trusted launch for VMs is defined by the SecurityProfile.SecurityType TrustedLaunch, which should be defined along with the SecurityProfile.UefiSettings section, i.e. the SecureBootEnabled and VTpmEnabled fields.

Related image-builder PR.

Which issue(s) this PR fixes:
Fixes #3264

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Add SecurityProfile SecurityType to API
Add OSDisk VMDiskSecurityProfile to API
Add SecurityProfile UefiSettings with SecureBootEnabled and VTpmEnabled to API

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Mar 10, 2023
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 10, 2023
@mresvanis mresvanis force-pushed the add-confidential-vm-support branch from f1afdb5 to f0e5485 Compare March 10, 2023 11:08
@mresvanis
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e

@nawazkh
Copy link
Member

nawazkh commented Mar 15, 2023

I started looking into this PR, will take sometime in going over the Azure Docs and the new fields being added.
Thanks for all the work!

@CecileRobertMichon
Copy link
Contributor

@nawazkh are you still reviewing this?

@nawazkh
Copy link
Member

nawazkh commented Apr 4, 2023

Extremely sorry, It slipped my radar after I was half way done reviewing this PR. I will get started on it again!
Sorry @mresvanis !

@jackfrancis jackfrancis added this to the v1.9 milestone Apr 6, 2023
Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your hard work on this! I just had one comment and a general question: I think it'd be great to have an E2E test for this feature. This may be out of scope for this PR but this seems like a great feature that can be tested E2E.

api/v1beta1/azuremachine_webhook_test.go Show resolved Hide resolved
@mresvanis
Copy link
Contributor Author

mresvanis commented Apr 18, 2023

Thanks for all your hard work on this! I just had one comment and a general question: I think it'd be great to have an E2E test for this feature. This may be out of scope for this PR but this seems like a great feature that can be tested E2E.

@willie-yao Thank you for taking the time to review this change :)

I agree that having an E2E test for this feature would be quite useful. In order to keep this PR at its current size, would it be an option to open an issue for the E2E test and then tackle it in a future PR? Or would you prefer to have the E2E test as part of this PR?

(I would be happy to propose a change for the E2E test either way, I would just need a bit more time)

@willie-yao
Copy link
Contributor

In order to keep this PR at its current size, would it be an option to open an issue for the E2E test and then tackle it in a future PR?

Ideally, we would want to have an E2E test be a part of this PR in order to ensure the feature is fully working E2E. Since the next release is planned for May 2, it would be totally okay to take some time. The feature won't be available to users until then anyways.

Would this be enough time for you to implement an E2E test? I apologize for the rush on this.. it's on us for not reviewing this PR sooner 😔

@mresvanis
Copy link
Contributor Author

Would this be enough time for you to implement an E2E test? I apologize for the rush on this.. it's on us for not reviewing this PR sooner

Absolutely no problem, I'll get started as soon as I can and ping you here when I have something up for review. Thanks again for your time.

Copy link
Member

@nawazkh nawazkh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great effort @mresvanis , thanks for taking the time in putting this together!
Also, I apologize for sharing my review so late.

I would generally share all the review at once, but I am breaking it in chunks here to get the ball rolling. I plan on completing reviewing this PR in couple more hours.

api/v1beta1/types.go Outdated Show resolved Hide resolved
api/v1beta1/types.go Show resolved Hide resolved
api/v1beta1/types.go Outdated Show resolved Hide resolved
api/v1beta1/types.go Outdated Show resolved Hide resolved
Copy link
Member

@nawazkh nawazkh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 2/2 of the review.
I have skipped reviewing azuremachine_validation(.go and _test.go), azuremachine_validation_test.go since we are expecting for some refactor.

Comment on lines 301 to 334
if storageProfile.OsDisk.ManagedDisk != nil &&
storageProfile.OsDisk.ManagedDisk.SecurityProfile != nil &&
storageProfile.OsDisk.ManagedDisk.SecurityProfile.SecurityEncryptionType != "" {
if s.SecurityProfile.EncryptionAtHost != nil && *s.SecurityProfile.EncryptionAtHost {
return nil, azure.WithTerminalError(errors.New("encryption at host is not supported when SecurityEncrytionType is set"))
}

securityProfile.UefiSettings = &compute.UefiSettings{}

if s.SecurityProfile.SecureBoot != "" && s.SecurityProfile.SecureBoot == infrav1.SecureBootPolicyEnabled {
securityProfile.UefiSettings.SecureBootEnabled = pointer.Bool(true)
} else {
securityProfile.UefiSettings.SecureBootEnabled = pointer.Bool(false)
}

if storageProfile.OsDisk.ManagedDisk.SecurityProfile.SecurityEncryptionType == compute.SecurityEncryptionTypesDiskWithVMGuestState && !*securityProfile.UefiSettings.SecureBootEnabled {
return nil, azure.WithTerminalError(errors.Errorf("secure boot should be enabled when SecurityEncrytionType is set to %s", compute.SecurityEncryptionTypesDiskWithVMGuestState))
}

if s.SecurityProfile.VirtualizedTrustedPlatformModule != "" && s.SecurityProfile.VirtualizedTrustedPlatformModule == infrav1.VirtualizedTrustedPlatformModulePolicyDisabled {
return nil, azure.WithTerminalError(errors.New("vTPM should be enabled when SecurityEncrytionType is set"))
}

securityProfile.UefiSettings.VTpmEnabled = pointer.Bool(true)

securityProfile.SecurityType = compute.SecurityTypesConfidentialVM

return securityProfile, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better if we enforce these validations at webhook level so that there are lesser chances for a user to run into Terminal Error ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think, if we were to refactor the securityProfile as per earlier comments, then we would run into lesser (almost none) validations and have more of assignments at this stage of v1beta1 -> azure conversion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better if we enforce these validations at webhook level so that there are lesser chances for a user to run into Terminal Error ?

I absolutely agree, these validations are quite useful at the webhook level and we have them here. The latter is initially called through here. Does this makes sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

@CecileRobertMichon do you know why we perform validations in generateSecurityProfile() func?
I am trying to understand the reasoning behind generateSecurityProfile() returning azure.WithTerminalError().
Can we not perform all the validations at webhook level ? Is it because we need s of type *VMSpec populated for these validations, and s can only be populated later in the reconciliation and not at the webhook level?

Comment on lines 341 to 352
if s.SecurityProfile.SecureBoot != "" {
if s.SKU.HasCapability(resourceskus.TrustedLaunchDisabled) && s.SecurityProfile.SecureBoot == infrav1.SecureBootPolicyEnabled {
return nil, azure.WithTerminalError(errors.Errorf("secure boot is not supported for VM type %s", s.Size))
}

if s.SecurityProfile.SecureBoot == infrav1.SecureBootPolicyEnabled {
securityProfile.UefiSettings.SecureBootEnabled = pointer.Bool(true)
securityProfile.SecurityType = compute.SecurityTypesTrustedLaunch
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I am not sure if s.SKU.HasCapability(...) can be run at webhook level. If we cannot perform s.SKU.HasCapability(...) validations at webhook, then we might have to leave some of these dependent validations here.

What do you think @CecileRobertMichon @willie-yao ?

azure/services/resourceskus/sku.go Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Apr 20, 2023

Codecov Report

Patch coverage: 85.83% and project coverage change: -0.24 ⚠️

Comparison is base (f20a204) 53.33% compared to head (353a109) 53.10%.

❗ Current head 353a109 differs from pull request most recent head 720566f. Consider uploading reports for the commit 720566f to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3265      +/-   ##
==========================================
- Coverage   53.33%   53.10%   -0.24%     
==========================================
  Files         185      182       -3     
  Lines       18433    18367      -66     
==========================================
- Hits         9832     9754      -78     
  Misses       8063     8063              
- Partials      538      550      +12     
Impacted Files Coverage Δ
api/v1beta1/types.go 60.71% <ø> (ø)
azure/services/resourceskus/sku.go 0.00% <ø> (ø)
api/v1beta1/azuremachine_validation.go 83.54% <82.97%> (-0.17%) ⬇️
azure/services/virtualmachines/spec.go 88.35% <87.67%> (-0.64%) ⬇️

... and 20 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

if profile != nil && securityEncryptionType != "" {
if profile.EncryptionAtHost != nil && *profile.EncryptionAtHost && securityEncryptionType == SecurityEncryptionTypeDiskWithVMGuestState {
allErrs = append(allErrs, field.Invalid(fieldPath.Child("EncryptionAtHost"), profile.EncryptionAtHost,
"EncryptionAtHost cannot be set to 'true' when securityEncryptionType is set to DiskWithVMGuestState"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @anujmaheshwari1 I wonder if we do this correctly for AKS...

VirtualizedTrustedPlatformModulePolicyEnabled)))
}

if securityEncryptionType == SecurityEncryptionTypeDiskWithVMGuestState {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure DiskWithVMGuestState is what you want? I've only used CVM with VMGuestStateOnly

either way, VMGuestStateOnly should be a valid setting for CVM, I think, and works with secure boot.

securityProfile.UefiSettings = &compute.UefiSettings{}

if s.SecurityProfile.SecureBoot != "" {
if s.SKU.HasCapability(resourceskus.TrustedLaunchDisabled) && s.SecurityProfile.SecureBoot == infrav1.SecureBootPolicyEnabled {
Copy link
Contributor

@alexeldeib alexeldeib Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a gen1/gen2 check here as well. I believe you only get TrustedLaunchDisabled from SKU API on gen2 sizes which do not support it, it's annoyingly inverted from most capabilities.

as-is I believe you allow a gen1 VM size with TL + secure boot which is invalid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could sanity check sku api to confirm --

az vm list-skus -r virtualMachines -s Something_Thats_only_gen1 (Standard_NC6?)

@alexeldeib
Copy link
Contributor

apologies for drive-by review but hope I added some useful comments -- happen to have worked a little on CVM/TL from AKS perspective so was curious :)

@alexeldeib
Copy link
Contributor

some other random high level thoughts

  • sku api in webhooks: your intuition is correct I don't think we do that today due to requirements to fetch creds, call azure, etc. would be nice
  • other early validation: you can technically regex a lot of stuff without the sku api. I think CVM may work for that case, but you'd have to update the regex in the future. I got tired of manually updating sizes for AKS which is why I wrote the sku api clients to begin with. your mileage may vary :)

@mresvanis mresvanis force-pushed the add-confidential-vm-support branch from 2fc3e0c to 6e04db1 Compare May 9, 2023 10:06
@nawazkh
Copy link
Member

nawazkh commented May 12, 2023

This PR looks good to me!

@CecileRobertMichon
Copy link
Contributor

I still don't see any documentation in the PR, @mresvanis are you still working on adding it?

@mresvanis
Copy link
Contributor Author

I still don't see any documentation in the PR, @mresvanis are you still working on adding it?

@CecileRobertMichon yes, I was waiting for the image-builder PR to get merged first, but I will add the documentation today so that you have time to review it.

@mresvanis mresvanis force-pushed the add-confidential-vm-support branch 2 times, most recently from 8574389 to 353a109 Compare May 15, 2023 13:31
@mresvanis
Copy link
Contributor Author

@CecileRobertMichon I added documentation for both Confidential VMs and Trusted Launch for VMs, by adding 2 new topics. Please feel free to review. Thank you.

@nawazkh
Copy link
Member

nawazkh commented May 17, 2023

I looked into the newly added docs and they look good to me! 🚀
I like the thought of waiting until we have a defined way of generating CVM images and then merging this doc with the steps.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 17, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: ab33867f4da255c6876eb53b839ff1231377fc59

@mresvanis
Copy link
Contributor Author

@mboersma thank you for your input in the respective image-builder PR! Since that has been merged, your thoughts here would also be much appreciated.

Copy link
Contributor

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your work and your patience. This is great!

I only found some typos and had some minor doc comments, feel free to ignore those you disagree with.

api/v1beta1/types.go Show resolved Hide resolved
azure/services/virtualmachines/spec.go Outdated Show resolved Hide resolved
azure/services/virtualmachines/spec_test.go Outdated Show resolved Hide resolved
azure/services/virtualmachines/spec.go Outdated Show resolved Hide resolved
azure/services/virtualmachines/spec.go Outdated Show resolved Hide resolved
api/v1beta1/types.go Outdated Show resolved Hide resolved
docs/book/src/topics/confidential-vms.md Outdated Show resolved Hide resolved
docs/book/src/topics/confidential-vms.md Outdated Show resolved Hide resolved
docs/book/src/topics/trusted-launch-for-vms.md Outdated Show resolved Hide resolved
docs/book/src/topics/trusted-launch-for-vms.md Outdated Show resolved Hide resolved
@mresvanis mresvanis force-pushed the add-confidential-vm-support branch from 353a109 to 3582327 Compare June 12, 2023 11:47
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2023
@k8s-ci-robot k8s-ci-robot requested a review from nawazkh June 12, 2023 11:47
@mresvanis mresvanis force-pushed the add-confidential-vm-support branch from 3582327 to cbaaaca Compare June 12, 2023 11:54
This change adds support for Confidential VMs and Trusted launch for
VMs.

Azure Confidential VMs are defined by their SecurityProfile.SecurityType
ConfidentialVM, which should be defined along with the
OSDisk.ManagedDisk.SecurityProfile.SecurityEncryptionType field.

Trusted launch for VMs is defined by the SecurityProfile.SecurityType
TrustedLaunch, which should be defined along with the
SecurityProfile.UefiSettings section, i.e. the SecureBootEnabled and
VTpmEnabled fields.

Signed-off-by: Michail Resvanis <mresvani@redhat.com>
@mresvanis mresvanis force-pushed the add-confidential-vm-support branch from cbaaaca to 720566f Compare June 12, 2023 12:15
Copy link
Contributor

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 4410040d6da82f0cc54a283e907b1a716c33f59d

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 12, 2023
@k8s-ci-robot k8s-ci-robot merged commit 4cfde35 into kubernetes-sigs:main Jun 12, 2023
@mresvanis mresvanis deleted the add-confidential-vm-support branch June 12, 2023 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Add support for Confidential VMs and Trusted launch for VMs
9 participants