Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable vm boot diagnostics #808

Merged
merged 3 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions charts/internal/machineclass/templates/machineclass.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,13 @@ providerSpec:
networkProfile:
acceleratedNetworking: {{ $machineClass.network.acceleratedNetworking }}
{{- end }}
{{- if hasKey $machineClass "diagnosticsProfile" }}
diagnosticsProfile:
enabled: {{ $machineClass.diagnosticsProfile.enabled }}
{{- if hasKey $machineClass.diagnosticsProfile "storageURI" }}
storageURI: {{ $machineClass.diagnosticsProfile.storageURI }}
{{- end }}
{{- end }}
hardwareProfile:
vmSize: {{ $machineClass.machineType }}
osProfile:
Expand Down
3 changes: 3 additions & 0 deletions charts/internal/machineclass/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ machineClasses:
subnet: my-subnet-in-my-vnet
# vnetResourceGroup: my-vnet-resource-group
# acceleratedNetworking: true
diagnosticsProfile:
enabled: false
# storageURI: my-custom-azure-storage
tags:
Name: shoot-crazy-botany
kubernetes.io-cluster-shoot-crazy-botany: "1"
Expand Down
13 changes: 10 additions & 3 deletions docs/usage/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -353,13 +353,20 @@ nodeTemplate: # (to be specified only if the node capacity would be different fr
cpu: 2
gpu: 1
memory: 50Gi
diagnosticsProfile:
enabled: true
# storageURI: <string>
```

The `.nodeTemplate` is used to specify resource information of the machine during runtime. This then helps in Scale-from-Zero.
Some points to note for this field:
- Currently only cpu, gpu and memory are configurable.
- a change in the value lead to a rolling update of the machine in the workerpool
- all the resources needs to be specified
- Currently only cpu, gpu and memory are configurable.
- a change in the value lead to a rolling update of the machine in the worker pool
- all the resources needs to be specified

The `.diagnosticsProfile` is used to enable [machine boot diagnostics](https://learn.microsoft.com/en-us/azure/virtual-machines/boot-diagnostics) (disabled per default).
A storage account is used for storing vm's boot console output and screenshots.
If `.diagnosticsProfile.StorageURI` is not specified azure managed storage will be used (recommended way).

## Example `Shoot` manifest (non-zoned)

Expand Down
5 changes: 5 additions & 0 deletions example/30-worker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -113,3 +113,8 @@ spec:
# - name: kubelet-dir
# type: standard
# size: 36Gi
# providerConfig:
# apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
# kind: WorkerConfig
# diagnosticsProfile:
# enabled: false
55 changes: 55 additions & 0 deletions hack/api-reference/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,19 @@ github.com/gardener/gardener/pkg/apis/extensions/v1alpha1.NodeTemplate
<p>NodeTemplate contains resource information of the machine which is used by Cluster Autoscaler to generate nodeTemplate during scaling a nodeGroup from zero.</p>
</td>
</tr>
<tr>
<td>
<code>diagnosticsProfile</code></br>
<em>
<a href="#azure.provider.extensions.gardener.cloud/v1alpha1.DiagnosticsProfile">
DiagnosticsProfile
</a>
</em>
</td>
<td>
<p>DiagnosticsProfile specifies boot diagnostic options</p>
</td>
</tr>
</tbody>
</table>
<h3 id="azure.provider.extensions.gardener.cloud/v1alpha1.WorkerStatus">WorkerStatus
Expand Down Expand Up @@ -652,6 +665,48 @@ map[string]bool
</tr>
</tbody>
</table>
<h3 id="azure.provider.extensions.gardener.cloud/v1alpha1.DiagnosticsProfile">DiagnosticsProfile
</h3>
<p>
(<em>Appears on:</em>
<a href="#azure.provider.extensions.gardener.cloud/v1alpha1.WorkerConfig">WorkerConfig</a>)
</p>
<p>
<p>DiagnosticsProfile specifies boot diagnostic options</p>
</p>
<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<code>enabled</code></br>
<em>
bool
</em>
</td>
<td>
<p>Enabled configures boot diagnostics to be stored or not</p>
</td>
</tr>
<tr>
<td>
<code>storageURI</code></br>
<em>
string
</em>
</td>
<td>
<p>StorageURI is the URI of the storage account to use for storing console output and screenshot.
If not specified azure managed storage will be used.</p>
</td>
</tr>
</tbody>
</table>
<h3 id="azure.provider.extensions.gardener.cloud/v1alpha1.DomainCount">DomainCount
</h3>
<p>
Expand Down
11 changes: 11 additions & 0 deletions pkg/apis/azure/types_worker.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ type WorkerConfig struct {
metav1.TypeMeta
// NodeTemplate contains resource information of the machine which is used by Cluster Autoscaler to generate nodeTemplate during scaling a nodeGroup from zero.
NodeTemplate *extensionsv1alpha1.NodeTemplate
// DiagnosticsProfile specifies boot diagnostic options
DiagnosticsProfile *DiagnosticsProfile
}

// +genclient
Expand Down Expand Up @@ -65,3 +67,12 @@ type VmoDependency struct {
// Name is the name of the VMO resource on Azure.
Name string
}

// DiagnosticsProfile specifies boot diagnostic options
type DiagnosticsProfile struct {
// Enabled configures boot diagnostics to be stored or not
Enabled bool
// StorageURI is the URI of the storage account to use for storing console output and screenshot.
// If not specified azure managed storage will be used.
StorageURI *string
}
11 changes: 11 additions & 0 deletions pkg/apis/azure/v1alpha1/types_worker.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ type WorkerConfig struct {
// NodeTemplate contains resource information of the machine which is used by Cluster Autoscaler to generate nodeTemplate during scaling a nodeGroup from zero.
// +optional
NodeTemplate *extensionsv1alpha1.NodeTemplate `json:"nodeTemplate,omitempty"`
// DiagnosticsProfile specifies boot diagnostic options
DiagnosticsProfile *DiagnosticsProfile `json:"diagnosticsProfile,omitempty"`
}

// +genclient
Expand Down Expand Up @@ -76,3 +78,12 @@ type VmoDependency struct {
// Name is the name of the VMO resource on Azure.
Name string `json:"name"`
}

// DiagnosticsProfile specifies boot diagnostic options
type DiagnosticsProfile struct {
// Enabled configures boot diagnostics to be stored or not
Enabled bool `json:"enabled,omitempty"`
// StorageURI is the URI of the storage account to use for storing console output and screenshot.
// If not specified azure managed storage will be used.
StorageURI *string `json:"storageURI,omitempty"`
}
34 changes: 34 additions & 0 deletions pkg/apis/azure/v1alpha1/zz_generated.conversion.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

26 changes: 26 additions & 0 deletions pkg/apis/azure/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

26 changes: 26 additions & 0 deletions pkg/apis/azure/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 10 additions & 0 deletions pkg/controller/worker/machines.go
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,16 @@ func (w *workerDelegate) generateMachineConfig(ctx context.Context) error {
machineClassSpec["zone"] = zone.name
}

if workerConfig.DiagnosticsProfile != nil {
diagnosticProfile := map[string]interface{}{
"enabled": workerConfig.DiagnosticsProfile.Enabled,
}
if workerConfig.DiagnosticsProfile.StorageURI != nil {
diagnosticProfile["storageURI"] = workerConfig.DiagnosticsProfile.StorageURI
}
machineClassSpec["diagnosticsProfile"] = diagnosticProfile
}

if pool.NodeTemplate != nil {
// Currently Zone field is mandatory, and passing it an
// empty string turns it to `null` string during marshalling which fails CRD validation
Expand Down
32 changes: 31 additions & 1 deletion pkg/controller/worker/machines_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ package worker_test

import (
"context"
"encoding/json"
"fmt"
"path/filepath"
"strings"
Expand Down Expand Up @@ -141,6 +142,10 @@ var _ = Describe("Machines", func() {
nodeTemplateZone3 machinev1alpha1.NodeTemplate
nodeTemplateZone4 machinev1alpha1.NodeTemplate

diagnosticProfile apiv1alpha1.DiagnosticsProfile
providerConfig *runtime.RawExtension
workerConfig apiv1alpha1.WorkerConfig

shootVersionMajorMinor string
shootVersion string

Expand Down Expand Up @@ -226,6 +231,25 @@ var _ = Describe("Machines", func() {
Zone: "no-zone",
}

diagnosticProfile = apiv1alpha1.DiagnosticsProfile{
Enabled: true,
StorageURI: ptr.To("azure-storage-uri"),
}

workerConfig = apiv1alpha1.WorkerConfig{
TypeMeta: metav1.TypeMeta{
APIVersion: apiv1alpha1.SchemeGroupVersion.String(),
Kind: "WorkerConfig",
},
DiagnosticsProfile: &diagnosticProfile,
}

marshalledWorkerConfig, err := json.Marshal(workerConfig)
Expect(err).To(BeNil())
providerConfig = &runtime.RawExtension{
Raw: marshalledWorkerConfig,
}

namePool2 = "pool-zones"
minPool2 = 30
maxPool2 = 45
Expand Down Expand Up @@ -310,7 +334,8 @@ var _ = Describe("Machines", func() {
Type: &dataVolume2Type,
},
},
Labels: labels,
Labels: labels,
ProviderConfig: providerConfig,
}

pool2 = extensionsv1alpha1.WorkerPool{
Expand Down Expand Up @@ -502,6 +527,11 @@ var _ = Describe("Machines", func() {
machineClassPool3["nodeTemplate"] = nodeTemplateZone3
machineClassPool4["nodeTemplate"] = nodeTemplateZone4

machineClassPool1["diagnosticsProfile"] = map[string]interface{}{
"enabled": diagnosticProfile.Enabled,
"storageURI": diagnosticProfile.StorageURI,
}

machineClassPool1["dataDisks"] = []map[string]interface{}{
{
"name": dataVolume2Name,
Expand Down
Loading