Refactor: Lima Config and External Drivers

# Lima Config and External Drivers

This document tries to catalog all the configuration challenges related to supporting external virtualization drivers in Lima.

It is not really well thought through, but more like a first draft to start the discussion.

## Status Quo

Right now all drivers are built-in (`qemu`, `vz`, `wsl2`), and driver-specific configuration is mixed in with generic configuration in various ways.


### Properties that are specific to a driver

These are properties used only by a single driver. They should be moved under `vmOpts`.

#### `cpuTypes`

Specific to `qemu` driver.

#### `rosetta`

Specific to `vz` driver.

#### `video.display`

Seems like a `qemu` specific setting. Not clear if `video.vnc` is supposed to work with other drivers.

#### `hostResolver`

Not sure if it is specific to `qemu`, or works with other drivers too.


### Properties whose valid values depend on the driver

These properties are used by more than one driver, but not necessarily all of them. Some drivers may not support them at all.

#### `arch`

`vz` driver only allows the native arch (`limayaml/validate.do`). `wsl2` ignores value, but should also only allow native arch.

#### `mountType`

`qemu` driver on `macOS` does not support `virtiofs`. `wsl2` driver only supports `wsl2` mount type.

#### `firmware.legacyBIOS`

Default value depends on `arch` and `vmType`.

#### `audio.device`

Mostly a `qemu` setting, but I think `vz` supports `vz` and `none`.

#### `nestedVirtualization`

CPU type must be set to `host` with the `qemu` driver when using nested virtualization.

#### `cpus`, `memory`

For `wsl2` these settings can only be specified globally because all distros are really just containers in a single VM.


### List properties that contain driver sub-properties

#### `images`

The `wsl2` driver needs tarballs of the rootfs, not regular images. There is no property to specify the type of image. We could use the file extension. Currently there is only a single template containing a `wsl2` image (`templates/experimental/wsl2.yaml`), and it only has a single `wsl2` image in the list.

#### `mounts`

Mounts can have mount-type specific sub-properties like `sshfs` and `9p`. Not all drivers support all mount types. A new driver may add a new mount type that needs additional properties under `mounts` that need their own validation logic.

Mount entries are "combined" if they use the same `mountPoint`. Some of the additional properties may have custom merge strategies. E.g. should `fsType` and `fsArgs`[^fsargs] only be merged as a combination?

[^fsargs]: `fsArgs` is currently undocumented in `default.yaml`.

#### `networks`

None of the `networks` entries work with `wsl2` afaik. The `vzNAT` property only works with the `vz` driver.


### List properties with driver-specific selector

#### `firmware.images[].vmType`

Similar to the `arch` selector for `images[]`. But confusing that these are not under `vmOpts`.


## Operations that need to be aware of driver-specific settings

### Template embedding (combining entries in lists of properties)

Template embedding (resolving `base` template references) requires merging of property lists (`images`, `mounts`, `networks`, `additionalDisks`).

Most of them have special combining rules, when entries have a matching shared key (e.g. `mountPoint` for `mounts`). These rules are different for each list type, and are currently hard-coded in Go code.

### Filling in template defaults

Template defaults sometimes depend on the selected driver (e.g. `mountType`).

### Validating templates

Templates cannot be validated without knowing the `vmType` to select a driver. If a template needs to be validated against every available driver, validation will have to be run multiple times. Even then it won't be possible to validate a template for a driver that is not available on the host (e.g. validate for `wsl2` on macOS.

### Marshaling templates

We sometimes need to marshal `lima.yaml` back as JSON (or YAML). Examples are `limactl info` and `limactl ls`. This is not possible without knowing the full schema, so we probably need to keep the YAML presentation around as the single source of truth, and only unmarshal bits as needed.

Alternatively we could implement an unmarshaling function for each driver, and then merge `vmOpts` data with the shared config from `lima.yaml`.

### Generating boot scripts

Some driver settings require generation of boot scripts. Examples are `02-wsl2-setup.sh` and `05-rosetta-volume.sh`. There can be hidden references in other scripts, e.g. `05-lima-mounts.sh` has additional code for Rosetta mounts.

These scripts need to run early and cannot be replaced with provisioning scripts.

### Drivers might not be running when we need to validate their config

Lima performs validation for stopped instances in various situations. (Which instance is using an `additionalDisk`? Which instance is using a managed network?)

So much of the validation functionality must be available by running the external drivers as a subprocess over STDIN/STDOUT and not via IPC/RPC.

## Suggestions

### Drivers can only add new fields under `vmOpts`

Any other fields are part of the strict Lima schema, and use of unknown keys is an error.

### Existing driver-specific fields should be moved into `vmOpts`

This is true at least for fields only used by a single driver. For fields used by multiple drivers (e.g. `audio.device` this is less clear, but probably also a good idea, but gets quite verbose:

```yaml
vmOpts:
  qemu:
    audio:
      device: coreaudio
   vz:
     audio:
       device: vz
```

### `vmType` must be specified up-front

Either the user specifies `--vm-type` explicitly, or the platform-default is choosen. We will not pick a `vmType` based on other settings in the template. Instead validation will fail if the other settings are not compatible with the `vmType`.

E.g. it will not be supported to set `audio.device: coreaudio` and have Lima infer that it needs to set `vmType: qemu`. Instead the `vz` driver will throw a validation error that it doesn't support this setting value.

### Boot scripts are renumbered and documented

Currently a lot of boot scripts are clustered around the `00`-`11` range. We need to renumber them so drivers can easily insert scripts at particular points in the sequence. We should reserve odd numbers for driver boot scripts.

### Drivers have to set defaults and validate their settings before Lima picks final defaults

Lima does not know the full schema of the driver settings, and the driver may not know the full schema of Lima settings (because Lima may have added new settings after the driver was last updated). Therefore communication about settings has to happen at the textual (YAML) level.

Each driver must have a validation function, that receives the fully embeded template, but without defaults filled in. It returns an object with these fields:

* `version`: protocol version for validation results
* `errors`: list of validation errors
* `warnings`: list of validation warnings
* `knownFields`: a list of all fields validated by the driver
* `driverDefaults`: a template with any default settings the driver wants to apply
* `bootScripts`: a map of boot script filenames and scripts

#### `errors` and `warnings`

These are displayed by Lima to the user. If the `errors` list is not empty, then validation has failed and `driverDefaults` and `bootScripts` are undefined.

#### `knownFields` list

The purpose of this list is to allow Lima to check that all driver-specific settings that are **not** validated by the driver have not been assigned a value. The list is a flat list of strings, which may use "dotted names" like `audio.device`.

Maybe this is not necessary if we manage to move all driver settings under `vmOpts`.

#### `driverDefaults` template

The final template is then generated using the regular `base` embedding mechanism:

```yaml
base:
- $LIMA_HOME/_config/override.yaml (if it exists)
- the fully embedded source template
- the `driverDefaults` template
- $LIMA_HOME/_config/default.yaml (if it exists)
- the Lima builtin defaults
```

#### Driver boot scripts

At least initially we won't provide a way for drivers to add `LIMA_CIDATA_*` variables to `lima.env`. The driver will have to perform variable interpolation on the `bootScripts` before returning them.

_Originally posted by @jandubois in https://github.com/lima-vm/lima/discussions/3501_

Refactor: Lima Config and External Drivers #3769

Description

Lima Config and External Drivers

Status Quo

Properties that are specific to a driver

cpuTypes

rosetta

video.display

hostResolver

Properties whose valid values depend on the driver

arch

mountType

firmware.legacyBIOS

audio.device

nestedVirtualization

cpus, memory

List properties that contain driver sub-properties

images

mounts

networks

List properties with driver-specific selector

firmware.images[].vmType

Operations that need to be aware of driver-specific settings

Template embedding (combining entries in lists of properties)

Filling in template defaults

Validating templates

Marshaling templates

Generating boot scripts

Drivers might not be running when we need to validate their config

Suggestions

Drivers can only add new fields under vmOpts

Existing driver-specific fields should be moved into vmOpts

vmType must be specified up-front

Boot scripts are renumbered and documented

Drivers have to set defaults and validate their settings before Lima picks final defaults

errors and warnings

knownFields list

driverDefaults template

Driver boot scripts

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`cpuTypes`

`rosetta`

`video.display`

`hostResolver`

`arch`

`mountType`

`firmware.legacyBIOS`

`audio.device`

`nestedVirtualization`

`cpus`, `memory`

`images`

`mounts`

`networks`

`firmware.images[].vmType`

Drivers can only add new fields under `vmOpts`

Existing driver-specific fields should be moved into `vmOpts`

`vmType` must be specified up-front

`errors` and `warnings`

`knownFields` list

`driverDefaults` template