The Forklift validation service uses Open Policy Agent (Open Policy Agent) policy rules to determine “concerns” that should be applied to VMs before migration.
The concerns list for each VM is generated by the validation service, but stored in the provider inventory as an attribute of each VM.
The forklift-validation service is installed from the forklift-operator here: https://github.com/konveyor/forklift-operator
The operator creates the following OpenShift/Kubernetes objects:
-
forklift-validation service
-
forklift-validation-config configMap
-
forklift-validation deployment
Validation rules are written in the Rego policy language (Policy Language). There is a separate Rego file for each validation requirement test.
Each Rego rules file defines a set rule called concerns
. This rule in each file tests for a specific condition, and if true it adds a {“category”, “label”, “assessment”} hash to the concerns
set. An example rule file is as follows:
package io.konveyor.forklift.vmware
has_drs_enabled {
input.host.cluster.drsEnabled
}
concerns[flag] {
has_drs_enabled
flag := {
"category": "Information",
"label": "VM running in a DRS-enabled cluster",
"assessment": "Distributed resource scheduling is not currently supported by OpenShift Virtualization. The VM can be migrated but it will not have this feature in the target environment."
}
}
Note: Category can be one of: “Critical”, “Warning”, or “Information”
Same-named rules in different files are OR’d together, so the resultant concerns
set after all the rules have been evaluated is the combined output from any concerns
set rules that evaluate to true.
The concerns
set contents are added to the concerns key in the inventory record for the VM. This key’s value is subsequently used by the UI to flag conditions in the migration plan that may affect the success of the VM migration.
The OPA rules can refer to any key in the JSON input structure, for example input.snapshot.kind
.
(see Appendix A for an example of the JSON input)
The inventory service records the version of the rules used to determine the concerns for each VM. There is a rules_version.rego file for each provider, for example:
package io.konveyor.forklift.vmware
RULES_VERSION := 5
rules_version = {
"rules_version": RULES_VERSION
}
If any rules for a provider are updated, this file must also be edited to increment the RULES_VERSION, otherwise the inventory service will not detect the change and re-validate the concerns for the VMs.
The current rules version can be queried as follows:
Returns:
{
"result": {
"rules_version": 5
}
}
Each of the validation OPA rules is defined within a package. The current package namespaces are io.konveyor.forklift.vmware
and io.konveyor.forklift.ovirt
The rule directory paths reflect the namespaces, for example:
Within this GitHub repo:
-
policies/io/konveyor/forklift/vmware
-
policies/io/konveyor/forklift/ovirt
Within the validation service container image:
-
/usr/share/opa/policies/io/konveyor/forklift/vmware
-
/usr/share/opa/policies/io/konveyor/forklift/ovirt
Rules defined within the same namespace are read and combined/merged when OPA loads.
A new feature with Forklift 2.2 is the ability for users to be able to add their own validation rules without having to push them to the upstream repository and rebuild the container image.
The forklift-validation operator creates a volume called validation-extra-rules
mounted at /usr/share/opa/policies/extra
.
The data for the volume is taken from the validation-service-config configMap, but by default this is empty. Users can add their own data by defining and merging their own configMap definition, for example:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: validation-service-config
namespace: openshift-mtv
data:
vmware_multiple_disks.rego: |-
package io.konveyor.forklift.vmware
has_multiple_disks {
count(input.disks) > 1
}
concerns[flag] {
has_multiple_disks
flag := {
"category": "Information",
"label": "Multiple disks detected",
"assessment": "Example user-supplied extra validation rule - multiple disks have been detected on this VM."
}
}
There are several things to be aware of when defining additional rules:
-
The validation service pod needs to be restarted (deployment scaled down/up) after editing the configMap for the new rules to be seen. If there are any errors in the user-added rule, the validation service will fail to start. Check the pod logs for OPA startup errors.
-
User-defined rules should be written to be part of the existing package namespace paths, either:
io.konveyor.forklift.vmware
or
io.konveyor.forklift.ovirt
-
If a user-defined rule re-defines an existing default value, the validation service will fail to start. For example if an existing rule contains the line
default valid_input = false
then defining another rule with the line
default valid_input = true
will fail.
Existing default values can be checked by connecting to the terminal of the validation pod, and entering the following commands:
cd /usr/share/opa/policies/io/konveyor/forklift
grep -R "default" *
Check the pod logs for any OPA startup errors.
-
Adding a user-defined rule to the configMap will not automatically add the new concerns to the inventory as the built-in rules version won’t have changed. Remove and re-add the provider to force validation rule re-evaluation using the new user-supplied rules.
-
If a user-defined rule is created with the same name as an existing rule, the net effect will be the OR’ing of the two rules.
In normal operation the forklift-validation service is only ever called by the forklift-inventory service. After retrieving VM inventory from the source provider, the forklift-inventory service calls the forklift-validation service once for each VM, to populate a concerns array associated with the VM’s record in the inventory database.
The forklift-validation service is called using a RESTful POST to the provider-specific validate rule path, i.e.
/v1/data/io/konveyor/forklift/vmware/validate
or
/v1/data/io/konveyor/forklift/ovirt/validate
For example:
POST
https://forklift-validation/v1/data/io/konveyor/forklift/vmware/validate
The POST is made with a JSON body that corresponds to the output from a forklift-inventory service workloads query for a VM. An example of such a query is as follows:
GET https://<inventory_service_route>/providers/vsphere/c872d364.../workloads/vm-2958
Tip: Use
GET https://<inventory_service_route>/providers/<provider_type>
to get all provider UUIDs, then
GET https://<inventory_service_route>/providers/<provider_type>/<UUID>/vms
to get all of the VMs on that provider
The return JSON from this forklift-inventory service GET is wrapped as a value to a new key called "input" and used as the JSON body of the forklift-validation service POST request. See Appendix A for a listing of a typical JSON body.
It should be noted that as the validation service is only called from within the <forklift> namespace by the forklift-inventory service, by default there is no external OpenShift/Kubernetes route defined for the validation service. When testing it is often useful to manually create an HTTPS route to the validation service so that it is accessible from outside of the cluster.
The JSON body output from the validation service is a result hash whose value is the concerns set that has been created from the rules files. Each of the rules files adds its own hash to the concerns set if the rule is triggered. An example output is as follows:
{
"result": {
"concerns": [
{
"assessment": "Distributed resource scheduling is not currently supported by OpenShift Virtualization. The VM can be migrated but it will not have this feature in the target environment.",
"category": "Information",
"label": "VM running in a DRS-enabled cluster"
},
{
"assessment": "Hot pluggable CPU or memory is not currently supported by OpenShift Virtualization. Review CPU or memory configuration after migration.",
"category": "Warning",
"label": "CPU/Memory hotplug detected"
},
{
"assessment": "Multiple disks have been detected on this VM.",
"category": "Information",
"label": "Multiple disks detected"
},
{
"assessment": "NUMA node affinity is not currently supported by OpenShift Virtualization. The VM can be migrated but it will not have this feature in the target environment.",
"category": "Warning",
"label": "NUMA node affinity detected"
},
{
"assessment": "Online snapshots are not currently supported by OpenShift Virtualization.",
"category": "Information",
"label": "VM snapshot detected"
},
{
"assessment": "CPU affinity is not currently supported by OpenShift Virtualization. The VM can be migrated but it will not have this feature in the target environment.",
"category": "Warning",
"label": "CPU affinity detected"
},
{
"assessment": "Changed Block Tracking (CBT) has not been enabled on this VM. This feature is a prerequisite for VM warm migration.",
"category": "Warning",
"label": "Changed Block Tracking (CBT) not enabled"
}
],
"errors": [],
"rules_version": 5
}
}
The inventory attributes that the validation service tests for have been deliberately added as simple attributes to the VM inventory provider model.
For example one particular validation requirement is to test whether a VMware VM has NUMA node affinity configured. The VMware API path to determine this is as follows:
MOR:VirtualMachine.config.extraConfig["numa.nodeAffinity"]
The Forklift Provider Inventory model has simplified this to a single testable attribute with a list value:
"numaNodeAffinity": [
"0",
"1"
],
This is therefore testable with a single Rego line, as follows:
count(input.numaNodeAffinity) != 0
Each of the rego rules has a corresponding unit test that exercises the conditions that would trigger the rule. For example the VMware shareable disk rule is as follows:
package io.konveyor.forklift.vmware
has_shareable_disk {
some i
input.disks[i].shared
}
concerns[flag] {
has_shareable_disk
flag := {
"category": "Warning",
"label": "Shareable disk detected",
"assessment": "Shared disks are only supported by certain OpenShift Virtualization storage configurations. Ensure that the correct storage is selected for the disk."
}
}
The corresponding test for this rule is as follows:
package io.konveyor.forklift.vmware
test_with_no_disks {
mock_vm := {
"name": "test",
"disks": []
}
results := concerns with input as mock_vm
count(results) == 0
}
test_with_no_shareable_disk {
mock_vm := {
"name": "test",
"disks": [
{ "shared": false }
]
}
results := concerns with input as mock_vm
count(results) == 0
}
test_with_shareable_disk {
mock_vm := {
"name": "test",
"disks": [
{ "shared": false },
{ "shared": true },
{ "shared": false }
]
}
results := concerns with input as mock_vm
count(results) == 1
}
There are tests for each rule, and the tests are run as a complete set, rather than individually. This means that each test must be written with an awareness of the other tests, and each test will exercise all of the concerns set rules in the namespace in an OR manner.
For example the oVirt rule to detect for valid NIC interfaces has the following Rego rules:
valid_nic_interfaces [i] {
some i
regex.match(`e1000|rtl8139|virtio`, input.nics[i].interface)
}
number_of_nics [i] {
some i
input.nics[i].id
}
concerns[flag] {
count(valid_nic_interfaces) != count(number_of_nics)
…
The ovirt rule to detect whether a NIC set to PCI Passthrough mode has the following Rego rules:
nic_set_to_pci_passthrough [i] {
some i
regex.match(`pci_passthrough`, input.nics[i].interface)
}
concerns[flag] {
count(nic_set_to_pci_passthrough) > 0
…
The corresponding test of the PCI passthrough rule also tests the valid NIC interface rule, so this must be allowed for in the results count, for example:
test_with_pci_passthrough {
mock_vm := {
"name": "test",
"nics": [
{
"id" : "656e7031-7330-3030-3a31-613a34613a31",
"interface": "pci_passthrough",
"plugged": true,
"profile": {
"portMirroring": false,
"networkFilter": "",
"qos": "",
"properties": []
}
}
]
}
results := concerns with input as mock_vm
# count should be 2 as this test also invalidates the
# nic_interface_type rule
count(results) == 2
}
It should also be noted that NIC tests contain the attributes to match (and pass) all of the other NIC rules, such as nics_with_port_mirroring_enabled or nics_with_qos_enabled.
The tests should be run together, in the following manner:
$ pwd
.../git/forklift-validation/policies/io/konveyor/forklift/vmware
$ ls *test*
changed_block_tracking_test.rego
cpu_affinity_test.rego
cpu_memory_hotplug_test.rego
dpm_enabled_test.rego
drs_enabled_test.rego
fault_tolerance_test.rego
ha_enabled_test.rego
host_affinity_test.rego
memory_ballooning_test.rego
name_test.rego
numa_affinity_test.rego
passthrough_device_test.rego
rdm_disk_test.rego
shareable_disk_test.rego
snapshot_test.rego
sriov_device_test.rego
uefi_boot_test.rego
usb_controller_test.rego
$ opa test . --explain full
data.io.konveyor.forklift.vmware.test_with_changed_block_tracking_enabled: PASS (1.434945ms)
data.io.konveyor.forklift.vmware.test_with_changed_block_tracking_disabled: PASS (410.751µs)
data.io.konveyor.forklift.vmware.test_without_cpu_affinity#01: PASS (397.413µs)
data.io.konveyor.forklift.vmware.test_with_cpu_affinity#01: PASS (392.375µs)
data.io.konveyor.forklift.vmware.test_with_hotplug_disabled: PASS (374.934µs)
data.io.konveyor.forklift.vmware.test_with_cpu_hot_add_enabled: PASS (358.079µs)
data.io.konveyor.forklift.vmware.test_with_cpu_hot_remove_enabled: PASS (364.151µs)
data.io.konveyor.forklift.vmware.test_with_memory_hot_add_enabled: PASS (362.469µs)
data.io.konveyor.forklift.vmware.test_without_dpm_enabled: PASS (355.382µs)
data.io.konveyor.forklift.vmware.test_with_dpm_enabled: PASS (374.199µs)
data.io.konveyor.forklift.vmware.test_without_drs_enabled: PASS (354.674µs)
data.io.konveyor.forklift.vmware.test_with_drs_enabled: PASS (403.224µs)
data.io.konveyor.forklift.vmware.test_with_fault_tolerance_disabled: PASS (420.773µs)
data.io.konveyor.forklift.vmware.test_with_fault_tolerance_enabled: PASS (361.583µs)
data.io.konveyor.forklift.vmware.test_without_ha_enabled: PASS (787.522µs)
data.io.konveyor.forklift.vmware.test_with_ha_enabled: PASS (855.455µs)
data.io.konveyor.forklift.vmware.test_without_host_affinity_vms: PASS (386.044µs)
data.io.konveyor.forklift.vmware.test_with_other_host_affinity_vms: PASS (388.889µs)
data.io.konveyor.forklift.vmware.test_with_host_affinity_vm: PASS (417.673µs)
data.io.konveyor.forklift.vmware.test_without_ballooned_memory: PASS (379.208µs)
data.io.konveyor.forklift.vmware.test_with_balloned_memory: PASS (401.975µs)
data.io.konveyor.forklift.vmware.test_valid_vm_name: PASS (339.828µs)
data.io.konveyor.forklift.vmware.test_vm_name_too_long: PASS (335.458µs)
data.io.konveyor.forklift.vmware.test_vm_name_invalid_char_underscore: PASS (335.918µs)
data.io.konveyor.forklift.vmware.test_vm_name_invalid_char_slash: PASS (329.709µs)
data.io.konveyor.forklift.vmware.test_without_cpu_affinity: PASS (339.376µs)
data.io.konveyor.forklift.vmware.test_with_cpu_affinity: PASS (426.495µs)
data.io.konveyor.forklift.vmware.test_with_no_device#01: PASS (431.456µs)
data.io.konveyor.forklift.vmware.test_with_other_xyz_device: PASS (400.697µs)
data.io.konveyor.forklift.vmware.test_with_pci_passthrough_device: PASS (840.322µs)
data.io.konveyor.forklift.vmware.test_with_no_disks#01: PASS (907.954µs)
data.io.konveyor.forklift.vmware.test_with_no_shareable_disk#01: PASS (418.082µs)
data.io.konveyor.forklift.vmware.test_with_shareable_disk#01: PASS (416.483µs)
data.io.konveyor.forklift.vmware.test_with_no_disks: PASS (388.499µs)
data.io.konveyor.forklift.vmware.test_with_no_shareable_disk: PASS (375.962µs)
data.io.konveyor.forklift.vmware.test_with_shareable_disk: PASS (446.255µs)
data.io.konveyor.forklift.vmware.test_with_no_snapshot: PASS (359.438µs)
data.io.konveyor.forklift.vmware.test_with_snapshot: PASS (365.453µs)
data.io.konveyor.forklift.vmware.test_with_no_device#02: PASS (341.82µs)
data.io.konveyor.forklift.vmware.test_with_other_yyy_device: PASS (356.789µs)
data.io.konveyor.forklift.vmware.test_with_sriov_nic: PASS (391.878µs)
data.io.konveyor.forklift.vmware.test_without_uefi_boot: PASS (398.853µs)
data.io.konveyor.forklift.vmware.test_with_uefi_boot: PASS (440.887µs)
data.io.konveyor.forklift.vmware.test_with_no_device: PASS (417.793µs)
data.io.konveyor.forklift.vmware.test_with_other_xxx_device: PASS (442.267µs)
data.io.konveyor.forklift.vmware.test_with_usb_controller: PASS (421.734µs)
--------------------------------------------------------------------------------
PASS: 46/46
Using the --explain full
argument helps trace the reason for a test failure.
Tip: When writing tests that look for an attribute of a possibly repeating item (e.g. disks or NICs), include in the test an attribute for both a pass and a fail, e.g.
"disks": [ { "shared": false }, { "shared": true }, { "shared": false } ] }
Some Rego constructs using NOT have subtle implications when testing repeating structures. For example it might seems simpler to replace the valid_nic_interfaces rule above with the following:
valid_nic_interfaces {
regex.match(`e1000|rtl8139|virtio`, input.nics[_].interface)
}
However testing for not valid_nic_interfaces
would be incorrect if only one NIC out of several had an invalid interface type.
Debugging the policy rules when they are running “live” from the forklift-validation service image can be challenging. Use the trace statement to add debug lines to rules files.
By default, explanations are disabled so trace statements won’t appear in any output(!)
Call the validation service with ?explain=notes&pretty
to enable debugging trace output.
There is an example debug.rego file in each rules directory, for example:
package io.konveyor.forklift.ovirt
debug {
trace(sprintf("** debug ** vm name: %v", [input.name]))
}
This can be called using a RESTful POST such as the following, and a standard JSON input payload:
POST …/v1/data/io/konveyor/forklift/vmware/debug?explain=notes&pretty
This will return a JSON body such as the following:
{
"explanation": [
"query:1 Enter data.io.kon...debug = _",
"/usr/share.../debug.rego:3 | Enter data.io.kon...debug",
"/usr/share.../debug.rego:4 | | Note \"** debug ** vm name: test\""
],
"result": true
}
Note: ?explain=full
can also be used, which will return more detailed output.
The OPA runtime can be initialized with one or more files that contain policies or data. If the path is a directory, OPA will recursively load ALL rego, JSON, and YAML files.
The OPA run command line within the forklift-validation container image is as follows (ignoring the TLS cert-related arguments):
/usr/bin/opa run --server /usr/share/opa/policies
The following are examples of various validation service policy tests.
This is the most simple type of test, and flags if an attribute is set to true
dpm_enabled {
input.host.cluster.dpmEnabled
}
concerns[flag] {
dpm_enabled
…
This flags if an attribute is set to false
change_tracking_disabled {
not input.changeTrackingEnabled
}
concerns[flag] {
change_tracking_disabled
…
This flags if a value is true anywhere in a possibly repeating structure of objects, such as multiple disks or NICs
has_rdm_disk {
input.disks[_].rdm
}
concerns[flag] {
has_rdm_disk
…
This flags if a value is false anywhere in a possibly repeating structure of objects.
unplugged_nics {
input.nics[_].plugged == false
// Can’t use: not input.nics[_].plugged
}
concerns[flag] {
unplugged_nics
…
Alternatively (using a counter):
unplugged_nics [i] {
some i
input.nics[i].plugged == false
}
concerns[flag] {
count(unplugged_nics) > 0
…
Use a default to prevent the rule returning undefined.
default warn_placement_policy = false
warn_placement_policy {
regex.match(`\bmigratable\b`, input.placementPolicyAffinity)
}
concerns[flag] {
warn_placement_policy
…
Use a default to prevent the rule returning undefined.
default has_cpu_affinity = false
has_cpu_affinity {
count(input.cpuAffinity) != 0
}
concerns[flag] {
has_cpu_affinity
…
The same rule name defined multiple times results in the rules being OR’d together.
default has_hotplug_enabled = false
has_hotplug_enabled {
input.cpuHotAddEnabled
}
has_hotplug_enabled {
input.cpuHotRemoveEnabled
}
has_hotplug_enabled {
input.memoryHotAddEnabled
}
concerns[flag] {
has_hotplug_enabled
…
The following is a typical input JSON body for a call to the validation service:
{
"input": {
"selfLink": "providers/vsphere/...0324e/workloads/vm-431",
"id": "vm-431",
"parent": {
"kind": "Folder",
"id": "group-v22"
},
"revision": 1,
"name": "iscsi-target",
"revisionValidated": 1,
"isTemplate": false,
"networks": [
{
"kind": "Network",
"id": "network-31"
},
{
"kind": "Network",
"id": "network-33"
}
],
"disks": [
{
"key": 2000,
"file": "[iSCSI_Datastore] ...001.vmdk",
"datastore": {
"kind": "Datastore",
"id": "datastore-63"
},
"capacity": 17179869184,
"shared": false,
"rdm": false
},
{
"key": 2001,
"file": "[iSCSI_Datastore] ...002.vmdk",
"datastore": {
"kind": "Datastore",
"id": "datastore-63"
},
"capacity": 10737418240,
"shared": true,
"rdm": false
}
],
"concerns": [],
"policyVersion": 5,
"uuid": "42256329-8c3a-2a82-54fd-01d845a8bf49",
"firmware": "bios",
"powerState": "poweredOn",
"connectionState": "connected",
"snapshot": {
"kind": "VirtualMachineSnapshot",
"id": "snapshot-3034"
},
"changeTrackingEnabled": false,
"cpuAffinity": [
0,
2
],
"cpuHotAddEnabled": true,
"cpuHotRemoveEnabled": false,
"memoryHotAddEnabled": false,
"faultToleranceEnabled": false,
"cpuCount": 2,
"coresPerSocket": 1,
"memoryMB": 2048,
"guestName": "Red Hat Enterprise Linux 7 (64-bit)",
"balloonedMemory": 0,
"ipAddress": "10.119.2.96",
"storageUsed": 30436770129,
"numaNodeAffinity": [
"0",
"1"
],
"devices": [
{
"kind": "RealUSBController"
}
],
"host": {
"id": "host-29",
"parent": {
"kind": "Cluster",
"id": "domain-c26"
},
"revision": 1,
"name": "esx13.acme.com",
"selfLink": "providers/vsphere/...324e/hosts/host-29",
"status": "green",
"inMaintenance": false,
"managementServerIp": "10.119.2.101",
"thumbprint": "...:9E:98",
"timezone": "UTC",
"cpuSockets": 2,
"cpuCores": 16,
"productName": "VMware ESXi",
"productVersion": "6.5.0",
"networking": {
"pNICs": [
{
"key": "key-vim.host.PhysicalNic-vmnic0",
"linkSpeed": 10000
},
{
"key": "key-vim.host.PhysicalNic-vmnic1",
"linkSpeed": 10000
},
...
],
"vNICs": [
{
"key": "key-vim.host.VirtualNic-vmk2",
"portGroup": "VM_Migration",
"dPortGroup": "",
"ipAddress": "192.168.79.13",
"subnetMask": "255.255.255.0",
"mtu": 9000
},
{
"key": "key-vim.host.VirtualNic-vmk0",
"portGroup": "Management Network",
"dPortGroup": "",
"ipAddress": "10.119.2.13",
"subnetMask": "255.255.255.128",
"mtu": 1500
},
...
],
"portGroups": [
{
"key": "key-vim.host.PortGroup-VM Network",
"name": "VM Network",
"vSwitch": "key-vim.host.VirtualSwitch-vSwitch0"
},
{
"key": "key-vim.host.PortGroup-Management Network",
"name": "Management Network",
"vSwitch": "key-vim.host.VirtualSwitch-vSwitch0"
},
...
],
"switches": [
{
"key": "key-vim.host.VirtualSwitch-vSwitch0",
"name": "vSwitch0",
"portGroups": [
"key-vim.host.PortGroup-VM Network",
"key-vim.host.PortGroup-Management Network"
],
"pNICs": [
"key-vim.host.PhysicalNic-vmnic4"
]
},
...
]
},
"networks": [
{
"kind": "Network",
"id": "network-31"
},
{
"kind": "Network",
"id": "network-34"
},
...
],
"datastores": [
{
"kind": "Datastore",
"id": "datastore-35"
},
{
"kind": "Datastore",
"id": "datastore-63"
}
],
"vms": null,
"networkAdapters": [],
"cluster": {
"id": "domain-c26",
"parent": {
"kind": "Folder",
"id": "group-h23"
},
"revision": 1,
"name": "V2V_Cluster",
"selfLink": "providers/vsphere/...324e/clusters/domain-c26",
"folder": "group-h23",
"networks": [
{
"kind": "Network",
"id": "network-31"
},
{
"kind": "Network",
"id": "network-34"
},
...
],
"datastores": [
{
"kind": "Datastore",
"id": "datastore-35"
},
{
"kind": "Datastore",
"id": "datastore-63"
}
],
"hosts": [
{
"kind": "Host",
"id": "host-44"
},
{
"kind": "Host",
"id": "host-29"
}
],
"dasEnabled": false,
"dasVms": [],
"drsEnabled": true,
"drsBehavior": "fullyAutomated",
"drsVms": [],
"datacenter": null
}
}
}
}