diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index a96b5d78068..4f8e28af816 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -7,7 +7,7 @@ assignees: '' --- -## Describe the bug +# Describe the bug `[Author TODO: A clear and concise description of what the bug is.]` @@ -40,6 +40,7 @@ assignees: '' `[Author TODO: Do you have any idea of what the solution might be?]` ## Checks + - [ ] Have you searched the Firecracker Issues database for similar problems? - [ ] Have you read the existing relevant Firecracker documentation? - [ ] Are you certain the bug being reported is a Firecracker issue? diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index d23953ebd1d..e97efb2e630 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -7,17 +7,20 @@ assignees: '' --- -## Why is this feature request important? What are the use cases? Please describe. +# Feature Request -`[Author TODO: A clear and concise description of what the problem is.]` +`[Author TODO: Why is this feature request important? What are the use cases? +Please describe.]` ## Describe the desired solution -`[Author TODO: A clear and concise description of how you would like the feature to work.]` +`[Author TODO: A clear and concise description of how you would like +the feature to work.]` ## Describe possible alternatives -`[Author TODO: A clear and concise description of any alternative solutions or features you have considered.]` +`[Author TODO: A clear and concise description of any alternative solutions +or features you have considered.]` `[Author TODO: How do you work around not having this feature?]` @@ -26,6 +29,7 @@ assignees: '' `[Author TODO: Add additional context about this feature request here.]` ## Checks + - [ ] Have you searched the Firecracker Issues database for similar requests? - [ ] Have you read all the existing relevant Firecracker documentation? - [ ] Have you read and understood Firecracker's core tenets? diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 344e7036efa..9af60610a44 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -1,4 +1,4 @@ -## Reason for This PR +# Reason for This PR `[Author TODO: add issue # or explain reasoning.]` diff --git a/CHANGELOG.md b/CHANGELOG.md index 9dfe51e1498..b00c5308bff 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,7 @@ ## [Unreleased] ### Changed + - Changed Docker images repository from DockerHub to Amazon ECR. ## [0.24.0] @@ -23,7 +24,8 @@ ### Changed - Change the information provided in `DescribeInstance` command to provide microVM - state information (Not started/Running/Paused) instead of whether it's started or not. + state information (Not started/Running/Paused) instead of whether it's + started or not. - Removed the jailer `--extra-args` parameter. It was a noop, having been replaced by the `--` separator for extra arguments. - Changed the output of the `--version` command line parameter to include a list diff --git a/CHARTER.md b/CHARTER.md index 473d227f815..142236bf8bb 100644 --- a/CHARTER.md +++ b/CHARTER.md @@ -9,7 +9,7 @@ execution of container and function workloads. These tenets guide Firecracker's development: -1. **Built-In Security**: We provide compute security barriers that +1. **Built-In Security**: We provide compute security barriers that enable multi-tenant workloads, and cannot be mistakenly disabled by customers. Customer workloads are simultaneously considered sacred (shall not be touched) and malicious (shall be defended against). diff --git a/CREDITS.md b/CREDITS.md index 47bc2459fe9..f6e677a301e 100644 --- a/CREDITS.md +++ b/CREDITS.md @@ -15,7 +15,6 @@ written in Rust with a focus on safety and security. Thanks go to: * [Jason D. Clinton](https://github.com/jclinton) * Sonny Rao - Contributors to the Firecracker repository: * Aaron Hill diff --git a/FAQ.md b/FAQ.md index d6528479ff6..b6e25aeb87f 100644 --- a/FAQ.md +++ b/FAQ.md @@ -133,7 +133,7 @@ Example of a kernel valid command line that enables the serial console (which goes in the `boot_args` field of the `/boot-source` Firecracker API resource): -``` +```console console=ttyS0 reboot=k panic=1 pci=off nomodules ``` @@ -156,10 +156,12 @@ calls into kvm ptp instead of actual network NTP traffic. To be able to do this you need to have a guest kernel compiled with `KVM_PTP` support: -``` + +```console CONFIG_PTP_1588_CLOCK=y CONFIG_PTP_1588_CLOCK_KVM=y ``` + Our [recommended guest kernel config](resources/microvm-kernel-x86_64.config) already has these included. @@ -169,8 +171,8 @@ Now `/dev/ptp0` should be available in the guest. Next you need to configure For example when using `chrony`: 1. Add `refclock PHC /dev/ptp0 poll 3 dpoll -2 offset 0` to the chrony conf -file (`/etc/chrony/chrony.conf`) -2. Restart the `chrony` daemon. + file (`/etc/chrony/chrony.conf`) +1. Restart the `chrony` daemon. You can see more info about the `refclock` parameters [here](https://chrony.tuxfamily.org/doc/3.4/chrony.conf.html#refclock). @@ -192,7 +194,7 @@ For example, when you create two network interfaces by calling `/network-interfaces/1` and then `/network-interfaces/0`, it may result in this mapping: -``` +```console /network-interfaces/1 -> eth0 /network-interfaces/0 -> eth1 ``` @@ -202,8 +204,9 @@ mapping: Firecracker does not implement ACPI and PM devices, therefore operations like gracefully rebooting or powering off the guest are supported in unconventional ways. -Running the `poweroff` or `halt` commands inside a Linux guest will bring it down but -Firecracker process remains unaware of the guest shutdown so it lives on. +Running the `poweroff` or `halt` commands inside a Linux guest will bring it +down but Firecracker process remains unaware of the guest shutdown so it lives +on. Running the `reboot` command in a Linux guest will gracefully bring down the guest system and also bring a graceful end to the Firecracker process. @@ -222,7 +225,7 @@ docs/rootfs-and-kernel-setup.md). If you see errors like ... -``` +```console [] fc_vmm: page allocation failure: order:6, mode:0x140c0c0 (GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null) [] fc_vmm cpuset= mems_allowed=0 @@ -233,6 +236,7 @@ allocation of 2^`order` bytes (in this case, 6) and there aren't sufficient contiguous pages. Possible mitigations are: + - Track the failing allocations in the `dmesg` output and rebuild the host kernel so as to use `vmalloc` instead of `kmalloc` for them. - Reduce memory pressure on the host. @@ -246,8 +250,9 @@ the microVM. One example of such file can be found at `tests/framework/vm_config ### Firecracker fails to start and returns an Out of Memory error -If the Firecracker process exits with `12` exit code (`Out of memory` error), the root -cause is that there is not enough memory on the host to be used by the Firecracker microVM. +If the Firecracker process exits with `12` exit code (`Out of memory` error), +the root cause is that there is not enough memory on the host to be used by the +Firecracker microVM. If the microVM was not configured in terms of memory size through an API request, the host needs to meet the minimum requirement in terms of free memory size, @@ -255,8 +260,9 @@ namely 128 MB of free memory which the microVM defaults to. ### Firecracker fails to start and returns "Resource busy" error -If another hypervisor like VMware or VirtualBox is running on the host and locks `/dev/kvm`, -Firecracker process will fail to start with "Resource busy" error. +If another hypervisor like VMware or VirtualBox is running on the host and +locks `/dev/kvm`, Firecracker process will fail to start with "Resource busy" +error. This issue can be resolved by terminating the other hypervisor running on the host, and allowing Firecracker to start. diff --git a/docs/RELEASE_POLICY.md b/docs/RELEASE_POLICY.md index 4bce5722edc..e6d15783f9b 100644 --- a/docs/RELEASE_POLICY.md +++ b/docs/RELEASE_POLICY.md @@ -8,12 +8,12 @@ customers effectively plan their Firecracker based operations. Firecracker uses [semantic versioning](http://semver.org/) for all releases. Semantic versions are comprised of three fields in the form: - - vMAJOR.MINOR.PATCH +`vMAJOR.MINOR.PATCH`. For example: v0.20.0, v0.22.0-beta5, and v99.123.77+foo.bar.baz.5. Firecracker publishes major, minor and patch releases: + * Patch release - The `PATCH` field is incremented whenever critical bugs and/or security issues are found in a supported release. The fixes in a PATCH release do not change existing behavior or the user interface. Upgrade is recommended. @@ -37,7 +37,7 @@ Firecracker publishes major, minor and patch releases: ## Release support The Firecracker maintainers will only provide support for Firecracker releases -under https://github.com/firecracker-microvm/firecracker/releases. +under our [repository's release page](https://github.com/firecracker-microvm/firecracker/releases). The Firecracker maintainers will provide patch releases for critical bugs and security issues when they are found, for: @@ -47,35 +47,34 @@ security issues when they are found, for: * any Firecracker `vMAJOR.MINOR` release for at least 6 months from release date; * for each `vMAJOR`, the latest `MINOR` for 1 year since release date; -#### Examples: +### Examples 1. Considering an example where the last Firecracker releases are: - * v2.10.0 released on 2022-05-01 - * v2.11.0 released on 2022-07-10 - * v2.12.0 released on 2022-09-11 + * v2.10.0 released on 2022-05-01 + * v2.11.0 released on 2022-07-10 + * v2.12.0 released on 2022-09-11 - In case of an event occurring in 2022-10-03, all three releases will be - patched since less than 6 months elapsed from their MINOR release time. + In case of an event occurring in 2022-10-03, all three releases will be + patched since less than 6 months elapsed from their MINOR release time. 1. Considering an example where the last Firecracker releases are: - * v2.10.0 released on 2022-05-01 - * v2.11.0 released on 2022-07-10 - * v2.12.0 released on 2022-09-11 + * v2.10.0 released on 2022-05-01 + * v2.11.0 released on 2022-07-10 + * v2.12.0 released on 2022-09-11 - In case of of an event occurring in 2023-05-04, v2.11 and v2.12 will be - patched since those were the last 2 Firecracker major releases and less than - an year passed since their release time. + In case of of an event occurring in 2023-05-04, v2.11 and v2.12 will be + patched since those were the last 2 Firecracker major releases and less than + an year passed since their release time. 1. Considering an example where the last Firecracker releases are: - * v2.14.0 released on 2022-05-01 - * v3.0.0 released on 2022-07-10 - * v3.1.0 released on 2022-09-11 - - In case of of an event occurring in 2023-01-13, v2.14 will be patched since - is the last minor of v2 and has less than one year since release while v3.0 - and v3.1 will be patched since were the last two Firecracker releases and - less than 6 months have passed since release time. - + * v2.14.0 released on 2022-05-01 + * v3.0.0 released on 2022-07-10 + * v3.1.0 released on 2022-09-11 + + In case of of an event occurring in 2023-01-13, v2.14 will be patched since + is the last minor of v2 and has less than one year since release while v3.0 + and v3.1 will be patched since were the last two Firecracker releases and + less than 6 months have passed since release time. ## Developer preview features diff --git a/docs/api_requests/actions.md b/docs/api_requests/actions.md index cc55464ec26..36186212199 100644 --- a/docs/api_requests/actions.md +++ b/docs/api_requests/actions.md @@ -56,7 +56,9 @@ the guest OS. For Linux, that means the guest kernel needs a few tens of milliseconds probing the device. This can be disabled by using these kernel command line parameters: -```i8042.noaux i8042.nomux i8042.nopnp i8042.dumbkbd``` +```console +i8042.noaux i8042.nomux i8042.nopnp i8042.dumbkbd +``` **Note2** This action is only supported on `x86_64` architecture. diff --git a/docs/api_requests/patch-block.md b/docs/api_requests/patch-block.md index 6a71f021944..68e2c5911e5 100644 --- a/docs/api_requests/patch-block.md +++ b/docs/api_requests/patch-block.md @@ -1,22 +1,37 @@ -## Updating block devices after boot +# Updating block devices after boot -Firecracker offers support to update attached block devices after the microVM has been started. This is provided via PATCH /drives API which notifies Firecracker that the underlying block file has been changed on the host. It should be called when the path to the block device is changed or if the file size has been modified. It is important to note that external changes to the block device file do not automatically trigger a notification in Firecracker so the explicit PATCH API call is mandatory. +Firecracker offers support to update attached block devices after the microVM +has been started. This is provided via PATCH /drives API which notifies +Firecracker that the underlying block file has been changed on the host. It +should be called when the path to the block device is changed or if the file +size has been modified. It is important to note that external changes to the +block device file do not automatically trigger a notification in Firecracker +so the explicit PATCH API call is mandatory. ## How it works -The implementation of the PATCH /drives API does not modify the host backing file. It only updates the emulation layer block device properties, path and length and then triggers a virtio device reconfiguration that is handled by the guest driver which will update the size of the raw block device. -With that being said, a sequence which performs resizing/altering of the block underlying host file followed by a PATCH /drives API call is not an atomic operation as the guest can also modify the block file via emulation during the sequence, if the raw block device is mounted or accessible. +The implementation of the PATCH /drives API does not modify the host backing +file. It only updates the emulation layer block device properties, path and +length and then triggers a virtio device reconfiguration that is handled by the +guest driver which will update the size of the raw block device. +With that being said, a sequence which performs resizing/altering of the block +underlying host file followed by a PATCH /drives API call is not an atomic +operation as the guest can also modify the block file via emulation during +the sequence, if the raw block device is mounted or accessible. ## Supported use case -This feature was designed to work with a cooperative guest in order to effectively simulate hot plug/unplug functionality for block devices. +This feature was designed to work with a cooperative guest in order to +effectively simulate hot plug/unplug functionality for block devices. The following guarantees need to be provided: -* guest did not mount the device -* guest does not read or write from the raw block device /dev/vdX during the update sequence +* guest did not mount the device +* guest does not read or write from the raw block device `/dev/vdX` during the + update sequence -Example sequence that configures a microVM with a placeholder drive and then updates it with the real one: +Example sequence that configures a microVM with a placeholder drive and then +updates it with the real one: ```bash # Create and set up a block device. @@ -26,11 +41,11 @@ curl --unix-socket ${socket} -i \ -X PUT "http://localhost/drives/scratch" \ -H "accept: application/json" \ -H "Content-Type: application/json" \ - -d "{ + -d "{ \"drive_id\": \"scratch\", - \"path_on_host\": \"${ro_drive_path}\", - \"is_root_device\": false, - \"is_read_only\": true + \"path_on_host\": \"${ro_drive_path}\", + \"is_root_device\": false, + \"is_read_only\": true }" # Finish configuring and start the microVM. Wait for the guest to boot. @@ -48,22 +63,33 @@ curl --unix-socket ${socket} -i \ -H "accept: application/json" \ -H "Content-Type: application/json" \ -d "{ - \"drive_id\": \"scratch\", - \"path_on_host\": \"${updated_ro_drive_path}\" + \"drive_id\": \"scratch\", + \"path_on_host\": \"${updated_ro_drive_path}\" }" - + # It's now safe to mount the block device in the guest and use it # with the updated backing file. ``` ## Data integrity and other issues -We do not recommend using this feature outside of its supported use case scope. If the required guarantees are not provided, data integrity and potential other issues may arise depending on the actual use case. There are two major aspects that need be considered here: +We do not recommend using this feature outside of its supported use case scope. +If the required guarantees are not provided, data integrity and potential other +issues may arise depending on the actual use case. There are two major aspects +that need be considered here: ### Atomicity of the update sequence -If the guest has the opportunity to perform I/O against the block device during the update sequence it can either read data while it is changed or can overwrite data already written by a host process. For example a truncate operation can be undone if the guest issues a write for the last sector of the raw block device, or the guest application can become inconsistent or/and can create inconsistency in the block device itself. +If the guest has the opportunity to perform I/O against the block device during +the update sequence it can either read data while it is changed or can +overwrite data already written by a host process. For example a truncate +operation can be undone if the guest issues a write for the last sector of the +raw block device, or the guest application can become inconsistent or/and can +create inconsistency in the block device itself. ### In flight I/O requests -If the atomicity of the operation is guaranteed by using methods to make the microVM quiescence during the update sequence (for example pausing the microVM) the guest itself or block device can still become incosistent from in flight I/O requests in the guest that will be executed after it is resumed. +If the atomicity of the operation is guaranteed by using methods to make the +microVM quiescence during the update sequence (for example pausing the microVM) +the guest itself or block device can still become incosistent from in flight +I/O requests in the guest that will be executed after it is resumed. diff --git a/docs/api_requests/patch-network-interface.md b/docs/api_requests/patch-network-interface.md index 17017636fca..aefe911d558 100644 --- a/docs/api_requests/patch-network-interface.md +++ b/docs/api_requests/patch-network-interface.md @@ -6,7 +6,7 @@ call. E.g. for a network interface created with: -``` +```console PUT /network-interfaces/iface_1 HTTP/1.1 Host: localhost Content-Type: application/json @@ -36,7 +36,7 @@ Accept: application/json A `PATCH` request can be sent at any future time, to update the rate limiters: -``` +```console PATCH /network-interfaces/iface_1 HTTP/1.1 Host: localhost Content-Type: application/json @@ -64,14 +64,12 @@ found in our [OpenAPI spec](../../src/api_server/swagger/firecracker.yaml). In the above example, the RX rate limit is updated, but the TX rate limit remains unchanged. - -# Removing Rate Limiting +## Removing Rate Limiting A rate limit can be disabled by providing a 0-sized token bucket. E.g., following the above example, the TX rate limit can be disabled with: - -``` +```console PATCH /network-interfaces/iface_1 HTTP/1.1 Host: localhost Content-Type: application/json diff --git a/docs/ballooning.md b/docs/ballooning.md index c335cdb4351..04a5534b3c2 100644 --- a/docs/ballooning.md +++ b/docs/ballooning.md @@ -18,17 +18,18 @@ then tries again. While the actual size of the balloon is larger than the target size, it will free memory until it hits the target size. The device can be configured with the following options: + * `deflate_on_oom`: if this is set to `true` and a guest process wants to -allocate some memory which would make the guest enter an out-of-memory state, -the kernel will take some pages from the balloon and give them to said -process instead asking the OOM killer process to kill some processes to free -memory. Note that this applies to allocations from guest processes which would -make the system enter an OOM state. This does not apply to instances when the -kernel needs memory for its activities (i.e. constructing caches), or when the -user requests more memory than the amount available through an inflate. + allocate some memory which would make the guest enter an out-of-memory state, + the kernel will take some pages from the balloon and give them to said + process instead asking the OOM killer process to kill some processes to free + memory. Note that this applies to allocations from guest processes which would + make the system enter an OOM state. This does not apply to instances when the + kernel needs memory for its activities (i.e. constructing caches), or when the + user requests more memory than the amount available through an inflate. * `stats_polling_interval_s`: unsigned integer value which if set to 0 -disables the virtio balloon statistics and otherwise represents the interval -of time in seconds at which the balloon statistics are updated. + disables the virtio balloon statistics and otherwise represents the interval + of time in seconds at which the balloon statistics are updated. ## Security disclaimer @@ -36,12 +37,13 @@ of time in seconds at which the balloon statistics are updated. from a driver in the guest.** In normal conditions, the balloon device will: + * not change the target size, which is set directly by the host * consume exactly as many pages as required to achieve the target size * correctly update the value of the actual size of the balloon seen by the host * not use pages that were previously inflated if they were not returned to the -guest via a deflate operation (unless the `deflate_on_oom` flag was set and the -guest is in an out of memory state) + guest via a deflate operation (unless the `deflate_on_oom` flag was set and the + guest is in an out of memory state) * provide correct statistics when available However, Firecracker does not and cannot introspect into the guest to check the @@ -58,11 +60,12 @@ restrict the amount of memory available to the guest. It is also the users' responsibility to monitor the memory consumption of the VM and, in case unexpected increases in memory usage are observed, we recommend the following options: + * migrate the VM to a machine with higher memory availability through -snapshotting at the cost of disrupting the workload; + snapshotting at the cost of disrupting the workload; * kill the Firecracker process that exceeds memory restrictions; * enable swap with a sufficient amount of memory to handle the demand at the -cost of memory access speed; + cost of memory access speed; Users should also never rely solely on the statistics provided by the balloon when controlling the Firecracker process as they are provided directly by the @@ -101,7 +104,7 @@ configuration file given as a command line argument to the Firecracker process. Here is an example command on how to install the balloon through the API: -``` +```console socket_location=... amount_mb=... deflate_on_oom=... @@ -127,7 +130,7 @@ represents the target size of the balloon, and `deflate_on_oom` and To install the balloon via the JSON config file, insert the following JSON object into your configuration file: -``` +```console "balloon": { "amount_mb": 0, "deflate_on_oom": false, @@ -139,7 +142,7 @@ After installing the balloon device, users can poll the configuration of the device at any time by sending a GET request on "/balloon". Here is an example of such a request: -``` +```console socket_location=... curl --unix-socket $socket_location -i \ @@ -155,7 +158,7 @@ one used to configure the device (via a PUT request on "/balloon"). After it has been installed, the balloon device can only be operated via the API through the following command: -``` +```console socket_location=... amount_mb=... polling_interval=... @@ -180,7 +183,7 @@ in the balloon configuration to a non-zero value. If enabled, users can receive the latest balloon statistics by issuing a GET request on "/balloon". Here is an example of such a request: -``` +```console socket_location=... curl --unix-socket $socket_location -i \ @@ -208,25 +211,25 @@ As defined in the virtio 1.1 specification, the traditional virtio balloon device has support for the following statistics: * `VIRTIO_BALLOON_S_SWAP_IN`: The amount of memory that has been swapped in -(in bytes). + (in bytes). * `VIRTIO_BALLOON_S_SWAP_OUT`: The amount of memory that has been swapped out -to disk (in bytes). + to disk (in bytes). * `VIRTIO_BALLOON_S_MAJFLT`: The number of major page faults that have -occurred. + occurred. * `VIRTIO_BALLOON_S_MINFLT`: The number of minor page faults that have -occurred. + occurred. * `VIRTIO_BALLOON_S_MEMFREE`: The amount of memory not being used for any -purpose (in bytes). -* `VIRTIO_BALLOON_S_MEMTOT`: The total amount of memory available (in bytes). + purpose (in bytes). +* `VIRTIO_BALLOON_S_MEMTOT`: The total amount of memory available (in bytes). * `VIRTIO_BALLOON_S_AVAIL`: An estimate of how much memory is available (in -bytes) for starting new applications, without pushing the system to swap. + bytes) for starting new applications, without pushing the system to swap. * `VIRTIO_BALLOON_S_CACHES`: The amount of memory, in bytes, that can be -quickly reclaimed without additional I/O. Typically these pages are used for -caching files from disk. + quickly reclaimed without additional I/O. Typically these pages are used for + caching files from disk. * `VIRTIO_BALLOON_S_HTLB_PGALLOC`: The number of successful hugetlb page -allocations in the guest. + allocations in the guest. * `VIRTIO_BALLOON_S_HTLB_PGFAIL`: The number of failed hugetlb page allocations -in the guest. + in the guest. The driver is querried for updated statistics every time the amount of time specified in that field passes. The driver may not provide all the @@ -236,7 +239,7 @@ statistics are preserved. To change the statistics polling interval, users can sent a PATCH request on "/balloon/statistics". Here is an example of such a request: -``` +```console socket_location=... polling_interval=... diff --git a/docs/benchmarks/state-serialize.md b/docs/benchmarks/state-serialize.md index 7ad4fc9f171..cf1b51e1548 100644 --- a/docs/benchmarks/state-serialize.md +++ b/docs/benchmarks/state-serialize.md @@ -1,10 +1,13 @@ +# MicroVM state serialization benchmarks -### MicroVM state serialization benchmarks -The benchmarks have been performed using a synthetic state snapshot that contains 100 structs and a 10k element array. +The benchmarks have been performed using a synthetic state snapshot that +contains 100 structs and a 10k element array. Source code: [src/snapshot/benches/main.rs](../../src/snapshot/benches/main.rs). -Snapshot size: 83886 bytes. +Snapshot size: 83886 bytes. -### Host configuration +## Host configuration + +```console - Architecture: x86_64 - CPU op-mode(s): 32-bit, 64-bit - Byte Order: Little Endian @@ -28,9 +31,24 @@ Snapshot size: 83886 bytes. - L2 cache: 256K - L3 cache: 4096K - NUMA node0 CPU(s): 0-3 -- Flags: `fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d` +- Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca + cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht + tmpbe syscall nx pdpe1gb rdtscp lm constant_tsc art + arch_perfmon pebs bts rep_good nopl xtopology + nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 + monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr + pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt + tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm + 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd + ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid + fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms + invpcid rtm mpx rdseed adx smap clflushopt intel_pt + xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln + pts hwp hwp_notify hwp_act_window hwp_epp md_clear + flush_l1d +``` -### Current baseline +## Current baseline | Test | Mean | |---------------------|---------------| diff --git a/docs/dev-machine-setup.md b/docs/dev-machine-setup.md index 3ef09c5630f..ed4de5b906d 100644 --- a/docs/dev-machine-setup.md +++ b/docs/dev-machine-setup.md @@ -21,38 +21,36 @@ Note that Firecracker development on macOS has no hard dependency on VMware Fusion or Ubuntu. All that is required is a Linux VM that supports nested virtualization. This is but one example of that setup: -1. Download and install -[VMware Fusion](https://www.vmware.com/products/fusion/fusion-evaluation.html). -2. Download an [Ubuntu 18.04.2 LTS](https://www.ubuntu.com/download/desktop) -ISO image. -3. Open VMware Fusion, open the **File** menu, and select **New...** to bring up -the **Select the Installation Method** window. -4. Find the ISO image you downloaded in step 2, and drag it onto the VMware -window opened in step 3. -5. You should now be at the **Create a New Virtual Machine** window. Ensure the -Ubuntu 18.04.2 image is highlighted, and click **Continue**. -6. On the **Linux Easy Install** window, leave the **Use Easy Install** option -checked, enter a password, and click **Continue**. -7. On the **Finish** window, click **Finish**, and save the `.vmwarevm` file if -prompted. -8. After the VM starts up, open the **Virtual Machine** menu, and select **Shut -Down**. -9. After the VM shuts down, open the **Virtual Machine** menu, and select -**Settings...**. -10. From the settings window, select **Processors & Memory**, and then unfurl -the **Advanced options** section. -11. Check the **Enable hypervisor applications in this virtual machine** option, -close the settings window, open the **Virtual Machine** menu, and select **Start -Up**. -12. If you receive a **Cannot connect the virtual device sata0:1 because no -corresponding device is available on the host.** error, you can respond **No** -to the prompt. -13. Once the VM starts up, log in as the user you created in step 6. -14. After logging in, open the **Terminal** app, and run -`sudo apt install curl -y` to install cURL. -15. Now you can continue with the Firecracker -[Getting Started](getting-started.md) instructions to install and configure -Firecracker in the new VM. +1. Download and install [VMware Fusion](https://www.vmware.com/products/fusion/fusion-evaluation.html). +1. Download an [Ubuntu 18.04.2 LTS](https://www.ubuntu.com/download/desktop) + ISO image. +1. Open VMware Fusion, open the **File** menu, and select **New...** to bring up + the **Select the Installation Method** window. +1. Find the ISO image you downloaded in step 2, and drag it onto the VMware + window opened in step 3. +1. You should now be at the **Create a New Virtual Machine** window. Ensure the + Ubuntu 18.04.2 image is highlighted, and click **Continue**. +1. On the **Linux Easy Install** window, leave the **Use Easy Install** option + checked, enter a password, and click **Continue**. +1. On the **Finish** window, click **Finish**, and save the `.vmwarevm` file if + prompted. +1. After the VM starts up, open the **Virtual Machine** menu, + and select **Shut Down**. +1. After the VM shuts down, open the **Virtual Machine** menu, and select + **Settings...**. +1. From the settings window, select **Processors & Memory**, and then unfurl + the **Advanced options** section. +1. Check the **Enable hypervisor applications in this virtual machine** option, + close the settings window, open the **Virtual Machine** menu, and select **Start + Up**. +1. If you receive a **Cannot connect the virtual device sata0:1 because no + corresponding device is available on the host.** error, you can respond **No** + to the prompt. +1. Once the VM starts up, log in as the user you created in step 6. +1. After logging in, open the **Terminal** app, and run + `sudo apt install curl -y` to install cURL. +1. Now you can continue with the Firecracker [Getting Started](getting-started.md) + instructions to install and configure Firecracker in the new VM. ## Cloud @@ -62,25 +60,28 @@ Firecracker development environment on AWS can be setup using bare metal instanc Follow these steps to create a bare metal instance. 1. If you don't already have an AWS account, create one using the [AWS Portal](https://portal.aws.amazon.com/billing/signup). -1. Login to [AWS console](https://console.aws.amazon.com/console/home). You must select a region that offers bare metal EC2 instances. To check which regions support bare-metal, visit [Amazon EC2 On-Demand Pricing](https://aws.amazon.com/ec2/pricing/on-demand/) and look for `*.metal` instance types. +1. Login to [AWS console](https://console.aws.amazon.com/console/home). You must + select a region that offers bare metal EC2 instances. To check which regions + support bare-metal, visit [Amazon EC2 On-Demand Pricing](https://aws.amazon.com/ec2/pricing/on-demand/) + and look for `*.metal` instance types. 1. Click on `Launch a virtual machine` in `Build Solution` section. 1. Firecracker requires a relatively new kernel, so you should use a recent -Linux distribution - such as `Ubuntu Server 18.04 LTS (HVM), SSD Volume Type`. + Linux distribution - such as `Ubuntu Server 18.04 LTS (HVM), SSD Volume Type`. 1. In `Step 2`, scroll to the bottom and select `i3.metal` instance type. Click - on `Next: Configure Instance Details`. + on `Next: Configure Instance Details`. 1. In `Step 3`, click on `Next: Add Storage`. 1. In `Step 4`, click on `Next: Add Tags`. 1. In `Step 5`, click on `Next: Configure Security Group`. 1. In `Step 6`, take the default security group. This opens up port 22 and is -needed so that you can ssh into the machine later. Click on `Review and Launch`. + needed so that you can ssh into the machine later. Click on `Review and Launch`. 1. Verify the details and click on `Launch`. If you do not have an existing -key pair, then you can select `Create a new key pair` to create a key pair. -This is needed so that you can use it later to ssh into the machine. + key pair, then you can select `Create a new key pair` to create a key pair. + This is needed so that you can use it later to ssh into the machine. 1. Click on the instance id in the green box. Copy `Public DNS` from the -`Description` tab of the selected instance. + `Description` tab of the selected instance. 1. Login to the newly created instance: - ``` + ```console ssh -i ubuntu@ ``` @@ -98,89 +99,89 @@ Here is a brief summary of steps to create such a setup (full instructions to set up a Ubuntu-based VM on GCE with nested KVM enablement can be found in GCE [documentation](https://cloud.google.com/compute/docs/instances/enable-nested-virtualization-vm-instances)). - 1. Select a GCP project and zone - - ``` - $ FC_PROJECT = your_name-firecracker - $ FC_REGION = us-east1 - $ FC_ZONE = us-east1-b - ``` - -
Click here for instructions to create a new project -

- It might be convenient to keep your Firecracker-related GCP resources in - a separate project, so that you can keep track of resources more easily - and remove everything easily once your are done. - - For convenience, give the project a unique name (e.g., - your_name-firecracker), so that GCP does not need to create a project - id different than project name (by appending randomized numbers to the - name you provide). - - ``` - $ gcloud projects create ${FC_PROJECT} --enable-cloud-apis --set-as-default - ``` - -

-
- - ``` - $ gcloud config set project ${FC_PROJECT} - $ gcloud config set compute/region ${FC_REGION} - $ gcloud config set compute/zone ${FC_ZONE} - ``` - - 1. The next step is to create a VM image able to run nested KVM (as outlined - [here](https://cloud.google.com/compute/docs/instances/enable-nested-virtualization-vm-instances)). - - **IMPORTANT:** Notice that Firecracker requires a relatively new kernel, - so you should use a recent Linux distribution image - such as Ubuntu 18 - (used in the commands below), or equivalent. - - ``` - $ FC_VDISK=disk-ub18 - $ FC_IMAGE=ub18-nested-kvm - $ gcloud compute disks create ${FC_VDISK}\ - --image-project ubuntu-os-cloud --image-family ubuntu-1804-lts - $ gcloud compute images create ${FC_IMAGE} --source-disk ${FC_VDISK}\ - --licenses "https://www.googleapis.com/compute/v1/projects/vm-options/global/licenses/enable-vmx"\ - --source-disk-zone ${FC_ZONE} - ``` - - 1. Now we create the VM: - - ``` - $ FC_VM = firecracker-vm - $ gcloud compute instances create ${FC_VM} --zone ${FC_ZONE}\ - --image ${FC_IMAGE} - ``` - - 1. Connect to the VM via SSH. - - ``` - $ gcloud compute ssh ${FC_VM} - ``` - - When doing it for the first time, a key-pair will be created for you - (you will be propmpted for a passphrase - can just keep it empty) and - uploaded to GCE. Done! You should see the prompt of the new VM: - - ``` - ubuntu@firecracker-vm:~$ - ``` - - 1. Verify that VMX is enabled, enable KVM - - ``` - $ grep -cw vmx /proc/cpuinfo - 1 - $ sudo setfacl -m u:${USER}:rw /dev/kvm - $ [ -r /dev/kvm ] && [ -w /dev/kvm ] && echo "OK" || echo "FAIL" - OK - ``` - - Now you can continue with the Firecracker [Getting Started](getting-started.md) - instructions to install and configure Firecracker in the new VM. +1. Select a GCP project and zone + + ```console + $ FC_PROJECT = your_name-firecracker + $ FC_REGION = us-east1 + $ FC_ZONE = us-east1-b + ``` + +
Click here for instructions to create a new project +

+ It might be convenient to keep your Firecracker-related GCP resources in + a separate project, so that you can keep track of resources more easily + and remove everything easily once your are done. + + For convenience, give the project a unique name (e.g., + your_name-firecracker), so that GCP does not need to create a project + id different than project name (by appending randomized numbers to the + name you provide). + + ```console + $ gcloud projects create ${FC_PROJECT} --enable-cloud-apis --set-as-default + ``` + +

+
+ + ``` + $ gcloud config set project ${FC_PROJECT} + $ gcloud config set compute/region ${FC_REGION} + $ gcloud config set compute/zone ${FC_ZONE} + ``` + +1. The next step is to create a VM image able to run nested KVM (as outlined + [here](https://cloud.google.com/compute/docs/instances/enable-nested-virtualization-vm-instances)). + + **IMPORTANT:** Notice that Firecracker requires a relatively new kernel, + so you should use a recent Linux distribution image - such as Ubuntu 18 + (used in the commands below), or equivalent. + + ``` + $ FC_VDISK=disk-ub18 + $ FC_IMAGE=ub18-nested-kvm + $ gcloud compute disks create ${FC_VDISK}\ + --image-project ubuntu-os-cloud --image-family ubuntu-1804-lts + $ gcloud compute images create ${FC_IMAGE} --source-disk ${FC_VDISK}\ + --licenses "https://www.googleapis.com/compute/v1/projects/vm-options/global/licenses/enable-vmx"\ + --source-disk-zone ${FC_ZONE} + ``` + +1. Now we create the VM: + + ``` + $ FC_VM = firecracker-vm + $ gcloud compute instances create ${FC_VM} --zone ${FC_ZONE}\ + --image ${FC_IMAGE} + ``` + +1. Connect to the VM via SSH. + + ``` + $ gcloud compute ssh ${FC_VM} + ``` + + When doing it for the first time, a key-pair will be created for you + (you will be propmpted for a passphrase - can just keep it empty) and + uploaded to GCE. Done! You should see the prompt of the new VM: + + ``` + ubuntu@firecracker-vm:~$ + ``` + +1. Verify that VMX is enabled, enable KVM + + ``` + $ grep -cw vmx /proc/cpuinfo + 1 + $ sudo setfacl -m u:${USER}:rw /dev/kvm + $ [ -r /dev/kvm ] && [ -w /dev/kvm ] && echo "OK" || echo "FAIL" + OK + ``` + +Now you can continue with the Firecracker [Getting Started](getting-started.md) +instructions to install and configure Firecracker in the new VM. #### Addendum @@ -188,29 +189,30 @@ set up a Ubuntu-based VM on GCE with nested KVM enablement can be found in GCE In a nutshell, setting up a GCP account involves the following steps: - 1. Log in to GCP [console](https://console.cloud.google.com/) with your - Google credentials. If you don't have account, you will be prompted to join - the trial. +1. Log in to GCP [console](https://console.cloud.google.com/) with your + Google credentials. If you don't have account, you will be prompted to join + the trial. - 1. Install GCP CLI & SDK (full instructions can be found - [here](https://cloud.google.com/sdk/docs/quickstart-debian-ubuntu)) +1. Install GCP CLI & SDK (full instructions can be found + [here](https://cloud.google.com/sdk/docs/quickstart-debian-ubuntu)). - ``` - $ export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" - $ echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main"\ - | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list - $ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - - $ sudo apt-get update && sudo apt-get install -y google-cloud-sdk - ``` + ```console + $ export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" + $ echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main"\ + | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list + $ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg \ + | sudo apt-key add - + $ sudo apt-get update && sudo apt-get install -y google-cloud-sdk + ``` - 1. Configure the `gcloud` CLI by running: +1. Configure the `gcloud` CLI by running: - ``` - $ gcloud init --console-only - ``` + ```console + $ gcloud init --console-only + ``` - Follow the prompts to authenticate (open the provided link, authenticate, - copy the token back to console) and select the default project. + Follow the prompts to authenticate (open the provided link, authenticate, + copy the token back to console) and select the default project. ### Microsoft Azure diff --git a/docs/devctr-image.md b/docs/devctr-image.md index b966c7d51e4..d48208f2890 100644 --- a/docs/devctr-image.md +++ b/docs/devctr-image.md @@ -1,6 +1,6 @@ # Publishing a New Container Image -## What's the Container Image? +## About the Container Image Firecracker uses a [Docker container](https://www.docker.com/) to standardize the build process. This also fixes the build tools and dependencies to specific @@ -25,7 +25,8 @@ registry. The Firecracker CI suite must also be updated to use the new image. access to the repository: ```bash - aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws + aws ecr-public get-login-password --region us-east-1 \ + | docker login --username AWS --password-stdin public.ecr.aws ``` 1. Navigate to the Firecracker directory. Verify that you have the latest @@ -33,8 +34,8 @@ registry. The Firecracker CI suite must also be updated to use the new image. ```bash docker images - REPOSITORY TAG IMAGE ID CREATED SIZE - public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a 2 weeks ago 2.41GB + REPOSITORY TAG IMAGE ID CREATED SIZE + public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a 2 weeks ago 2.41GB ``` 1. Make your necessary changes, if any, to the @@ -53,9 +54,9 @@ registry. The Firecracker CI suite must also be updated to use the new image. ```bash docker images - REPOSITORY TAG IMAGE ID CREATED SIZE - fcuvm latest 1f9852368efb 2 minutes ago 2.36GB - public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a 2 weeks ago 2.41GB + REPOSITORY TAG IMAGE ID CREATED SIZE + fcuvm latest 1f9852368efb 2 weeks ago 2.36GB + public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a 2 weeks ago 2.41GB ``` 1. Tag the new image with the next available version and the architecture @@ -65,10 +66,10 @@ registry. The Firecracker CI suite must also be updated to use the new image. docker tag 1f9852368efb public.ecr.aws/firecracker/fcuvm:v26_x86_64 docker images - REPOSITORY TAG IMAGE ID CREATED SIZE - fcuvm latest 1f9852368efb 5 minutes ago 2.36GB - public.ecr.aws/firecracker/fcuvm v27_x86_64 1f9852368efb 5 minutes ago 2.36GB - public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a 2 weeks ago 2.41GB + REPOSITORY TAG IMAGE ID CREATED + fcuvm latest 1f9852368efb 1 week ago + public.ecr.aws/firecracker/fcuvm v27_x86_64 1f9852368efb 1 week ago + public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a 2 weeks ago ``` 1. Push the image. @@ -83,56 +84,61 @@ Login to the `aarch64` build machine. Steps 1-4 are identical across architectures, change `x86_64` to `aarch64`. -Then: +Then continue with the above steps: -5. Build a new container image with the updated Dockerfile. +1. Build a new container image with the updated Dockerfile. ```bash docker build -t fcuvm -f tools/devctr/Dockerfile.aarch64 . ``` -5. Verify that the new image exists. +1. Verify that the new image exists. ```bash docker images - REPOSITORY TAG IMAGE ID CREATED SIZE - fcuvm latest 1f9852368efb 2 minutes ago 2.36GB - public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a 2 weeks ago 2.41GB + REPOSITORY TAG IMAGE ID CREATED + fcuvm latest 1f9852368efb 2 minutes ago + public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a 2 weeks ago ``` -5. Tag the new image with the next available version and the architecture +1. Tag the new image with the next available version and the architecture you're on. ```bash docker tag 1f9852368efb public.ecr.aws/firecracker/fcuvm:v26_aarch64 docker images - REPOSITORY TAG IMAGE ID CREATED SIZE - fcuvm latest 1f9852368efb 5 minutes ago 2.36GB - public.ecr.aws/firecracker/fcuvm v27_aarch64 1f9852368efb 5 minutes ago 2.36GB - public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a 2 weeks ago 2.41GB + REPOSITORY TAG IMAGE ID + fcuvm latest 1f9852368efb + public.ecr.aws/firecracker/fcuvm v27_aarch64 1f9852368efb + public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a ``` -5. Push the image. +1. Push the image. ```bash docker push public.ecr.aws/firecracker/fcuvm:v27_aarch64 ``` -5. Create a manifest to point the latest container version to each specialized +1. Create a manifest to point the latest container version to each specialized image, per architecture. ```bash - docker manifest create public.ecr.aws/firecracker/fcuvm/dev:v27 public.ecr.aws/firecracker/fcuvm/dev:v27_x86_64 public.ecr.aws/firecracker/fcuvm/dev:v27_aarch64 + docker manifest create public.ecr.aws/firecracker/fcuvm/dev:v27 \ + public.ecr.aws/firecracker/fcuvm/dev:v27_x86_64 public.ecr.aws/firecracker/fcuvm/dev:v27_aarch64 + docker manifest push public.ecr.aws/firecracker/fcuvm/dev:v27 ``` -5. Update the image tag in the +1. Update the image tag in the [`devtool` script](https://github.com/firecracker-microvm/firecracker/blob/master/tools/devtool). Commit and push the change. ```bash - sed -i 's%DEVCTR_IMAGE="public.ecr.aws/firecracker/fcuvm:v26"%DEVCTR_IMAGE="public.ecr.aws/firecracker/fcuvm:v27"%' tools/devtool + PREV_IMAGE=public.ecr.aws/firecracker/fcuvm:v26 + CURR_IMAGE=public.ecr.aws/firecracker/fcuvm:v27 + sed -i "s%DEVCTR_IMAGE=\"$PREV_IMAGE\"%DEVCTR_IMAGE=\"$CURR_IMAGE\"%" \ + tools/devtool ``` ## Troubleshooting @@ -151,7 +157,7 @@ See [this article](https://medium.com/@mauridb/docker-multi-architecture-images-365a44c26be6) for explanations and fix. -### How can I test the image after pushing it to the Docker registry? +### How to test the image after pushing it to the Docker registry Either fetch and run it locally on another machine than the one you used to build it, or clean up any artifacts from the build machine and fetch. @@ -217,8 +223,8 @@ Let's say you want to update ```bash docker ps - CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES - e9f0487fdcb9 fcuvm/dev:v14 "bash" 53 seconds ago Up 52 seconds zen_beaver + CONTAINER ID IMAGE COMMAND CREATED + e9f0487fdcb9 fcuvm/dev:v14 "bash" 53 seconds ago ``` 1. Commit the modified container to a new image. Use the `container ID`. @@ -229,9 +235,9 @@ Let's say you want to update ```bash docker image ls - REPOSITORY TAG IMAGE ID CREATED SIZE - fcuvm/dev v15_x86_64 514581e654a6 18 seconds ago 2.31GB - fcuvm/dev v14 c8581789ead3 2 months ago 2.31GB + REPOSITORY TAG IMAGE ID CREATED + fcuvm/dev v15_x86_64 514581e654a6 18 seconds ago + fcuvm/dev v14 c8581789ead3 2 months ago ``` 1. Repeat for `aarch64`. diff --git a/docs/device-api.md b/docs/device-api.md index e1cd7bf6302..e213ae8be9e 100644 --- a/docs/device-api.md +++ b/docs/device-api.md @@ -1,4 +1,4 @@ -# Device-API Functionality +# Device The Device-API following functionality matrix indicates which devices are required for an API call to be usable. @@ -28,14 +28,14 @@ depends on the device (see [issue #2173](https://github.com/firecracker-microvm/ | `machine-config` | O | O | O | O | O | | `metrics` | O | O | O | O | O | | `mmds` | O | O | O | **R** | O | -| `mmds/config` | O | O | O | O\* | O | +| `mmds/config` | O | O | O | O* | O | | `network-interfaces/{id}` | O | O | O | **R** | O | | `snapshot/create` | O | O | O | O | O | | `snapshot/load` | O | O | O | O | O | | `vm` | O | O | O | O | O | | `vsock` | O | O | O | O | O | -\*: See [issue #2174](https://github.com/firecracker-microvm/firecracker/issues/2174) +*: See [issue #2174](https://github.com/firecracker-microvm/firecracker/issues/2174) ## Input Schema @@ -97,7 +97,8 @@ specification: [firecracker.yaml](./../src/api_server/swagger/firecracker.yaml). | | uds_path | O | O | O | O | **R** | | | vsock_id | O | O | O | O | **R** | -\*: The `TokenBucket` can be configured with either the virtio-net or virtio-block drivers, or both. +\*: The `TokenBucket` can be configured with either the virtio-net +or virtio-block drivers, or both. ## Output Schema diff --git a/docs/getting-started.md b/docs/getting-started.md index d34134b4a73..cb91842ccbe 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -17,7 +17,7 @@ instance using Ubuntu 18.04 on EC2. Firecracker uses [KVM](https://www.linux-kvm.org) and needs read/write access that can be granted as shown below: -``` +```console sudo setfacl -m u:${USER}:rw /dev/kvm ``` @@ -51,16 +51,16 @@ and run it on your x86_64 or aarch64 Linux machine. On the EC2 instance, this binary can be downloaded as: ```wrap -latest=$(basename $(curl -fsSLI -o /dev/null -w %{url_effective} https://github.com/firecracker-microvm/firecracker/releases/latest)) -``` - -```wrap -curl -L https://github.com/firecracker-microvm/firecracker/releases/download/${latest}/firecracker-${latest}-$(uname -m).tgz | tar -xz +release_url="https://github.com/firecracker-microvm/firecracker/releases" +latest=$(basename $(curl -fsSLI -o /dev/null -w %{url_effective} ${release_url}/latest)) +arch=`uname -m` +curl -L ${release_url}/download/${latest}/firecracker-${latest}-${arch}.tgz \ +| tar -xz ``` Rename the binary to "firecracker": -``` +```console mv firecracker-${latest}-$(uname -m) firecracker ``` @@ -274,9 +274,13 @@ git checkout tags/v0.10.1 Within the Firecracker repository root directory: -1. with the __default__ musl target: ```tools/devtool build``` -1. (__Experimental only__) using the gnu target: -```tools/devtool build -l gnu``` +1. With the __default__ musl target: + + ```tools/devtool build``` + +1. (__Experimental only__) using the gnu target: + + ```tools/devtool build -l gnu``` This will build and place the two Firecracker binaries at: diff --git a/docs/initrd.md b/docs/initrd.md index 08536e9fe69..557c7d5f9ff 100644 --- a/docs/initrd.md +++ b/docs/initrd.md @@ -4,13 +4,16 @@ ### Based on alpine or suse -You can use this script to generate an initrd either based on alpine or suse linux: https://github.com/marcov/firecracker-initrd +You can use the script found [here](https://github.com/marcov/firecracker-initrd) +to generate an initrd either based on alpine or suse linux. -The script extracts the init system from each distribution and creates a initrd. +The script extracts the init system from each distribution and creates a +initrd. ### Custom -Use this option for creating an initrd if you're building your own init or if you need any specific files / logic in your initrd. +Use this option for creating an initrd if you're building your own init or if +you need any specific files / logic in your initrd. ```bash mkdir initrd @@ -40,5 +43,8 @@ curl --unix-socket /tmp/firecracker.socket -i \ - You should not use a drive with `is_root_device: true` when using an initrd - Make sure your kernel configuration has `CONFIG_BLK_DEV_INITRD=y` -- If you don't want to place your init at the root of your initrd, you can add `rdinit=/path/to/init` to your `boot_args` property -- If you intend to `pivot_root` in your init, it won't be possible because the initrd is mounted as a rootfs and cannot be unmounted. You will need to use `switch_root` instead. \ No newline at end of file +- If you don't want to place your init at the root of your initrd, you can add + `rdinit=/path/to/init` to your `boot_args` property +- If you intend to `pivot_root` in your init, it won't be possible because the + initrd is mounted as a rootfs and cannot be unmounted. You will need to use + `switch_root` instead. \ No newline at end of file diff --git a/docs/jailer.md b/docs/jailer.md index 867dcbc61f8..951b00b265a 100644 --- a/docs/jailer.md +++ b/docs/jailer.md @@ -36,9 +36,9 @@ jailer --id \ `=` (e.g cpuset.cpus=0). This argument can be used multiple times to set multiple cgroups. This is useful to avoid providing privileged permissions to another process for setting the cgroups before or after the jailer is executed. - The `--cgroup` flag can help as well to set Firecracker process cgroups before the - VM starts running, with no need to create the entire cgroup hierarchy manually (which - requires privileged permissions). + The `--cgroup` flag can help as well to set Firecracker process cgroups + before the VM starts running, with no need to create the entire cgroup + hierarchy manually (which requires privileged permissions). - `chroot_base` represents the base folder where chroot jails are built. The default is `/srv/jailer`. - `netns` represents the path to a network namespace handle. If present, the @@ -195,7 +195,7 @@ STDERR)`. Close `dev_null_fd`, because it is no longer necessary. Finally, the jailer switches the `uid` to `123`, and `gid` to `100`, and execs -``` +```console ./firecracker \ --id="551e7604-e35c-42b3-b825-416853441234" \ --start-time-us= \ @@ -206,7 +206,7 @@ Now firecracker creates the socket at `/srv/jailer/firecracker/551e7604-e35c-42b3-b825-416853441234/root/` to interact with the VM. -Note: default value for is `/run/firecracker.socket` +Note: default value for `` is `/run/firecracker.socket`. ### Observations diff --git a/docs/logger.md b/docs/logger.md index 029a1419491..fa6e69959ab 100644 --- a/docs/logger.md +++ b/docs/logger.md @@ -68,7 +68,7 @@ warnings etc.(depending on the level). If the path provided is a named pipe, you can use the script below to read from it: -```shell script +```shell logs=logs.fifo while true diff --git a/docs/metrics.md b/docs/metrics.md index 7e07239ed40..e420adefa87 100644 --- a/docs/metrics.md +++ b/docs/metrics.md @@ -42,12 +42,12 @@ The metrics get flushed in two ways: * without user intervention every 60 seconds; * upon user demand, by issuing a `FlushMetrics` request. You can -find how to use this request in the [actions API](api_requests/actions.md). + find how to use this request in the [actions API](api_requests/actions.md). If the path provided is a named pipe, you can use the script below to read from it: -```shell script +```shell metrics=metrics.fifo while true diff --git a/docs/mmds/mmds-design.md b/docs/mmds/mmds-design.md index 8305a9b82b1..a5f0994fbb2 100644 --- a/docs/mmds/mmds-design.md +++ b/docs/mmds/mmds-design.md @@ -43,10 +43,10 @@ Patch functionality, based on [RFC 7396](https://tools.ietf.org/html/rfc7396). M related API requests come from the host, which is considered a trusted environment, so there are no checks beside the kind of validation done by HTTP server and `serde-json` (the crate used to de/serialize JSON). There is no maximum limit for - the stored metadata size, but one might consider that storing and retrieving large - amount of data may induce bottlenecks for the HTTP REST API processing, which is - based on `micro-http` crate. MMDS contents can be retrieved using the Firecracker - API, via a `GET` request to the `/mmds` resource. +the stored metadata size, but one might consider that storing and retrieving large +amount of data may induce bottlenecks for the HTTP REST API processing, which is +based on `micro-http` crate. MMDS contents can be retrieved using the Firecracker +API, via a `GET` request to the `/mmds` resource. ## The data store @@ -108,9 +108,10 @@ a reply. ### MMDS Network Stack Somewhat confusingly, this is the name of the component which taps the device -model. It has a user-configured IPv4 address (see +model. It has a user-configured IPv4 address (see [Firecracker MMDS configuration API](../../src/api_server/swagger/firecracker.yaml)) -and MAC (`06:01:23:45:67:01`) addresses. The latter is also used to respond to ARP requests. +and MAC (`06:01:23:45:67:01`) addresses. The latter is also used to respond to +ARP requests. For every frame coming from the guest, the following steps take place: 1. Apply a heuristic to determine whether the frame may contain an ARP request diff --git a/docs/mmds/mmds-user-guide.md b/docs/mmds/mmds-user-guide.md index b905cc26958..02dafc8749d 100644 --- a/docs/mmds/mmds-user-guide.md +++ b/docs/mmds/mmds-user-guide.md @@ -4,7 +4,7 @@ The Firecracker microVM Metadata Service (MMDS) is a mutable data store which can be used for sharing information between host and guests, in a secure and easy at hand way. -# Activating the microVM Metadata Service +## Activating the microVM Metadata Service By default, MMDS is not reachable from the guest operating system. At microVM runtime, MMDS is tightly coupled with a network interface, which allows MMDS @@ -29,11 +29,11 @@ curl --unix-socket /tmp/firecracker.socket -i \ }' ``` -# Configuring the microVM Metadata Service +## Configuring the microVM Metadata Service MMDS can be configured pre-boot only, using the Firecracker API server. This can be achieved through an HTTP `PUT` request to `/mmds/config` resource. The -complete MMDS configuration API is described in the +complete MMDS configuration API is described in the [firecracker swagger file](../../src/api_server/swagger/firecracker.yaml). At the moment, MMDS is configurable with respect to the IPv4 address used by @@ -41,7 +41,7 @@ guest applications when issuing requests to MMDS. If MMDS configuration is not provided before booting up the guest, the MMDS IPv4 address defaults to `169.254.169.254`. -### Example +The Ipv4 address for issuing requests to the MMDS can be configured like this: ```bash MMDS_IPV4_ADDR=169.254.170.2 @@ -56,9 +56,8 @@ curl --unix-socket /tmp/firecracker.socket -i \ MMDS is tightly coupled with a network interface which is used to route MMDS packets. To send MMDS intended packets, guest applications must insert a new rule into the routing table of the guest OS. This new rule must forward MMDS -intended packets to a network interface which allows MMDS requests. - -### Example +intended packets to a network interface which allows MMDS requests. For +example: ```bash MMDS_IPV4_ADDR=169.254.170.2 @@ -66,21 +65,20 @@ MMDS_NET_IF=eth0 ip route add ${MMDS_IPV4_ADDR} dev ${MMDS_NET_IF} ``` -# Inserting and updating metadata +## Inserting and updating metadata Inserting and updating metadata is possible through the Firecracker API server. The metadata inserted in MMDS must be any valid JSON. An user can create or update the MMDS data store before the microVM is started or during its operation. To insert metadata into MMDS, an HTTP `PUT` request to the `/mmds` resource has to be issued. This request must have a payload with metadata structured in -[JSON](https://tools.ietf.org/html/rfc7159) format. To replace existing metadata, a -subsequent HTTP `PUT` request to the `/mmds` resource must be issued, using as a -payload the new metadata. A complete description of metadata insertion firecracker -API can be found in the +[JSON](https://tools.ietf.org/html/rfc7159) format. To replace existing +metadata, a subsequent HTTP `PUT` request to the `/mmds` resource must be +issued, using as a payload the new metadata. A complete description of +metadata insertion firecracker API can be found in the [firecracker swagger file](../../src/api_server/swagger/firecracker.yaml). - -### Example +An example of an API request for inserting metadata is provided below: ```bash curl --unix-socket /tmp/firecracker.socket -i \ @@ -115,7 +113,7 @@ resource has to be issued, using as a payload the metadata patch, as describes. A complete description of updating metadata Firecracker API can be found in the [firecracker swagger file](../../src/api_server/swagger/firecracker.yaml). -### Example +An example API for how to update existing metadata is offered below: ```bash curl --unix-socket /tmp/firecracker.socket -i \ @@ -131,7 +129,7 @@ curl --unix-socket /tmp/firecracker.socket -i \ }' ``` -# Retrieving metadata +## Retrieving metadata MicroVM metadata can be retrieved both from host and guest operating systems. For the scope of this chapter, let's assume the data store content is the JSON @@ -148,7 +146,7 @@ below: } ``` -## Retrieving metadata in the host operating system +### Retrieving metadata in the host operating system To retrieve existing MMDS metadata from host operating system, an HTTP `GET` request to the `/mmds` resource must be issued. The HTTP response returns the @@ -156,7 +154,7 @@ existing metadata, as a JSON formatted text. A complete description of retrieving metadata Firecracker API can be found in the [firecracker swagger file](../../src/api_server/swagger/firecracker.yaml). -### Example +Below you can see how to retrieve metadata from the host: ```bash curl -s --unix-socket /tmp/firecracker.socket http://localhost/mmds @@ -175,7 +173,7 @@ Output: } ``` -## Retrieving metadata in the guest operating system +### Retrieving metadata in the guest operating system To retrieve existing MMDS metadata from guest operating system, an HTTP `GET` request must be issued. The requested resource can be referenced by its @@ -193,9 +191,9 @@ the output to IMDS. Retrieving MMDS resources in IMDS format, other than JSON `string` and `object` types, is not supported. -### Example +Below is an example on how to retrieve the `latest/meta-data` resource in +JSON format: -Retrieving the `latest/meta-data` resource in JSON format: ```bash MMDS_IPV4_ADDR=169.254.170.2 RESOURCE_POINTER_OBJ=latest/meta-data @@ -220,11 +218,13 @@ curl -s -H "Accept: application/json" "http://${MMDS_IPV4_ADDR}/${RESOURCE_POINT ``` Output: + ```json "ami-87654321" ``` Retrieving the `latest` resource in IMDS format: + ```bash MMDS_IPV4_ADDR=169.254.170.2 RESOURCE_POINTER=latest @@ -238,6 +238,7 @@ meta-data/ ``` Retrieving the `latest/meta-data/` resource in IMDS format: + ```bash MMDS_IPV4_ADDR=169.254.170.2 RESOURCE_POINTER=latest/meta-data @@ -252,6 +253,7 @@ reservation-id ``` Retrieving the `latest/meta-data/ami-id` resource in IMDS format: + ```bash MMDS_IPV4_ADDR=169.254.170.2 RESOURCE_POINTER=latest/meta-data/ami-id @@ -264,7 +266,7 @@ Output: ami-87654321 ``` -### Errors +## Errors *200* - `Ok` @@ -288,9 +290,9 @@ header was formed. The requested HTTP functionality is not supported by MMDS or the requested resource is not supported in IMDS format. -# Appendix +## Appendix -#### Example use case: credential rotation +### Example use case: credential rotation For this example, the guest expects to find some sort of credentials (say, a secret access key) by issuing a `GET` request to diff --git a/docs/network-performance.md b/docs/network-performance.md index 0a75f2c570e..74e928c8bf5 100644 --- a/docs/network-performance.md +++ b/docs/network-performance.md @@ -1,7 +1,8 @@ # Firecracker network performance numbers This document provides details about Firecracker network performance. -The numbers presented are dependent on the hardware (CPU, networking card, etc.), OS version and settings. +The numbers presented are dependent on the hardware (CPU, networking card, +etc.), OS version and settings. Scope of the measurements is to illustrate the limits for the emulation thread. ## TCP Throughput @@ -12,35 +13,48 @@ Ingress| 25Gbps | 23Gbps | 20Gbps | 18Gbps Egress | 25Gbps | 23Gbps | 20Gbps | 18Gbps Bidirectional | 18Gbps | 18Gbps | 18Gbps | 18Gbps -### Setup and test description - -Throughput measurements were done using [iperf3](https://iperf.fr/). The target is to fully saturate the emulation thread and keep it at 100% utilization. +**Setup and test description** +Throughput measurements were done using [iperf3](https://iperf.fr/). The target +is to fully saturate the emulation thread and keep it at 100% utilization. No adjustments were done to socket buffer, or any other network related kernel parameters. -To identify the limit of emulation thread, TCP throughput was measured between host and guest. An EC2 [M5d.metal](https://aws.amazon.com/ec2/instance-types/m5/) instance, running [Amazon Linux 2](https://aws.amazon.com/amazon-linux-ami/), was used as a host. +To identify the limit of emulation thread, TCP throughput was measured between +host and guest. An EC2 [M5d.metal](https://aws.amazon.com/ec2/instance-types/m5/) +instance, running [Amazon Linux 2](https://aws.amazon.com/amazon-linux-ami/), +was used as a host. -For ingress or egress throughput measurements, a Firecracker microVM running Kernel 4.14 with 4GB of Ram, 8 vCPUs and one network interface was used. -The measurements were taken using 6 iperf3 clients running on host and 6 iperf3 serves running on guest and vice versa. +For ingress or egress throughput measurements, a Firecracker microVM running +Kernel 4.14 with 4GB of Ram, 8 vCPUs and one network interface was used. +The measurements were taken using 6 iperf3 clients running on host and 6 iperf3 +serves running on guest and vice versa. -For bidirectional throughput measurements, a Firecracker microVM running Amazon Linux 2, Kernel 4.14 with 4GB of Ram, 12 vCPUs and one network interface was used. -The measurements were taken using 4 iperf3 clients and 4 iperf3 servers running on both host and guest. +For bidirectional throughput measurements, a Firecracker microVM running Amazon +Linux 2, Kernel 4.14 with 4GB of Ram, 12 vCPUs and one network interface was used. +The measurements were taken using 4 iperf3 clients and 4 iperf3 servers running +on both host and guest. ## Latency -The virtualization layer, Firecracker emulation thread plus host kernel stack, is responsible for adding on average 0.06ms of network latency. - -### Setup and test description +The virtualization layer, Firecracker emulation thread plus host kernel stack, +is responsible for adding on average 0.06ms of network latency. +**Setup and test description** Latency measurements were done using ping round trip times. -2 x EC2 M5d.metal instances running Amazon Linux 2 within the same [VPC](https://aws.amazon.com/vpc/) were used, with a security group configured so that it would allow traffic from instances using private IPs. A 10Mbps background traffic was running between instances. +2 x EC2 M5d.metal instances running Amazon Linux 2 within the same[VPC](https://aws.amazon.com/vpc/) +were used, with a security group configured so that it would allow traffic +from instances using private IPs. A 10Mbps background traffic was running +between instances. Round trip time between instances was measured. ```rtt min/avg/max/mdev = 0.101/0.198/0.237/0.044 ms``` -On one of the instances, a Firecracker microVM running Kernel 4.14, with 1 GB of RAM, 2 vCPUs, one network interface running was used. -Round trip between the microVM and the other instance was measured, while a 10Mbps background traffic was running. +On one of the instances, a Firecracker microVM running Kernel 4.14, with 1 GB +of RAM, 2 vCPUs, one network interface running was used. +Round trip between the microVM and the other instance was measured, while a +10Mbps background traffic was running. ```rtt min/avg/max/mdev = 0.191/0.321/0.519/0.058 ms``` -From the difference between those we can conclude that ~0.06ms are the virtualization overhead. +From the difference between those we can conclude that ~0.06ms are the +virtualization overhead. diff --git a/docs/network-setup.md b/docs/network-setup.md index f6443dedc17..cfb11d9ee0e 100644 --- a/docs/network-setup.md +++ b/docs/network-setup.md @@ -89,7 +89,7 @@ to resolve DNS names. In production, you'd want to use the right DNS server for your environment. For testing, you can add a public DNS server to `/etc/resolv.conf` by adding a line like this: -``` +```console nameserver 8.8.8.8 ``` diff --git a/docs/prod-host-setup.md b/docs/prod-host-setup.md index bc9ffef99b7..f6c26b6e893 100644 --- a/docs/prod-host-setup.md +++ b/docs/prod-host-setup.md @@ -14,34 +14,6 @@ option, and the recommended setting for production workloads. This can also be explicitly requested by supplying `--seccomp-level=2` to the Firecracker executable. -## Jailer Configuration - -Using Jailer in a production Firecracker deployment is highly recommended, -as it provides additional security boundaries for the microVM. -The Jailer process applies -[cgroup](https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt), -namespace isolation and drops privileges of the Firecracker process. - -To set up the jailer correctly, you'll need to: - -- Create a dedicated non-privileged POSIX user and group to run Firecracker - under. Use the created POSIX user and group IDs in Jailer's ``--uid `` - and ``--gid `` flags, respectively. This will run the Firecracker as - the created non-privileged user and group. All file system resources used for - Firecracker should be owned by this user and group. Apply least privilege to - the resource files owned by this user and group to prevent other accounts from - unauthorized file access. - When running multiple Firecracker instances it is recommended that each runs - with its unique `uid` and `gid` to provide an extra layer of security for - their individually owned resources in the unlikely case where any one of the - jails is broken out of. - -Additional details of Jailer features can be found in the -[Jailer documentation](jailer.md). - - -## Firecracker Configuration - ### 8250 Serial Device Firecracker implements the 8250 serial device, which is visible from the guest @@ -73,6 +45,31 @@ for consuming and storing this data safely. We suggest using any upper-bounded forms of storage, such as fixed-size or ring buffers, programs like `journald` or `logrotate`, or redirecting to a named pipe. +## Jailer Configuration + +Using Jailer in a production Firecracker deployment is highly recommended, +as it provides additional security boundaries for the microVM. +The Jailer process applies +[cgroup](https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt), +namespace isolation and drops privileges of the Firecracker process. + +To set up the jailer correctly, you'll need to: + +- Create a dedicated non-privileged POSIX user and group to run Firecracker + under. Use the created POSIX user and group IDs in Jailer's ``--uid `` + and ``--gid `` flags, respectively. This will run the Firecracker as + the created non-privileged user and group. All file system resources used for + Firecracker should be owned by this user and group. Apply least privilege to + the resource files owned by this user and group to prevent other accounts from + unauthorized file access. + When running multiple Firecracker instances it is recommended that each runs + with its unique `uid` and `gid` to provide an extra layer of security for + their individually owned resources in the unlikely case where any one of the + jails is broken out of. + +Additional details of Jailer features can be found in the +[Jailer documentation](jailer.md). + ## Host Security Configuration ### Mitigating Side-Channel Issues @@ -81,8 +78,8 @@ When deploying Firecracker microVMs to handle multi-tenant workloads, the following host environment configurations are strongly recommended to guard against side-channel security issues. -Some of the mitigations are platform specific. When applicable, this information -will be specified between brackets. +Some of the mitigations are platform specific. When applicable, this +information will be specified between brackets. #### Disable Simultaneous Multithreading (SMT) @@ -91,7 +88,7 @@ threads on the same physical core. SMT can be disabled by adding the following Kernel boot parameter to the host: -``` +```console nosmt=force ```` @@ -100,7 +97,8 @@ Verification can be done by running: ```bash (grep -q "^forceoff$" /sys/devices/system/cpu/smt/control && \ echo "Hyperthreading: DISABLED (OK)") || \ -(grep -q "^notsupported$\|^notimplemented$" /sys/devices/system/cpu/smt/control && \ +(grep -q "^notsupported$\|^notimplemented$" \ +/sys/devices/system/cpu/smt/control && \ echo "Hyperthreading: Not Supported (OK)") || \ echo "Hyperthreading: ENABLED (Recommendation: DISABLED)" ``` @@ -138,7 +136,7 @@ reveal what memory line was accessed by another process. KSM can be disabled by executing the following as root: -``` +```console echo "0" > /sys/kernel/mm/ksm/run ``` @@ -160,7 +158,8 @@ Indirect Branch Restricted Speculation (IBRS). Verification can be done by running: ```bash -(grep -Eq '^Mitigation: Full [[:alpha:]]+ retpoline, IBPB: conditional, IBRS_FW' \ +(grep -Eq '^Mitigation: Full [[:alpha:]]+ retpoline, \ +IBPB: conditional, IBRS_FW' \ /sys/devices/system/cpu/vulnerabilities/spectre_v2 && \ echo "retpoline, IBPB, IBRS: ENABLED (OK)") \ || echo "retpoline, IBPB, IBRS: DISABLED (Recommendation: ENABLED)" @@ -196,7 +195,7 @@ affected hardware. They can be enabled by adding the following Linux kernel boot parameter: -``` +```console l1tf=full,force ``` @@ -222,7 +221,7 @@ Speculative Store Bypass and SpectreNG. It can be enabled by adding the following Linux kernel boot parameter: -``` +```console spec_store_bypass_disable=seccomp ``` @@ -261,7 +260,9 @@ having guest memory contents on microVM storage devices. Verify that swap is disabled by running: ```bash -grep -q "/dev" /proc/swaps && echo "swap partitions present (Recommendation: no swap)" || echo "no swap partitions (OK)" +grep -q "/dev" /proc/swaps && \ +echo "swap partitions present (Recommendation: no swap)" \ +|| echo "no swap partitions (OK)" ``` ### Known kernel issues @@ -272,13 +273,13 @@ General recommendation: Keep the host and the guest kernels up to date. ##### Description -In a Linux KVM guest that has PV TLB enabled, a process in the guest kernel +In a Linux KVM guest that has PV TLB enabled, a process in the guest kernel may be able to read memory locations from another process in the same guest. ##### Impact -Under certain conditions the TLB will contain invalid entries. A malicious -attacker running on the guest can get access to the memory of other running +Under certain conditions the TLB will contain invalid entries. A malicious +attacker running on the guest can get access to the memory of other running process on that guest. ##### Vulnerable systems @@ -289,9 +290,9 @@ are present: - the host kernel >= 4.10. - the guest kernel >= 4.16. - the `KVM_FEATURE_PV_TLB_FLUSH` is set in the CPUID of the -guest. This is the `EAX` bit 9 in the `KVM_CPUID_FEATURES (0x40000001)` entry. + guest. This is the `EAX` bit 9 in the `KVM_CPUID_FEATURES (0x40000001)` entry. -This can be checked by running +This can be checked by running ```bash cpuid -r @@ -301,20 +302,20 @@ and by searching for the entry corresponding to the leaf `0x40000001`. Example output: -``` +```console 0x40000001 0x00: eax=0x200 ebx=0x00000000 ecx=0x00000000 edx=0x00000000 EAX 010004fb = 0010 0000 0000 -EAX Bit 9: KVM_FEATURE_PV_TLB_FLUSH = 1 +EAX Bit 9: KVM_FEATURE_PV_TLB_FLUSH = 1 ``` ##### Mitigation -The vulnerability is fixed by the following host kernel +The vulnerability is fixed by the following host kernel [patches](https://lkml.org/lkml/2020/1/30/482). The fix was integrated in the mainline kernel and in 4.19.103, 5.4.19, 5.5.3 stable kernel releases. Please follow [kernel.org](https://www.kernel.org/) and -once the fix is available in your stable release please update the host kernel. +once the fix is available in your stable release please update the host kernel. If you are not using a vanilla kernel, please check with Linux distro provider. #### [ARM only] Physical counter directly passed through to the guest diff --git a/docs/rootfs-and-kernel-setup.md b/docs/rootfs-and-kernel-setup.md index ab38d352db5..7a2679fbd95 100644 --- a/docs/rootfs-and-kernel-setup.md +++ b/docs/rootfs-and-kernel-setup.md @@ -11,6 +11,7 @@ make vmlinux Here's a quick step-by-step guide to building your own kernel that Firecracker can boot: + 1. Get the Linux source code: ```bash @@ -18,17 +19,17 @@ can boot: cd linux.git ``` -2. Check out the Linux version you want to build (e.g. we'll be using v4.20 +1. Check out the Linux version you want to build (e.g. we'll be using v4.20 here): ```bash git checkout v4.20 ``` -3. You will need to configure your Linux build. You can start from - [our recommended config](../resources/microvm-kernel-x86_64.config) - just - copy it to `.config` (under the Linux sources dir). You can make interactive - config adjustments using: +1. You will need to configure your Linux build. You can start from + [our recommended config](../resources/microvm-kernel-x86_64.config) by + copying it to `.config` (under the Linux sources dir). You can make + interactive config adjustments using: ```bash make menuconfig @@ -37,16 +38,15 @@ can boot: *Note*: there are many ways of building a kernel config file, other than `menuconfig`. You are free to use whichever one you choose. -4. Build the uncompressed kernel image: +1. Build the uncompressed kernel image: ```bash make vmlinux ``` -5. Upon a successful build, you can find the uncompressed kernel image under +1. Upon a successful build, you can find the uncompressed kernel image under `./vmlinux`. - ## Creating a rootfs Image A rootfs image is just a file system image, that hosts at least an init @@ -56,6 +56,7 @@ support for it will have to be compiled into the kernel, so it can be mounted at boot time. To build an EXT4 image: + 1. Prepare a properly-sized file. We'll use 50MiB here, but this depends on how much data you'll want to fit inside: @@ -63,7 +64,7 @@ To build an EXT4 image: dd if=/dev/zero of=rootfs.ext4 bs=1M count=50 ``` -2. Create an empty file system on the file you created: +1. Create an empty file system on the file you created: ```bash mkfs.ext4 rootfs.ext4 @@ -87,6 +88,7 @@ and many other features. For the sake of simplicity, let's set up an Alpine-based rootfs, with OpenRC as an init system. To that end, we'll use the official Docker image for Alpine Linux: + 1. First, let's start the Alpine container, bind-mounting the EXT4 image created earlier, to `/my-rootfs`: @@ -94,7 +96,7 @@ Alpine Linux: docker run -it --rm -v /tmp/my-rootfs:/my-rootfs alpine ``` -2. Then, inside the container, install the OpenRC init system, and some basic +1. Then, inside the container, install the OpenRC init system, and some basic tools: ```bash @@ -102,7 +104,7 @@ Alpine Linux: apk add util-linux ``` -3. And set up userspace init (still inside the container shell): +1. And set up userspace init (still inside the container shell): ```bash # Set up a login terminal on the serial console (ttyS0): @@ -123,7 +125,7 @@ Alpine Linux: exit ``` -4. Finally, unmount your rootfs image: +1. Finally, unmount your rootfs image: ```bash sudo umount /tmp/my-rootfs diff --git a/docs/snapshotting/network-for-clones.md b/docs/snapshotting/network-for-clones.md index c15285898bb..4156bfd34d4 100644 --- a/docs/snapshotting/network-for-clones.md +++ b/docs/snapshotting/network-for-clones.md @@ -126,14 +126,18 @@ which is unique on the host for each VM. In the demo, we use `clone 0`): ```bash -# for packets that leave the namespace and have the source IP address of the original guest, -# rewrite the source address to clone address 192.168.0.3) -sudo ip netns exec fc0 iptables -t nat -A POSTROUTING -o veth0 -s 192.168.241.2 -j SNAT --to 192.168.0.3 +# for packets that leave the namespace and have the source IP address of the +# original guest, rewrite the source address to clone address 192.168.0.3 +sudo ip netns exec fc0 iptables -t nat -A POSTROUTING -o veth0 \ +-s 192.168.241.2 -j SNAT --to 192.168.0.3 # do the reverse operation; rewrites the destination address of packets # heading towards the clone address to 192.168.241.2 -sudo ip netns exec fc0 iptables -t nat -A PREROUTING -i veth0 -d 192.168.0.3 -j DNAT —to 192.168.241.2 -sudo ip route add 192.168.0.3 via 10.0.0.2 # (adds a route on the host for the clone address) +sudo ip netns exec fc0 iptables -t nat -A PREROUTING -i veth0 \ +-d 192.168.0.3 -j DNAT —to 192.168.241.2 + +# (adds a route on the host for the clone address) +sudo ip route add 192.168.0.3 via 10.0.0.2 ``` **Full connectivity to/from the clone should be present at this point.** diff --git a/docs/snapshotting/random-for-clones.md b/docs/snapshotting/random-for-clones.md index 396c6987e01..128b3dc4a8b 100644 --- a/docs/snapshotting/random-for-clones.md +++ b/docs/snapshotting/random-for-clones.md @@ -1,3 +1,5 @@ +# Entropy for Clones + This document provides a high level perspective on the implications of restoring multiple VM clones from a single snapshot. We start with an overview of the Linux random number generation (RNG) @@ -118,10 +120,10 @@ the read result via bind mounting another file on top of customer code continues its run in the clone): 1. Open one of the special devices files (either `/dev/random` or `/dev/urandom`). - 2. Issue an `RNDCLEARPOOL ioctl` (requires `CAP_SYS_ADMIN`). This + 1. Issue an `RNDCLEARPOOL ioctl` (requires `CAP_SYS_ADMIN`). This clears and sets the entropy pools to the initial state. Should also cause the reinitialization of the `/dev/urandom` `CSPRNG`. - 3. Issue an `RNDADDENTROPY ioctl` (requires `CAP_SYS_ADMIN`) to mix + 1. Issue an `RNDADDENTROPY ioctl` (requires `CAP_SYS_ADMIN`) to mix the provided bytes into the input entropy pool and increase the entropy count. This should also cause the `/dev/urandom` `CSPRNG` to be reseeded. The bytes can be generated locally in the guest, @@ -213,20 +215,7 @@ int main(int argc, char ** argv) { } ``` -## References - -[1] http://man7.org/linux/man-pages/man7/random.7.html (please note the - online man pages are specific to a particular version, mentioned at - the end; also, for some reason, this man page is not available by - default on all distros) - -[2] https://www.2uo.de/myths-about-urandom - -[3] https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Studies/LinuxRNG/LinuxRNG_EN.pdf - -[4] http://man7.org/linux/man-pages/man4/random.4.html - -[1]: http://man7.org/linux/man-pages/man7/random.7.html +[1]: "Lala" [2]: https://www.2uo.de/myths-about-urandom [3]: https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Studies/LinuxRNG/LinuxRNG_EN.pdf [4]: http://man7.org/linux/man-pages/man4/random.4.html diff --git a/docs/snapshotting/snapshot-support.md b/docs/snapshotting/snapshot-support.md index 72a3fbb4be3..b6793b9ccfc 100644 --- a/docs/snapshotting/snapshot-support.md +++ b/docs/snapshotting/snapshot-support.md @@ -2,31 +2,31 @@ ## Table of Contents -- [What is microVM snapshotting?](#what-is-microvm-snapshotting) +- [What is microVM snapshotting?](#about-microvm-snapshotting) - [Snapshotting in Firecracker](#snapshotting-in-firecracker) - - [Supported platforms](#supported-platforms) - - [Overview](#overview) - - [Snapshot files management](#snapshot-files-management) - - [Performance](#performance) - - [Known issues and limitations](#known-issues-and-limitations) + - [Supported platforms](#supported-platforms) + - [Overview](#overview) + - [Snapshot files management](#snapshot-files-management) + - [Performance](#performance) + - [Known issues and limitations](#known-issues-and-limitations) - [Firecracker Snapshotting characteristics](#firecracker-snapshotting-characteristics) - [Snapshot versioning](#snapshot-versioning) - [Snapshot API](#snapshot-api) - - [Pausing the microVM](#pausing-the-microvm) - - [Creating snapshots](#creating-snapshots) - - [Creating full snapshots](#creating-full-snapshots) - - [Creating diff snapshots](#creating-diff-snapshots) - - [Resuming the microVM](#resuming-the-microvm) - - [Loading snapshots](#loading-snapshots) + - [Pausing the microVM](#pausing-the-microvm) + - [Creating snapshots](#creating-snapshots) + - [Creating full snapshots](#creating-full-snapshots) + - [Creating diff snapshots](#creating-diff-snapshots) + - [Resuming the microVM](#resuming-the-microvm) + - [Loading snapshots](#loading-snapshots) - [Provisioning host disk space for snapshots](#provisioning-host-disk-space-for-snapshots) - [Ensure continued network connectivity for clones](#ensure-continued-network-connectivity-for-clones) - [Snapshot security and uniqueness](#snapshot-security-and-uniqueness) - - [Secure and insecure usage examples](#usage-examples) - - [Reusing snapshotted states securely](#reusing-snapshotted-states-securely) + - [Secure and insecure usage examples](#usage-examples) + - [Reusing snapshotted states securely](#reusing-snapshotted-states-securely) - [Known Issues](#known-issues) - - [Vsock device limitations](#vsock-device-limitations) + - [Vsock device limitations](#vsock-device-limitations) -## What is microVM snapshotting? +## About microVM snapshotting MicroVM snapshotting is a mechanism through which a running microVM and its resources can be serialized and saved to an external medium in the form of a @@ -37,10 +37,11 @@ guest workload at that particular point in time. ### Supported platforms -The Firecracker snapshot feature is in [developer preview](../RELEASE_POLICY.md) +The Firecracker snapshot feature is in [developer preview](../RELEASE_POLICY.md) on all CPU micro-architectures listed in [README](../README.md#supported-platforms). ### Overview + A Firecracker microVM snapshot can be used for loading it later in a different Firecracker process, and the original guest workload is being simply resumed. @@ -55,16 +56,18 @@ the process. In order to make restoring possible, Firecracker snapshots save the full state of the following resources: + - the guest memory, - the emulated HW state (both KVM and Firecracker emulated HW). The state of the components listed above is generated independently, which brings flexibility to our snapshotting support. This means that taking a snapshot results in multiple files that are composing the full microVM snapshot: + - the guest memory file, - the microVM state file, -- zero or more disk files (depending on how many the guest had; these are -**managed by the users**). +- zero or more disk files (depending on how many the guest had; these are + **managed by the users**). The design allows sharing of memory pages and read only disks between multiple microVMs. When loading a snapshot, instead of loading at resume time the full @@ -78,43 +81,44 @@ resumed microVM. ### Snapshot files management -The Firecracker snapshot design offers a very simple interface to interact with +The Firecracker snapshot design offers a very simple interface to interact with snapshots but provides no functionality to package or manage them on the host. -Using snapshots in production is currently not recommended as there are open +Using snapshots in production is currently not recommended as there are open [Known issues and limitations](#known-issues-and-limitations). The [threat containment model](../design.md#threat-containment) model states -that the host, host/API communication and snapshot files are trusted by Firecracker. +that the host, host/API communication and snapshot files are trusted by Firecracker. -To ensure a secure integration with the snapshot functionality, users need to secure -snapshot files by implementing authentication and encryption schemes while managing their -lifecycle or moving them across the trust boundary, like for example when provisioning -them from a respository to a host over the network. +To ensure a secure integration with the snapshot functionality, users need to secure +snapshot files by implementing authentication and encryption schemes while +managing their lifecycle or moving them across the trust boundary, like for +example when provisioning them from a respository to a host over the network. -Firecracker is optimized for fast load/resume and it's designed to do some very basic -sanity checks only on the vm state file. It only verifies integrity using a 64 bit CRC -value embedded in the vm state file, but this is only as a partial measure to protect -against accidental corruption, as the disk files and memory file need to be secured as -well. +Firecracker is optimized for fast load/resume and it's designed to do some very basic +sanity checks only on the vm state file. It only verifies integrity using a 64 +bit CRC value embedded in the vm state file, but this is only as a partial +measure to protect against accidental corruption, as the disk files and memory +file need to be secured as well. ### Performance The Firecracker snapshot create/resume performance depends on the memory size, -vCPU count and emulated devices count. The Firecracker CI runs snapshots tests +vCPU count and emulated devices count. The Firecracker CI runs snapshots tests on AWS **m5d.metal** instances for Intel and on AWS **m6g.metal** for ARM. -The baseline for snapshot resume latency target on Intel is under **8ms** with 5ms p90, -and on ARM is under **3ms** for a microvm with this specs: 2vCPU/512MB/1 block/1 net device. +The baseline for snapshot resume latency target on Intel is under **8ms** with +5ms p90, and on ARM is under **3ms** for a microVM with the following specs: +2vCPU/512MB/1 block/1 net device. ### Known issues and limitations -- High snapshot latency on 5.4+ host kernels - -[#2129](https://github.com/firecracker-microvm/firecracker/issues/2129) +- High snapshot latency on 5.4+ host kernels - [#2129](https://github.com/firecracker-microvm/firecracker/issues/2129) - Guest network connectivity is not guaranteed to be preserved after resume. -For recommendations related to guest network connectivity for clones please -see [Network connectivity for clones](network-for-clones.md). -- Vsock device does not have full snapshotting support. Please see [Vsock device limitations](#vsock-device-limitations) -- Poor entropy and replayable randomness when resuming multiple microvms which -deal with cryptographic secrets. Please see [Snapshot security and uniqueness](#snapshot-security-and-uniqueness) + For recommendations related to guest network connectivity for clones please + see [Network connectivity for clones](network-for-clones.md). +- Vsock device does not have full snapshotting support. + Please see [Vsock device limitations](#vsock-device-limitations). +- Poor entropy and replayable randomness when resuming multiple microvms which + deal with cryptographic secrets. Please see [Snapshot security and uniqueness](#snapshot-security-and-uniqueness). ## Firecracker Snapshotting characteristics @@ -130,15 +134,18 @@ deal with cryptographic secrets. Please see [Snapshot security and uniqueness](# creation. The disk contents are _not_ explicitly flushed to their backing files. - The API calls exposing the snapshotting functionality have clear **Prerequisites** that describe the requirements on when/how they should be used. - + ## Snapshot versioning Firecracker snapshotting implementation offers support for microVM versioning (`cross-version snapshots`) in the following contexts: -- saving snapshots at older versions (being able to create a snapshot with any version -in the `[N, N + o]` interval, while being in Firecracker version `N+o`), -- loading snapshots from older versions (being able to load a snapshot created by any -Firecracker version in the `[N, N + o]` interval, in a Firecracker version `N+o`). + +- saving snapshots at older versions (being able to create a snapshot with any + version in the `[N, N + o]` interval, while being in Firecracker + version `N+o`), +- loading snapshots from older versions (being able to load a snapshot created + by any Firecracker version in the `[N, N + o]` interval, in a Firecracker + version `N+o`). The design supports an unlimited number of versions, the value of `o` (maximum number of older versions that we can restore from / save a snapshot to, from the current @@ -169,20 +176,22 @@ curl --unix-socket /tmp/firecracker.socket -i \ Successive calls of this request keep the microVM in the `Paused` state. **Effects**: + - _on success_: microVM is guaranteed to be `Paused`. - _on failure_: no side-effects. ### Creating snapshots -Now that the microVM is paused, you can create a snapshot, which can be either a `full` -one or a `diff` one. Full snapshots always create a complete, resume-able snapshot of -the current microVM state and memory. Diff snapshots save the current microVM state -and the memory dirtied since the last snapshot (full or diff). Diff snapshots are not -resume-able, but can be merged into a full snapshot. In this context, we will refer to -the base as the first memory file created by a `/snapshot/create` API call and the -layer as a memory file created by a subsequent `/snapshot/create` API call. The -order in which the snapshots were created matters and they should be merged in the -same order in which they were created. To merge a `diff` snapshot memory file on +Now that the microVM is paused, you can create a snapshot, which can be either +a `full`one or a `diff` one. Full snapshots always create a complete, +resume-able snapshot of the current microVM state and memory. Diff snapshots +save the current microVM state and the memory dirtied since the last snapshot +(full or diff). Diff snapshots are not resume-able, but can be merged into a +full snapshot. In this context, we will refer to the base as the first memory +file created by a `/snapshot/create` API call and the layer as a memory file +created by a subsequent `/snapshot/create` API call. The order in which the +snapshots were created matters and they should be merged in the same order +in which they were created. To merge a `diff` snapshot memory file on top of a base, users should copy its content over the base, as the following example does: @@ -220,31 +229,36 @@ curl --unix-socket /tmp/firecracker.socket -i \ Details about the required and optional fields can be found in the [swagger definition](../../src/api_server/swagger/firecracker.yaml). -*Note*: If the files indicated by `snapshot_path` and `mem_file_path` don't exist at - the specified paths, then they will be created right before generating the - snapshot. + +*Note*: If the files indicated by `snapshot_path` and `mem_file_path` don't +exist at the specified paths, then they will be created right before generating +the snapshot. **Prerequisites**: The microVM is `Paused`. + **Effects**: + - _on success_: - - The file indicated by `snapshot_path` (e.g. `/path/to/snapshot_file`) contains the - devices' model state and emulation state. The one indicated by `mem_file_path` - (e.g. `/path/to/mem_file`) contains a full copy of the guest memory. + - The file indicated by `snapshot_path` (e.g. `/path/to/snapshot_file`) + contains the devices' model state and emulation state. The one indicated + by `mem_file_path`(e.g. `/path/to/mem_file`) contains a full copy of the + guest memory. - The generated snapshot files are immediately available to be used (current process releases ownership). At this point, the block devices backing files should be backed up externally by the user. Please note that block device contents are only guaranteed to be committed/flushed to the host FS, but not necessarily to the underlying persistent storage (could still live in host FS cache). - - If diff snapshots were enabled, the snapshot creation resets then the dirtied page - bitmap and marks all pages clean (from a diff snapshot point of view). - -If a `version` is specified, the new snapshot is saved at that version, otherwise -it will be saved at the same version of the running Firecracker. The version is only -used for the microVM state file as it contains internal state structures for device -emulation, vCPUs and others that can change their format from a Firecracker version -to another. Versioning is not required for the block and memory files. The separate -block device file components of the snapshot have to be handled by the user. + - If diff snapshots were enabled, the snapshot creation resets then the + dirtied page bitmap and marks all pages clean (from a diff snapshot point + of view). + - If a `version` is specified, the new snapshot is saved at that version, + otherwise it will be saved at the same version of the running Firecracker. + The version is only used for the microVM state file as it contains internal + state structures for device emulation, vCPUs and others that can change + their format from a Firecracker version to another. Versioning is not + required for the block and memory files. The separate block device file + components of the snapshot have to be handled by the user. - _on failure_: no side-effects. @@ -252,6 +266,7 @@ block device file components of the snapshot have to be handled by the user. For creating a diff snapshot, you should use the same API command, but with `snapshot_type` field set to `Diff`. + *Note*: If not specified, `snapshot_type` is by default `Full`. ```bash @@ -268,18 +283,21 @@ curl --unix-socket /tmp/firecracker.socket -i \ ``` **Prerequisites**: The microVM is `Paused`. - On a fresh microVM, `track_dirty_pages` field should be set to `true`, - when configuring the `/machine-config` resource, while on a snapshot - loaded microVM, `enable_diff_snapshots` from `PUT /snapshot/load` - request body, should be set. + +*Note*: On a fresh microVM, `track_dirty_pages` field should be set to `true`, +when configuring the `/machine-config` resource, while on a snapshot loaded +microVM, `enable_diff_snapshots` from `PUT /snapshot/load`request body, +should be set. **Effects**: + - _on success_: - The file indicated by `snapshot_path` contains the devices' model state and emulation state, same as when creating a full snapshot. The one indicated by - `mem_file_path` contains this time a **diff copy** of the guest memory - the diff - consists of the memory pages which have been dirtied since the last snapshot creation - or since the creation of the microVM, whichever of these events was the most recent. + `mem_file_path` contains this time a **diff copy** of the guest memory; the + diff consists of the memory pages which have been dirtied since the last + snapshot creation or since the creation of the microVM, whichever of these + events was the most recent. - All the other effects mentioned in the **Effects** paragraph from **Creating full snapshots** section apply here. - _on failure_: no side-effects. @@ -300,13 +318,14 @@ curl --unix-socket /tmp/firecracker.socket -i \ ``` Enabling this support enables KVM dirty page tracking, so it comes at a cost -(which consists of CPU cycles spent by KVM accounting for dirtied pages); it should only -be used when needed. +(which consists of CPU cycles spent by KVM accounting for dirtied pages); it +should only be used when needed. Creating a snapshot will **not** influence state, will **not** stop or end the microVM, -it can be used as before, so the microVM can be resumed if you still want to use it. -At this point, in case you plan to continue using the current microVM, you should make -sure to also copy the disk backing files. +it can be used as before, so the microVM can be resumed if you still want to +use it. +At this point, in case you plan to continue using the current microVM, you +should make sure to also copy the disk backing files. ### Resuming the microVM @@ -326,6 +345,7 @@ curl --unix-socket /tmp/firecracker.socket -i \ Successive calls of this request are ignored (microVM remains in the running state). **Effects**: + - _on success_: microVM is guaranteed to be `Resumed`. - _on failure_: no side-effects. @@ -351,72 +371,79 @@ curl --unix-socket /tmp/firecracker.socket -i \ Details about the required and optional fields can be found in the [swagger definition](../../src/api_server/swagger/firecracker.yaml). -**Prerequisites**: A full memory snapshot and a microVM state file **must** be provided. - The disk backing files, network interfaces backing TAPs and/or vsock - backing socket that were used for the original microVM's configuration - should be set up and accessible to the new Firecracker process (in - which the microVM is resumed). These host-resources need to be - accessible at the same relative paths to the new Firecracker process - as they were to the original one. +**Prerequisites**: A full memory snapshot and a microVM state file **must** be +provided. The disk backing files, network interfaces backing TAPs and/or vsock +backing socket that were used for the original microVM's configuration +should be set up and accessible to the new Firecracker process (in +which the microVM is resumed). These host-resources need to be +accessible at the same relative paths to the new Firecracker process +as they were to the original one. + **Effects:** + - _on success_: - The complete microVM state is loaded from snapshot into the current Firecracker process. - - It then resets the dirtied page bitmap and marks all pages clean (from a diff - snapshot point of view). - - The loaded microVM is now in the `Paused` state, so it needs to be resumed for it - to run. - - The memory file pointed by `mem_file_path` **must** be considered immutable from - Firecracker and host point of view. It backs the guest OS memory for read access - through the page cache. External modification to this file corrupts the guest - memory and leads to undefined behavior. - - The file indicated by `snapshot_path`, that is used to load from, is released and no - longer used by this process. - - If `enable_diff_snapshots` is set, then diff snapshots can be taken afterwards. - - If `resume_vm` is set, the vm is automatically resumed if load is successful. + - It then resets the dirtied page bitmap and marks all pages clean (from a + diff snapshot point of view). + - The loaded microVM is now in the `Paused` state, so it needs to be resumed + for it to run. + - The memory file pointed by `mem_file_path` **must** be considered immutable + from Firecracker and host point of view. It backs the guest OS memory for + read access through the page cache. External modification to this file + corrupts the guest memory and leads to undefined behavior. + - The file indicated by `snapshot_path`, that is used to load from, is + released and no longer used by this process. + - If `enable_diff_snapshots` is set, then diff snapshots can be taken + afterwards. + - If `resume_vm` is set, the vm is automatically resumed if load is + successful. - _on failure_: A specific error is reported and then the current Firecracker process is ended (as it might be in an invalid state). *Notes*: -Please, keep in mind that only by setting to true `enable_diff_snapshots`, when loading a -snapshot, or `track_dirty_pages`, when configuring the machine on a fresh microVM, you can -then create a `diff` snapshot. Also, `track_dirty_pages` is not saved when creating a -snapshot, so you need to explicitly set `enable_diff_snapshots` when sending `LoadSnapshot` -command if you want to be able to do diff snapshots from a loaded microVM. -Another thing that you should be aware of is the following: if a fresh microVM can create -diff snapshots, then if you create a **full** snapshot, the memory file contains -the whole guest memory, while if you create a **diff** one, that file is sparse and only -contains the guest dirtied pages. +Please, keep in mind that only by setting to true `enable_diff_snapshots`, when +loading a snapshot, or `track_dirty_pages`, when configuring the machine on a +fresh microVM, you can then create a `diff` snapshot. Also, `track_dirty_pages` +is not saved when creating a snapshot, so you need to explicitly set +`enable_diff_snapshots` when sending `LoadSnapshot`command if you want to be +able to do diff snapshots from a loaded microVM. +Another thing that you should be aware of is the following: if a fresh microVM +can create diff snapshots, then if you create a **full** snapshot, the memory +file contains the whole guest memory, while if you create a **diff** one, that +file is sparse and only contains the guest dirtied pages. With these in mind, some possible snapshotting scenarios are the following: -- `Boot from a fresh microVM` -> `Pause` -> `Create snapshot` -> `Resume` -> `Pause` -> - `Create snapshot` -> ... ; -- `Boot from a fresh microVM` -> `Pause` -> `Create snapshot` -> `Resume` -> `Pause` -> - `Resume` -> ... -> `Pause` -> `Create snapshot` -> ... ; -- `Load snapshot` -> `Resume` -> `Pause` -> `Create snapshot` -> `Resume` -> `Pause` -> - `Create snapshot` -> ... ; -- `Load snapshot` -> `Resume` -> `Pause` -> `Create snapshot` -> `Resume` -> `Pause` -> - `Resume` -> ... -> `Pause` -> `Create snapshot` -> ... ; - where `Create snapshot` can refer to either a full or a diff snapshot for all the - aforementioned flows. - -It is also worth knowing, a microVM that is restored from snapshot will be resumed with -the guest OS wall-clock continuing from the moment of the snapshot creation. For this -reason, the wall-clock should be updated to the current time, on the guest-side. -More details on how you could do this can be found at a -[related FAQ](../../FAQ.md#my-guest-wall-clock-is-drifting-how-can-i-fix-it). + +- `Boot from a fresh microVM` -> `Pause` -> `Create snapshot` -> `Resume` -> + `Pause` -> `Create snapshot` -> ... ; +- `Boot from a fresh microVM` -> `Pause` -> `Create snapshot` -> `Resume` -> + `Pause` -> `Resume` -> ... -> `Pause` -> `Create snapshot` -> ... ; +- `Load snapshot` -> `Resume` -> `Pause` -> `Create snapshot` -> `Resume` -> + `Pause` -> `Create snapshot` -> ... ; +- `Load snapshot` -> `Resume` -> `Pause` -> `Create snapshot` -> `Resume` -> + `Pause` -> `Resume` -> ... -> `Pause` -> `Create snapshot` -> ... ; + where `Create snapshot` can refer to either a full or a diff snapshot for + all the aforementioned flows. + +It is also worth knowing, a microVM that is restored from snapshot will be +resumed with the guest OS wall-clock continuing from the moment of the +snapshot creation. For this reason, the wall-clock should be updated to the +current time, on the guest-side. More details on how you could do this can +be found at a [related FAQ](../../FAQ.md#my-guest-wall-clock-is-drifting-how-can-i-fix-it). ## Provisioning host disk space for snapshots -Depending on VM memory size, snapshots can consume a lot of disk space. Firecracker +Depending on VM memory size, snapshots can consume a lot of disk space. Firecracker integrators **must** ensure that the provisioned disk space is sufficient for normal operation of their service as well as during failure scenarios. If the service exposes -the snapshot triggers to customers, integrators **must** enforce proper disk quotas to -avoid any DoS threats that would cause the service to fail or function abnormally. +the snapshot triggers to customers, integrators **must** enforce proper disk +quotas to avoid any DoS threats that would cause the service to fail or +function abnormally. ## Ensure continued network connectivity for clones -For recomandations related to continued network connectivity for multiple clones created from -a single Firecracker microVM snapshot please see [this doc](network-for-clones.md). +For recomandations related to continued network connectivity for multiple +clones created from a single Firecracker microVM snapshot please see [this doc](network-for-clones.md). ## Snapshot security and uniqueness @@ -434,7 +461,7 @@ For more information please see [this doc](random-for-clones.md) #### Example 1: secure usage (currently in dev preview) -``` +```console Boot microVM A -> ... -> Create snapshot S -> Terminate -> Load S in microVM B -> Resume -> ... ``` @@ -447,7 +474,7 @@ secure. #### Example 2: potentially insecure usage -``` +```console Boot microVM A -> ... -> Create snapshot S -> Resume -> ... -> Load S in microVM B -> Resume -> ... ``` @@ -459,7 +486,8 @@ before microVM B resumes execution from snapshot S or not. In this example, we consider both microVMs insecure as soon as microVM A resumes execution. #### Example 3: potentially insecure usage -``` + +```console Boot microVM A -> ... -> Create snapshot S -> ... -> Load S in microVM B -> Resume -> ... -> Load S in microVM C -> Resume -> ... @@ -500,7 +528,9 @@ Firecracker. to vsock packet loss, leads to Vsock device breakage when doing a snapshot while there are active Vsock connections. - _**Workaround**_: Close all active Vsock connections prior to snapshotting the VM. + _**Workaround**_: Close all active Vsock connections prior to snapshotting + the VM. -2. _Incremental/diff_ snapshots are not yet supported for Vsock devices. Creating a -`diff` snapshot on a microVM with a `vsock` device configured is not allowed. \ No newline at end of file +1. _Incremental/diff_ snapshots are not yet supported for Vsock devices. + Creating a `diff` snapshot on a microVM with a `vsock` device configured + is not allowed. \ No newline at end of file diff --git a/docs/snapshotting/versioning.md b/docs/snapshotting/versioning.md index 8d685152439..f23739747b2 100644 --- a/docs/snapshotting/versioning.md +++ b/docs/snapshotting/versioning.md @@ -1,44 +1,68 @@ # Firecracker snapshot versioning -This document describes how Firecracker persists its state across multiple versions, diving deep into the snapshot format, encoding, compatibility and limitations. +This document describes how Firecracker persists its state across multiple +versions, diving deep into the snapshot format, encoding, compatibility and +limitations. ## Introduction -The design behind the snapshot implementation enables version tolerant save and restore across multiple Firecracker versions which we call a version space. For example, one can pause a microVM, save it to disk with Firecracker version **0.23.0** and later load it in Firecracker version **0.24.0**. It also works in reverse: Firecracker version **0.23.0** loads what **0.24.0** saves. -Below is an example graph showing backward and forward snapshot compatibility. This is the general picture, but keep in mind that when adding new features some version translations would not be possible. +The design behind the snapshot implementation enables version tolerant save +and restore across multiple Firecracker versions which we call a version space. +For example, one can pause a microVM, save it to disk with Firecracker version +**0.23.0** and later load it in Firecracker version **0.24.0**. It also works +in reverse: Firecracker version **0.23.0** loads what **0.24.0** saves. + +Below is an example graph showing backward and forward snapshot compatibility. +This is the general picture, but keep in mind that when adding new features +some version translations would not be possible. ![Version graph]( ../images/version_graph.png?raw=true "Version graph") A non-exhaustive list of how cross-version snapshot support can be used: -Example scenario #1 - load snapshot from older version: -* Start Firecracker v0.23 → Boot microVM → *Workload starts* → Pause → CreateSnapshot(snap) → kill microVM +Example scenario #1 - load snapshot from older version: + +* Start Firecracker v0.23 → Boot microVM → *Workload starts* → Pause → + CreateSnapshot(snap) → kill microVM * Start Firecracker v0.24 → LoadSnapshot → Resume → *Workload continues* -Example scenario #2 - load snapshot in older version: -* Start Firecracker v0.24 → Boot microVM → *Workload starts* → Pause → CreateSnapshot(snap, “0.23”) → kill microVM +Example scenario #2 - load snapshot in older version: + +* Start Firecracker v0.24 → Boot microVM → *Workload starts* → Pause → + CreateSnapshot(snap, “0.23”) → kill microVM * Start Firecracker v0.23 → LoadSnapshot(snap) → Resume → *Workload continues* Example scenario #3 - load snapshot in older version: -* Start Firecracker v0.24 → LoadSnapshot(older_snap) → Resume → *Workload continues* → Pause → CreateSnapshot(snap, “0.23”) → kill microVM -* Start Firecracker v0.23 → LoadSnapshot(snap) → Resume → *Workload continues* +* Start Firecracker v0.24 → LoadSnapshot(older_snap) → Resume → + *Workload continues* → Pause → CreateSnapshot(snap, “0.23”) → kill microVM +* Start Firecracker v0.23 → LoadSnapshot(snap) → Resume → *Workload continues* ## Overview + Firecracker persists the microVM state as 2 separate objects: - - a **guest memory** file - - a **microVM state** file. - -*The block devices attached to the microVM are not considered part of the state and need to be managed separately.* + +* a **guest memory** file +* a **microVM state** file. + +*The block devices attached to the microVM are not considered part of the +state and need to be managed separately.* ### Guest memory + The guest memory file contains the microVM memory saved as a dump of all pages. ### MicroVM state -In the VM state file, Firecracker stores the internal state of the VMM (device emulation, KVM and vCPUs) with 2 exceptions - serial emulation and vsock backend. -While we continuously improve and extend Firecracker's features by adding new capabilities, devices or enhancements, the microVM state file may change both structurally and semantically with each new release. The state file includes versioning information and each Firecracker release implements distinct save/restore logic for the supported version space. +In the VM state file, Firecracker stores the internal state of the VMM (device +emulation, KVM and vCPUs) with 2 exceptions - serial emulation and vsock backend. + +While we continuously improve and extend Firecracker's features by adding new +capabilities, devices or enhancements, the microVM state file may change both +structurally and semantically with each new release. The state file includes +versioning information and each Firecracker release implements distinct +save/restore logic for the supported version space. ## MicroVM state file format @@ -51,21 +75,38 @@ A microVM state file is further split into four different fields: | state | N | Bincode blob containing the microVM state. | crc| 64 | Optional CRC64 sum of magic_id, version and state fields. -Note: the last 16 bits of `magic_id` encode the storage version which specifies the encoding used for the `version` and `state` fields. The current implementation sets this field to 1, which identifies it as a [Serde bincode](https://github.com/servo/bincode) compatible encoder/decoder. +**Note**: the last 16 bits of `magic_id` encode the storage version which specifies +the encoding used for the `version` and `state` fields. The current +implementation sets this field to 1, which identifies it as a [Serde bincode](https://github.com/servo/bincode) +compatible encoder/decoder. ### Version tolerant ser/de -Firecracker reads and writes the `state` blob of the snapshot by using per version, separate serialization and deserialization logic. This logic is mostly autogenerated by a Rust procedural macro based on `struct` and `enum` annotations. Basically, one can say that these structures support versioning. The versioning logic is generated by parsing a structure's history log (encoded using Rust annotations) and emitting Rust code. + +Firecracker reads and writes the `state` blob of the snapshot by using per +version, separate serialization and deserialization logic. This logic is mostly +autogenerated by a Rust procedural macro based on `struct` and `enum` +annotations. Basically, one can say that these structures support versioning. +The versioning logic is generated by parsing a structure's history log (encoded +using Rust annotations) and emitting Rust code. Versioned serialization and deserialization is divided into two translation layers: - - field translator, - - semantic translator. -The _field translator_ implements the logic to convert between different versions of the same Rust POD structure: it can deserialize or serialize from source version to target. -The translation is done field by field - the common fields are copied from source to target, and the fields that are unique to the target are (de)serialized with their default values. +* field translator, +* semantic translator. -The _semantic translator_ is only concerned with translating the semantics of the serialized/deserialized fields. +The _field translator_ implements the logic to convert between different +versions of the same Rust POD structure: it can deserialize or serialize from +source version to target. +The translation is done field by field - the common fields are copied from +source to target, and the fields that are unique to the target are +(de)serialized with their default values. -The _field translator_ is generated automatically through a procedural macro, and the _semantic translation methods_ have to be annotated in the structure by the user. +The _semantic translator_ is only concerned with translating the semantics of +the serialized/deserialized fields. + +The _field translator_ is generated automatically through a procedural macro, +and the _semantic translation methods_ have to be annotated in the structure +by the user. This block diagram illustrates the concept: @@ -75,8 +116,12 @@ This block diagram illustrates the concept: ## VM state encoding -During research and prototyping we considered multiple storage formats. The criteria used for comparing these are: performance, size, rust support, specification, versioning support, community and tooling. Performance, size and Rust support are hard requirements while all others can be the subject of trade offs. -More info about this comparison can be found here (https://github.com/firecracker-microvm/firecracker/blob/9d427b33d989c3225d874210f6c2849465941dc0/docs/snapshotting/design.md). +During research and prototyping we considered multiple storage formats. The +criteria used for comparing these are: performance, size, rust support, +specification, versioning support, community and tooling. Performance, size +and Rust support are hard requirements while all others can be the subject +of trade offs. +More info about this comparison can be found [here](https://github.com/firecracker-microvm/firecracker/blob/9d427b33d989c3225d874210f6c2849465941dc0/docs/snapshotting/design.md#snapshot-format). Key benefits of using *bincode*: @@ -84,47 +129,88 @@ Key benefits of using *bincode*: * Minimal CPU overhead * Simple implementation -The current implementation relies on the Serde bincode encoder (https://github.com/servo/bincode). +The current implementation relies on the [Serde bincode encoder](https://github.com/servo/bincode). -Versionize is compatible to Serde with bincode backend: structures serialized with versionize at a specific version can be deserialized with Serde. Also structures serialized with serde can be deserialized with versionize. +Versionize is compatible to Serde with bincode backend: structures serialized +with versionize at a specific version can be deserialized with Serde. Also +structures serialized with serde can be deserialized with versionize. ## Snapshot compatibility -### Host kernel - -The minimum kernel version required by Firecracker snapshots is 4.14. Snapshots can be saved and restored on the same kernel version without any issues. There might be issues when restoring snapshots created on different host kernel version even when using the same Firecracker version. -SnapshotCreate and SnapshotLoad operations across different host kernels is considerted unstable in Firecracker as the saved KVM state might have different semantics on different kernels. +### Host kernel +The minimum kernel version required by Firecracker snapshots is 4.14. Snapshots +can be saved and restored on the same kernel version without any issues. There +might be issues when restoring snapshots created on different host kernel +version even when using the same Firecracker version. +SnapshotCreate and SnapshotLoad operations across different host kernels is +considered unstable in Firecracker as the saved KVM state might have different +semantics on different kernels. ### Device model -The current Firecracker devices are backwards compatible up to the version that introduces them. Ideally this property would be kept over time, but there are situations when a new version of a device exposes new features to the guest that do not exist in an older version. In such cases restoring a snapshot at an older version becomes impossible without breaking the guest workload. -The microVM state file links some resources that are external to the snapshot: -- tap devices by device name, -- block devices by block file path, -- vsock backing Unix domain socket by socket name. +The current Firecracker devices are backwards compatible up to the version that +introduces them. Ideally this property would be kept over time, but there are +situations when a new version of a device exposes new features to the guest +that do not exist in an older version. In such cases restoring a snapshot at +an older version becomes impossible without breaking the guest workload. -To successfully restore a microVM one should check that: -- tap devices are available, their names match their original names since these are the values saved in the microVM state file, and they are accessible to the Firecracker process where the microVM is being restored, -- block devices are set up at their original relative or absolute paths with the proper permissions, as the Firecracker process with the restored microVM will attempt to access them exactly as they were accessed in the original Firecracker process, -- the vsock backing Unix domain socket is available, its name matches the original name, and it is accessible to the -new Firecracker process. +The microVM state file links some resources that are external to the snapshot: -### CPU model +* tap devices by device name, +* block devices by block file path, +* vsock backing Unix domain socket by socket name. -Firecracker micromVMs snapshot functionality is available for Intel/AMD/ARM64 CPU models that support the hardware virtualizations extensions, more details are available [here](../../README.md#supported-platforms). Snapshots are not compatible across CPU architectures and even across CPU models of the same architecture. They are only compatible if the CPU features exposed to the guest are an invariant when saving and restoring the snapshot. The trivial scenario is creating and restoring snapshots on hosts that have the same CPU model. +To successfully restore a microVM one should check that: -To make snapshots more portable across Intel CPUs Firecracker provides an API to select a CPU template which is only available for Intel - T2 and C3. Firecracker CPU templates mask CPUID to restrict the exposed features to a common denominator of multiple CPU models. These templates are mapped as close as possible to AWS T2/C3 instances in terms of CPU features. There are no templates available for AMD or ARM64. +* tap devices are available, their names match their original names since these + are the values saved in the microVM state file, and they are accessible to + the Firecracker process where the microVM is being restored, +* block devices are set up at their original relative or absolute paths with + the proper permissions, as the Firecracker process with the restored microVM + will attempt to access them exactly as they were accessed in the original + Firecracker process, +* the vsock backing Unix domain socket is available, its name matches the + original name, and it is accessible to the new Firecracker process. -It is important to note that guest workloads can still execute instructions that are being masked by CPUID and restoring and saving of such workloads will lead to undefined result. Firecracker retrieves the state of a discrete list MSRs from KVM, more specificically the MSRs corresponding to the guest exposed features. +### CPU model +Firecracker micromVMs snapshot functionality is available for Intel/AMD/ARM64 +CPU models that support the hardware virtualizations extensions, more details +are available [here](../../README.md#supported-platforms). Snapshots are not +compatible across CPU architectures and even across CPU models of the same +architecture. They are only compatible if the CPU features exposed to the guest +are an invariant when saving and restoring the snapshot. The trivial scenario +is creating and restoring snapshots on hosts that have the same CPU model. + +To make snapshots more portable across Intel CPUs Firecracker provides an API +to select a CPU template which is only available for Intel - T2 and C3. +Firecracker CPU templates mask CPUID to restrict the exposed features to a +common denominator of multiple CPU models. These templates are mapped as close +as possible to AWS T2/C3 instances in terms of CPU features. There are no +templates available for AMD or ARM64. + +It is important to note that guest workloads can still execute instructions +that are being masked by CPUID and restoring and saving of such workloads will +lead to undefined result. Firecracker retrieves the state of a discrete list +MSRs from KVM, more specifically the MSRs corresponding to the guest +exposed features. ## Implementation -To enable Firecracker cross version snapshots we have designed and built two crates: -- [versionize](https://crates.io/crates/versionize) - defines the Versionize trait, implements serialization of primitive types and provides a helper class to map Firecracker versions to individual structure versions. -- [versionize_derive](https://crates.io/crates/versionize_derive) - exports a procedural macro that consumes structures and enums and their annotations to produce an implementation of the `Versionize` trait - -The microVM state file format is implemented in the [snapshot crate](../../src/snapshot/src/lib.rs) in the Firecracker repository. -All Firecracker devices implement the [Persist](../../src/snapshot/src/persist.rs) trait which exposes an interface that enables creating from and saving to the microVM state. \ No newline at end of file +To enable Firecracker cross version snapshots we have designed and built two +crates: + +* [versionize](https://crates.io/crates/versionize) - defines the `Versionize` + trait, implements serialization of primitive types and provides a helper + class to map Firecracker versions to individual structure versions. +* [versionize_derive](https://crates.io/crates/versionize_derive) - exports + a procedural macro that consumes structures and enums and their annotations + to produce an implementation of the `Versionize` trait. + +The microVM state file format is implemented in the [snapshot crate](../../src/snapshot/src/lib.rs) +in the Firecracker repository. +All Firecracker devices implement the [Persist](../../src/snapshot/src/persist.rs) +trait which exposes an interface that enables creating from and saving to the +microVM state. \ No newline at end of file diff --git a/docs/vsock.md b/docs/vsock.md index fa601196e41..a0d8bf512e5 100644 --- a/docs/vsock.md +++ b/docs/vsock.md @@ -50,11 +50,11 @@ Client A initiates connection to Server A in [figure below](#vsock-connections): 1. Host: At VM configuration time, add a virtio-vsock device, with some path specified in `uds_path`; -2. Guest: create an AF_VSOCK socket and `listen()` on ``; -3. Host: `connect()` to AF_UNIX at `uds_path`. -4. Host: `send()` "CONNECT \n". -5. Guest: `accept()` the new connection. -6. Host: `read()` "OK \n". +1. Guest: create an AF_VSOCK socket and `listen()` on ``; +1. Host: `connect()` to AF_UNIX at `uds_path`. +1. Host: `send()` "CONNECT ``\n". +1. Guest: `accept()` the new connection. +1. Host: `read()` "OK ``\n". The channel is established between the sockets obtained at steps 3 (host) and 5 (guest). @@ -74,10 +74,10 @@ Client B initiates connection to Server B in [figure below](#vsock-connections): 1. Host: At VM configuration time, add a virtio-vsock device, with some `uds_path` (e.g. `/path/to/v.sock`). -2. Host: create and listen on an AF_UNIX socket at `/path/to/v.sock_PORT`. -3. Guest: create an AF_VSOCK socket and connect to `HOST_CID` (i.e. integer +1. Host: create and listen on an AF_UNIX socket at `/path/to/v.sock_PORT`. +1. Guest: create an AF_VSOCK socket and connect to `HOST_CID` (i.e. integer value 2) and `PORT`; -4. Host: `accept()` the new connection. +1. Host: `accept()` the new connection. The channel is established between the sockets obtained at steps 4 (host) and 3 (guest). @@ -106,7 +106,7 @@ curl --unix-socket /tmp/firecracker.socket -i \ Once the microvm is started, Firecracker will create and start listening on the AF_UNIX socket at `uds_path`. Incoming connections will get forwarded to the guest microvm, and translated to AF_VSOCK. The destination port is expected to -be specified by sending the text command "CONNECT \n", immediately +be specified by sending the text command "CONNECT ``\n", immediately after the AF_UNIX connection is established. Connections initiated from within the guest will be forwarded to AF_UNIX sockets expected to be listening at `./v.sock_`. I.e. a guest connection to port 52 will get forwarded to @@ -117,7 +117,6 @@ the guest will be forwarded to AF_UNIX sockets expected to be listening at The examples below assume a running microvm, with a vsock device configured as shown [above](#setting-up-the-virtio-vsock-device). - ### Using External Socket Tools (`nc-vsock` and `socat`) #### Connecting From Host to Guest @@ -126,7 +125,7 @@ First, make sure the vsock port is bound and listened to on the guest side. Say, port 52: ```bash -$ nc-vsock -l 52 +nc-vsock -l 52 ``` On the host side, connect to `./v.sock` and issue a connection request to that @@ -139,7 +138,7 @@ CONNECT 52 `socat` will display the connection acknowledgement message: -``` +```console OK 1073741824 ``` @@ -152,14 +151,14 @@ First make sure the AF_UNIX corresponding to your desired port is listened to on the host side: ```bash -$ socat - UNIX-LISTEN:./v.sock_52 +socat - UNIX-LISTEN:./v.sock_52 ``` On the guest side, create an AF_VSOCK socket and connect it to the previously chosen port on the host (CID=2): ```bash -$ nc-vsock 2 52 +nc-vsock 2 52 ``` ## Known issues diff --git a/tests/README.md b/tests/README.md index b8d3f166dcd..46772a4e6c3 100644 --- a/tests/README.md +++ b/tests/README.md @@ -72,6 +72,7 @@ cargo test --test integration_tests Unlike unit tests, Rust integration tests are each run in a separate process. `Cargo` also packages them in a new crate. This has several known side effects: + 1. Only the `pub` functions can be called. This is fine, as it allows the VMM to be consumed as a programmatic user would. If any function is necessary but not `pub`, please consider carefully whether it conceptually *needs* to @@ -270,13 +271,12 @@ Accessing **pytest.ini** will allow you to modify logger settings. will not need to rebuild everything. If any Rust source file is changed, the build is done incrementally. -2. Pass the `-k substring` option to Pytest to only run a subset of tests by +1. Pass the `-k substring` option to Pytest to only run a subset of tests by specifying a part of their name. -3. Only run the tests contained in a file or directory, as specified in the +1. Only run the tests contained in a file or directory, as specified in the **Running** section. - ## Implementation Goals - Easily run tests manually on a development/test machine, and in a continuous