diff --git a/CHANGELOG.md b/CHANGELOG.md index 9a44cc10356..c9e595b7c0a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,6 +18,11 @@ and this project adheres to ### Fixed +- [#4796](https://github.com/firecracker-microvm/firecracker/pull/4796): Fixed + Vsock not notifying guest about `TRANSPORT_RESET_EVENT` event after snapshot + restore. This resulted in guest waiting indefinitely on a connection which was + reset during snapshot creation. + ## \[1.9.0\] ### Added @@ -812,7 +817,9 @@ and this project adheres to `--show-level` and `--show-log-origin` that can be used for configuring the Logger when starting the process. When using this method for configuration, only `--log-path` is mandatory. -- Added a [guide](docs/devctr-image.md) for updating the dev container image. +- Added a + [guide](https://github.com/firecracker-microvm/firecracker/blob/v0.22.0/docs/devctr-image.md) + for updating the dev container image. - Added a new API call, `PUT /mmds/config`, for configuring the `MMDS` with a custom valid link-local IPv4 address. - Added experimental JSON response format support for MMDS guest applications diff --git a/docs/devctr-image.md b/docs/devctr-image.md deleted file mode 100644 index f16ed4dc2cb..00000000000 --- a/docs/devctr-image.md +++ /dev/null @@ -1,305 +0,0 @@ -# Publishing a New Container Image - -## About the Container Image - -Firecracker uses a [Docker container](https://www.docker.com/) to standardize -the build process. This also fixes the build tools and dependencies to specific -versions. Every once in a while, something needs to be updated. To do this, a -new container image needs to be built locally, then published to the -[AWS ECR](https://aws.amazon.com/ecr/) registry. The Firecracker CI suite must -also be updated to use the new image. - -## Prerequisites - -1. Access to the - [`fcuvm` ECR repository](https://gallery.ecr.aws/firecracker/fcuvm). -1. The `docker` package installed locally. You should already have this if - you've ever built Firecracker from source. -1. Access to both an `x86_64` and `aarch64` machines to build the container - images. -1. Ensure `aws --version` is >=1.17.10. - -## Steps - -### **\[optional\]** Update `poetry.lock` - -This step is optional but recommended, to be on top of Python package changes. - -```sh -./tools/devtool shell --privileged -poetry update --lock --directory tools/devctr/ -``` - -This will change `poetry.lock`, which you can commit with your changes. - -### `x86_64` - -1. Login to the Docker organization in a shell. Make sure that your account has - access to the repository: - - ```bash - aws ecr-public get-login-password --region us-east-1 \ - | docker login --username AWS --password-stdin public.ecr.aws - ``` - - For non-TTY devices, although not recommended a less secure approach can be - used: - - ```bash - docker login --username AWS --password \ - $(aws ecr-public get-login-password --region us-east-1) public.ecr.aws - ``` - -1. Navigate to the Firecracker directory. Verify that you have the latest - container image locally. - - ```bash - docker images - REPOSITORY TAG IMAGE ID CREATED SIZE - public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a 2 weeks ago 2.41GB - ``` - -1. Make your necessary changes, if any, to the - [Dockerfile](https://docs.docker.com/engine/reference/builder/). There's one - for all the architectures in the Firecracker source tree. - -1. Commit the changes, if any. - -1. Build a new container image with the updated Dockerfile. - - ```bash - tools/devtool build_devctr - ``` - -1. Verify that the new image exists. - - ```bash - docker images - REPOSITORY TAG IMAGE ID CREATED SIZE - public.ecr.aws/firecracker/fcuvm latest 1f9852368efb 2 weeks ago 2.36GB - public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a 2 weeks ago 2.41GB - ``` - -1. Tag the new image with the next available version `X` and the architecture - you're on. Note that this will not always be "current version in devtool + - 1", as sometimes that version might already be used on feature branches. - Always check the "Image Tags" on - [the fcuvm repository](https://gallery.ecr.aws/firecracker/fcuvm) to make - sure you do not accidentally overwrite an existing image. - - As a sanity check, run: - - ```bash - docker pull public.ecr.aws/firecracker/fcuvm:vX - ``` - - and verify that you get an error message along the lines of - - ``` - Error response from daemon: manifest for public.ecr.aws/firecracker/fcuvm:vX not - found: manifest unknown: Requested image not found - ``` - - This means the version you've chosen does not exist yet, and you are good to - go. - - ```bash - docker tag 1f9852368efb public.ecr.aws/firecracker/fcuvm:v27_x86_64 - - docker images - REPOSITORY TAG IMAGE ID CREATED - public.ecr.aws/firecracker/fcuvm latest 1f9852368efb 1 week ago - public.ecr.aws/firecracker/fcuvm v27_x86_64 1f9852368efb 1 week ago - public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a 2 weeks ago - ``` - -1. Push the image. - - ```bash - docker push public.ecr.aws/firecracker/fcuvm:v27_x86_64 - ``` - -### `aarch64` - -Login to the `aarch64` build machine. - -Steps 1-4 are identical across architectures, change `x86_64` to `aarch64`. - -Then continue with the above steps: - -1. Build a new container image with the updated Dockerfile. - - ```bash - tools/devtool build_devctr - ``` - -1. Verify that the new image exists. - - ```bash - docker images - REPOSITORY TAG IMAGE ID CREATED - public.ecr.aws/firecracker/fcuvm latest 1f9852368efb 2 minutes ago - public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a 2 weeks ago - ``` - -1. Tag the new image with the next available version `X` and the architecture - you're on. Note that this will not always be "current version in devtool + - 1", as sometimes that version might already be used on feature branches. - Always check the "Image Tags" on - [the fcuvm repository](https://gallery.ecr.aws/firecracker/fcuvm) to make - sure you do not accidentally overwrite an existing image. - - As a sanity check, run: - - ```bash - docker pull public.ecr.aws/firecracker/fcuvm:vX - ``` - - and verify that you get an error message along the lines of - - ``` - Error response from daemon: manifest for public.ecr.aws/firecracker/fcuvm:vX not - found: manifest unknown: Requested image not found - ``` - - This means the version you've chosen does not exist yet, and you are good to - go. - - ```bash - docker tag 1f9852368efb public.ecr.aws/firecracker/fcuvm:v27_aarch64 - - docker images - REPOSITORY TAG IMAGE ID - public.ecr.aws/firecracker/fcuvm latest 1f9852368efb - public.ecr.aws/firecracker/fcuvm v27_aarch64 1f9852368efb - public.ecr.aws/firecracker/fcuvm v26 8d00deb17f7a - ``` - -1. Push the image. - - ```bash - docker push public.ecr.aws/firecracker/fcuvm:v27_aarch64 - ``` - -1. Create a manifest to point the latest container version to each specialized - image, per architecture. - - ```bash - docker manifest create public.ecr.aws/firecracker/fcuvm:v27 \ - public.ecr.aws/firecracker/fcuvm:v27_x86_64 public.ecr.aws/firecracker/fcuvm:v27_aarch64 - - docker manifest push public.ecr.aws/firecracker/fcuvm:v27 - ``` - -1. Update the image tag in the - [`devtool` script](https://github.com/firecracker-microvm/firecracker/blob/main/tools/devtool). - Commit and push the change. - - ```bash - PREV_TAG=v26 - CURR_TAG=v27 - sed -i "s%DEVCTR_IMAGE_TAG=\"$PREV_TAG\"%DEVCTR_IMAGE_TAG=\"$CURR_TAG\"%" tools/devtool - ``` - -## Troubleshooting - -Check out the -[`rust-vmm-container` readme](https://github.com/rust-vmm/rust-vmm-container) -for additional troubleshooting steps and guidelines. - -### I can't push the manifest - -```bash -docker manifest is only supported when experimental cli features are enabled -``` - -See -[this article](https://medium.com/@mauridb/docker-multi-architecture-images-365a44c26be6) -for explanations and fix. - -### How to test the image after pushing it to the Docker registry - -Either fetch and run it locally on another machine than the one you used to -build it, or clean up any artifacts from the build machine and fetch. - -```bash -docker system prune -a - -docker images -REPOSITORY TAG IMAGE ID CREATED SIZE - -tools/devtool shell -[Firecracker devtool] About to pull docker image public.ecr.aws/firecracker/fcuvm:v15 -[Firecracker devtool] Continue? -``` - -### I don't have access to the AWS ECR registry - -```bash -docker push public.ecr.aws/firecracker/fcuvm:v27 -The push refers to repository [public.ecr.aws/firecracker/fcuvm] -e2b5ee0c4e6b: Preparing -0fbb5fd5f156: Preparing -... -a1aa3da2a80a: Waiting -denied: requested access to the resource is denied -``` - -Only a Firecracker maintainer can update the container image. If you are one, -ask a member of the team to add you to the AWS ECR repository and retry. - -### I pushed the wrong tag - -Tags can be deleted from the [AWS ECR interface](https://aws.amazon.com/ecr/). - -Also, pushing the same tag twice will overwrite the initial content. - -### I did everything right and nothing works anymore - -If you see unrelated `Python` errors, it's likely because the dev container -pulls `Python 3` at build time. `Python 3` means different minor versions on -different platforms, and is not backwards compatible. So it's entirely possible -that `docker build` has pulled in unwanted `Python` dependencies. - -To include **only your** changes, an alternative to the method described above -is to make the changes *inside* the container, instead of in the `Dockerfile`. - -Let's say you want to update -[`cargo-audit`](https://github.com/RustSec/cargo-audit) (random example). - -1. Enter the container as `root`. - - ```bash - tools/devtool shell -p - ``` - -1. Make the changes locally. Do not exit the container. - - ```bash - cargo install cargo-audit --force - ``` - -1. Find your running container. - - ```bash - docker ps - CONTAINER ID IMAGE COMMAND CREATED - e9f0487fdcb9 fcuvm:v14 "bash" 53 seconds ago - ``` - -1. Commit the modified container to a new image. Use the `container ID`. - - ```bash - docker commit e9f0487fdcb9 fcuvm:v15_x86_64 - ``` - - ```bash - docker image ls - REPOSITORY TAG IMAGE ID CREATED - fcuvm v15_x86_64 514581e654a6 18 seconds ago - fcuvm v14 c8581789ead3 2 months ago - ``` - -1. Repeat for `aarch64`. - -1. Create and push the manifest. diff --git a/src/vmm/src/device_manager/mmio.rs b/src/vmm/src/device_manager/mmio.rs index 69451650045..6020ea483ea 100644 --- a/src/vmm/src/device_manager/mmio.rs +++ b/src/vmm/src/device_manager/mmio.rs @@ -34,7 +34,7 @@ use crate::devices::virtio::device::VirtioDevice; use crate::devices::virtio::mmio::MmioTransport; use crate::devices::virtio::net::Net; use crate::devices::virtio::rng::Entropy; -use crate::devices::virtio::vsock::TYPE_VSOCK; +use crate::devices::virtio::vsock::{Vsock, VsockUnixBackend, TYPE_VSOCK}; use crate::devices::virtio::{TYPE_BALLOON, TYPE_BLOCK, TYPE_NET, TYPE_RNG}; use crate::devices::BusDevice; #[cfg(target_arch = "x86_64")] @@ -489,6 +489,16 @@ impl MMIODeviceManager { // so for Vsock we don't support connection persistence through snapshot. // Any in-flight packets or events are simply lost. // Vsock is restored 'empty'. + // The only reason we still `kick` it is to make guest process + // `TRANSPORT_RESET_EVENT` event we sent during snapshot creation. + let vsock = virtio + .as_mut_any() + .downcast_mut::>() + .unwrap(); + if vsock.is_activated() { + info!("kick vsock {id}."); + vsock.signal_used_queue().unwrap(); + } } TYPE_RNG => { let entropy = virtio.as_mut_any().downcast_mut::().unwrap(); diff --git a/tests/integration_tests/functional/test_vsock.py b/tests/integration_tests/functional/test_vsock.py index a09bd246e9b..c860e2d920b 100644 --- a/tests/integration_tests/functional/test_vsock.py +++ b/tests/integration_tests/functional/test_vsock.py @@ -14,6 +14,9 @@ """ import os.path +import subprocess +import time +from pathlib import Path from socket import timeout as SocketTimeout from framework.utils_vsock import ( @@ -126,7 +129,7 @@ def test_vsock_epipe(uvm_plain, bin_vsock_path, test_fc_session_root_path): validate_fc_metrics(metrics) -def test_vsock_transport_reset( +def test_vsock_transport_reset_h2g( uvm_nano, microvm_factory, bin_vsock_path, test_fc_session_root_path ): """ @@ -215,3 +218,67 @@ def test_vsock_transport_reset( check_host_connections(path, blob_path, blob_hash) metrics = vm2.flush_metrics() validate_fc_metrics(metrics) + + +def test_vsock_transport_reset_g2h(uvm_nano, microvm_factory): + """ + Vsock transport reset test. + """ + test_vm = uvm_nano + test_vm.add_net_iface() + test_vm.api.vsock.put(vsock_id="vsock0", guest_cid=3, uds_path=f"/{VSOCK_UDS_PATH}") + test_vm.start() + test_vm.wait_for_up() + + host_socket_path = os.path.join( + test_vm.path, f"{VSOCK_UDS_PATH}_{ECHO_SERVER_PORT}" + ) + host_socat_commmand = [ + "socat", + "-dddd", + f"UNIX-LISTEN:{host_socket_path},fork", + "STDOUT", + ] + host_socat = subprocess.Popen( + host_socat_commmand, stdout=subprocess.PIPE, stderr=subprocess.PIPE + ) + + # Give some time for host socat to create socket + time.sleep(0.5) + assert Path(host_socket_path).exists() + test_vm.create_jailed_resource(host_socket_path) + + # Create a socat process in the guest which will connect to the host socat + guest_socat_commmand = f"tmux new -d 'socat - vsock-connect:2:{ECHO_SERVER_PORT}'" + test_vm.ssh.run(guest_socat_commmand) + + # socat should be running in the guest now + code, _, _ = test_vm.ssh.run("pidof socat") + assert code == 0 + + # Create snapshot. + snapshot = test_vm.snapshot_full() + test_vm.resume() + + # After `create_snapshot` + 'restore' calls, connection should be dropped + code, _, _ = test_vm.ssh.run("pidof socat") + assert code == 1 + + # Kill host socat as it is not useful anymore + host_socat.kill() + host_socat.communicate() + + # Terminate VM. + test_vm.kill() + + # Load snapshot. + vm2 = microvm_factory.build() + vm2.spawn() + vm2.restore_from_snapshot(snapshot, resume=True) + vm2.wait_for_up() + + # After snap restore all vsock connections should be + # dropped. This means guest socat should exit same way + # as it did after snapshot was taken. + code, _, _ = vm2.ssh.run("pidof socat") + assert code == 1