Align device events registration after device activation #1687

acatangiu · 2020-03-19T18:31:22Z

Reason for This PR

Description of Changes

Device access to guest memory on/after activation

Passing memory during device activation instead of during construction enables the following story: #1702 #1708 #1709 and finally #1713

The changes in this PR are in line with the model currently defined in https://github.com/rust-vmm/vm-virtio.
They are also compatible with the rust-vmm/vm-virtio#10 proposal where memory is also passed to the device during activation:

    /// Associated guest memory
    type M: GuestMemory;

    ...

    /// Activates this device for real usage.
    /// The ownership of the VirtioDeviceConfig object moves into the VirtioDevice object if
    /// activate succeeds, otherwise it should return the ownership back.
    fn activate(&mut self, config: VirtioDeviceConfig<M>) -> ActivateResult<VirtioDeviceConfig<M>>;

Events registration during activate

Postpone external events registration to device activation time.
This makes the block and net devices unaware of external events prior to their activation.

During creation register a dedicated activation event which will notify the device when it's time to register the other external events sources.

This activation event is unregistered after successful device activation.

Coverage slightly decreased because untestable EventFd error cases.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

PR Checklist

[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]

All commits in this PR are signed (git commit -s).
The reason for this PR is clearly provided (issue no. or explanation).
The description of changes is clear and encompassing.
Any required documentation changes (code and docs) are included in this PR.
Any newly added unsafe code is properly documented.
Any API changes are reflected in firecracker/swagger.yaml.
Any user-facing changes are mentioned in CHANGELOG.md.

acatangiu · 2020-03-19T18:34:08Z

Still need to add unit tests for the new code.

acatangiu · 2020-03-20T14:02:48Z

Tests added, PR is complete.

src/devices/src/virtio/vsock/device.rs

andreeaflorescu · 2020-03-25T13:07:10Z

src/devices/src/virtio/net/event_handler.rs

+                self_subscriber.clone(),
+            )
+            .unwrap_or_else(|e| {
+                error!(


Do we want to continue the execution if we cannot register the events needed for this device? It might lead to silent failures.

I believe that once a guest is booted (customer workload has possibly started) we should always avoid crashing.

A malfunctioning device has the blast-radius limited to operations on that device. The guest workload might or might not be affected and data might be recoverable if the malfunction is detected.

andreeaflorescu · 2020-03-25T13:39:23Z

src/devices/src/virtio/block/event_handler.rs

+            DeviceState::Inactive => warn!(
+                "Block: The device is not yet activated. Spurious event received: {:?}",
+                source
+            ),


This error case can be tested and should probably be tested. We can check that we do not trigger process_*_event functions when the device is not activated. This applies to both block and net devices. It looks like the activate for vsock device is not tested at all, so maybe we should open an issue for that one.

Sounds good. Will add a test for this error case.

Added a test for the vsock device event handler.

Enhanced handler tests for all devices to validate correct events handling for both pre- and post-activation.

andreeaflorescu · 2020-03-25T13:42:06Z

src/devices/src/virtio/block/event_handler.rs

+    use vm_memory::{Bytes, GuestAddress};
+
+    #[test]
+    fn test_interest_list() {


This test does not bring much value because interest_list just returns a static vector created in the function body. To test this basically means to duplicate the code in the test. This is typically a bad engineering practice referred to as WET. You can read more about it here: https://en.wikipedia.org/wiki/Don%27t_repeat_yourself

I'm fine with that, but we should be consistent. We have a lot of tests doing exactly this (think errors formatting).

I am in favor of removing all such tests to be honest, but when they were added the logic was test brings an extra layer of redundancy that would catch unintentional changes.

@firecracker-microvm/compute-capsule I'd really like for us to align here and maybe even formalize it in a best-practices doc in the codebase.

I agree we should be consistent. If people from the team agree with this, we shouldn't introduce this test in this PR just for the sake of consistency though :D

I've removed the WET tests.

src/devices/src/virtio/device.rs

alexandruag · 2020-03-27T10:28:19Z

Hi everyone! Taking a step back, it seems a lot complexity stems from the multi-step Firecracker configuration process. Is there any reason for devices, memory, mmds, etc. to be configured via separate API calls at this point? If the API only allowed the "one call" configuration we talked about a while ago (or, to push things even further, Firecracker is always started using a config file and only keeps the API for runtime PATCHes, etc), quite a lot of logic gets eliminated and we get configuration information upfront. This is a breaking change, but it seems worthwhile to seriously consider sooner rather than later.

Also wanted to mention the VirtioDevice interface from rust-vmm is still mostly based on the old crosvm/Firecracker code, so unfortunately it's unclear at this point how it will end up looking like. FWIW, not immediately going back to tying device memory configuration to activation might be better in terms of keeping options open.

acatangiu · 2020-03-27T15:14:09Z

I agree with @alexandruag that the current multi-step configuration process introduces a complexity cost and imposes limitations on the Firecracker design that far outweigh any benefits it brings in a production scenario.

Unfortunately, like Alex mentioned, removing the multi-step configuration is a breaking change in terms of API and usage patterns and needs research and buy-in from customers/users of Firecracker before we can do it.

In the meantime, we should move forward with the changes in this PR as their blast-radius is very small and reverting the way memory is passed to the device can be easily done in the future. By that time we might even have an agreed-upon interface in rust-vmm that we can also adhere to.

iulianbarbu

Overall LGTM. Please add unit tests in */event_handler.rs which verifies that when the device is inactive, there is no event handling.

iulianbarbu · 2020-03-31T11:20:37Z

src/devices/src/virtio/vsock/event_handler.rs

+
+        // Push a queue event
+        // - the driver has something to send (there's data in the TX queue); and
+        // - the backend has no pending RX data.


This comment says that the backend has no pending RX data, but in the code block below, you do:
device.backend.set_pending_rx(true).

You're right, I had updated the test but forgot to update the comment.
Fixed.

andreeaflorescu · 2020-03-31T11:04:36Z

src/devices/src/virtio/vsock/event_handler.rs

-            _ if source == evq => raise_irq = self.handle_evq_event(event),
-            _ if source == backend => {
-                raise_irq = self.notify_backend(event);
+        match self.device_state {


Shouldn't this be using is_activated function instead?

Sure. I've made the changes so all devices now use is_activated() in their event handler.

andreeaflorescu · 2020-03-31T11:06:00Z

src/devices/src/virtio/block/device.rs

@@ -161,11 +160,16 @@ impl Block {
    }

    pub(crate) fn process_queue(&mut self, queue_index: usize) -> bool {
+        let mem = match self.device_state {
+            DeviceState::Activated(ref mem) => mem,
+            // This should never happen, it's been already validated in the event handler.


If this should never happen, should we use a panic instead to avoid programming errors?

I'm usually very apprehensive of adding crash conditions because of the risk of crashing in production.

In this area however, we have a lot of unit-tests so any future programming errors will be better caught with a panic as you say. Also because of the high-degree of coverage, I believe it is safe to say that we shouldn't crash in prod while tests are passing.

Per your suggestion, I've modified all instances of this check to have unreachable!() on the unreachable path.

ioanachirca · 2020-03-31T11:37:52Z

src/devices/src/virtio/block/event_handler.rs

+                "Block: The device is not yet activated. Spurious event received: {:?}",
+                source
+            ),
+        };
    }

    // Returns the rate_limiter and queue event fds.


Suggested change

// Returns the rate_limiter and queue event fds.

// Returns the activate event fd.

Could also remove the comment, as it's not essential and requires maintenance with each change of the function's meaning.

Instead of passing memory at device creation, bring it in during device activation. This enables future scenarios where devices can be created prior to guest memory configuration. Signed-off-by: Adrian Catangiu <acatan@amazon.com>

Postpone block external events registration to device activation time. This makes the block device unaware of external events prior to its activation. During creation register a dedicated activation event which will notify the device when it's time to register the other external events sources. This activation event is unregistered after successful device activation. Signed-off-by: Adrian Catangiu <acatan@amazon.com>

Postpone net external events registration to device activation time. This makes the net device unaware of external events prior to its activation. During creation register a dedicated activation event which will notify the device when it's time to register the other external events sources. This activation event is unregistered after successful device activation. Signed-off-by: Adrian Catangiu <acatan@amazon.com>

Allow Vsock creation without guest memory. Access to memory will be given to the device during its activation. Signed-off-by: Adrian Catangiu <acatan@amazon.com>

Make sure all events are ignored prior to device activation. Added test for vsock event handling through EventManager. Signed-off-by: Adrian Catangiu <acatan@amazon.com>

acatangiu self-assigned this Mar 19, 2020

acatangiu added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Mar 19, 2020

acatangiu added Status: Author Status: Awaiting review Indicates that a pull request is ready to be reviewed and removed Status: Awaiting review Indicates that a pull request is ready to be reviewed labels Mar 19, 2020

acatangiu force-pushed the device_activate branch from ccd1383 to 249d674 Compare March 20, 2020 14:02

acatangiu removed the Status: Author label Mar 20, 2020

acatangiu force-pushed the device_activate branch from 249d674 to 9bfb07e Compare March 20, 2020 14:30

iulianbarbu self-requested a review March 23, 2020 10:44

acatangiu force-pushed the device_activate branch from 04f3317 to ad12fa5 Compare March 24, 2020 17:47

andreeaflorescu self-requested a review March 25, 2020 09:56

andreeaflorescu reviewed Mar 25, 2020

View reviewed changes

serban300 self-requested a review March 25, 2020 16:00

acatangiu linked an issue Mar 27, 2020 that may be closed by this pull request

Decouple Vmm from its Firecracker-specific RPC interface #1713

Closed

acatangiu removed a link to an issue Mar 27, 2020

Decouple Vmm from its Firecracker-specific RPC interface #1713

Closed

iulianbarbu reviewed Mar 27, 2020

View reviewed changes

acatangiu force-pushed the device_activate branch 2 times, most recently from 8945700 to 1707be1 Compare March 30, 2020 19:52

iulianbarbu reviewed Mar 31, 2020

View reviewed changes

andreeaflorescu reviewed Mar 31, 2020

View reviewed changes

ioanachirca reviewed Mar 31, 2020

View reviewed changes

acatangiu force-pushed the device_activate branch from ded4139 to 88ee2bb Compare March 31, 2020 15:56

acatangiu added 4 commits March 31, 2020 19:21

devices: re-introduce mem in VirtioDevice activate

6409db6

Instead of passing memory at device creation, bring it in during device activation. This enables future scenarios where devices can be created prior to guest memory configuration. Signed-off-by: Adrian Catangiu <acatan@amazon.com>

devices: vsock: pass memory during activate

24ed2f1

Allow Vsock creation without guest memory. Access to memory will be given to the device during its activation. Signed-off-by: Adrian Catangiu <acatan@amazon.com>

devices: vsock: only handle events after device activation

be73e7c

Make sure all events are ignored prior to device activation. Added test for vsock event handling through EventManager. Signed-off-by: Adrian Catangiu <acatan@amazon.com>

acatangiu force-pushed the device_activate branch from 88ee2bb to be73e7c Compare March 31, 2020 16:22

andreeaflorescu approved these changes Apr 1, 2020

View reviewed changes

Merge branch 'master' into device_activate

a113b2e

iulianbarbu approved these changes Apr 1, 2020

View reviewed changes

iulianbarbu merged commit ce7a3d9 into firecracker-microvm:master Apr 1, 2020

acatangiu deleted the device_activate branch April 1, 2020 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align device events registration after device activation #1687

Align device events registration after device activation #1687

acatangiu commented Mar 19, 2020 •

edited

Loading

acatangiu commented Mar 19, 2020

acatangiu commented Mar 20, 2020

andreeaflorescu Mar 25, 2020

acatangiu Mar 25, 2020

andreeaflorescu Mar 25, 2020

acatangiu Mar 25, 2020

acatangiu Mar 30, 2020

andreeaflorescu Mar 25, 2020

acatangiu Mar 25, 2020

andreeaflorescu Mar 25, 2020

acatangiu Mar 30, 2020

alexandruag commented Mar 27, 2020

acatangiu commented Mar 27, 2020

iulianbarbu left a comment

iulianbarbu Mar 31, 2020

acatangiu Mar 31, 2020

andreeaflorescu Mar 31, 2020

acatangiu Mar 31, 2020

andreeaflorescu Mar 31, 2020

acatangiu Mar 31, 2020

ioanachirca Mar 31, 2020

ioanachirca Mar 31, 2020

acatangiu Mar 31, 2020

	// Returns the rate_limiter and queue event fds.
	// Returns the activate event fd.

Align device events registration after device activation #1687

Align device events registration after device activation #1687

Conversation

acatangiu commented Mar 19, 2020 • edited Loading

Reason for This PR

Description of Changes

Device access to guest memory on/after activation

Events registration during activate

License Acceptance

PR Checklist

acatangiu commented Mar 19, 2020

acatangiu commented Mar 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexandruag commented Mar 27, 2020

acatangiu commented Mar 27, 2020

iulianbarbu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acatangiu commented Mar 19, 2020 •

edited

Loading