-
Notifications
You must be signed in to change notification settings - Fork 113
Conversation
/test-vfio |
The "lazy attach" mechanism [1] was added to hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, fixing LBS hotplug in kata containers. Since PCI rescan is removed in kata-containers/agent#782, lazy attach is not longer needed. Depends-on: github.com/kata-containers/agent#782 fixes kata-containers#2664 [1] kata-containers#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
5b1ddf6
to
379f833
Compare
/test |
Codecov Report
@@ Coverage Diff @@
## master #782 +/- ##
==========================================
- Coverage 60.15% 60.06% -0.09%
==========================================
Files 17 17
Lines 2665 2672 +7
==========================================
+ Hits 1603 1605 +2
- Misses 900 906 +6
+ Partials 162 161 -1 |
The "lazy attach" mechanism [1] was added to hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, fixing LBS hotplug in kata containers. Since PCI rescan is removed in kata-containers/agent#782, lazy attach is not longer needed. Depends-on: github.com/kata-containers/agent#782 fixes kata-containers#2664 [1] kata-containers#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this looks quite interesting.
We're being hit by a workaround of an old QEMU bug. Not exactly related to this PR, but we really should document the workarounds for this and that version of the packages we rely on, mainly to revisit that in the future when we bump our minimum requirements.
Code wise, it looks good. Conceptual wise, it looks good.
I'll add the "Approve" once @amorenoz mentions his tests are all good, introducing the "AaaS" concept ("Ack-as-a-Service").
379f833
to
ebd5b2a
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice @devimc - the best type of PR 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/test-vfio |
@devimc You mentioned that Afaik, for q35 we hotplug devices on a pci-to-pci bridge and this mechanism makes use of SHPC itself and does not use ACPI at all. (This is the original commit that added the rescan logic in Kata : #380) |
@amshinde As far as I have seen, the 5s are still there. However, the rescan systematically breaks the pci devices due to a race condition between the shpc and the rescan. I don't see a clear way of avoiding that race that does not defeat the very same reason the rescan was added. I know that the use of pcie-root-ports should be the preferred way in q35 machine-types but even there the rescan (which, IIUC, is not needed in that case) breaks the pcie hotplug. |
Note that both shpc and PCI-E native hotplug have a delay (though I think the PCI-E one is a little shorter). Despite adding a delay, I think applying this patch is the correct thing to do: As well as badly breaking devices because of the race, the rescan only ever improved the delay by accident. It's really a kernel bug that the rescan and hotplug break so badly. I've spoken to Alex Williamson (kernel VFIO maintainer) and he says it's a known problem, but considered a low priority, so this is unlikely to be fixed any time soon. AFAICT, if the kernel bug were fixed it would likely mean that either:
So even if there wasn't the bad failure mode, the rescan still wouldn't accomplish what we want. If we want to properly address that hotplug delay, we'd need to define a new hotplug protocol and teach the guest kernel and qemu to use it. Given that the 5s delay is there for the benefit of a human pressing buttons and watching blinkenlights when physically hotplugging a device, a faster virtual-only hotplug protocol would actually make a lot of sense for a bunch of applications, but it's not something we can accomplish in Kata alone. So despite the extra delay, I think we should go ahead and apply this fix. |
@dgibson thanks for the explanation. I wonder if there is a good place to document these for any new contributors to understand the finer nuances. May be as a code comment or commit message ? |
We could certainly update the commit message with that context. I don't see an obvious place to put it in the code, though. |
I agree we want to have this change in. @dgibson you mention a native delay. But with the current qemu on q35, do we still have this delay? |
The delay isn't in qemu, it's in the guest kernel. |
PCI bus rescan code was added long time ago in Clear Containers due to lack of ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug in Kata Containers. A workaround to this issue is the "lazy attach" mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, unfourtunately some non-LBS devices are being affected too, for instance SR-IOV devices. It would not make sense to lazy-attach non-LBS devices because kata will end up lazy-attaching all the devices, having said that, the PCI bus rescan code and the "lazy attach" mechanism should be removed Depends-on: github.com/kata-containers/runtime#2670 fixes kata-containers#781 fixes kata-containers/runtime#2664 [1] clearcontainers/agent#139 [2] kata-containers/runtime#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
ebd5b2a
to
b26f728
Compare
PR rebased, but VFIO CI will fail /test-vfio |
Do you mean the kernel implements the same delay required for physical hardware (or human operators) irrespective of whether it's a physical or virtual device? |
Well, from the guest kernel's point of view a virtual device is (at least theoretically) indistinguishable from a physical one. Plus I believe the delay is written into the PCI specs, and presumably predates the widespread use of virtual devices. |
@devimc - I think this needs to be ported to 2.0, but please could you verify? |
True, but there is also paravirtualization all over the place. Maybe the kernel could relax the historical PCI delays when running in a VM.
Thanks for answering my question. |
The rust agent does indeed contain PCI rescan code. |
PCI bus rescan code was added long time ago in Clear Containers due to lack of ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug in Kata Containers. A workaround to this issue is the "lazy attach" mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, unfourtunately some non-LBS devices are being affected too, for instance SR-IOV devices. It would not make sense to lazy-attach non-LBS devices because kata will end up lazy-attaching all the devices, having said that, the PCI bus rescan code and the "lazy attach" mechanism should be removed Forward port of: kata-containers/agent#782 Fixes: kata-containers#683 Suggested-by: Julio Montes <julio.montes@intel.com> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
PCI bus rescan code was added long time ago in Clear Containers due to lack of ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug in Kata Containers. A workaround to this issue is the "lazy attach" mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, unfourtunately some non-LBS devices are being affected too, for instance SR-IOV devices. It would not make sense to lazy-attach non-LBS devices because kata will end up lazy-attaching all the devices, having said that, the PCI bus rescan code and the "lazy attach" mechanism should be removed Forward port of: kata-containers/agent#782 Fixes: kata-containers#683 Suggested-by: Julio Montes <julio.montes@intel.com> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
but we don't have a vfio CI in 2.0, |
@c3d it won't pass until we reimplement the way hotplugged devices are handled in both runtime and agent |
Uh.. possibly. Working out criteria to do that safely will require some thought, though. |
/test-ubuntu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed the conversation, I think this is a necessary change, despite the extra delays.
VFIO CI failing:
@devimc - it might be useful to modify |
Ok, I had a look at the tests code, and I'm pretty sure what's happening is that it's looking for the eth interface from the VFIO device more-or-less immediately after starting the container, whereas the hotplug will take several seconds. The rescan we used to have not only probed the device sooner, but it's synchronous, so it had the side effect of waiting for the scan to complete which meant a bunch of things were already set up once we started the container. We have a couple of options:
@jodh-intel, thoughts? |
@dgibson - that sounds like the best option to me, since it's the least surprising.
That's a pita. @amshinde / @mcastelino - any thoughts on how we might handle this reliably? One other consideration - whatever we decide needs to be implementable for the 2.0 agent at some future date (ideally before 2.0 :) |
I should clarify: while it's true that we can't be sure that the expected secondary devices have appeared, I think that's also true of the existing rescan.
Yes, I haven't looked at porting my patches to 2.0 yet, but it's on my list. |
@devimc - please can you rebase as the branch is now conflicted? Forward port PR: kata-containers/kata-containers#684. |
closing in favour of #850 - thanks @jodh-intel |
The "lazy attach" mechanism [1] was added to hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, fixing LBS hotplug in kata containers. Since PCI rescan is removed in kata-containers/agent#782, lazy attach is not longer needed. fixes kata-containers#2664 [1] kata-containers#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
PCI bus rescan code was added long time ago in Clear Containers due to lack of
ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug
in Kata Containers. A workaround to this issue is the "lazy attach"
mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the
PCI bus, unfourtunately some non-LBS devices are being affected too, for
instance SR-IOV devices. It would not make sense to lazy-attach non-LBS
devices because kata will end up lazy-attaching all the devices, having said
that, the PCI bus rescan code and the "lazy attach" mechanism should be removed
Depends-on: github.com/kata-containers/runtime#2670
fixes #781
fixes kata-containers/runtime#2664
[1] clearcontainers/agent#139
[2] kata-containers/runtime#2461
Signed-off-by: Julio Montes julio.montes@intel.com