fix up remaining issues with pci hotplug and uevent listening #380

bergwolf · 2018-09-20T03:59:08Z

rescan pci bus before waiting for a new device
fix redefined notify channel when waiting for a new device

Fixes: #372

I made this depends on #375 since they both modify the same function and would naturally conflict.

bergwolf · 2018-09-20T04:08:23Z

/test

codecov · 2018-09-20T05:45:59Z

Codecov Report

Merging #380 into master will increase coverage by 0.04%.
The diff coverage is 33.33%.

@@            Coverage Diff             @@
##           master     #380      +/-   ##
==========================================
+ Coverage   46.47%   46.51%   +0.04%     
==========================================
  Files          16       16              
  Lines        2498     2498              
==========================================
+ Hits         1161     1162       +1     
+ Misses       1186     1185       -1     
  Partials      151      151

jodh-intel · 2018-09-20T08:15:21Z

lgtm

devimc · 2018-09-20T14:10:35Z

device.go

@@ -36,6 +36,7 @@ var (
 	sysBusPrefix     = "/sys/bus/pci/devices"
 	pciBusPathFormat = "%s/%s/pci_bus/"
 	systemDevPath    = "/dev"
+	pciBusRescanPath = "/sys/bus/pci/rescan"


This variable is in grpc.go pciBusRescanFile

devimc · 2018-09-20T14:10:51Z

device.go

-		fieldLogger.Info("Device found on pci device path")
-		return "", nil
+	// Rescan pci bus if we need to wait for a new pci device
+	if err = ioutil.WriteFile(pciBusRescanPath, []byte{'1'}, 0200); err != nil {


please use pciBusMode

or could you implement a new function here to rescan the PCI bus and use it in grpc.go ?

…ing" We should not check for the sysfs path assciated with the PCI address as the device node may still not be created. Only a uevent with a device name associated indicates that the device is ready. This change was also causing the sandbox lock to be not released causing the uevent listening loop to be locked permanently. This reverts commit 7caf1c8. Fixes kata-containers#379 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>

Otherwise the channel we are waiting on is not the one we put in the device watchers map. Fixes: kata-containers#372 Signed-off-by: Peng Tao <bergwolf@gmail.com>

So that we make sure the device pci kobject shows up. Signed-off-by: Peng Tao <bergwolf@gmail.com>

bergwolf · 2018-09-20T15:13:27Z

@devimc updated. PTAL.

devimc

lgtm

amshinde · 2018-09-20T18:24:10Z

@sboeuf PTAL. I think scanning the PCI bus is ok till we find out the qemu issue for SHPC hotplug.

sboeuf

Yes that's fine for now.

sboeuf · 2018-09-20T23:13:51Z

I've just restarted the metrics build.

bergwolf · 2018-09-21T02:24:07Z

The metrics failure doesn't seem to relate to this PR. I created kata-containers/tests#763 to track it.

00:32:45 ERROR: command perf not available

gnawux · 2018-09-21T04:06:07Z

Ignore the non-related metrics failure.

let's merge it

grahamwhaley · 2018-09-21T08:35:34Z

Thanks @gnawux - already aware of the perf item - we have an issue over at kata-containers/ci#43 as well.
But, that is not an actual fail for the metrics right now - it stops us running a test (network performance), that we are not actually using as a pass/fail metric yet (so, yes, it is annoying and distracting that the error occurs, but it is not the actual failure problem). Let me go see if I can find the Jenkins logs to check (as once you have closed the PR, the links vanish from the github view ;-) ).... here we go: http://jenkins.katacontainers.io/job/kata-metrics-agent-ubuntu-16-04-PR/44/console
Wow, metrics was having a bad day :-(
The important bit of the metrics is the bounds check at the end of the logs - first thing to do with a metrics log is to check if we got to the end and did the check happen. Then you find a table like:

03:36:43 Report Summary:
03:36:43 +-----+----------------------+-------+--------+--------+-------+--------+--------+------+------+-----+
03:36:43 | P/F |         NAME         |  FLR  |  MEAN  |  CEIL  |  GAP  |  MIN   |  MAX   | RNG  | COV  | ITS |
03:36:43 +-----+----------------------+-------+--------+--------+-------+--------+--------+------+------+-----+
03:36:43 | *F* | boot-times           | 95.0% | 110.2% | 105.0% | 10.0% | 108.1% | 112.6% | 4.2% | 1.1% |  20 |
03:36:43 | *F* | memory-footprint     | 95.0% | 87.4%  | 105.0% | 10.0% | 87.4%  | 87.4%  | 0.0% | 0.0% |   1 |
03:36:43 | P   | memory-footprint-ksm | 95.0% | 97.2%  | 105.0% | 10.0% | 97.2%  | 97.2%  | 0.0% | 0.0% |   1 |
03:36:43 +-----+----------------------+-------+--------+--------+-------+--------+--------+------+------+-----+

What I've been seeing over the last week is too much variance in the metrics results, and I'm not sure why yet. I would not normally expect either memory footprint or boot times to vary so much - which is why I have the pass/fail boundaries set at 5% - but we see here boot times were up 10%, and memory footprint down 12.6%. I need to do some research into what is going on. I just didn't find time yet this week.

This is a key reason why the metrics CI's are not marked as 'required' items in Github yet...
Thanks for you patience ;-)

sboeuf · 2018-09-21T18:41:00Z

@bergwolf @amshinde @egernst @devimc
I have spent some time understand what was going on with pc, q35 and the different ways the PCI hotplug is being performed.

pc case

This machine type uses ACPI to hotplug PCI devices. This means the whole notification mechanism relies on ACPI methods and it works well when hotplugging on the main bus (pci host bridge).
But why it does not work to hotplug on pci-bridge if shpc=off? The ACPI mechanism should still work, right? Yes, the ACPI mechanism notifying the guest OS about a new PCI device works properly and the PCI slot is getting rescanned. Unfortunately, the PCI spec defines that any PCI bridge that is created (coldplugged) with no device attached to it, will not be getting any memory resource allocated for the futures BARs of its devices. For this reason, an empty PCI bridge is unusable from a PCI hotplug perspective, and the guest OS generates this error:

[    8.424258] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00040000 pref]
[    8.425179] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00040000 pref]
[    8.425274] pci 0000:01:00.0: BAR 4: no space for [mem size 0x00004000 64bit pref]
[    8.425353] pci 0000:01:00.0: BAR 4: failed to assign [mem size 0x00004000 64bit pref]
[    8.425434] pci 0000:01:00.0: BAR 1: no space for [mem size 0x00001000]
[    8.425501] pci 0000:01:00.0: BAR 1: failed to assign [mem size 0x00001000]
[    8.425568] pci 0000:01:00.0: BAR 0: no space for [io  size 0x0020]
[    8.425635] pci 0000:01:00.0: BAR 0: failed to assign [io  size 0x0020]
[    8.472148] PCI Interrupt Link [LNKA] enabled at IRQ 11
[    8.472498] virtio-pci 0000:01:00.0: virtio_pci: leaving for legacy driver
[    8.493270] virtio-pci: probe of 0000:01:00.0 failed with error -12

But then, how does it work with shpc=on? Well, by enabling shpc on the PCI bridge, we create a shpc controller on the slot 0 of this PCI bridge, and this will allocate some shpc BAR, which will force the bridge not to be empty, hence usable for later hotplug. Here is the output from the kernel when the PCI device is properly added:

[    8.512248] pci 0000:01:00.0: BAR 6: assigned [mem 0x80000000-0x8003ffff pref]
[    8.512417] pci 0000:01:00.0: BAR 4: assigned [mem 0x80040000-0x80043fff 64bit pref]
[    8.512597] pci 0000:01:00.0: BAR 1: assigned [mem 0x80044000-0x80044fff]
[    8.512795] pci 0000:01:00.0: BAR 0: assigned [io  0xc000-0xc01f]
[    8.535774] PCI Interrupt Link [LNKA] enabled at IRQ 11
[    8.536077] virtio-pci 0000:01:00.0: enabling device (0000 -> 0003)
[    8.558381] virtio_net virtio3 enp1s0: renamed from eth0
[    8.597475] IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready
[    9.564367] IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0: link becomes ready

So that's the explanation for pc and why we need shpc to be enabled on any pci-bridge that we create.

Note: coldplugging a PCI device to the PCI bridge will have the same effect, forcing the PCI bridge to have it's own BAR space, and further hotplug will work (unless you're hotplugging a PCI device with large BARs).

q35 case

There is no ACPI involved in the handling of PCI hotplug for the case of q35. It's either relying on shpc (for PCI devices) or PCIe native hotplug.
Because we use a pci-bridge, shpc is going to be used in this case. But it will be used fully, not only for the "PCI bridge trick". It will provide the mechanism to communicate with the guest OS and that's why the guest OS needs to support it (that's why we enabled CONFIG_HOTPLUG_PCI_SHPC).
Problem is about the way shpc works and the assumptions from the shpc driver drivers/pci/hotplug/shpchp_ctrl.c. When the driver detects the device has been hotplugged, it will eventually call into handle_button_press_event() which will delay queuing the work for 5 seconds.... Yeah 5 seconds! We can understand this from a real HW perspective, but for virtualization, it does not make sense... Unless we patch the driver itself (and I don't think we should do that), the PCI rescan looks like the best optimization we can use here. Indeed, the PCI device is already there, but we're simply waiting for the driver to resume its work.

I hope this helps clarify things :)

amshinde · 2018-09-21T19:49:07Z

@sboeuf Thanks for summarising our findings. In conclusion:
It always makes sense to leave SHPC on for a pci-bridge.

With the current hotplug implementation for q35 in virtcontainers with pci-bridge, we need to rescan the pci bus because of the 5 second delay in the shpc controller.
Since this is not required for pc, we can do better and pass an extra parameter to the agent telling it to scan the bus or not. I am not sure how expensive scanning the bus would get, when large number of devices are attached.

Another point to consider with q35 hotplug is that by attaching pcie devices to a pci-bridge we lose out on the pcie capabilities of the device, since the device is exposed as a pci device. We lose out on all the advantages of pcie which we would get with q35 platform. Hence, what we really need to do is to revisit q35 hotplug in virtcontainers and implement the hotplug using pcie root ports instead of a pci-bridge.

cc @devimc

bergwolf · 2018-09-22T03:16:03Z

@sboeuf Thanks for the detailed summary! And I agree with @amshinde that we should move to pcie ports for q35 since pcie support is the main advantage of q35 platform.

gnawux · 2018-09-22T03:26:56Z

thanks @sboeuf , agree to move to pcie as well.

bergwolf mentioned this pull request Sep 20, 2018

uevent listening goroutine is not (always) working #372

Closed

bergwolf requested review from amshinde and sboeuf September 20, 2018 04:08

devimc reviewed Sep 20, 2018

View reviewed changes

amshinde and others added 3 commits September 20, 2018 23:04

device: fix redefined notify channel

1310f3d

Otherwise the channel we are waiting on is not the one we put in the device watchers map. Fixes: kata-containers#372 Signed-off-by: Peng Tao <bergwolf@gmail.com>

device: rescan pci bus before waiting for new devices

a628496

So that we make sure the device pci kobject shows up. Signed-off-by: Peng Tao <bergwolf@gmail.com>

bergwolf force-pushed the uevent branch from 4a37b61 to a628496 Compare September 20, 2018 15:12

devimc approved these changes Sep 20, 2018

View reviewed changes

amshinde approved these changes Sep 20, 2018

View reviewed changes

sboeuf approved these changes Sep 20, 2018

View reviewed changes

gnawux merged commit f699f87 into kata-containers:master Sep 21, 2018

gnawux mentioned this pull request Sep 21, 2018

Revert "device: check for existing device PCI path before waiting" #375

Closed

grahamwhaley mentioned this pull request Sep 24, 2018

metrics test failed on missing perf command kata-containers/tests#763

Closed

bergwolf deleted the uevent branch April 8, 2019 08:28

amshinde mentioned this pull request Feb 19, 2020

Fix rescan PCI conflicts with shpchp / pciehp kata-containers/runtime#2461

Merged

amshinde mentioned this pull request May 13, 2020

device: Do not rescan PCI bus #782

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix up remaining issues with pci hotplug and uevent listening #380

fix up remaining issues with pci hotplug and uevent listening #380

bergwolf commented Sep 20, 2018

bergwolf commented Sep 20, 2018

codecov bot commented Sep 20, 2018 •

edited

Loading

jodh-intel commented Sep 20, 2018 •

edited by amshinde

Loading

devimc Sep 20, 2018

devimc Sep 20, 2018

devimc Sep 20, 2018

bergwolf commented Sep 20, 2018

devimc left a comment

amshinde commented Sep 20, 2018

sboeuf left a comment

sboeuf commented Sep 20, 2018

bergwolf commented Sep 21, 2018

gnawux commented Sep 21, 2018

grahamwhaley commented Sep 21, 2018

sboeuf commented Sep 21, 2018

amshinde commented Sep 21, 2018

bergwolf commented Sep 22, 2018

gnawux commented Sep 22, 2018 •

edited

Loading

fix up remaining issues with pci hotplug and uevent listening #380

fix up remaining issues with pci hotplug and uevent listening #380

Conversation

bergwolf commented Sep 20, 2018

bergwolf commented Sep 20, 2018

codecov bot commented Sep 20, 2018 • edited Loading

Codecov Report

jodh-intel commented Sep 20, 2018 • edited by amshinde Loading

devimc Sep 20, 2018

Choose a reason for hiding this comment

devimc Sep 20, 2018

Choose a reason for hiding this comment

devimc Sep 20, 2018

Choose a reason for hiding this comment

bergwolf commented Sep 20, 2018

devimc left a comment

Choose a reason for hiding this comment

amshinde commented Sep 20, 2018

sboeuf left a comment

Choose a reason for hiding this comment

sboeuf commented Sep 20, 2018

bergwolf commented Sep 21, 2018

gnawux commented Sep 21, 2018

grahamwhaley commented Sep 21, 2018

sboeuf commented Sep 21, 2018

amshinde commented Sep 21, 2018

bergwolf commented Sep 22, 2018

gnawux commented Sep 22, 2018 • edited Loading

codecov bot commented Sep 20, 2018 •

edited

Loading

jodh-intel commented Sep 20, 2018 •

edited by amshinde

Loading

gnawux commented Sep 22, 2018 •

edited

Loading