Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

IOMMU groups with multiple devices #708

Closed
eguzman3 opened this issue Sep 10, 2018 · 3 comments
Closed

IOMMU groups with multiple devices #708

eguzman3 opened this issue Sep 10, 2018 · 3 comments

Comments

@eguzman3
Copy link
Contributor

Description of problem

When passing the VFIO device file corresponding to an IOMMU group comprised of multiple devices, the QMP device_add successfully adds the first device but fails to add any devices after that.

This seems to happen because only one device ID is generated per VFIO device file (/dev/vfio/XX). When attaching the PCI devices associated with this IOMMU group via the QMP device_add command, the id parameter is set to generated device ID resulting in the Duplicate ID error (see log below) for any PCI devices after the first one.

Potential Fixes

  • generate VFIO devices IDs per device/BDF rather than per device file/IOMMU group
  • don't pass the id parameter to the device_add command since it's optional

Log

CLI: docker run --runtime=kata --device=/dev/vfio/34 --rm -it busybox

Device 15:00.0 has already been successfully attached with id = vfio-97d2a47534f15ff4 (prior to this part of the log). This shows device 15:00.1 failing to hotplug since it shares the same id.

time="2018-09-10T14:46:16.107445474-07:00" level=info msg="{"error": {"class": "GenericError", "desc": "Duplicate ID 'vfio-97d2a47534f15ff4' for device"}}" arch=amd64 command=create container=dd9219202646dd82c0d4ba829426da75a93ff995b763c5e623f4904bf2dee2ff name=kata-runtime pid=21953 source=virtcontainers subsystem=qmp
time="2018-09-10T14:46:16.107681462-07:00" level=error msg="failed to hotplug VFIO device" arch=amd64 command=create container=dd9219202646dd82c0d4ba829426da75a93ff995b763c5e623f4904bf2dee2ff error="QMP command failed" name=kata-runtime pid=21953 sandbox=dd9219202646dd82c0d4ba829426da75a93ff995b763c5e623f4904bf2dee2ff sandboxid=dd9219202646dd82c0d4ba829426da75a93ff995b763c5e623f4904bf2dee2ff source=virtcontainers subsystem=sandbox vfio device BDF="15:00.1" vfio device ID=vfio-97d2a47534f15ff4
time="2018-09-10T14:46:16.107826113-07:00" level=error msg="Failed to add device" arch=amd64 command=create container=dd9219202646dd82c0d4ba829426da75a93ff995b763c5e623f4904bf2dee2ff error="QMP command failed" name=kata-runtime pid=21953 source=virtcontainers subsystem=device
time="2018-09-10T14:46:16.211620142-07:00" level=error msg="QMP command failed" arch=amd64 command=create container=dd9219202646dd82c0d4ba829426da75a93ff995b763c5e623f4904bf2dee2ff name=kata-runtime pid=21953 source=runtime

@sboeuf
Copy link

sboeuf commented Sep 11, 2018

The issue is not completely clear to me, but it sounds like once you passed the first device, the IOMMU group is locked, and since all the other devices that you're trying to pass are part of the same group, you cannot assign them to this VM.
Is this something that you can make work with Qemu only?
I am trying to identify if this is Kata specific, and if yes, making sure we're identifying the root cause of this failure.

@eguzman3
Copy link
Contributor Author

Each VFIO device file (/dev/vfio/XX) gets assigned one device ID, regardless of how many actual PCI devices this file represents.

if devInfo.ID, err = dm.newDeviceID(); err != nil {

Then when resolving the PCI devices represented by the VFIO device file, the device ID of the file is used as an ID for each of the VFIO PCI devices.

ID: utils.MakeNameID("vfio", device.DeviceInfo.ID, maxDevIDSize),

These IDs are then passed to QEMU which expects devices to have unique IDs.

if err := q.qmpMonitorCh.qmp.ExecutePCIVFIODeviceAdd(q.qmpMonitorCh.ctx, devID, device.BDF, addr, bridge.ID); err != nil {

@amshinde
Copy link
Member

@sboeuf This is a Kata specific issue. In fact this looks like a regression since the device refactoring as I have been able to pass multiple devices in an iommu group in the past.
@eguzman3 We need to generate unique IDs for each PCI device in the group, a simple approach would be to just append the index of the device in the devices slice while passing the ID to qemu.

eguzman3 added a commit to eguzman3/runtime that referenced this issue Sep 11, 2018
Adds per-device VFIO ids allowing IOMMU groups with
multiple devices to be passed to qemu.

Fixes kata-containers#708

Signed-off-by: Edward Guzman <eguzman@nvidia.com>
eguzman3 added a commit to eguzman3/runtime that referenced this issue Sep 12, 2018
Adds per-device VFIO ids allowing IOMMU groups with
multiple devices to be passed to qemu.

Fixes kata-containers#708

Signed-off-by: Edward Guzman <eguzman@nvidia.com>
egernst pushed a commit to egernst/runtime that referenced this issue Feb 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants