Skip to content
This repository has been archived by the owner on Oct 3, 2024. It is now read-only.

Libvirt Fails to Use GVT-G Device (iommu_group: No such file or directory) #74

Closed
CuriousTommy opened this issue Feb 23, 2019 · 22 comments

Comments

@CuriousTommy
Copy link

I have a more detailed post here.

To summarize, I use to be able to run GVT-g on my Fedora 29 machine. But for unrelated reasons, I had to reinstall it. I followed this guide to get GVT-g working on my machine.

Since I already did the steps, I re-imported my win10.xml file, edited my grub file to enable GVT-g and intel-iommu, and manually created the mdev device.

# This is the command I used to create the mdev device.
echo fff6f017-3417-4ad3-b05e-17ae3e1a4615 | sudo tee /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_8/create

But now, no matter what I do, I always get this error...

Error starting domain: internal error: Process exited prior to exec: libvirt:  error : failed to access '/sys/bus/mdev/devices/fff6f017-3417-4ad3-b05e-17ae3e1a4615/iommu_group': No such file or directory

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 66, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1420, in startup
    self._backend.create()
  File "/usr/lib64/python3.7/site-packages/libvirt.py", line 1080, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirt.libvirtError: internal error: Process exited prior to exec: libvirt:  error : failed to access '/sys/bus/mdev/devices/fff6f017-3417-4ad3-b05e-17ae3e1a4615/iommu_group': No such file or directory

While trying to find a solution, I noticed that the mdev device does exist, but iommu_group does not exist. The weird thing is that my laptop is able to show the IOMMU groups just fine.

[root@dell-precision-5510-laptop-localdomain systemd-services]# '/sys/bus/mdev/devices/fff6f017-3417-4ad3-b05e-17ae3e1a4615/'
bash: /sys/bus/mdev/devices/fff6f017-3417-4ad3-b05e-17ae3e1a4615/: Is a directory
[root@dell-precision-5510-laptop-localdomain systemd-services]# '/sys/bus/mdev/devices/fff6f017-3417-4ad3-b05e-17ae3e1a4615/iommu_group'
bash: /sys/bus/mdev/devices/fff6f017-3417-4ad3-b05e-17ae3e1a4615/iommu_group: No such file or directory

Does anyone know what I am doing wrong here?

@TerrenceXu
Copy link

@CuriousTommy, We will have a look, do you means the vgpu be created successfully and the iommu_group cannot be found?

@CuriousTommy
Copy link
Author

@TerrenceXu When you say "created successfully", are you asking if the UUID device shows up in /sys/bus/mdev/devices/? I can confirm that it does exist there. I also provided a listing of the directory (using ls -al), showing that iommu_group doesn't exist.

[user@dell-precision-5510-laptop-localdomain ~]$ cd /sys/bus/mdev/devices/fff6f017-3417-4ad3-b05e-17ae3e1a4615/
[user@dell-precision-5510-laptop-localdomain fff6f017-3417-4ad3-b05e-17ae3e1a4615]$ ls -al
total 0
drwxr-xr-x.  4 root root    0 Feb 25 08:12 .
drwxr-xr-x. 11 root root    0 Feb 25 08:09 ..
drwxr-xr-x.  2 root root    0 Feb 25 08:12 intel_vgpu
lrwxrwxrwx.  1 root root    0 Feb 25 08:12 mdev_type -> ../mdev_supported_types/i915-GVTg_V5_8
drwxr-xr-x.  2 root root    0 Feb 25 08:12 power
--w-------.  1 root root 4096 Feb 25 08:12 remove
lrwxrwxrwx.  1 root root    0 Feb 25 08:12 subsystem -> ../../../../bus/mdev
-rw-r--r--.  1 root root 4096 Feb 25 08:12 uevent

@CuriousTommy
Copy link
Author

I did a dmesg | grep "gvt" and noticed something interesting here:

[    3.028975] i915 0000:00:02.0: Direct firmware load for i915/gvt/vid_0x8086_did_0x191b_rid_0x06.golden_hw_state failed with error -2

Do you think this is the cause of my issue? Or is this unrelated?

@TerrenceXu
Copy link

@CuriousTommy , i think this issue is not related to your highlight log, what is the full dmesg log?

@CuriousTommy
Copy link
Author

@hyuan3
Copy link
Contributor

hyuan3 commented Feb 27, 2019

Is the full log dumped after the command to create vgpu mdev device?

@CuriousTommy
Copy link
Author

After I created the mdev device and try to start the VM on libvirt (from a fresh boot).

@fred1gao
Copy link
Contributor

@CuriousTommy
looks like Nvidia' gfx driver also loaded from dmesg log ,
can you disable the NV gfx , then retry ?
[drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1

pls add drm.debug=0xff in grub.cfg after NV gfx driver is disabled.
thanks

@CuriousTommy
Copy link
Author

@fred1gao I updated my grub file to add the debug command:

GRUB_CMDLINE_LINUX="resume=/dev/mapper/fedora_localhost--live-swap rd.lvm.lv=fedora_localhost-live/root rd.luks.uuid=luks-65690778-7a43-4784-90a1-961ea0cc4069 rd.lvm.lv=fedora_localhost-live/swap kvm.ignore_msrs=1 intel_iommu=on i915.enable_gvt=1 i915.enable_guc=0 drm.debug=0xff rhgb quiet"

I also blacklisted nouveau in /etc/modprobe.d/

Here is the new output of dmesg: result2.txt
However, it seems like the older messages are cut out for some reason.

@fred1gao
Copy link
Contributor

@CuriousTommy
Same result with blacklisted nouveau?
if so, can you try to capture the full dmesg log or stuff in /var/log/kern.log?

@CuriousTommy
Copy link
Author

Same result with blacklisted nouveau?

Yup, still complains about iommu_group not existing.

if so, can you try to capture the full dmesg log or stuff in /var/log/kern.log?

I am going to ask around and find out how to do a full dmesg log dump. I will provide a full dump once I figure this out. Also, /var/log/kern.log does not seem to exist on my system.

$ cat /var/log/kern.log
cat: /var/log/kern.log: No such file or directory

@CuriousTommy
Copy link
Author

@fred1gao I got the entire kernel log! Be warned that the file is kind of large (10Mb)
result3.txt

@fred1gao
Copy link
Contributor

@CuriousTommy the dmesg log looks fine, can you replace intel_iommu=on with intel_iommu=igfx_off in grub.cfg ?
BTW: have you done the below command to create vgpu instance?
echo "fff6f017-3417-4ad3-b05e-17ae3e1a4615" > "/sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_4/create"
--prefer i915-GVTg_V5_4 if there is no BIG RAM size.

578.101534] iommu: Adding device b323e53e-782f-41bb-a7b8-3538d76fc73a to group 0
[ 578.101538] vfio_mdev b323e53e-782f-41bb-a7b8-3538d76fc73a: MDEV: group_id = 0

@CuriousTommy
Copy link
Author

CuriousTommy commented Feb 28, 2019

can you replace intel_iommu=on with intel_iommu=igfx_off in grub.cfg?

Does intel_iommu=igfx_off disable my integrated graphics (because I need that)?

Edit: Would I still be able to display stuff on my screen if I apply this option?

BTW: have you done the below command to create vgpu instance?

echo "fff6f017-3417-4ad3-b05e-17ae3e1a4615" > "/sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_4/create"

--prefer i915-GVTg_V5_4 if there is no BIG RAM size.

A while back I tried to use i915-GVTg_V5_4, but it slowed down my machine. But I willing to try it again later and report my results.

@CuriousTommy
Copy link
Author

@fred1gao (Just in case you didn't see my edit) does having intel_iommu=igfx_off still let me display stuff on the screen? I don't want to accidentally render my laptop unusable.

On a side note, I tried to compile the latest gvt-linux kernel to see if that would resolve my issue. But running that kernel results in me no longer being able to create a mdev device (If I recall correctly, mdev_supported_types doesn't exist).

@fred1gao
Copy link
Contributor

fred1gao commented Mar 4, 2019

@CuriousTommy

  1. intel_iommu=igfx_off will not disable your intel_iommu=igfx_off and your display stuff should be ok as well.
  2. but it slowed down my machine
    plse reset drm.debug=0x0 in grub.cfg if your machine is slow.
  3. pls send the full log after below cmd.
    echo "fff6f017-3417-4ad3-b05e-17ae3e1a4615" > "/sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_4/create"

@CuriousTommy
Copy link
Author

CuriousTommy commented Mar 4, 2019

@fred1gao Here is the dmesg dump result4.txt.

Here are the grub options I set, in case you want to see them:

GRUB_CMDLINE_LINUX="resume=/dev/mapper/fedora_localhost--live-swap rd.lvm.lv=fedora_localhost-live/root rd.luks.uuid=luks-65690778-7a43-4784-90a1-961ea0cc4069 rd.lvm.lv=fedora_localhost-live/swap kvm.ignore_msrs=1 intel_iommu=igfx_off i915.enable_gvt=1 i915.enable_guc=0 log_buf_len=2000M drm.debug=0xff rhgb quiet"

@fred1gao
Copy link
Contributor

fred1gao commented Mar 5, 2019

@CuriousTommy
pls do the extra step: create VGPU instance, which supposed to create IOMMU group
echo "fff6f017-3417-4ad3-b05e-17ae3e1a4615" > "/sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_4/create"

I can not see the below expected log :
578.101534] iommu: Adding device b323e53e-782f-41bb-a7b8-3538d76fc73a to group 0
[ 578.101538] vfio_mdev b323e53e-782f-41bb-a7b8-3538d76fc73a: MDEV: group_id = 0

@CuriousTommy
Copy link
Author

CuriousTommy commented Mar 5, 2019

@fred1gao That odd... I could have sworn that I used sudo su to get that command to work. I will redo it again, but I will also add a link to a video showing the steps I did to obtain the log.

@CuriousTommy
Copy link
Author

CuriousTommy commented Mar 5, 2019

@fred1gao Here is the log and video showing that I did it.
result5.txt
Screencast from 03-05-2019 11:36:03 AM.zip

Edit: The messages don't appear, but you can see from the video I provided, I did typed the command.

@apnix-uk
Copy link

apnix-uk commented Mar 8, 2019

Is the vfio_mdev module loaded?

It wasn't for me and I had what seemed to be pretty much the exact same issue. I loaded the module and iommu_group appeared in /sys/bus/mdev/devices/my-uuid/ immediately. I now have a different error but that's progress for you :-p

@CuriousTommy
Copy link
Author

CuriousTommy commented Mar 9, 2019

@apnix-uk HELL YEAH!!! Thank you kind human creature! This was my issue.

lsmod | grep "vfio_mdev" didn't list the module, but after I used sudo modprobe vfio_mdev to enable it, I was able to get my GVT-g VM working again! So I guess this issue can be closed.

Edit: If you want to apply this to dracut, make sure you apply both vfio_mdev and vfio_iommu_type1.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants