-
Notifications
You must be signed in to change notification settings - Fork 96
Direct firmware load failed with error -2 and mdev_supported_types missing (Coffee Lake) #77
Comments
@albionxy I am glad I am not the only one with this issue. I am having a similar problem on my Fedora machine (Dell Precision 5510). When I try the latest gvt-linux kernel, the However, if I use the official Fedora Kernel, I can create the mdev device, but the Edit: You can find my bug report here: #74 |
@CuriousTommy: Thank you for your info! Good to know that it is not just some misconfiguration on my system or something. I can reproduce the issue also with gvt-staging-2019y-01m-10d-11h-40m-01s, which is mentioned to work for Coffee Lake in #14 (always reproducible). |
This might be a long shot, but maybe the |
The error your met is not a bug and it will not occurs mdev_supported_types load failed. |
@CuriousTommy Thank you, your hint was spot on! Adding kvmgt vfio-iommu-type1 vfio-mdev to MODULES=(...) in /etc/mkinitcpio.conf did the trick for me. I guess I've read the respective part in the tutorial over, as this used to be not required, as far as I understand. |
@TerrenceXu: Thank you for the hint. I can boot the gvt-staging (3.3.19) kernel with Windows 10 now. However, a popup saying 'This installation package is not supported by this processor type' shows up when I try to install the Intel installer (tried 25.20.100.6577 and 25.20.100.6519) and 'Error 31' remains. Strangely it says 'Microsoft Basic Display Adapter' instead of 'Intel UHD Graphics 630', which I would have expected to say even before installing the driver. (qemu script I used: TestGvt.txt) |
@albionxy You can try to modify "-vga qxl" to "-vga none", since DMA-BUF not support multi monitor display . |
@TerrenceXu: For some reason I can now install the driver, even with "-vga qxl". Not sure why. However, the
error message remains. Either it indicates a real issue, or it is wrong and should be removed (or changed from error to info, or something). But in either case, I think it is a bug. |
@albionxy , it is not a real issue, and we will remove or change from erro to info, thank you for your reminder! |
Before calling __ip_options_compile(), we need to ensure the network header is a an IPv4 one, and that it is already pulled in skb->head. RAW sockets going through a tunnel can end up calling ipv4_link_failure() with total garbage in the skb, or arbitrary lengthes. syzbot report : BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:355 [inline] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123 Write of size 69 at addr ffff888096abf068 by task syz-executor.4/9204 CPU: 0 PID: 9204 Comm: syz-executor.4 Not tainted 5.1.0-rc5+ #77 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 check_memory_region_inline mm/kasan/generic.c:185 [inline] check_memory_region+0x123/0x190 mm/kasan/generic.c:191 memcpy+0x38/0x50 mm/kasan/common.c:133 memcpy include/linux/string.h:355 [inline] __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123 __icmp_send+0x725/0x1400 net/ipv4/icmp.c:695 ipv4_link_failure+0x29f/0x550 net/ipv4/route.c:1204 dst_link_failure include/net/dst.h:427 [inline] vti6_xmit net/ipv6/ip6_vti.c:514 [inline] vti6_tnl_xmit+0x10d4/0x1c0c net/ipv6/ip6_vti.c:553 __netdev_start_xmit include/linux/netdevice.h:4414 [inline] netdev_start_xmit include/linux/netdevice.h:4423 [inline] xmit_one net/core/dev.c:3292 [inline] dev_hard_start_xmit+0x1b2/0x980 net/core/dev.c:3308 __dev_queue_xmit+0x271d/0x3060 net/core/dev.c:3878 dev_queue_xmit+0x18/0x20 net/core/dev.c:3911 neigh_direct_output+0x16/0x20 net/core/neighbour.c:1527 neigh_output include/net/neighbour.h:508 [inline] ip_finish_output2+0x949/0x1740 net/ipv4/ip_output.c:229 ip_finish_output+0x73c/0xd50 net/ipv4/ip_output.c:317 NF_HOOK_COND include/linux/netfilter.h:278 [inline] ip_output+0x21f/0x670 net/ipv4/ip_output.c:405 dst_output include/net/dst.h:444 [inline] NF_HOOK include/linux/netfilter.h:289 [inline] raw_send_hdrinc net/ipv4/raw.c:432 [inline] raw_sendmsg+0x1d2b/0x2f20 net/ipv4/raw.c:663 inet_sendmsg+0x147/0x5d0 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0xdd/0x130 net/socket.c:661 sock_write_iter+0x27c/0x3e0 net/socket.c:988 call_write_iter include/linux/fs.h:1866 [inline] new_sync_write+0x4c7/0x760 fs/read_write.c:474 __vfs_write+0xe4/0x110 fs/read_write.c:487 vfs_write+0x20c/0x580 fs/read_write.c:549 ksys_write+0x14f/0x2d0 fs/read_write.c:599 __do_sys_write fs/read_write.c:611 [inline] __se_sys_write fs/read_write.c:608 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:608 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x458c29 Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f293b44bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458c29 RDX: 0000000000000014 RSI: 00000000200002c0 RDI: 0000000000000003 RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f293b44c6d4 R13: 00000000004c8623 R14: 00000000004ded68 R15: 00000000ffffffff The buggy address belongs to the page: page:ffffea00025aafc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x1fffc0000000000() raw: 01fffc0000000000 0000000000000000 ffffffff025a0101 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888096abef80: 00 00 00 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 f2 ffff888096abf000: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 >ffff888096abf080: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 ^ ffff888096abf100: 00 00 00 00 f1 f1 f1 f1 00 00 f3 f3 00 00 00 00 ffff888096abf180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Fixes: ed0de45 ("ipv4: recompile ip options in ipv4_link_failure") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Suryaputra <ssuryaextr@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
hello Mr Xu, could you please tell me how to use the tag to try. I'm a newbie of IGPU passthrough. I'd like to pass through my i7-8700k UHD630 to VM MacOS. My system is below: I created a VM, passed through AMD rx580 and installed MacOS 10.15. It works fine. Now I want to add UHD630 to the VM but I can't find the folder "mdev_supported_types". Could you please give me some links for newbie to learn? Thank you very much. Aaron |
@TerrenceXu |
You can run "modprobe kvmgt" in your host and try again. :) |
@TerrenceXu I was referring to my previous comment that the error message still appears:
and that it would be removed:
But I still get this error message with the current kernel. Thus, I wanted to remind that it still needs to be removed. |
@TerrenceXu, I also get the error message on kernel 5.4.23 which is lts, if not bug, could the error message be removed or changed to more informational?
|
And there are no multiple TRBs on EP0 and WA1 workaround, so it doesn't need to change TRB for EP0. It fixes below oops. configfs-gadget gadget: high-speed config #1: b android_work: sent uevent USB_STATE=CONFIGURED Unable to handle kernel read from unreadable memory at virtual address 0000000000000008 Mem abort info: android_work: sent uevent USB_STATE=DISCONNECTED ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000008b5bb7000 [0000000000000008] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 2 PID: 430 Comm: HwBinder:401_1 Not tainted 5.4.24-06071-g6fa8921409c1-dirty #77 Hardware name: Freescale i.MX8QXP MEK (DT) pstate: 60400085 (nZCv daIf +PAN -UAO) pc : cdns3_gadget_ep_dequeue+0x1d4/0x270 lr : cdns3_gadget_ep_dequeue+0x48/0x270 sp : ffff800012763ba0 x29: ffff800012763ba0 x28: ffff00082c653c00 x27: 0000000000000000 x26: ffff000068fa7b00 x25: ffff0000699b2000 x24: ffff00082c6ac000 x23: ffff000834f0a480 x22: ffff000834e87b9c x21: 0000000000000000 x20: ffff000834e87800 x19: ffff000069eddc00 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001 x11: ffff80001180fbe8 x10: 0000000000000001 x9 : ffff800012101558 x8 : 0000000000000001 x7 : 0000000000000006 x6 : ffff000835d9c668 x5 : ffff000834f0a4c8 x4 : 0000000096000000 x3 : 0000000000001810 x2 : 0000000000000000 x1 : ffff800024bd001c x0 : 0000000000000001 Call trace: cdns3_gadget_ep_dequeue+0x1d4/0x270 usb_ep_dequeue+0x34/0xf8 composite_dev_cleanup+0x154/0x170 configfs_composite_unbind+0x6c/0xa8 usb_gadget_remove_driver+0x44/0x70 usb_gadget_unregister_driver+0x74/0xe0 unregister_gadget+0x28/0x58 gadget_dev_desc_UDC_store+0x80/0x110 configfs_write_file+0x1e0/0x2a0 __vfs_write+0x48/0x90 vfs_write+0xe4/0x1c8 ksys_write+0x78/0x100 __arm64_sys_write+0x24/0x30 el0_svc_common.constprop.0+0x74/0x168 el0_svc_handler+0x34/0xa0 el0_svc+0x8/0xc Code: 52830203 b9407660 f94042e4 11000400 (b9400841) ---[ end trace 1574516e4c1772ca ]--- Kernel panic - not syncing: Fatal exception SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x0002,20002008 Memory Limit: none Rebooting in 5 seconds.. Fixes: f616c3b ("usb: cdns3: Fix dequeue implementation") Cc: stable <stable@vger.kernel.org> Signed-off-by: Peter Chen <peter.chen@nxp.com> Signed-off-by: Felipe Balbi <balbi@kernel.org>
I still get this error with kernel |
I still get this error in 5.7.4 arch linux kernel |
the golden state firmware load fail is not fatal, if not found, gvt will use hw value instead, it's just an optimization for module load, as it's error message in kernel api, so can not be skipped in gvt |
Richard reported a warning which can be reproduced by running the LTP madvise6 test (cgroup v1 in the non-hierarchical mode should be used): WARNING: CPU: 0 PID: 12 at mm/page_counter.c:57 page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156) Modules linked in: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.9.0-rc7-22-default #77 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812d-rebuilt.opensuse.org 04/01/2014 Workqueue: events drain_local_stock RIP: 0010:page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156) Call Trace: __memcg_kmem_uncharge (mm/memcontrol.c:3022) drain_obj_stock (./include/linux/rcupdate.h:689 mm/memcontrol.c:3114) drain_local_stock (mm/memcontrol.c:2255) process_one_work (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2274) worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2416) kthread (kernel/kthread.c:292) ret_from_fork (arch/x86/entry/entry_64.S:300) The problem occurs because in the non-hierarchical mode non-root page counters are not linked to root page counters, so the charge is not propagated to the root memory cgroup. After the removal of the original memory cgroup and reparenting of the object cgroup, the root cgroup might be uncharged by draining a objcg stock, for example. It leads to an eventual underflow of the charge and triggers a warning. Fix it by linking all page counters to corresponding root page counters in the non-hierarchical mode. Please note, that in the non-hierarchical mode all objcgs are always reparented to the root memory cgroup, even if the hierarchy has more than 1 level. This patch doesn't change it. The patch also doesn't affect how the hierarchical mode is working, which is the only sane and truly supported mode now. Thanks to Richard for reporting, debugging and providing an alternative version of the fix! Fixes: bf4f059 ("mm: memcg/slab: obj_cgroup API") Reported-by: <ltp@lists.linux.it> Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201026231326.3212225-1-guro@fb.com Debugged-by: Richard Palethorpe <rpalethorpe@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In some scenarios (kdump), dpdma hardware irqs has been enabled when calling request_irq in probe function, and then the dpdma irq handler xilinx_dpdma_irq_handler is invoked to access xdev->chan[i]. But at this moment xdev->chan[i] hasn't been initialized. We should ensure the dpdma controller to be in a consistent and clean state before further initialization. So add dpdma_hw_init() to do this. Furthermore, in xilinx_dpdma_disable_irq, disable all interrupts instead of error interrupts. This patch is to fix the kdump kernel crash as below: [ 3.696128] Unable to handle kernel NULL pointer dereference at virtual address 000000000000012c [ 3.696710] xilinx-zynqmp-dpdma fd4c0000.dma-controller: Xilinx DPDMA engine is probed [ 3.704900] Mem abort info: [ 3.704902] ESR = 0x96000005 [ 3.704905] EC = 0x25: DABT (current EL), IL = 32 bits [ 3.704907] SET = 0, FnV = 0 [ 3.704912] EA = 0, S1PTW = 0 [ 3.713800] ahci-ceva fd0c0000.ahci: supply ahci not found, using dummy regulator [ 3.715585] Data abort info: [ 3.715587] ISV = 0, ISS = 0x00000005 [ 3.715589] CM = 0, WnR = 0 [ 3.715592] [000000000000012c] user address but active_mm is swapper [ 3.715596] Internal error: Oops: 96000005 [#1] SMP [ 3.715599] Modules linked in: [ 3.715608] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-12170-g60894882155f-dirty #77 [ 3.723937] Hardware name: ZynqMP ZCU102 Rev1.0 (DT) [ 3.723942] pstate: 80000085 (Nzcv daIf -PAN -UAO -TCO BTYPE=--) [ 3.723956] pc : xilinx_dpdma_irq_handler+0x418/0x560 [ 3.793049] lr : xilinx_dpdma_irq_handler+0x3d8/0x560 [ 3.798089] sp : ffffffc01186bdf0 [ 3.801388] x29: ffffffc01186bdf0 x28: ffffffc011836f28 [ 3.806692] x27: ffffff8023e0ac80 x26: 0000000000000080 [ 3.811996] x25: 0000000008000408 x24: 0000000000000003 [ 3.817300] x23: ffffffc01186be70 x22: ffffffc011291740 [ 3.822604] x21: 0000000000000000 x20: 0000000008000408 [ 3.827908] x19: 0000000000000000 x18: 0000000000000010 [ 3.833212] x17: 0000000000000000 x16: 0000000000000000 [ 3.838516] x15: 0000000000000000 x14: ffffffc011291740 [ 3.843820] x13: ffffffc02eb4d000 x12: 0000000034d4d91d [ 3.849124] x11: 0000000000000040 x10: ffffffc0112d2d48 [ 3.854428] x9 : ffffffc0112d2d40 x8 : ffffff8021c00268 [ 3.859732] x7 : 0000000000000000 x6 : ffffffc011836000 [ 3.865036] x5 : 0000000000000003 x4 : 0000000000000000 [ 3.870340] x3 : 0000000000000001 x2 : 0000000000000000 [ 3.875644] x1 : 0000000000000000 x0 : 000000000000012c [ 3.880948] Call trace: [ 3.883382] xilinx_dpdma_irq_handler+0x418/0x560 [ 3.888079] __handle_irq_event_percpu+0x5c/0x178 [ 3.892774] handle_irq_event_percpu+0x34/0x98 [ 3.897210] handle_irq_event+0x44/0xb8 [ 3.901030] handle_fasteoi_irq+0xd0/0x190 [ 3.905117] generic_handle_irq+0x30/0x48 [ 3.909111] __handle_domain_irq+0x64/0xc0 [ 3.913192] gic_handle_irq+0x78/0xa0 [ 3.916846] el1_irq+0xc4/0x180 [ 3.919982] cpuidle_enter_state+0x134/0x2f8 [ 3.924243] cpuidle_enter+0x38/0x50 [ 3.927810] call_cpuidle+0x1c/0x40 [ 3.931290] do_idle+0x20c/0x270 [ 3.934502] cpu_startup_entry+0x28/0x58 [ 3.938410] rest_init+0xbc/0xcc [ 3.941631] arch_call_rest_init+0x10/0x1c [ 3.945718] start_kernel+0x51c/0x558 Fixes: 7cbb0c6 ("dmaengine: xilinx: dpdma: Add the Xilinx DisplayPort DMA engine driver") Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com> Link: https://lore.kernel.org/r/20210430064041.4058180-1-quanyang.wang@windriver.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
this kernel message still appears despite passthrough working
ubuntu jammy 22.04LTS kernel 5.15.0-27-generic |
I'm also getting the "golden_hw_state failed with error -2" error, it doesn't stop me from creating and using vgpu's though, what does this error affect? System: Arch Linux (5.17.5-arch1-1) |
@smsilva98 It seems to be a superfluous error message, which however may confuse clutters the log. Not being able to reopen the ticket myself, I suggest you open a new ticket (unless @TerrenceXu reopens this one) |
@ChristophSchmidpeter Okay thank you, I opened a new issue at #212 |
When request_firmware runs its prints an error message "Direct firmware load for i915/gvt/vid_0x8086_did_0x5917_rid_0x07.golden_hw_state failed with error -2 " into the logs, as mentioned in issue intel#77 this is not a real error message and should be removed. firmware_request_nowarn is the proper system call to use. It runs the exact same code as request_firmware except passes an extra bit to the internally called _request_firmware to disable the branch of code that prints the error message. I posted and discussed some of the internal code in issue intel#212
I have built the gvt-staging, but after booting with required kernel parameters (see below) mdev_supported_types does not appear. Dmesg prints the error (see also DmesgGvtStaging.txt):
i915 0000:00:02.0: Direct firmware load for i915/gvt/vid_0x8086_did_0x3e9b_rid_0x00.golden_hw_state failed with error -2
System: Arch Linux
CPU: Coffee Lake (i7-8750H)
GPU: Optimus (UHD Graphics 630 + Quadro P600)
Driver: bumblebee 3.2.1-20 xf86-video-intel 1:2.99.917+860+g3a2dec17-1 nvidia-dkms 418.43-4
Branch: gvt-staging (built 03.03.19)
Kernel Parameters for dmesg: i915.enable_gvt=1 kvm.ignore_msrs=1 intel_iommu=on ignore_loglevel drm.debug=2 "dyndbg=file gvt +p"
The text was updated successfully, but these errors were encountered: