Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase the dkms module on the linux-intel-lts 6.6/linux branch #207

Closed
wants to merge 24 commits into from

Conversation

bbaa-bbaa
Copy link
Contributor

@bbaa-bbaa bbaa-bbaa commented Oct 6, 2024

Rebase the dkms module on the linux-intel-lts 6.6/linux branch
Based on lts-v6.6.63-linux-241126T173815Z

Test on kernel 6.12.4-zen1 with ArchLinux / 6.8.12-5-pve with Proxmox VE
Other versions may need more testing.

Required kernel version: 6.8 ~ 6.12
The upstream tree of Intel seems to rely on features backported from 6.8, and more research is needed to make the module work in the 6.6 kernel.

Migrating to the 6.6 branch means that we need to drop support for kernels before 6.8.
We may need to create a new branch to preserve the original 6.1 kernel-based branch.

@bbaa-bbaa bbaa-bbaa marked this pull request as draft October 6, 2024 11:52
@bbaa-bbaa bbaa-bbaa changed the title [WIP]Rebase the dkms module on the linux-intel-lts 6.6/linux branch [RFC]Rebase the dkms module on the linux-intel-lts 6.6/linux branch Oct 7, 2024
@bbaa-bbaa bbaa-bbaa marked this pull request as ready for review October 7, 2024 12:46
@RaidenSummoner
Copy link

RaidenSummoner commented Oct 24, 2024

I have tried your patch on pve which uses kernel 6.8.12-2, when I reboot the guest machine, it still report an error:

  • [ 609.615343] i915 0000:00:02.0: [drm] ERROR [CRTC:80:pipe A] mismatch in post_csc_lut hw_state doesn't match sw_state

  • [ 609.615353] ------------[ cut here ]------------

  • [ 609.615354] i915 0000:00:02.0: pipe state doesn't match!

  • [ 609.615397] WARNING: CPU: 1 PID: 980 at /var/lib/dkms/i915-sriov-dkms/2024.10.07/build/drivers/gpu/drm/i915/display/intel_modeset_verify.c:222 intel_modeset_verify_crtc+0x5d6/0x6b0 [i915]

  • [ 609.615558] Modules linked in: tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter nf_tables nvme_fabrics overlay qrtr bonding tls softdog sunrpc binfmt_misc nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_sof_pci_intel_tgl snd_sof_intel_hda_common kvm soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda crct10dif_pclmul snd_sof_pci polyval_clmulni snd_sof_xtensa_dsp polyval_generic ghash_clmulni_intel snd_sof sha256_ssse3 snd_sof_utils sha1_ssse3 snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match aesni_intel snd_soc_acpi crypto_simd soundwire_generic_allocation snd_hda_codec_hdmi cryptd snd_hda_codec_realtek snd_hda_codec_generic soundwire_bus snd_soc_core snd_compress ac97_bus

  • [ 609.615599] snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec cmdlinepart snd_hda_core snd_hwdep snd_pcm spi_nor rapl snd_timer intel_cstate serio_raw wmi_bmof pcspkr mtd snd mei_me soundcore ee1004 mei igen6_edac intel_pmc_core intel_vsec pmt_telemetry pmt_class acpi_pad acpi_tad mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd parport_pc ppdev lp parport efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c i915(OE) xe drm_gpuvm drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper drm_ttm_helper ttm nvme spi_intel_pci nvme_core drm_display_helper r8169 xhci_pci xhci_pci_renesas crc32_pclmul psmouse spi_intel realtek cec nvme_auth i2c_i801 i2c_smbus xhci_hcd ahci rc_core libahci video wmi

  • [ 609.615646] CPU: 1 PID: 980 Comm: Xorg Tainted: P U W OE 6.8.12-2-pve SR-IOV cannot be turned on. #1

  • [ 609.615648] Hardware name: Intel ITX-N100-2L/ITX-N100-2L, BIOS IN100L06 07/17/2024

  • [ 609.615650] RIP: 0010:intel_modeset_verify_crtc+0x5d6/0x6b0 [i915]

  • [ 609.615781] Code: fc ff ff 49 8b 7f 08 48 8b 5f 50 48 85 db 75 03 48 8b 1f e8 ac 59 dc db 48 89 da 48 c7 c7 b0 d2 d4 c0 48 89 c6 e8 ea 5b 3d db <0f> 0b e9 89 fd ff ff 80 fb 01 0f 87 f8 ec 0c 00 41 0f b6 8c 24 50

  • [ 609.615783] RSP: 0018:ffff9d68445f7838 EFLAGS: 00010246

  • [ 609.615786] RAX: 0000000000000000 RBX: ffff8c89414b88a0 RCX: 0000000000000000

  • [ 609.615787] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000

  • [ 609.615788] RBP: ffff9d68445f78b0 R08: 0000000000000000 R09: 0000000000000000

  • [ 609.615790] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c894bbb2000

  • [ 609.615791] R13: ffff8c8943882000 R14: ffff8c8960306000 R15: ffff8c8963ca8000

  • [ 609.615792] FS: 000077cacce1cac0(0000) GS:ffff8c8cafa80000(0000) knlGS:0000000000000000

  • [ 609.615794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

  • [ 609.615796] CR2: 00007ffa84d627d8 CR3: 00000001082f0005 CR4: 0000000000f72ef0

  • [ 609.615798] PKRU: 55555554

  • [ 609.615799] Call Trace:

  • [ 609.615800]

  • [ 609.615804] ? show_regs+0x6d/0x80

  • [ 609.615810] ? __warn+0x89/0x160

  • [ 609.615813] ? intel_modeset_verify_crtc+0x5d6/0x6b0 [i915]

  • [ 609.615959] ? report_bug+0x17e/0x1b0

  • [ 609.615963] ? handle_bug+0x46/0x90

  • [ 609.615966] ? exc_invalid_op+0x18/0x80

  • [ 609.615968] ? asm_exc_invalid_op+0x1b/0x20

  • [ 609.615972] ? intel_modeset_verify_crtc+0x5d6/0x6b0 [i915]

  • [ 609.616110] ? intel_modeset_verify_crtc+0x5d6/0x6b0 [i915]

  • [ 609.616243] ? intel_fbc_nuke+0x42/0xd0 [i915]

  • [ 609.616378] intel_atomic_commit_tail+0x974/0xf60 [i915]

  • [ 609.616528] intel_atomic_commit+0x3bd/0x410 [i915]

  • [ 609.616660] drm_atomic_commit+0x96/0xd0

  • [ 609.616664] ? __pfx___drm_printfn_info+0x10/0x10

  • [ 609.616666] drm_atomic_connector_commit_dpms+0xd7/0x100

  • [ 609.616669] drm_mode_obj_set_property_ioctl+0x1ad/0x3e0

  • [ 609.616672] ? __pfx_drm_connector_property_set_ioctl+0x10/0x10

  • [ 609.616674] drm_connector_property_set_ioctl+0x3b/0x60

  • [ 609.616676] drm_ioctl_kernel+0xb9/0x120

  • [ 609.616678] drm_ioctl+0x2c2/0x530

  • [ 609.616680] ? __pfx_drm_connector_property_set_ioctl+0x10/0x10

  • [ 609.616682] __x64_sys_ioctl+0xa0/0xf0

  • [ 609.616686] x64_sys_call+0xa68/0x24b0

  • [ 609.616688] do_syscall_64+0x81/0x170

  • [ 609.616690] ? __x64_sys_setitimer+0x15b/0x1b0

  • [ 609.616694] ? syscall_exit_to_user_mode+0x89/0x260

  • [ 609.616697] ? do_syscall_64+0x8d/0x170

  • [ 609.616699] ? __sys_recvmsg+0xc6/0xe0

  • [ 609.616703] ? syscall_exit_to_user_mode+0x89/0x260

  • [ 609.616705] ? do_syscall_64+0x8d/0x170

  • [ 609.616707] ? syscall_exit_to_user_mode+0x89/0x260

  • [ 609.616709] ? do_syscall_64+0x8d/0x170

  • [ 609.616711] ? do_syscall_64+0x8d/0x170

  • [ 609.616713] ? irqentry_exit+0x43/0x50

  • [ 609.616716] entry_SYSCALL_64_after_hwframe+0x78/0x80

  • [ 609.616718] RIP: 0033:0x77caccb1cc5b

  • [ 609.616737] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00

  • [ 609.616738] RSP: 002b:00007ffd65f02a30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010

  • [ 609.616741] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000077caccb1cc5b

  • [ 609.616742] RDX: 00007ffd65f02ac0 RSI: 00000000c01064ab RDI: 0000000000000010

  • [ 609.616743] RBP: 00007ffd65f02ac0 R08: 0000000000000002 R09: 0000000000000000

  • [ 609.616744] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c01064ab

  • [ 609.616745] R13: 0000000000000010 R14: 00005eeb050d6c00 R15: 00005eeb06d6d5a0

  • [ 609.616747]

  • [ 609.616748] ---[ end trace 0000000000000000 ]---

  • [ 621.996225] i915 0000:00:02.0: VF1 FLR

  • [ 649.356926] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 727

  • [ 651.404802] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 728

  • [ 653.452670] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 729

  • [ 655.500536] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 730

But my host device won't stuck after applying your patch, is this trace a great concern?
UPDATE: After uninstalling my desktop environment on PVE, I no more get this error and it stll works well now.

@bbaa-bbaa
Copy link
Contributor Author

bbaa-bbaa commented Oct 25, 2024

But my host device won't stuck after applying your patch, is this trace a great concern?

I don't know.
post_csc_lut seems is a struct about color management. Due to color management in the upstream Intel repository depends on DRM changes that have not been merged into the mainline tree, I just simply reverted the relevant commits (https://github.com/bbaa-bbaa/linux-intel-lts/tree/6bf2df6d263e101cbb9a2c996fa6de30ddd13e01/drivers/gpu/drm/i915). There may be some unexpected side effects.

Could you add drm.debug=0x19f log_buf_len=4M to the kernel cmdline to provide more logs for investigation?

@admarty
Copy link

admarty commented Oct 26, 2024

Thank you for this fork! I had many errors in dmesg when using the original repo, but after trying your fork, all the errors are gone.
I'm using Proxmox VE 8.2, kernel 6.8.12-2-pve, on an Intel N100 CPU.

@RaidenSummoner
Copy link

RaidenSummoner commented Oct 26, 2024

But my host device won't stuck after applying your patch, is this trace a great concern?

I don't know. post_csc_lut seems is a struct about color management. Due to color management in the upstream Intel repository depends on DRM changes that have not been merged into the mainline tree, I just simply reverted the relevant commits (https://github.com/bbaa-bbaa/linux-intel-lts/tree/6bf2df6d263e101cbb9a2c996fa6de30ddd13e01/drivers/gpu/drm/i915). There may be some unexpected side effects.

Could you add drm.debug=0x19f log_buf_len=4M to the kernel cmdline to provide more logs for investigation?

After uninstalling desktop environment, I no more get these errors, but my Windows 10/11 VM randomly either works well or gets error 43 in device manager and nothing can be found in dmesg, and I can't find any solutions to fix it, so I temporarily disabled it.

@admarty
Copy link

admarty commented Oct 26, 2024

But my host device won't stuck after applying your patch, is this trace a great concern?

I don't know. post_csc_lut seems is a struct about color management. Due to color management in the upstream Intel repository depends on DRM changes that have not been merged into the mainline tree, I just simply reverted the relevant commits (https://github.com/bbaa-bbaa/linux-intel-lts/tree/6bf2df6d263e101cbb9a2c996fa6de30ddd13e01/drivers/gpu/drm/i915). There may be some unexpected side effects.
Could you add drm.debug=0x19f log_buf_len=4M to the kernel cmdline to provide more logs for investigation?

After uninstalling desktop environment, I no more get these errors, but my Windows 10/11 VM randomly either works well or gets error 43 in device manager and nothing can be found in dmesg, and I can't find any solutions to fix it, so I temporarily disabled it.

Have you set the cpu = host?

@RaidenSummoner
Copy link

RaidenSummoner commented Oct 26, 2024

But my host device won't stuck after applying your patch, is this trace a great concern?

I don't know. post_csc_lut seems is a struct about color management. Due to color management in the upstream Intel repository depends on DRM changes that have not been merged into the mainline tree, I just simply reverted the relevant commits (https://github.com/bbaa-bbaa/linux-intel-lts/tree/6bf2df6d263e101cbb9a2c996fa6de30ddd13e01/drivers/gpu/drm/i915). There may be some unexpected side effects.
Could you add drm.debug=0x19f log_buf_len=4M to the kernel cmdline to provide more logs for investigation?

After uninstalling desktop environment, I no more get these errors, but my Windows 10/11 VM randomly either works well or gets error 43 in device manager and nothing can be found in dmesg, and I can't find any solutions to fix it, so I temporarily disabled it.

Have you set the cpu = host?

Yes. This problem happens randomly. I have rebooted my VM for 10 times, 7 times works well and 3 times initialize failed with error 43.I found a solution:

  • Because Intel's driver fixes pciid to 02.0, when we use 02.1/2/3 and other sriov core graphics, it cannot be driven normally. Even if PCI-Express is checked in the direct sriov core graphics to show normal driving, some software will not work properly due to pciid, such as streaming media Jellyfin Emby, etc. cannot be hard-decoded and transcoded.

  • We need to specify the address of the direct core graphics in the virtual machine conf

  • nano /etc/pve/qemu-server/virtual machine serial number.conf

  • Add:

  • args: -set device.hostpci0.addr=02.0 -set device.hostpci0.x-igd-gms=0x2

but after trying it, GPU totally disappeared in my VM.

@admarty
Copy link

admarty commented Oct 26, 2024

But my host device won't stuck after applying your patch, is this trace a great concern?

I don't know. post_csc_lut seems is a struct about color management. Due to color management in the upstream Intel repository depends on DRM changes that have not been merged into the mainline tree, I just simply reverted the relevant commits (https://github.com/bbaa-bbaa/linux-intel-lts/tree/6bf2df6d263e101cbb9a2c996fa6de30ddd13e01/drivers/gpu/drm/i915). There may be some unexpected side effects.
Could you add drm.debug=0x19f log_buf_len=4M to the kernel cmdline to provide more logs for investigation?

After uninstalling desktop environment, I no more get these errors, but my Windows 10/11 VM randomly either works well or gets error 43 in device manager and nothing can be found in dmesg, and I can't find any solutions to fix it, so I temporarily disabled it.

Have you set the cpu = host?

Yes. This problem happens randomly. I have rebooted my VM for 10 times, 7 times works well and 3 times initialize failed with error 43.I found a solution:

  • Because Intel's driver fixes pciid to 02.0, when we use 02.1/2/3 and other sriov core graphics, it cannot be driven normally. Even if PCI-Express is checked in the direct sriov core graphics to show normal driving, some software will not work properly due to pciid, such as streaming media Jellyfin Emby, etc. cannot be hard-decoded and transcoded.
  • We need to specify the address of the direct core graphics in the virtual machine conf
  • nano /etc/pve/qemu-server/virtual machine serial number.conf
  • Add:
  • args: -set device.hostpci0.addr=02.0 -set device.hostpci0.x-igd-gms=0x2

but after trying it, GPU totally disappeared in my VM.

Can you try this instead:
qm set {VMID} --args "-cpu host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt,hv-vendor-id=GenuineIntel"

@RaidenSummoner
Copy link

RaidenSummoner commented Oct 26, 2024

But my host device won't stuck after applying your patch, is this trace a great concern?

I don't know. post_csc_lut seems is a struct about color management. Due to color management in the upstream Intel repository depends on DRM changes that have not been merged into the mainline tree, I just simply reverted the relevant commits (https://github.com/bbaa-bbaa/linux-intel-lts/tree/6bf2df6d263e101cbb9a2c996fa6de30ddd13e01/drivers/gpu/drm/i915). There may be some unexpected side effects.
Could you add drm.debug=0x19f log_buf_len=4M to the kernel cmdline to provide more logs for investigation?

After uninstalling desktop environment, I no more get these errors, but my Windows 10/11 VM randomly either works well or gets error 43 in device manager and nothing can be found in dmesg, and I can't find any solutions to fix it, so I temporarily disabled it.

Have you set the cpu = host?

Yes. This problem happens randomly. I have rebooted my VM for 10 times, 7 times works well and 3 times initialize failed with error 43.I found a solution:

  • Because Intel's driver fixes pciid to 02.0, when we use 02.1/2/3 and other sriov core graphics, it cannot be driven normally. Even if PCI-Express is checked in the direct sriov core graphics to show normal driving, some software will not work properly due to pciid, such as streaming media Jellyfin Emby, etc. cannot be hard-decoded and transcoded.
  • We need to specify the address of the direct core graphics in the virtual machine conf
  • nano /etc/pve/qemu-server/virtual machine serial number.conf
  • Add:
  • args: -set device.hostpci0.addr=02.0 -set device.hostpci0.x-igd-gms=0x2

but after trying it, GPU totally disappeared in my VM.

Can you try this instead: qm set {VMID} --args "-cpu host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt,hv-vendor-id=GenuineIntel"

Thanks! :args -cpu hv-vendor-id=GenuineIntel seems to be worked for me with latest xe driver for Windows.

@johntdavis84
Copy link

johntdavis84 commented Oct 27, 2024

What are the advantages of making this change?

I'm asking for a couple of reasons, as I anticipate these are issues others will have:

  1. It seems to make Windows VMs more difficult to work with? I've never had to manually edit my Windows VM conf file to get SR-IOV working. In fact, it's been easier to manage the guest-side drivers in Windows than in Linux.
  2. Kernel 6.5-based guests. I have a VM that depends on an Ubuntu fork that's stuck on the 6.5.0.x kernel. I'd request that if kernel 6.5 support is going to be removed, that the 6.5-based fork be in place before that. Will the 6.5-based fork still be maintained for future bug fixes/performance improvements?z

ETA: The latest SR-IOV related kernel commits from Intel suggest that iGPU SR-IOV support might hit in kernel 6.13. How much work would it be to maintain the current status quo for kernel support vs. re-basing, if we're that close to in-kernel official support that would (hopefully) be an easier, more stable alternative than this DKMS driver?

@bbaa-bbaa
Copy link
Contributor Author

bbaa-bbaa commented Oct 28, 2024

What are the advantages of making this change?

The module based on 6.1 is too old, rebasing to a new branch is the simplest way to get upstream fixes.
We encountered issues similar to #186 and #204 with the older version of the module, and it's hard to pick out relevant fixes from the extensive patch set.

It seems to make Windows VMs more difficult to work with?

In my setup, modules based on branch 6.1 also require setting the vendor ID for the guest driver to work. (#8 (comment)) This might be related to the driver version on the guest side. If your vm works with older versions of the module, there should be no issues with the newer versions as well.

Will the 6.5-based fork still be maintained for future bug fixes/performance improvements.

In fact, the older version of the module is based on kernel 6.1. I currently don’t use kernels below version 6.8, and the 6.1-based branch may be maintained by other contributors.
Additionally, if have performance issues, please report to Intel upstream. I don't have the capability to fix upstream issues.

ETA: The latest SR-IOV related kernel commits from Intel suggest that iGPU SR-IOV support might hit in kernel 6.13. How much work would it be to maintain the current status quo for kernel support vs. re-basing.

Rebasing to a new branch is the simplest way to get upstream fixes for newer kernel.
My recommendation is to use the module based on kernel 6.1 for Linux versions 6.1–6.7, and the module based on kernel 6.6 for Linux versions 6.8–6.11. For kernel 6.6, you might also consider using Intel’s upstream kernel directly, which is more likely to receive first-party support. If SR-IOV support is eventually merged into the mainline kernel, using the mainline kernel would be the best choice.

resiliencer added a commit to resiliencer/i915-sriov-dkms that referenced this pull request Nov 4, 2024
Source code had been synchronized with @Strongz repo,
but I had got some i915 errors during boot and init in dmesg.
So I had integrated @BBAA Pull Request strongtz#207 and it solved the problem.
Now everything works on linux kernel 6.11.2-zen.
@rb-andrade
Copy link

Has anyone tried this with Tiger Lake? I tried with i7-1165G7 with no success. I get code 43 on Windows. GuC firmware version 70.1.1 is being loaded instead of the latest one for some reason.

@resiliencer
Copy link

Has anyone tried this with Tiger Lake? I tried with i7-1165G7 with no success. I get code 43 on Windows. GuC firmware version 70.1.1 is being loaded instead of the latest one for some reason.

Just try https://github.com/resiliencer/i915-sriov-dkms#first-of-all-you-need-to-extract-intelgopdriverefi-from-your-bios
For me this solved the error number 43 issue for any VM with Windows OS.

@resiliencer
Copy link

resiliencer commented Nov 12, 2024

I don't know if SR-IOV works or is supported on Tiger Lake, I haven't tested it, but at least I've seen this info:

Intel's response in the past:
https://community.intel.com/t5/Graphics/SR-IOV-support-for-intel-Iris-Xe-Graphics-on-i7-1165G7/m-p/1294037
Documentation that has not been updated for a long time:
https://open-iov.org/index.php/GPU_Support

Links with success stories with Tiger Lake:
https://forum.proxmox.com/threads/audio-is-same-iommu-group-as-ethernet-how-pass-through.128404/post-561804
https://dlworld.github.io/virtualization/2022/09/18/tigerlake-sriov/
#8 (comment)
#8 (comment)

And I Think before Windows or another Guest OS initializes the iGPU video driver, the Guest VM's UEFI at first must correctly initialize IntelGopDriver.efi. Missing that causes many problems everywhere(like err.code 43) on different intel generations.
#8 (comment)

Interesting that GuC/HuC firmware file names in dmesg on Raptor Lake have the tgl prefix:
[...] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin version 70.29.2
[...] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3

https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/intel-linux/1421529-intel-begins-sorting-out-sr-iov-support-for-the-xe-kernel-graphics-driver?p=1421543#post1421543

Initially we plan to add SR-IOV functionality to the following SDV platforms
already supported by the Xe driver:

  • TGL (up to 7 VFs)
    ...

bbaa-bbaa and others added 9 commits November 20, 2024 21:49
* drm/i915: Do not attempt to load the GSC multiple times

commit 59d3cfdd7f9655a0400ac453bf92199204f8b2a1 upstream.

If the GSC FW fails to load the GSC HW hangs permanently; the only ways
to recover it are FLR or D3cold entry, with the former only being
supported on driver unload and the latter only on DGFX, for which we
don't need to load the GSC. Therefore, if GSC fails to load there is no
need to try again because the HW is stuck in the error state and the
submission to load the FW would just hang the GSCCS.

Note that, due to wa_14015076503, on MTL the GuC escalates all GSCCS
hangs to full GT resets, which would trigger a new attempt to load the
GSC FW in the post-reset HW re-init; this issue is also fixed by not
attempting to load the GSC FW after an error.

Fixes: 15bd4a67e914 ("drm/i915/gsc: GSC firmware loading")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.3+
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240820215952.2290807-1-daniele.ceraolospurio@intel.com
(cherry picked from commit 03ded4d432a1fb7bb6c44c5856d14115f6f6c3b9)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

* drm/i915/fence: Mark debug_fence_init_onstack() with __maybe_unused

[ Upstream commit fcd9e8afd546f6ced378d078345a89bf346d065e ]

When debug_fence_init_onstack() is unused (CONFIG_DRM_I915_SELFTEST=n),
it prevents kernel builds with clang, `make W=1` and CONFIG_WERROR=y:

.../i915_sw_fence.c:97:20: error: unused function 'debug_fence_init_onstack' [-Werror,-Wunused-function]
   97 | static inline void debug_fence_init_onstack(struct i915_sw_fence *fence)
      |                    ^~~~~~~~~~~~~~~~~~~~~~~~

Fix this by marking debug_fence_init_onstack() with __maybe_unused.

See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").

Fixes: 214707fc2ce0 ("drm/i915/selftests: Wrap a timer into a i915_sw_fence")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240829155950.1141978-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit 5bf472058ffb43baf6a4cdfe1d7f58c4c194c688)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

* drm/i915/fence: Mark debug_fence_free() with __maybe_unused

[ Upstream commit f99999536128b14b5d765a9982763b5134efdd79 ]

When debug_fence_free() is unused
(CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS=n), it prevents kernel builds
with clang, `make W=1` and CONFIG_WERROR=y:

.../i915_sw_fence.c:118:20: error: unused function 'debug_fence_free' [-Werror,-Wunused-function]
  118 | static inline void debug_fence_free(struct i915_sw_fence *fence)
      |                    ^~~~~~~~~~~~~~~~

Fix this by marking debug_fence_free() with __maybe_unused.

See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").

Fixes: fc1584059d6c ("drm/i915: Integrate i915_sw_fence with debugobjects")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240829155950.1141978-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit 8be4dce5ea6f2368cc25edc71989c4690fa66964)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

* drm/i915/guc: prevent a possible int overflow in wq offsets

[ Upstream commit d3d37f74683e2f16f2635ee265884f7ca69350ae ]

It may be possible for the sum of the values derived from
i915_ggtt_offset() and __get_parent_scratch_offset()/
i915_ggtt_offset() to go over the u32 limit before being assigned
to wq offsets of u64 type.

Mitigate these issues by expanding one of the right operands
to u64 to avoid any overflow issues just in case.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.

Fixes: c2aa552ff09d ("drm/i915/guc: Add multi-lrc context registration")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Link: https://patchwork.freedesktop.org/patch/msgid/20240725155925.14707-1-n.zhandarovich@fintech.ru
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 1f1c1bd56620b80ae407c5790743e17caad69cec)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

* drm/i915/gem: fix bitwise and logical AND mixup

commit 394b52462020b6cceff1f7f47fdebd03589574f3 upstream.

CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND is an int, defaulting to 250. When
the wakeref is non-zero, it's either -1 or a dynamically allocated
pointer, depending on CONFIG_DRM_I915_DEBUG_RUNTIME_PM. It's likely that
the code works by coincidence with the bitwise AND, but with
CONFIG_DRM_I915_DEBUG_RUNTIME_PM=y, there's the off chance that the
condition evaluates to false, and intel_wakeref_auto() doesn't get
called. Switch to the intended logical AND.

v2: Use != to avoid clang -Wconstant-logical-operand (Nathan)

Fixes: ad74457a6b5a ("drm/i915/dgfx: Release mmap on rpm suspend")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: stable@vger.kernel.org # v6.1+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> # v1
Link: https://patchwork.freedesktop.org/patch/msgid/643cc0a4d12f47fd8403d42581e83b1e9c4543c7.1726680898.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit 4c1bfe259ed1d2ade826f95d437e1c41b274df04)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

* drm/i915/hdcp: fix connector refcounting

commit 4cc2718f621a6a57a02581125bb6d914ce74d23b upstream.

We acquire a connector reference before scheduling an HDCP prop work,
and expect the work function to release the reference.

However, if the work was already queued, it won't be queued multiple
times, and the reference is not dropped.

Release the reference immediately if the work was already queued.

Fixes: a6597faa2d59 ("drm/i915: Protect workers against disappearing connectors")
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Suraj Kandpal <suraj.kandpal@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org # v5.10+
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240924153022.2255299-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit abc0742c79bdb3b164eacab24aea0916d2ec1cb5)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---------

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Co-authored-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Co-authored-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Co-authored-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Co-authored-by: Jani Nikula <jani.nikula@intel.com>
pick from 1da86618bdce301d23e89ecce92161f9d3b3c5e7
Test with Linux version 6.12.0-zen1-1-zen (linux-zen@archlinux)
@maur5
Copy link

maur5 commented Nov 22, 2024

Dropping this for anyone else who wanders into this thread....
Here's a working deployment script for Ubuntu Noble (6.8.0-48) on Proxmox 8.2.7:
https://gist.github.com/maur5/f2278472cdf06067d9d360bc5e6df365

@bbaa-bbaa bbaa-bbaa changed the title [RFC]Rebase the dkms module on the linux-intel-lts 6.6/linux branch Rebase the dkms module on the linux-intel-lts 6.6/linux branch Nov 23, 2024
@bbaa-bbaa
Copy link
Contributor Author

I have tried your patch on pve which uses kernel 6.8.12-2, when I reboot the guest machine, it still report an error:

@RaidenSummoner Could you try this branch?
https://github.com/bbaa-bbaa/i915-sriov-dkms/tree/xelpd_display

@RaidenSummoner
Copy link

I have tried your patch on pve which uses kernel 6.8.12-2, when I reboot the guest machine, it still report an error:

@RaidenSummoner Could you try this branch? https://github.com/bbaa-bbaa/i915-sriov-dkms/tree/xelpd_display

I'll check it after finishing my college exams in a copule of weeks.

pick from moetayuko/intel-gpu-i915-backports@24fff30

upstream: ff283a7182cbd652283d7b7fd5181a9d259939f9..e41e858ce16d85a60e10a8c9e04f86a2915530ea
e41e858ce16d85a60e10a8c9e04f86a2915530ea drm/i915/mtl: Handle lut equal check for MTL degamma
ab1ef80c8e74b660144e92d674b1fe1ba3450c4a drm/i915/mtl: Add legacy degamma lut support
093b7595e293f48d5ad896722293dd68d0786075 drm/i915/mtl: Add support for 24 bit precision DG LUT
56d4cba62ba67ec49019fcd13cfc0a270d87c894 drm/i915/mtl: Add check for 24 bit precision DG LUT
8d866c66114b7878497ebbd6e91579f2150efd98 drm/i915/color: Add checks for extended luts
f7e432035547eac89801d6edabc8febc5f0234d2 drm: Add helper functions for extended LUT
6765c66426bd5e25d76af123b93ec9bfe7143547 drm: Add Client Cap for advance degamma mode
98b4cf4004adeb05a4b4e445f791807bec3a53b8 drm/i915/mtl: Attach degamma mode property
d13eda747e737e12699d34f724666b87a909726b drm/i915/mtl: Add degamma lut range for 24 bit degamma LUT
d9960911cb914e6f0c260ea7f3dc0a88bbaa2115 drm/i915/color: Use new helper function
a2730f60bf25897a7cc866f774d2ce8d6c8bdc81 drm/color: Add pipe degamma mode property
9c398ddbf01fa3885b179a956bd183408ae532e3 drm/i915/xelpd: Enable plane gamma
7da61cd2518056f2d68276ad4ecad92b49b96c5a drm/i915/xelpd: Program Plane Gamma Registers
38a23653ddfda26b5c69bcdbbfa2fa5055183501 drm/i915/xelpd: Add register definitions for Plane Gamma
91e211e2a74ef40ae3e5ff4e7774870547cde6da drm/i915/xelpd: Define and Initialize Plane Gamma Lut range
228662a6a86d7d16aa0f9ea20e4ff11a6d893b6f drm: Add Plane Gamma Lut property
ee76df2c1c0cb438d80dfd8f4031d2c249bffdd7 drm: Add Plane Gamma Mode property
fd8f1cea8590dcc7f5f2a93f617fdbf5e543ea57 drm/i915/xelpd: Enable Plane CSC
3fae4f870e2fabcb7eecb36605808d2bf830fa5e drm: Add helper to attach Plane ctm property
52ca4dc2c13b14de92865e192c5cab2d315694c5 drm: Add Plane CTM property
35d977ac50a4a4d9420814b5b0fc85981d1eef7c drm/i915/xelpd: Load plane color luts from atomic flip
a7be0033a1457f40a1361f5e53c3b451323ba4ce drm/i915/xelpd: Initialize plane color features
88e485068702559efee335bd756715a6a5013342 drm/i915/xelpd: Add plane color check to glk_plane_color_ctl
7c685da055ba5d3d1c533854cbc27e2994702082 drm/i915/xelpd: Program Plane Degamma Registers
f0d4a02bcc142c386b29080159c0d14c28f89143 drm/i915/xelpd: Add color capabilities of SDR planes
24ed0d35c453afb5c93ce038a33d73f285071a24 drm/i915/xelpd: Enable plane color features
af6b56192f73c7367efaad7453f0282087696901 drm/i915/xelpd: Add register definitions for Plane Degamma
dec8f49885a5a53cffa0068a073018fac1bf5afc drm/i915/xelpd: Define Degamma Lut range struct for HDR planes
5bddb14b4c331155694b26405c713de7a28b686b drm: Add Plane Degamma Lut property
1fc3ba91e52ea98d9a494a00a160acfdad03f20d drm: Add Plane Degamma Mode property
c4f2521b14292f292780ecb43588f47fb5ba9ba1 drm: Add Enhanced Gamma LUT precision structure
e228413bc0a943c5c46533602f9934391fc8102a drm/i915/xelpd: Adjust gamma lut size for legacy
75ba29fef538a28042b5f343c6961017357133d6 drm/i915/xelpd: Enable XE_LPD Gamma Lut readout
3b850ab9a7eb677512dd31ab2d98ae75c84e0066 drm/i915/xelpd: logarithmic gamma enabled only with advance gamma mode
06a8a821371e147623e397dc978fa751eef40656 drm: Add Client Cap for advance gamma mode
2e5622438f8c87a5e3dd58f5374d8f5b79320a00 drm/i915/xelpd: Attach gamma mode property
c12115088f707bd8090f94bb1bdc607afad810df drm/i915/xelpd: Add support for Logarithmic gamma mode
3ed97b9fcb330e56363062a50568ba3d574be3df drm/i915/xelpd: Define color lut range structure
ff283a7182cbd652283d7b7fd5181a9d259939f9 drm: Add gamma mode property

Signed-off-by: bbaa <bbaa@bbaa.fun>
Port the previously reverted XeLPD display related changes
@bbaa-bbaa bbaa-bbaa closed this Dec 26, 2024
@bbaa-bbaa bbaa-bbaa deleted the 6.6-base branch January 4, 2025 04:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants