QubesOS 4.2 suspend and pcie device assignment (also in case of latest kernel, boot) BROKEN on GPD WinMax 2 devices (G1619-03 G1619-04) culprit device narrowed down #9584

LindaFerum · 2024-11-16T23:35:56Z

Qubes OS release

Qubes OS 4.2 (fully updated)

Xen 4.17.5

kernel 6.6.54-1.qubes.fc37.x86_64 (sort of works, but with problem, see below)

kernel 6.11.2-1.qubes.fc37.x86_64 (same problem but arises immediately upon entering user's password at desktop screen, making system completely unusable)

Hardware as identified by Qubes OS itself (HCL)

Brand: GPD
Model: G1619-04

CPU: AMD Ryzen 7 6800U with Radeon Graphics
Chipset: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe Root Complex [1022:14b5] (rev 01)
Graphics: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] [1002:1681] (rev c1) (prog-if 00 [VGA controller])

RAM: 29437 Mb

QubesOS version: R4.2.3
BIOS: 1.05
Kernel: 6.6.54-1
Xen: unknown (this is weird ? )

Hardware, according to device model ID written on unit's bottom cover:
Model G1619-03 (the mismatch between model ID on the physical unit and the one reported by Qubes is peculiar)
AMD Ryzen 7 6800U with Radeon Graphics
32 GB RAM
Radeon 680M

Brief summary

Initial problem was that the system was unable to wake up from sleep (even after disabling all the shady wakeup behaviors that commonly keep this laptop from sleeping, e.g. everything except keyboard)

The device would go to sleep normally but during wakeup it would enter "maximum performance" mode (extra noise and hot) and screen would go black.

Later, similar behavior manifested when assigning a particular PCIe device to a VM

Culprit device is identified as :
AMD Rembrandt USB4 XHCI controller #4

(XHC4 in acpitool), aka pci:0000:74:00.4

If this device gets assigned to a VM and VM starts, the laptop would exhibit same sudden "extreme hot and noisy" burst followed by screen rapidly becoming "laggy" and finally going black. After screen goes black device can only be recovered by forced reboot.

Assignment options (permissive, strict reset) do not help anything.

Later I decided to try it with kernel-latest (6.11.2-1.qubes.fc37.x86_64) and situation is much, much worse there.

The "burst of heat and noise followed by black screen" arises immediately after the system boots to desktop, making the device unusable.

Steps to reproduce

Have a Ryzen 7 6800U computer with a AMD Rembrandt USB4 XHCI controller #4 device ,

ideally a GPD Win Max 2 model number G1619-03 (though G1619-04 may also be affected)

assigning that
AMD Rembrandt USB4 XHCI controller #4 to any VM and
start the aforementioned VM
observe immediate degradation of performance, weird noise behavior, screen going black and complete lockup

ALTERNATIVELY

same as above
without assigning the culprit device to any VM try suspending to RAM
attempt wake-up
observe black screen event from which system can not recover

ALTERNATIVELY

same as above
just run QubesOS with 6.11.2-1.qubes.fc37.x86_64 kernel
immediate degradation and blackscreen upon reaching user's password entry

All variants reproduce reliably

Expected behavior

Suspend and resume working, at very least (ideally assignment of controller to VM working too but I can sort of live without that - some other USB controllers are assigned okay and I can live with that)

6.11.2-1.qubes.fc37.x86_64 kernel working too

Actual behavior

Suspend and resume cause immediate disaster (black screen) presumably due to AMD Rembrandt USB4 XHCI controller shenanigans

6.11.2-1.qubes.fc37.x86_64 same regardless of suspend/resume/VM assignment presumably due to same device

I don't know which logs would be appropriate here and how to best catch them, but given that I can reliably reproduce the behavior, please let me know how to grab the logs that are most likely to be useful and I will do my best to grab them

The text was updated successfully, but these errors were encountered:

LindaFerum · 2024-11-17T22:58:27Z

Okay ! I did catch some logs (and also figured out how to make it boot up with kernel 6.11.2-1.qubes.fc37.x86_64 - kinda - apparently the trick seems to be not to autostart any VMs with PCI passthrough)

First, a log of trying to pass the annoying AMD Rembrandt USB4 XHCI controller #4 to a VM (named "worst-usb"), causing eventual hang and blackscreen
journalctl:
journalctl-bad-usb-launch.txt

and the VM itself:
guest-worst-usb.log

Now, other USB controllers also have problems if you start them too early (same apparent symptoms, rapid degradation, blackscreen and lock up)

Journalctl:

sudden-fail-other-USB-journalctl.txt

And finally, the one that annoys and vexes me the most, the journalctl from the situation where the machine is made to go to sleep and then awoken (behaviorally the keyboard lights up, the fan spins up wildly and the led goes to "normal operation" (non blinking) signal BUT log looks like it never even tried to wake up. A terribly unpleasant conundrum, help would be very appreciated)

sleep-wakeup-failure.txt

marmarek · 2024-11-18T02:14:57Z

Based on PCI device address (73:00.4), this USB controller seems to be part of your GPU (73:00.0). I guess the GPU (or its driver) is not happy about taking away its part. Theoretically they should work separately, but in practice some devices do assume different functions of the same device are handled by the same kernel/VM. Looks like you got such a case here.
In practice, it means you need to keep all the devices 73:00.* in the same place, not assign some of them to sys-usb - if that's dom0, be it dom0. It isn't ideal for security, but well, looks like you hardware doesn't allow any better. To limit the impact of those USB controllers that stay in dom0, add qubes.rd.hide_pci=73:00.3,73:00.4 to the kernel cmdline to not attach normal drivers to them. Of course assuming you don't need to use them (monitor on USB-C port should still work).

LindaFerum added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Nov 16, 2024

andrewdavidwong added C: other hardware support needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. affects-4.2 This issue affects Qubes OS 4.2. labels Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QubesOS 4.2 suspend and pcie device assignment (also in case of latest kernel, boot) BROKEN on GPD WinMax 2 devices (G1619-03 G1619-04) culprit device narrowed down #9584

QubesOS 4.2 suspend and pcie device assignment (also in case of latest kernel, boot) BROKEN on GPD WinMax 2 devices (G1619-03 G1619-04) culprit device narrowed down #9584

LindaFerum commented Nov 16, 2024 •

edited

Loading

LindaFerum commented Nov 17, 2024 •

edited

Loading

marmarek commented Nov 18, 2024

QubesOS 4.2 suspend and pcie device assignment (also in case of latest kernel, boot) BROKEN on GPD WinMax 2 devices (G1619-03 G1619-04) culprit device narrowed down #9584

QubesOS 4.2 suspend and pcie device assignment (also in case of latest kernel, boot) BROKEN on GPD WinMax 2 devices (G1619-03 G1619-04) culprit device narrowed down #9584

Comments

LindaFerum commented Nov 16, 2024 • edited Loading

Qubes OS release

Brief summary

Steps to reproduce

Expected behavior

Actual behavior

LindaFerum commented Nov 17, 2024 • edited Loading

marmarek commented Nov 18, 2024

LindaFerum commented Nov 16, 2024 •

edited

Loading

LindaFerum commented Nov 17, 2024 •

edited

Loading