Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test AMD Radeon Pro W7700 & RX 7700 XT GPUs #680

Open
geerlingguy opened this issue Nov 4, 2024 · 74 comments
Open

Test AMD Radeon Pro W7700 & RX 7700 XT GPUs #680

geerlingguy opened this issue Nov 4, 2024 · 74 comments

Comments

@geerlingguy
Copy link
Owner

geerlingguy commented Nov 4, 2024

I have an AMD Radeon Pro W7700 that I'm planning on testing in an Ampere workstation.

gpu-amd-radeon-pro-w7700

This card runs great on the Pi 5, using @Coreforge's branch we started working on in #222

Two massive features of this card:

  • Support for full DisplayPort 2.1 (for 8K at up to 144 Hz), or 16K at 60 Hz (lol)
  • Built-in AV1 hardware encode/decode

It remains to be seen how many bits of the driver we can get running on this card.

Current steps to get this card working with Pi OS Bookworm

Last updated: 2024-11-11

  1. Clone the Raspberry Pi Linux kernel patching the default Raspberry Pi 6.6.y kernel tree with Coreforge's GPU-enablement patch (or just check out Coreforge's branch directly).
  2. Before compiling the kernel, run make menuconfig and select the options:
    1. Kernel Features > Page Size > 4 KB (for Box86 compatibility)
    2. Kernel Features > Kernel support for 32-bit EL0 > Fix up misaligned multi-word loads and stores in user space
    3. Kernel Features > Fix up misaligned loads and stores from userspace for 64bit code
    4. Device Drivers > Graphics support > AMD GPU (optionally SI/CIK support too)
    5. Device Drivers > Graphics support > Direct Rendering Manager (XFree86 4.1.0 and higher DRI support) > Force Architecture can write-combine memory
  3. Recompile the kernel following Raspberry Pi's instructions
  4. Install the AMD firmware (see note below)
  5. Reboot the Pi with the card attached using an appropriate PCIe riser and external ATX power supply.

AMD GPU Firmware for Bookworm

Because Pi OS 12 is based on Debian 12 Bookworm, and it's firmware-amd-graphics package doesn't include the firmware for the latest-generation AMD cards, you will have to install that package and download supplemental firmware files from the linux-firmware repo:

# Install the base AMD GPU firmware
sudo apt install -y firmware-amd-graphics

# Download supplemental firmware files for 7000-series cards
cd /usr/lib/firmware/amdgpu
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/psp_13_0_10_sos.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/smu_13_0_10.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_pfp.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mes_2.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mes1.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/psp_13_0_10_ta.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_me.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_rlc.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mec.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_imu.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/sdma_6_0_3.bin

Confirm everything is working by plugging a monitor into the graphics card; then confirm the card's GPU is in use by running glxinfo -B (part of the mesa-utils package), for example:

$ DISPLAY=:0 glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: AMD (0x1002)
    Device: AMD Radeon Pro W7700 (gfx1101, LLVM 15.0.6, DRM 3.54, 6.6.60-v8-AMDGPU+) (0x7470)
    Version: 23.2.1
    Accelerated: yes
    Video memory: 15360MB
...

(Prepend DISPLAY=:0 if running commands over SSH.)

Hardware video transcoding support

If you would like to enable hardware transcoding, you need to install the Mesa VAAPI drivers:

sudo apt install mesa-va-drivers vainfo

Then you should be able to see the VAAPI info, and apps like OBS (sudo apt install obs-studio) should be able to use the hardware transcoding instead of x264 on the CPU. Confirm it's working with vainfo:

pi@pi5-pcie:~ $ vainfo
libva info: VA-API version 1.17.0
libva info: Trying to open /usr/lib/aarch64-linux-gnu/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_17
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.17 (libva 2.12.0)
vainfo: Driver version: Mesa Gallium driver 23.2.1-1~bpo12+rpt3 for AMD Radeon Pro W7700 (gfx1101, LLVM 15.0.6, DRM 3.54, 6.6.60-v8-AMDGPU+)
vainfo: Supported profile and entrypoints
      VAProfileH264ConstrainedBaseline:	VAEntrypointVLD
      VAProfileH264ConstrainedBaseline:	VAEntrypointEncSlice
      VAProfileH264Main               :	VAEntrypointVLD
      VAProfileH264Main               :	VAEntrypointEncSlice
      VAProfileH264High               :	VAEntrypointVLD
      VAProfileH264High               :	VAEntrypointEncSlice
      VAProfileHEVCMain               :	VAEntrypointVLD
      VAProfileHEVCMain               :	VAEntrypointEncSlice
      VAProfileHEVCMain10             :	VAEntrypointVLD
      VAProfileHEVCMain10             :	VAEntrypointEncSlice
      VAProfileJPEGBaseline           :	VAEntrypointVLD
      VAProfileVP9Profile0            :	VAEntrypointVLD
      VAProfileVP9Profile2            :	VAEntrypointVLD
      VAProfileAV1Profile0            :	VAEntrypointVLD
      VAProfileAV1Profile0            :	VAEntrypointEncSlice
      VAProfileNone                   :	VAEntrypointVideoProc
@geerlingguy
Copy link
Owner Author

lspci output:

0000:01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 11) (prog-if 00 [Normal decode])
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 38
	Region 0: Memory at 1b80200000 (32-bit, non-prefetchable) [size=16K]
	Bus: primary=01, secondary=02, subordinate=03, sec-latency=0
	I/O behind bridge: 0000f000-00000fff [disabled] [32-bit]
	Memory behind bridge: 80000000-801fffff [size=2M] [32-bit]
	Prefetchable memory behind bridge: 1800000000-1817ffffff [size=384M] [32-bit]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Upstream Port, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ SlotPowerLimit 0W
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <64us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (downgraded), Width x1 (downgraded)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR+
			 10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 4
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-
			 AtomicOpsCap: Routing+ 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
			 AtomicOpsCtl: EgressBlck-
		LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [270 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [320 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [400 v1] Data Link Feature <?>
	Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
	Capabilities: [440 v1] Lane Margining at the Receiver <?>
	Kernel driver in use: pcieport

0000:02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 11) (prog-if 00 [Normal decode])
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 39
	Bus: primary=02, secondary=03, subordinate=03, sec-latency=0
	I/O behind bridge: 0000f000-00000fff [disabled] [32-bit]
	Memory behind bridge: 80000000-801fffff [size=2M] [32-bit]
	Prefetchable memory behind bridge: 1800000000-1817ffffff [size=384M] [32-bit]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Downstream Port (Slot-), MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0
			ExtTag+ RBE+
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <1us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 16GT/s, Width x16
			TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR+
			 10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 4
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- ARIFwd-
			 AtomicOpsCap: Routing+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled, ARIFwd-
			 AtomicOpsCtl: EgressBlck-
		LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -3.5dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 000000ffffffe000  Data: 0008
	Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [270 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [2a0 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [400 v1] Data Link Feature <?>
	Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
	Capabilities: [450 v1] Lane Margining at the Receiver <?>
	Kernel driver in use: pcieport

0000:03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 32 [Radeon PRO W7700] (prog-if 00 [VGA controller])
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 32 [Radeon PRO W7700]
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 38
	Region 0: Memory at 1800000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at 1810000000 (64-bit, prefetchable) [size=2M]
	Region 5: Memory at 1b80000000 (32-bit, non-prefetchable) [size=1M]
	Expansion ROM at 1b80100000 [disabled] [size=128K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <1us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 16GT/s, Width x16
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
			 10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
			 EmergencyPowerReduction Form Factor Dev Specific, EmergencyPowerReductionInit-
			 FRS-
			 AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [200 v1] Physical Resizable BAR
		BAR 0: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB
		BAR 2: current size: 2MB, supported: 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB
	Capabilities: [240 v1] Power Budgeting <?>
	Capabilities: [270 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [2a0 v1] Access Control Services
		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [2d0 v1] Process Address Space ID (PASID)
		PASIDCap: Exec+ Priv+, Max PASID Width: 10
		PASIDCtl: Enable- Exec- Priv-
	Capabilities: [320 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
	Capabilities: [450 v1] Lane Margining at the Receiver <?>
	Kernel modules: amdgpu

0000:03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin B routed to IRQ 255
	Region 0: Memory at 1b80120000 (32-bit, non-prefetchable) [disabled] [size=16K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <1us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 16GT/s, Width x16
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
			 10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
			 EmergencyPowerReduction Form Factor Dev Specific, EmergencyPowerReductionInit-
			 FRS-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
			 EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [2a0 v1] Access Control Services
		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

And dmesg pcie output:

pi@pi5-pcie:~ $ dmesg | grep pci
[    0.000000] Linux version 6.6.58-v8-16k+ (pi@pi5-pcie) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #13 SMP PREEMPT Wed Oct 30 15:16:21 CDT 2024
[    0.000000] Kernel command line: reboot=w coherent_pool=1M 8250.nr_uarts=1 pci=pcie_bus_safe cgroup_disable=memory numa_policy=interleave iommu_dma_numa_policy=interleave system_heap.max_order=0  smsc95xx.macaddr=D8:3A:DD:84:FB:3A vc_mem.mem_base=0x3fc00000 vc_mem.mem_size=0x40000000  console=ttyAMA10,115200 console=tty1 root=PARTUUID=90e8829a-02 rootfstype=ext4 fsck.repair=yes rootwait quiet splash plymouth.ignore-serial-consoles cfg80211.ieee80211_regdom=US
[    0.384926] brcm-pcie 1000110000.pcie: host bridge /axi/pcie@110000 ranges:
[    0.384933] brcm-pcie 1000110000.pcie:   No bus range found for /axi/pcie@110000, using [bus 00-ff]
[    0.384944] brcm-pcie 1000110000.pcie:      MEM 0x1b80000000..0x1bffffffff -> 0x0080000000
[    0.384951] brcm-pcie 1000110000.pcie:      MEM 0x1800000000..0x1b7fffffff -> 0x0400000000
[    0.384957] brcm-pcie 1000110000.pcie:   IB MEM 0x0000000000..0x0fffffffff -> 0x1000000000
[    0.386324] brcm-pcie 1000110000.pcie: Forcing gen 3
[    0.386602] brcm-pcie 1000110000.pcie: PCI host bridge to bus 0000:00
[    0.386605] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.386609] pci_bus 0000:00: root bus resource [mem 0x1b80000000-0x1bffffffff] (bus address [0x80000000-0xffffffff])
[    0.386613] pci_bus 0000:00: root bus resource [mem 0x1800000000-0x1b7fffffff pref] (bus address [0x400000000-0x77fffffff])
[    0.386628] pci 0000:00:00.0: [14e4:2712] type 01 class 0x060400
[    0.386656] pci 0000:00:00.0: PME# supported from D0 D3hot
[    0.387577] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    0.495551] brcm-pcie 1000110000.pcie: link up, 8.0 GT/s PCIe x1 (!SSC)
[    0.495579] pci 0000:01:00.0: [1002:1478] type 01 class 0x060400
[    0.495594] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff]
[    0.495712] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
[    0.495765] pci 0000:01:00.0: 7.876 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    0.507556] pci 0000:01:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    0.507628] pci 0000:02:00.0: [1002:1479] type 01 class 0x060400
[    0.507756] pci 0000:02:00.0: PME# supported from D0 D3hot D3cold
[    0.508721] pci 0000:02:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    0.508793] pci 0000:03:00.0: [1002:7470] type 00 class 0x030000
[    0.508811] pci 0000:03:00.0: reg 0x10: [mem 0x00000000-0x0fffffff 64bit pref]
[    0.508823] pci 0000:03:00.0: reg 0x18: [mem 0x00000000-0x001fffff 64bit pref]
[    0.508830] pci 0000:03:00.0: reg 0x20: [io  0x0000-0x00ff]
[    0.508838] pci 0000:03:00.0: reg 0x24: [mem 0x00000000-0x000fffff]
[    0.508845] pci 0000:03:00.0: reg 0x30: [mem 0x00000000-0x0001ffff pref]
[    0.508949] pci 0000:03:00.0: PME# supported from D1 D2 D3hot D3cold
[    0.508996] pci 0000:03:00.0: 7.876 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    0.509081] pci 0000:03:00.0: vgaarb: setting as boot VGA device
[    0.509084] pci 0000:03:00.0: vgaarb: bridge control possible
[    0.509086] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.509111] pci 0000:03:00.1: [1002:ab30] type 00 class 0x040300
[    0.509125] pci 0000:03:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[    0.509226] pci 0000:03:00.1: PME# supported from D1 D2 D3hot D3cold
[    0.509358] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03
[    0.509364] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 03
[    0.509369] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 03
[    0.509380] pci 0000:00:00.0: BAR 9: assigned [mem 0x1800000000-0x1817ffffff 64bit pref]
[    0.509383] pci 0000:00:00.0: BAR 8: assigned [mem 0x1b80000000-0x1b802fffff]
[    0.509387] pci 0000:01:00.0: BAR 9: assigned [mem 0x1800000000-0x1817ffffff 64bit pref]
[    0.509390] pci 0000:01:00.0: BAR 8: assigned [mem 0x1b80000000-0x1b801fffff]
[    0.509393] pci 0000:01:00.0: BAR 0: assigned [mem 0x1b80200000-0x1b80203fff]
[    0.509398] pci 0000:01:00.0: BAR 7: no space for [io  size 0x1000]
[    0.509401] pci 0000:01:00.0: BAR 7: failed to assign [io  size 0x1000]
[    0.509405] pci 0000:02:00.0: BAR 9: assigned [mem 0x1800000000-0x1817ffffff 64bit pref]
[    0.509408] pci 0000:02:00.0: BAR 8: assigned [mem 0x1b80000000-0x1b801fffff]
[    0.509410] pci 0000:02:00.0: BAR 7: no space for [io  size 0x1000]
[    0.509413] pci 0000:02:00.0: BAR 7: failed to assign [io  size 0x1000]
[    0.509417] pci 0000:03:00.0: BAR 0: assigned [mem 0x1800000000-0x180fffffff 64bit pref]
[    0.509427] pci 0000:03:00.0: BAR 2: assigned [mem 0x1810000000-0x18101fffff 64bit pref]
[    0.509437] pci 0000:03:00.0: BAR 5: assigned [mem 0x1b80000000-0x1b800fffff]
[    0.509442] pci 0000:03:00.0: BAR 6: assigned [mem 0x1b80100000-0x1b8011ffff pref]
[    0.509445] pci 0000:03:00.1: BAR 0: assigned [mem 0x1b80120000-0x1b80123fff]
[    0.509450] pci 0000:03:00.0: BAR 4: no space for [io  size 0x0100]
[    0.509453] pci 0000:03:00.0: BAR 4: failed to assign [io  size 0x0100]
[    0.509456] pci 0000:02:00.0: PCI bridge to [bus 03]
[    0.509461] pci 0000:02:00.0:   bridge window [mem 0x1b80000000-0x1b801fffff]
[    0.509465] pci 0000:02:00.0:   bridge window [mem 0x1800000000-0x1817ffffff 64bit pref]
[    0.509471] pci 0000:01:00.0: PCI bridge to [bus 02-03]
[    0.509476] pci 0000:01:00.0:   bridge window [mem 0x1b80000000-0x1b801fffff]
[    0.509480] pci 0000:01:00.0:   bridge window [mem 0x1800000000-0x1817ffffff 64bit pref]
[    0.509486] pci 0000:00:00.0: PCI bridge to [bus 01-03]
[    0.509489] pci 0000:00:00.0:   bridge window [mem 0x1b80000000-0x1b802fffff]
[    0.509491] pci 0000:00:00.0:   bridge window [mem 0x1800000000-0x1817ffffff 64bit pref]
[    0.509496] pci 0000:00:00.0: Max Payload Size set to  256/ 512 (was  128), Max Read Rq  512
[    0.509505] pci 0000:01:00.0: Max Payload Size set to  256/ 512 (was  128), Max Read Rq  512
[    0.509513] pci 0000:02:00.0: Max Payload Size set to  256/ 512 (was  128), Max Read Rq  512
[    0.509521] pci 0000:03:00.0: Max Payload Size set to  256/ 256 (was  128), Max Read Rq  512
[    0.509529] pci 0000:03:00.1: Max Payload Size set to  256/ 256 (was  128), Max Read Rq  512
[    0.509594] pcieport 0000:00:00.0: enabling device (0000 -> 0002)
[    0.509630] pcieport 0000:00:00.0: PME: Signaling with IRQ 38
[    0.509705] pcieport 0000:00:00.0: AER: enabled with IRQ 38
[    0.509782] pcieport 0000:01:00.0: enabling device (0000 -> 0002)
[    0.509884] pcieport 0000:02:00.0: enabling device (0000 -> 0002)
[    0.510067] pci 0000:03:00.1: D0 power state depends on 0000:03:00.0
[    0.510186] brcm-pcie 1000120000.pcie: host bridge /axi/pcie@120000 ranges:
[    0.510191] brcm-pcie 1000120000.pcie:   No bus range found for /axi/pcie@120000, using [bus 00-ff]
[    0.510200] brcm-pcie 1000120000.pcie:      MEM 0x1f00000000..0x1ffffffffb -> 0x0000000000
[    0.510206] brcm-pcie 1000120000.pcie:      MEM 0x1c00000000..0x1effffffff -> 0x0400000000
[    0.510213] brcm-pcie 1000120000.pcie:   IB MEM 0x1f00000000..0x1f003fffff -> 0x0000000000
[    0.510219] brcm-pcie 1000120000.pcie:   IB MEM 0x0000000000..0x0fffffffff -> 0x1000000000
[    0.511293] brcm-pcie 1000120000.pcie: Forcing gen 2
[    0.511327] brcm-pcie 1000120000.pcie: PCI host bridge to bus 0001:00
[    0.511329] pci_bus 0001:00: root bus resource [bus 00-ff]
[    0.511333] pci_bus 0001:00: root bus resource [mem 0x1f00000000-0x1ffffffffb] (bus address [0x00000000-0xfffffffb])
[    0.511336] pci_bus 0001:00: root bus resource [mem 0x1c00000000-0x1effffffff pref] (bus address [0x400000000-0x6ffffffff])
[    0.511345] pci 0001:00:00.0: [14e4:2712] type 01 class 0x060400
[    0.511366] pci 0001:00:00.0: PME# supported from D0 D3hot
[    0.512236] pci 0001:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    0.619547] brcm-pcie 1000120000.pcie: link up, 5.0 GT/s PCIe x4 (!SSC)
[    0.619567] pci 0001:01:00.0: [1de4:0001] type 00 class 0x020000
[    0.619581] pci 0001:01:00.0: reg 0x10: [mem 0xffffc000-0xffffffff]
[    0.619589] pci 0001:01:00.0: reg 0x14: [mem 0xffc00000-0xffffffff]
[    0.619596] pci 0001:01:00.0: reg 0x18: [mem 0xffff0000-0xffffffff]
[    0.619665] pci 0001:01:00.0: supports D1
[    0.619668] pci 0001:01:00.0: PME# supported from D0 D1 D3hot D3cold
[    0.631551] pci_bus 0001:01: busn_res: [bus 01-ff] end is updated to 01
[    0.631558] pci 0001:00:00.0: BAR 8: assigned [mem 0x1f00000000-0x1f005fffff]
[    0.631562] pci 0001:01:00.0: BAR 1: assigned [mem 0x1f00000000-0x1f003fffff]
[    0.631567] pci 0001:01:00.0: BAR 2: assigned [mem 0x1f00400000-0x1f0040ffff]
[    0.631572] pci 0001:01:00.0: BAR 0: assigned [mem 0x1f00410000-0x1f00413fff]
[    0.631577] pci 0001:00:00.0: PCI bridge to [bus 01]
[    0.631580] pci 0001:00:00.0:   bridge window [mem 0x1f00000000-0x1f005fffff]
[    0.631584] pci 0001:00:00.0: Max Payload Size set to  256/ 512 (was  128), Max Read Rq  512
[    0.631593] pci 0001:01:00.0: Max Payload Size set to  256/ 256 (was  128), Max Read Rq  512
[    0.631646] pcieport 0001:00:00.0: enabling device (0000 -> 0002)
[    0.631673] pcieport 0001:00:00.0: PME: Signaling with IRQ 40
[    0.631736] pcieport 0001:00:00.0: AER: enabled with IRQ 40

@geerlingguy
Copy link
Owner Author

Using the same kernel I was testing with the RX 6700 XT (using @Coreforge's patch, but from a week or so ago), I am getting:

[    6.815843] [drm] amdgpu kernel modesetting enabled.
[    6.816026] amdgpu 0000:03:00.0: enabling device (0000 -> 0002)
[    6.816034] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x7470 0x1002:0x0E0D 0x00).
[    6.816058] [drm] register mmio base: 0x80000000
[    6.816059] [drm] register mmio size: 1048576
[    6.820507] [drm] add ip block number 0 <soc21_common>
[    6.820512] [drm] add ip block number 1 <gmc_v11_0>
[    6.820513] [drm] add ip block number 2 <ih_v6_0>
[    6.820515] [drm] add ip block number 3 <psp>
[    6.820517] [drm] add ip block number 4 <smu>
[    6.820518] [drm] add ip block number 5 <dm>
[    6.820520] [drm] add ip block number 6 <gfx_v11_0>
[    6.820521] [drm] add ip block number 7 <sdma_v6_0>
[    6.820523] [drm] add ip block number 8 <vcn_v4_0>
[    6.820524] [drm] add ip block number 9 <jpeg_v4_0>
[    6.820526] [drm] add ip block number 10 <mes_v11_0>
[    6.836689] [drm] BIOS signature incorrect ff ff
[    6.875172] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM BAR
[    6.875200] amdgpu: ATOM BIOS: 113-D7170100-100
[    6.884251] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/psp_13_0_10_sos.bin failed with error -2
[    6.884272] [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <psp> failed -19
[    6.887234] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/smu_13_0_10.bin failed with error -2
[    6.887248] [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <smu> failed -19
[    6.895607] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/gc_11_0_3_pfp.bin failed with error -2
[    6.895619] [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <gfx_v11_0> failed -19
[    6.896119] [drm] VCN(0) encode/decode are enabled in VM mode
[    6.896120] [drm] VCN(1) encode/decode are enabled in VM mode
[    6.899222] amdgpu 0000:03:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
[    6.902434] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/gc_11_0_3_mes_2.bin failed with error -2
[    6.902444] [drm] try to fall back to amdgpu/gc_11_0_3_mes.bin
[    6.902471] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/gc_11_0_3_mes.bin failed with error -2
[    6.902475] [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <mes_v11_0> failed -19
[    6.902872] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
[    6.902874] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.

@Coreforge
Copy link

Same thing as the 7700xt then. It'll also need the fixes transferred to gfx11 and the other blocks this card has that are a different version from the 6000 series. That should be fairly simple to do though.

@geerlingguy
Copy link
Owner Author

All right, I've manually downloaded the firmware files (same as @martinx72 in this comment):

cd /usr/lib/firmware/amdgpu

sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/psp_13_0_10_sos.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/smu_13_0_10.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_pfp.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mes_2.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mes1.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/psp_13_0_10_ta.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_me.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_rlc.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mec.bin & \
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_imu.bin

After a reboot:

[    6.663256] [drm] amdgpu kernel modesetting enabled.
[    6.663965] amdgpu 0000:03:00.0: enabling device (0000 -> 0002)
[    6.663978] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x7470 0x1002:0x0E0D 0x00).
[    6.664014] [drm] register mmio base: 0x80000000
[    6.664016] [drm] register mmio size: 1048576
[    6.715988] [drm] add ip block number 0 <soc21_common>
[    6.715998] [drm] add ip block number 1 <gmc_v11_0>
[    6.716001] [drm] add ip block number 2 <ih_v6_0>
[    6.716004] [drm] add ip block number 3 <psp>
[    6.716006] [drm] add ip block number 4 <smu>
[    6.716009] [drm] add ip block number 5 <dm>
[    6.716012] [drm] add ip block number 6 <gfx_v11_0>
[    6.716015] [drm] add ip block number 7 <sdma_v6_0>
[    6.716017] [drm] add ip block number 8 <vcn_v4_0>
[    6.716019] [drm] add ip block number 9 <jpeg_v4_0>
[    6.716021] [drm] add ip block number 10 <mes_v11_0>
[    6.760472] [drm] BIOS signature incorrect ff ff
[    6.772597] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM BAR
[    6.772611] amdgpu: ATOM BIOS: 113-D7170100-100
[    6.799503] amdgpu 0000:03:00.0: amdgpu: CP RS64 enable
[    6.812814] [drm] VCN(0) encode/decode are enabled in VM mode
[    6.812822] [drm] VCN(1) encode/decode are enabled in VM mode
[    6.823368] amdgpu 0000:03:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
[    6.841245] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    6.841266] amdgpu 0000:03:00.0: amdgpu: PCIE atomic ops is not supported
[    6.841275] [drm] GPU posting now...
[    6.841327] amdgpu 0000:03:00.0: amdgpu: MEM ECC is active.
[    6.841330] amdgpu 0000:03:00.0: amdgpu: SRAM ECC is not presented.
[    6.841368] amdgpu 0000:03:00.0: amdgpu: DF poison setting is inconsistent(1:0:0:0)!
[    6.841372] amdgpu 0000:03:00.0: amdgpu: Poison setting is inconsistent in DF/UMC(0:1)!
[    6.841386] amdgpu 0000:03:00.0: amdgpu: RAS INFO: ras initialized successfully, hardware ability[101] ras_mask[101]
[    6.841402] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[    6.841432] amdgpu 0000:03:00.0: BAR 2: releasing [mem 0x1810000000-0x18101fffff 64bit pref]
[    6.841438] amdgpu 0000:03:00.0: BAR 0: releasing [mem 0x1800000000-0x180fffffff 64bit pref]
[    6.841464] pcieport 0000:02:00.0: BAR 9: releasing [mem 0x1800000000-0x1817ffffff 64bit pref]
[    6.841468] pcieport 0000:01:00.0: BAR 9: releasing [mem 0x1800000000-0x1817ffffff 64bit pref]
[    6.841472] pcieport 0000:00:00.0: BAR 9: releasing [mem 0x1800000000-0x1817ffffff 64bit pref]
[    6.841621] pcieport 0000:00:00.0: BAR 9: no space for [mem size 0x600000000 64bit pref]
[    6.841625] pcieport 0000:00:00.0: BAR 9: failed to assign [mem size 0x600000000 64bit pref]
[    6.841629] pcieport 0000:01:00.0: BAR 9: no space for [mem size 0x600000000 64bit pref]
[    6.841632] pcieport 0000:01:00.0: BAR 9: failed to assign [mem size 0x600000000 64bit pref]
[    6.841636] pcieport 0000:02:00.0: BAR 9: no space for [mem size 0x600000000 64bit pref]
[    6.841638] pcieport 0000:02:00.0: BAR 9: failed to assign [mem size 0x600000000 64bit pref]
[    6.841642] amdgpu 0000:03:00.0: BAR 0: no space for [mem size 0x400000000 64bit pref]
[    6.841644] amdgpu 0000:03:00.0: BAR 0: failed to assign [mem size 0x400000000 64bit pref]
[    6.841648] amdgpu 0000:03:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[    6.841651] amdgpu 0000:03:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
[    6.841654] pcieport 0000:00:00.0: PCI bridge to [bus 01-03]
[    6.841659] pcieport 0000:00:00.0:   bridge window [mem 0x1b80000000-0x1b802fffff]
[    6.841664] pcieport 0000:00:00.0: PCI bridge to [bus 01-03]
[    6.841666] pcieport 0000:00:00.0:   bridge window [mem 0x1b80000000-0x1b802fffff]
[    6.841669] pcieport 0000:00:00.0:   bridge window [mem 0x1800000000-0x1817ffffff 64bit pref]
[    6.841673] pcieport 0000:01:00.0: PCI bridge to [bus 02-03]
[    6.841677] pcieport 0000:01:00.0:   bridge window [mem 0x1b80000000-0x1b801fffff]
[    6.841681] pcieport 0000:01:00.0:   bridge window [mem 0x1800000000-0x1817ffffff 64bit pref]
[    6.841688] pcieport 0000:02:00.0: PCI bridge to [bus 03]
[    6.841693] pcieport 0000:02:00.0:   bridge window [mem 0x1b80000000-0x1b801fffff]
[    6.841697] pcieport 0000:02:00.0:   bridge window [mem 0x1800000000-0x1817ffffff 64bit pref]
[    6.841711] [drm] Not enough PCI address space for a large BAR.
[    6.841714] amdgpu 0000:03:00.0: BAR 0: assigned [mem 0x1800000000-0x180fffffff 64bit pref]
[    6.841725] amdgpu 0000:03:00.0: BAR 2: assigned [mem 0x1810000000-0x18101fffff 64bit pref]
[    6.841738] amdgpu 0000:03:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[    6.841742] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    6.841745] amdgpu 0000:03:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[    6.841752] [drm] Detected VRAM RAM=16368M, BAR=256M
[    6.841755] [drm] RAM width 256bits GDDR6
[    6.842878] [drm] amdgpu: 16368M of VRAM memory ready
[    6.842885] [drm] amdgpu: 3972M of GTT memory ready.
[    6.842920] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    6.843028] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[    6.860183] [drm] Loading DMUB firmware via PSP: version=0x07001900
[    6.861648] Unable to handle kernel paging request at virtual address ffffffc08093f000
[    6.872303] Mem abort info:
[    6.875111]   ESR = 0x0000000096000061
[    6.882140]   EC = 0x25: DABT (current EL), IL = 32 bits
[    6.890675]   SET = 0, FnV = 0
[    6.898507]   EA = 0, S1PTW = 0
[    6.904184]   FSC = 0x21: alignment fault
[    6.911308] Data abort info:
[    6.913331] brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled
[    6.915035]   ISV = 0, ISS = 0x00000061, ISS2 = 0x00000000
[    6.922661]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[    6.928406]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    6.934635] swapper pgtable: 4k pages, 39-bit VAs, pgdp=000000000137e000
[    6.941680] [ffffffc08093f000] pgd=1000000100165003, p4d=1000000100165003, pud=1000000100165003, pmd=10000001014e9003, pte=006800011752cf0f
[    6.954737] Internal error: Oops: 0000000096000061 [#1] PREEMPT SMP
[    6.961032] Modules linked in: spidev amdgpu(+) aes_ce_blk brcmfmac_wcc aes_ce_cipher ghash_ce gf128mul hci_uart sha2_ce sha256_arm64 sha1_ce btbcm bluetooth ecdh_generic ecc libaes brcmfmac raspberrypi_hwmon brcmutil vc4 cfg80211 i2c_brcmstb snd_soc_hdmi_codec spi_bcm2835 cec amdxcp rfkill drm_dma_helper drm_exec rpivid_hevc(C) i2c_algo_bit pisp_be snd_soc_core drm_buddy hid_apple gpio_keys v4l2_mem2mem videobuf2_dma_contig snd_compress videobuf2_memops snd_pcm_dmaengine videobuf2_v4l2 drm_suballoc_helper drm_display_helper snd_usb_audio videodev snd_hwdep pwm_fan videobuf2_common v3d snd_usbmidi_lib gpu_sched snd_rawmidi drm_shmem_helper snd_seq_device drm_ttm_helper mc binfmt_misc snd_pcm ttm snd_timer snd drm_kms_helper raspberrypi_gpiomem rp1_adc nvmem_rmem joydev sg hid_multitouch uio_pdrv_genirq uio drm cuse i2c_dev fuse dm_mod drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[    7.040926] CPU: 3 PID: 342 Comm: (udev-worker) Tainted: G         C         6.6.58-v8-16k+ #13
[    7.049663] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[    7.055515] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    7.062505] pc : __memset+0x16c/0x188
[    7.066182] lr : gfx_v11_0_sw_init+0x3e4/0x8e0 [amdgpu]
[    7.071851] sp : ffffffc0809c3680
[    7.075171] x29: ffffffc0809c3680 x28: ffffff8116e00000 x27: 0000000000000006
[    7.082337] x26: ffffff8116e27910 x25: 0000000000000001 x24: ffffff8116e44000
[    7.089502] x23: ffffff8116e16298 x22: ffffff8116e44000 x21: 0000000000004000
[    7.096667] x20: ffffff8116e10000 x19: ffffff8116e14000 x18: 0000000000000000
[    7.103832] x17: ffffffb179c55000 x16: ffffffd084cee680 x15: 0000000000000100
[    7.110996] x14: 0e61ca1e19683ac2 x13: 0000000000000000 x12: 0000000000102fff
[    7.118161] x11: 0000000000000000 x10: ffffff81fef855e0 x9 : 0000000000000000
[    7.125326] x8 : ffffffc08093f000 x7 : 0000000000000000 x6 : 000000000000003f
[    7.132490] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004
[    7.139654] x2 : 0000000000003fc0 x1 : 0000000000000000 x0 : ffffffc08093f000
[    7.146820] Call trace:
[    7.149269]  __memset+0x16c/0x188
[    7.152593]  amdgpu_device_init+0x106c/0x2208 [amdgpu]
[    7.158111]  amdgpu_driver_load_kms+0x20/0x1a8 [amdgpu]
[    7.163712]  amdgpu_pci_probe+0x154/0x420 [amdgpu]
[    7.168874]  local_pci_probe+0x48/0xb8
[    7.172635]  pci_device_probe+0xac/0x1c8
[    7.176568]  really_probe+0x150/0x2c0
[    7.180241]  __driver_probe_device+0x80/0x140
[    7.184611]  driver_probe_device+0x44/0x170
[    7.188805]  __driver_attach+0x9c/0x1b0
[    7.192651]  bus_for_each_dev+0x80/0xe8
[    7.196499]  driver_attach+0x2c/0x40
[    7.200083]  bus_add_driver+0xec/0x218
[    7.203841]  driver_register+0x68/0x138
[    7.207688]  __pci_register_driver+0x54/0x68
[    7.211970]  amdgpu_init+0x6c/0xff8 [amdgpu]
[    7.216610]  do_one_initcall+0x60/0x2c0
[    7.220457]  do_init_module+0x60/0x218
[    7.224218]  load_module+0x1dd0/0x2080
[    7.227977]  __do_sys_init_module+0x19c/0x1e0
[    7.232346]  __arm64_sys_init_module+0x24/0x38
[    7.236803]  invoke_syscall+0x50/0x128
[    7.240564]  el0_svc_common.constprop.0+0xc8/0xf0
[    7.245285]  do_el0_svc+0x24/0x38
[    7.248609]  el0_svc+0x40/0xe8
[    7.251670]  el0t_64_sync_handler+0x100/0x130
[    7.256039]  el0t_64_sync+0x190/0x198
[    7.259712] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428) 
[    7.265829] ---[ end trace 0000000000000000 ]---

Let the memset detections commence!

@geerlingguy
Copy link
Owner Author

@Coreforge - what's the simplest way to get a debug loop going for these faults? I don't see the file that's hit in the kernel panic, it would help a lot to get a debugger going or something, but last time I ran into issues trying to get it set up.

@geerlingguy geerlingguy changed the title Test AMD Radeon Pro W7700 GPU Test AMD Radeon Pro W7700 & RX 7700 XT GPUs Nov 5, 2024
@Coreforge
Copy link

The easiest way I've found is to look up the relevant functions using something like elixir or cscope. Usually you'll want to look up the function in the link register. From there, it's either fairly obvious, or you'll have to look at other code to figure out which calls are causing issues (or I guess placing printks also works, or make some educated guesses).
Another issue that pops up are accesses to structs. With those, the symbol name shown in the link register generally had .constprop in it. Changing those structs to be volatile has worked well so far, but there could be cases where that doesn't work.

Since the code for all of these cards is quite similar, it's probably also enough to just transfer the changes from gfx10 to gfx11. I can do that tomorrow if I remember. The general process should also work the same for other cards/drivers, but only if they cause faults (which it didn't look like nouveau does).

I haven't gotten addr2line working so far on kernel modules, but that might be another option.
On the pi 5 I haven't needed it, but on the pi 4, debugging with gdb over jtag was also useful, as the pi 4 didn't produce fault messages reliably.

@DanaGoyette
Copy link

To debug the kernel, I'll usually use ./scripts/decode_stacktrace.sh in the kernel source tree. You supply the kernel image as a positional argument and the stacktrace text on stdin. You can even pipe dmesg directly into the script.

@geerlingguy
Copy link
Owner Author

@DanaGoyette - Oh cool! TIL, going to give that a go. But probably tomorrow now since it's the end of the day and I'm just seeing your message lol. @Coreforge thanks for the notes. If you get to it tomorrow great, otherwise there's a chance I can get to it later or next week (I have a video going up in the morning tomorrow—unrelated to this, and some other errands to run!).

@Coreforge
Copy link

I've put the patch up in a gist.
I'm not sure if I got everything, so there might be a few places I missed.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 8, 2024

@Coreforge - Patch applied cleanly, recompiled and tested.

Got some different faults (a bunch); one below then the rest in this gist: https://gist.github.com/geerlingguy/05c34678d2802af271635da3b794a8b3

[    5.864254] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/sdma_6_0_3.bin failed with error -2
[    5.864266] [drm:sdma_v6_0_sw_init [amdgpu]] *ERROR* Failed to load sdma firmware!
[    5.864786] [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP block <sdma_v6_0> failed -19
[    5.865178] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed
[    5.865182] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
[    5.865185] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
[    5.865339] ------------[ cut here ]------------
[    5.865341] WARNING: CPU: 0 PID: 327 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:623 amdgpu_irq_put+0xa8/0xc8 [amdgpu]
[    5.865715] Modules linked in: aes_ce_blk aes_ce_cipher ghash_ce gf128mul binfmt_misc brcmfmac_wcc hci_uart btbcm bluetooth amdgpu(+) ecdh_generic ecc sha2_ce sha256_arm64 sha1_ce brcmfmac libaes brcmutil raspberrypi_hwmon cfg80211 vc4 i2c_brcmstb spi_bcm2835 snd_soc_hdmi_codec cec drm_dma_helper gpio_keys rpivid_hevc(C) snd_soc_core pwm_fan amdxcp drm_exec snd_compress snd_pcm_dmaengine pisp_be v4l2_mem2mem i2c_algo_bit drm_buddy drm_suballoc_helper snd_pcm drm_display_helper rfkill videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videodev joydev v3d snd_timer snd drm_ttm_helper hid_apple gpu_sched ttm sg drm_shmem_helper videobuf2_common mc drm_kms_helper raspberrypi_gpiomem rp1_adc hid_multitouch nvmem_rmem uio_pdrv_genirq uio drm cuse i2c_dev fuse dm_mod drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[    5.865808] CPU: 0 PID: 327 Comm: (udev-worker) Tainted: G         C         6.6.58-v8-16k+ #13
[    5.865813] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[    5.865815] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    5.865819] pc : amdgpu_irq_put+0xa8/0xc8 [amdgpu]
[    5.866183] lr : amdgpu_fence_driver_hw_fini+0x124/0x168 [amdgpu]
[    5.866543] sp : ffffffc08093b760
[    5.866545] x29: ffffffc08093b760 x28: ffffffd0417d17b8 x27: 0000000000000001
[    5.866551] x26: ffffff8103400010 x25: ffffff8103400000 x24: ffffff8103400000
[    5.866556] x23: ffffff8103427910 x22: ffffff8103400010 x21: ffffff81034104c8
[    5.866561] x20: ffffff81034100e8 x19: ffffff81034185a0 x18: 00000000fffffffe
[    5.866565] x17: 733c206b636f6c62 x16: ffffffd08412e698 x15: 696e695f7773202a
[    5.866570] x14: 0000000000000001 x13: 2e65636976656420 x12: 676e696873696e69
[    5.866575] x11: 66203a757067646d x10: ffffffd0856c3b40 x9 : ffffffd0412ec224
[    5.866580] x8 : ffffffc08093b788 x7 : 0000000000000000 x6 : 80000000fffff000
[    5.866585] x5 : 0000000000000000 x4 : ffffff8103400000 x3 : 0000000000000000
[    5.866589] x2 : 0000000000000000 x1 : ffffff8103427910 x0 : ffffff8101c27b40
[    5.866594] Call trace:
[    5.866596]  amdgpu_irq_put+0xa8/0xc8 [amdgpu]
[    5.866957]  amdgpu_device_fini_hw+0xb4/0x388 [amdgpu]
[    5.867316]  amdgpu_driver_load_kms+0x11c/0x1a8 [amdgpu]
[    5.867682]  amdgpu_pci_probe+0x154/0x420 [amdgpu]
[    5.868044]  local_pci_probe+0x48/0xb8
[    5.868051]  pci_device_probe+0xac/0x1c8
[    5.868054]  really_probe+0x150/0x2c0
[    5.868059]  __driver_probe_device+0x80/0x140
[    5.868062]  driver_probe_device+0x44/0x170
[    5.868065]  __driver_attach+0x9c/0x1b0
[    5.868068]  bus_for_each_dev+0x80/0xe8
[    5.868074]  driver_attach+0x2c/0x40
[    5.868077]  bus_add_driver+0xec/0x218
[    5.868080]  driver_register+0x68/0x138
[    5.868084]  __pci_register_driver+0x54/0x68
[    5.868087]  amdgpu_init+0x6c/0xff8 [amdgpu]
[    5.868447]  do_one_initcall+0x60/0x2c0
[    5.868451]  do_init_module+0x60/0x218
[    5.868457]  load_module+0x1dd0/0x2080
[    5.868460]  __do_sys_init_module+0x19c/0x1e0
[    5.868462]  __arm64_sys_init_module+0x24/0x38
[    5.868465]  invoke_syscall+0x50/0x128
[    5.868471]  el0_svc_common.constprop.0+0xc8/0xf0
[    5.868476]  do_el0_svc+0x24/0x38
[    5.868481]  el0_svc+0x40/0xe8
[    5.868485]  el0t_64_sync_handler+0x100/0x130
[    5.868488]  el0t_64_sync+0x190/0x198
[    5.868491] ---[ end trace 0000000000000000 ]---
...
[see the rest in the gist linked above]

@Coreforge
Copy link

Looks like you're still missing the sdma firmware.
Those traces are just from warnings, not faults, by the looks of it. It seems to not be shutting down entirely cleanly, but I don't think that should be too relevant, as it doesn't happen during normal operation.

@martinx72
Copy link

martinx72 commented Nov 9, 2024

Looks like you're still missing the sdma firmware. Those traces are just from warnings, not faults, by the looks of it. It seems to not be shutting down entirely cleanly, but I don't think that should be too relevant, as it doesn't happen during normal operation.

with my RX7700,

pi@raspberrypi:~ $ lspci
0000:00:00.0 PCI bridge: Broadcom Inc. and subsidiaries BCM2712 PCIe Bridge (rev 21)
0000:01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 11)
0000:02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 11)
0000:03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 32 [Radeon RX 7700 XT / 7800 XT] (rev ff)
0000:03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
0001:00:00.0 PCI bridge: Broadcom Inc. and subsidiaries BCM2712 PCIe Bridge (rev 21)
0001:01:00.0 Ethernet controller: Raspberry Pi Ltd RP1 PCIe 2.0 South Bridge
0000:01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 11) (prog-if 00 [Normal decode])
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 38
        Region 0: Memory at 1b80200000 (32-bit, non-prefetchable) [size=16K]
        Bus: primary=01, secondary=02, subordinate=03, sec-latency=0
        I/O behind bridge: [disabled] [32-bit]
        Memory behind bridge: 80000000-801fffff [size=2M] [32-bit]
        Prefetchable memory behind bridge: 1800000000-1817ffffff [size=384M] [32-bit]
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Express (v2) Upstream Port, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ SlotPowerLimit 0W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (downgraded), Width x1 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR+
                         10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 4
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: Routing+ 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: EgressBlck-
                LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
                LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [270 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: LaneErr at lane: 0
        Capabilities: [320 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [400 v1] Data Link Feature <?>
        Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
        Capabilities: [440 v1] Lane Margining at the Receiver <?>
        Kernel driver in use: pcieport

0000:02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 11) (prog-if 00 [Normal decode])
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 39
        Bus: primary=02, secondary=03, subordinate=03, sec-latency=0
        I/O behind bridge: [disabled] [32-bit]
        Memory behind bridge: 80000000-801fffff [size=2M] [32-bit]
        Prefetchable memory behind bridge: 1800000000-1817ffffff [size=384M] [32-bit]
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Express (v2) Downstream Port (Slot-), MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0
                        ExtTag+ RBE+
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <1us
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 16GT/s, Width x16
                        TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR+
                         10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 4
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- ARIFwd-
                         AtomicOpsCap: Routing+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled, ARIFwd-
                         AtomicOpsCtl: EgressBlck-
                LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
                LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -3.5dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 000000ffffffe000  Data: 0008
        Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [270 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [2a0 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [400 v1] Data Link Feature <?>
        Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
        Capabilities: [450 v1] Lane Margining at the Receiver <?>
        Kernel driver in use: pcieport

0000:03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 32 [Radeon RX 7700 XT / 7800 XT] (rev ff) (prog-if 00 [VGA controller])
        Subsystem: Sapphire Technology Limited Navi 32 [Radeon RX 7700 XT / 7800 XT]
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 189
        Region 0: Memory at 1800000000 (64-bit, prefetchable) [size=256M]
        Region 2: Memory at 1810000000 (64-bit, prefetchable) [size=2M]
        Region 5: Memory at 1b80000000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at 1b80100000 [disabled] [size=128K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <1us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 16GT/s, Width x16
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
                         EmergencyPowerReduction Form Factor Dev Specific, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
                LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 000000ffffffe000  Data: 0009
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [200 v1] Physical Resizable BAR
                BAR 0: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB
                BAR 2: current size: 2MB, supported: 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB
        Capabilities: [240 v1] Power Budgeting <?>
        Capabilities: [270 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [2a0 v1] Access Control Services
                ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [2d0 v1] Process Address Space ID (PASID)
                PASIDCap: Exec+ Priv+, Max PASID Width: 10
                PASIDCtl: Enable- Exec- Priv-
        Capabilities: [320 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
        Capabilities: [450 v1] Lane Margining at the Receiver <?>
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu

0000:03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin B routed to IRQ 255
        Region 0: Memory at 1b80120000 (32-bit, non-prefetchable) [disabled] [size=16K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <1us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 16GT/s, Width x16
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
                         EmergencyPowerReduction Form Factor Dev Specific, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [2a0 v1] Access Control Services
                ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

download the missing firmware via

cd /usr/lib/firmware/amdgpu/
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/sdma_6_0_3.bin

here is what it hits with that latest patch

[    4.947050] [drm] VCN(1) encode/decode are enabled in VM mode
[    4.962648] amdgpu 0000:03:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
[    4.978504] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    4.978524] amdgpu 0000:03:00.0: amdgpu: PCIE atomic ops is not supported
[    4.978530] [drm] GPU posting now...
[    4.978564] amdgpu 0000:03:00.0: amdgpu: MEM ECC is not presented.
[    4.978567] amdgpu 0000:03:00.0: amdgpu: SRAM ECC is not presented.
[    4.978582] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[    4.978607] amdgpu 0000:03:00.0: BAR 2: releasing [mem 0x1810000000-0x18101fffff 64bit pref]
[    4.978612] amdgpu 0000:03:00.0: BAR 0: releasing [mem 0x1800000000-0x180fffffff 64bit pref]
[    4.978638] pcieport 0000:02:00.0: BAR 9: releasing [mem 0x1800000000-0x1817ffffff 64bit pref]
[    4.978642] pcieport 0000:01:00.0: BAR 9: releasing [mem 0x1800000000-0x1817ffffff 64bit pref]
[    4.978646] pcieport 0000:00:00.0: BAR 9: releasing [mem 0x1800000000-0x1817ffffff 64bit pref]
[    4.978657] pcieport 0000:00:00.0: BAR 9: no space for [mem size 0x600000000 64bit pref]
[    4.978660] pcieport 0000:00:00.0: BAR 9: failed to assign [mem size 0x600000000 64bit pref]
[    4.978664] pcieport 0000:01:00.0: BAR 9: no space for [mem size 0x600000000 64bit pref]
[    4.978667] pcieport 0000:01:00.0: BAR 9: failed to assign [mem size 0x600000000 64bit pref]
[    4.978670] pcieport 0000:02:00.0: BAR 9: no space for [mem size 0x600000000 64bit pref]
[    4.978673] pcieport 0000:02:00.0: BAR 9: failed to assign [mem size 0x600000000 64bit pref]
[    4.978676] amdgpu 0000:03:00.0: BAR 0: no space for [mem size 0x400000000 64bit pref]
[    4.978679] amdgpu 0000:03:00.0: BAR 0: failed to assign [mem size 0x400000000 64bit pref]
[    4.978682] amdgpu 0000:03:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[    4.978685] amdgpu 0000:03:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
[    4.978688] pcieport 0000:00:00.0: PCI bridge to [bus 01-03]
[    4.978691] pcieport 0000:00:00.0:   bridge window [mem 0x1b80000000-0x1b802fffff]
[    4.978696] pcieport 0000:00:00.0: PCI bridge to [bus 01-03]
[    4.978699] pcieport 0000:00:00.0:   bridge window [mem 0x1b80000000-0x1b802fffff]
[    4.978702] pcieport 0000:00:00.0:   bridge window [mem 0x1800000000-0x1817ffffff 64bit pref]
[    4.978705] pcieport 0000:01:00.0: PCI bridge to [bus 02-03]
[    4.978710] pcieport 0000:01:00.0:   bridge window [mem 0x1b80000000-0x1b801fffff]
[    4.978713] pcieport 0000:01:00.0:   bridge window [mem 0x1800000000-0x1817ffffff 64bit pref]
[    4.978720] pcieport 0000:02:00.0: PCI bridge to [bus 03]
[    4.978725] pcieport 0000:02:00.0:   bridge window [mem 0x1b80000000-0x1b801fffff]
[    4.978728] pcieport 0000:02:00.0:   bridge window [mem 0x1800000000-0x1817ffffff 64bit pref]
[    4.978741] [drm] Not enough PCI address space for a large BAR.
[    4.978745] amdgpu 0000:03:00.0: BAR 0: assigned [mem 0x1800000000-0x180fffffff 64bit pref]
[    4.978756] amdgpu 0000:03:00.0: BAR 2: assigned [mem 0x1810000000-0x18101fffff 64bit pref]
[    4.978768] amdgpu 0000:03:00.0: amdgpu: VRAM: 12272M 0x0000008000000000 - 0x00000082FEFFFFFF (12272M used)
[    4.978772] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    4.978775] amdgpu 0000:03:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[    4.978782] [drm] Detected VRAM RAM=12272M, BAR=256M
[    4.978784] [drm] RAM width 192bits GDDR6
[    4.978955] [drm] amdgpu: 12272M of VRAM memory ready
[    4.978960] [drm] amdgpu: 2022M of GTT memory ready.
[    4.978989] [drm] GART: num cpu pages 32768, num gpu pages 131072
[    4.979084] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[    5.011743] [drm] Loading DMUB firmware via PSP: version=0x07001900
[    5.038717] [drm] Found VCN firmware Version ENC: 1.11 DEC: 5 VEP: 0 Revision: 12
[    5.038749] amdgpu 0000:03:00.0: amdgpu: Will use PSP to load VCN firmware
[    5.041961] Unable to handle kernel paging request at virtual address ffffc00082618000
[    5.166094] Mem abort info:
[    5.169584]   ESR = 0x0000000096000061
[    5.175079]   EC = 0x25: DABT (current EL), IL = 32 bits
[    5.183432]   SET = 0, FnV = 0
[    5.186630]   EA = 0, S1PTW = 0
[    5.190463]   FSC = 0x21: alignment fault
[    5.195842] Data abort info:
[    5.199408]   ISV = 0, ISS = 0x00000061, ISS2 = 0x00000000
[    5.208680]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[    5.219150]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    5.226818] swapper pgtable: 16k pages, 47-bit VAs, pgdp=0000000001384000
[    5.236481] [ffffc00082618000] pgd=10000000161bc003, p4d=10000000161bc003, pud=10000000161bc003, pmd=100000001a8e8003, pte=006800001a440f0f
[    5.249124] Internal error: Oops: 0000000096000061 [#1] PREEMPT SMP
[    5.255422] Modules linked in: spidev amdgpu(+) hci_uart btbcm aes_ce_blk aes_ce_cipher bluetooth ghash_ce gf128mul joydev sha2_ce sha256_arm64 ecdh_generic sha1_ce brcmfmac_wcc ecc snd_soc_rpi_simple_soundcard libaes raspberrypi_hwmon brcmfmac i2c_algo_bit i2c_brcmstb brcmutil vc4 cfg80211 drm_exec snd_soc_hdmi_codec spi_bcm2835 drm_suballoc_helper amdxcp drm_buddy drm_ttm_helper cec drm_display_helper ttm snd_soc_pcm5102a drm_dma_helper gpio_keys v3d pisp_be rpivid_hevc(C) binfmt_misc rfkill drm_kms_helper v4l2_mem2mem videobuf2_dma_contig gpu_sched videobuf2_memops videobuf2_v4l2 pwm_fan videodev designware_i2s drm_shmem_helper videobuf2_common snd_soc_core mc snd_compress snd_pcm_dmaengine snd_pcm raspberrypi_gpiomem snd_timer snd rp1_adc hid_logitech_dj nvmem_rmem uio_pdrv_genirq uio drm uinput i2c_dev fuse dm_mod drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[    5.333913] CPU: 0 PID: 319 Comm: (udev-worker) Tainted: G         C         6.6.51-v8-16k+ #1
[    5.342562] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[    5.348415] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    5.355404] pc : __memset+0x16c/0x188
[    5.359082] lr : mes_v11_0_sw_init+0xbc/0x370 [amdgpu]
[    5.364743] sp : ffffc000809e3740
[    5.368064] x29: ffffc000809e3740 x28: ffff80001ba8fd60 x27: ffff80001bac15d0
[    5.375230] x26: 000000000000000a x25: 0000000000000001 x24: ffff80001bac0000
[    5.382395] x23: ffffd0002154ced0 x22: ffff800016e8c000 x21: ffff80001bac1340
[    5.389560] x20: 0000000000000000 x19: ffff80001ba80000 x18: ffffffffffffffff
[    5.396725] x17: 6f69736976655220 x16: ffffd00084cd8ec0 x15: ffffc000809e3610
[    5.403889] x14: ffff80009bab3cb7 x13: ffff80001bab3cc4 x12: 204e43562064616f
[    5.411054] x11: 0000000000000220 x10: ffff8000ffd865b0 x9 : 0000000000000000
[    5.418219] x8 : ffffc00082618000 x7 : 0000000000000000 x6 : 000000000000003f
[    5.425384] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004
[    5.432548] x2 : 0000000000003fc0 x1 : 0000000000000000 x0 : ffffc00082618000
[    5.439714] Call trace:
[    5.442162]  __memset+0x16c/0x188
[    5.445488]  amdgpu_device_init+0x106c/0x2220 [amdgpu]
[    5.451037]  amdgpu_driver_load_kms+0x20/0x1a8 [amdgpu]
[    5.456669]  amdgpu_pci_probe+0x154/0x420 [amdgpu]
[    5.461863]  pci_device_probe+0xa0/0x148
[    5.465800]  really_probe+0x150/0x2c0
[    5.469474]  __driver_probe_device+0x80/0x140
[    5.473844]  driver_probe_device+0x44/0x170
[    5.478040]  __driver_attach+0x9c/0x1b0
[    5.481887]  bus_for_each_dev+0x80/0xe8
[    5.485733]  driver_attach+0x2c/0x40
[    5.489318]  bus_add_driver+0xec/0x218
[    5.493076]  driver_register+0x68/0x138
[    5.496924]  __pci_register_driver+0x54/0x68
[    5.501207]  amdgpu_init+0x6c/0x3ff8 [amdgpu]
[    5.505964]  do_one_initcall+0x60/0x2c0
[    5.509812]  do_init_module+0x60/0x218
[    5.513573]  load_module+0x1de0/0x2090
[    5.517333]  __do_sys_init_module+0x19c/0x1e0
[    5.521705]  __arm64_sys_init_module+0x24/0x38
[    5.526163]  invoke_syscall+0x50/0x128
[    5.529924]  el0_svc_common.constprop.0+0xc8/0xf0
[    5.534645]  do_el0_svc+0x24/0x38
[    5.537969]  el0_svc+0x40/0xe8
[    5.541030]  el0t_64_sync_handler+0x100/0x130
[    5.545399]  el0t_64_sync+0x190/0x198
[    5.549073] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
[    5.555189] ---[ end trace 0000000000000000 ]---
[    5.881339] input: Logitech M315/M235 as /devices/platform/axi/1000120000.pcie/1f00300000.usb/xhci-hcd.1/usb3/3-1/3-1:1.2/0003:046D:C52B.0003/0003:046D:4009.0004/input/input20

What i noticed are

[    4.978676] amdgpu 0000:03:00.0: BAR 0: no space for [mem size 0x400000000 64bit pref]
[    4.978679] amdgpu 0000:03:00.0: BAR 0: failed to assign [mem size 0x400000000 64bit pref]
[    4.978682] amdgpu 0000:03:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[    4.978685] amdgpu 0000:03:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]

and

[    4.978741] [drm] Not enough PCI address space for a large BAR.

And the full dmesg log is attached here:
dmesg_rx7700_Nov09.txt

@Coreforge
Copy link

Looks like the MES wasn't being used on the 6700xt and 6600xt. I updated the gist, so it should get further now.

Those messages about not being able to get a large BAR are normal, the pi5 doesn't provide enough PCIe address space for one port to get a large BAR on 12GB cards (8GB cards seem to be able to get one). I don't know yet if that is due to hardware limitations or if that's something that could be changed, I haven't been successful at getting it to work yet at least.
Other than a performance hit, it shouldn't matter much though.

@geerlingguy
Copy link
Owner Author

Okay, grabbed the latest version of your gist, and installed the additional firmware bit:

cd /usr/lib/firmware/amdgpu/
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/sdma_6_0_3.bin

After a recompile and reboot... it gets further and just one more memset it's triggering here:

[    6.503657] [drm] Found VCN firmware Version ENC: 1.11 DEC: 5 VEP: 0 Revision: 12
[    6.503711] amdgpu 0000:03:00.0: amdgpu: Will use PSP to load VCN firmware
[    6.537244] Unable to handle kernel paging request at virtual address ffffffc081b8b000
[    6.545476] Mem abort info:
[    6.548906]   ESR = 0x0000000096000061
[    6.552701]   EC = 0x25: DABT (current EL), IL = 32 bits
[    6.558104]   SET = 0, FnV = 0
[    6.561180]   EA = 0, S1PTW = 0
[    6.564342]   FSC = 0x21: alignment fault
[    6.568372] Data abort info:
[    6.571256]   ISV = 0, ISS = 0x00000061, ISS2 = 0x00000000
[    6.576770]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[    6.581841]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    6.587178] swapper pgtable: 4k pages, 39-bit VAs, pgdp=000000000137e000
[    6.593914] [ffffffc081b8b000] pgd=1000000100165003, p4d=1000000100165003, pud=1000000100165003, pmd=10000001114dc003, pte=0068000113300f0f
[    6.606507] Internal error: Oops: 0000000096000061 [#1] PREEMPT SMP
[    6.612799] Modules linked in: spidev amdgpu(+) binfmt_misc hci_uart brcmfmac_wcc btbcm bluetooth vc4 aes_ce_blk aes_ce_cipher ghash_ce gf128mul sha2_ce sha256_arm64 sha1_ce raspberrypi_hwmon brcmfmac ecdh_generic ecc libaes brcmutil cfg80211 snd_soc_hdmi_codec rpivid_hevc(C) cec pisp_be amdxcp drm_dma_helper drm_exec snd_soc_core i2c_brcmstb v4l2_mem2mem videobuf2_dma_contig i2c_algo_bit videobuf2_memops videobuf2_v4l2 snd_compress spi_bcm2835 drm_buddy snd_pcm_dmaengine drm_suballoc_helper snd_pcm drm_display_helper videodev rfkill drm_ttm_helper gpio_keys snd_timer ttm v3d videobuf2_common mc snd gpu_sched drm_kms_helper pwm_fan hid_apple drm_shmem_helper joydev sg hid_multitouch raspberrypi_gpiomem rp1_adc nvmem_rmem uio_pdrv_genirq uio drm cuse i2c_dev dm_mod fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[    6.616218] Bluetooth: hci0: BCM: features 0x2f
[    6.686837] CPU: 1 PID: 323 Comm: (udev-worker) Tainted: G         C         6.6.58-v8-16k+ #14
[    6.686841] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[    6.686843] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    6.686846] pc : __memset+0x16c/0x188
[    6.686854] lr : psp_v13_0_bootloader_load_component+0x78/0x180 [amdgpu]
[    6.693281] Bluetooth: hci0: BCM43455 37.4MHz Raspberry Pi 3+-0190
[    6.700419] sp : ffffffc0809035f0
[    6.700421] x29: ffffffc0809035f0 x28: ffffff811050fc88 x27: 000000000000000b
[    6.700425] x26: ffffff8110541530 x25: 0000000000000001 x24: ffffff8110544000
[    6.700429] x23: ffffff8110544000 x22: 0000000000080000 x21: ffffff811053e930
[    6.700432] x20: ffffff8110500000 x19: ffffff811053e880 x18: ffffffffffffffff
[    6.700436] x17: 0000000000000000 x16: ffffffd084cee680 x15: 0000000000000100
[    6.700439] x14: ffffff8190541177 x13: 0000000010000000 x12: 000000000010bfff
[    6.700443] x11: fffffffffffff000 x10: 0000000000000fff x9 : 0000000000000000
[    6.700447] x8 : ffffffc081b8b000 x7 : 0000000000000000 x6 : 000000000000003f
[    6.700450] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004
[    6.700453] x2 : 00000000000fffc0 x1 : 0000000000000000 x0 : ffffffc081b8b000
[    6.700457] Call trace:
[    6.700459]  __memset+0x16c/0x188
[    6.700465]  psp_v13_0_bootloader_load_kdb+0x20/0x38 [amdgpu]
[    6.706529] Bluetooth: hci0: BCM4345C0 (003.001.025) build 0382
[    6.713381]  psp_hw_start+0x38c/0x530 [amdgpu]
[    6.717634] Bluetooth: hci0: BCM: Using default device address (43:45:c0:00:1f:ac)
[    6.723910]  psp_hw_init+0x84/0x300 [amdgpu]
[    6.810500] alsactl[914]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[    6.810886]  amdgpu_device_fw_loading+0x148/0x1a8 [amdgpu]
[    6.852472]  amdgpu_device_init+0x1ba0/0x2208 [amdgpu]
[    6.852980]  amdgpu_driver_load_kms+0x20/0x1a8 [amdgpu]
[    6.853342]  amdgpu_pci_probe+0x154/0x420 [amdgpu]
[    6.853699]  local_pci_probe+0x48/0xb8
[    6.853704]  pci_device_probe+0xac/0x1c8
[    6.853706]  really_probe+0x150/0x2c0
[    6.853710]  __driver_probe_device+0x80/0x140
[    6.853713]  driver_probe_device+0x44/0x170
[    6.853716]  __driver_attach+0x9c/0x1b0
[    6.853718]  bus_for_each_dev+0x80/0xe8
[    6.853724]  driver_attach+0x2c/0x40
[    6.853726]  bus_add_driver+0xec/0x218
[    6.853729]  driver_register+0x68/0x138
[    6.853732]  __pci_register_driver+0x54/0x68
[    6.853735]  amdgpu_init+0x6c/0xff8 [amdgpu]
[    6.854094]  do_one_initcall+0x60/0x2c0
[    6.854098]  do_init_module+0x60/0x218
[    6.854104]  load_module+0x1dd0/0x2080
[    6.854106]  __do_sys_init_module+0x19c/0x1e0
[    6.854108]  __arm64_sys_init_module+0x24/0x38
[    6.854111]  invoke_syscall+0x50/0x128
[    6.854117]  el0_svc_common.constprop.0+0xc8/0xf0
[    6.854121]  do_el0_svc+0x24/0x38
[    6.854125]  el0_svc+0x40/0xe8
[    6.854128]  el0t_64_sync_handler+0x100/0x130
[    6.854131]  el0t_64_sync+0x190/0x198
[    6.854135] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428) 
[    6.854137] ---[ end trace 0000000000000000 ]---

@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 10, 2024

I made one modification and recompiled...

diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
index fe1995ed13be..ddf7c4b2b9e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
@@ -215,7 +215,7 @@ static int psp_v13_0_bootloader_load_component(struct psp_context  	*psp,
 	if (ret)
 		return ret;
 
-	memset(psp->fw_pri_buf, 0, PSP_1_MEG);
+	memset_io(psp->fw_pri_buf, 0, PSP_1_MEG);
 
 	/* Copy PSP KDB binary to memory */
 	memcpy(psp->fw_pri_buf, bin_desc->start_addr, bin_desc->size_bytes);

However, when I rebooted, I think the Pi froze (it does this sometimes, it just hangs when I issue the reboot command), and I'm remote, so I won't be able to debug further until Monday. Maybe that fixed it, maybe not, ha!

Edit: caught another memset:

[    6.949097] Call trace:
[    6.951546]  __memset+0x16c/0x188
[    6.954871]  psp_hw_start+0x13c/0x530 [amdgpu]
[    6.959692]  psp_hw_init+0x84/0x300 [amdgpu]
[    6.964333]  amdgpu_device_fw_loading+0x148/0x1a8 [amdgpu]
[    6.970195]  amdgpu_device_init+0x1ba0/0x2208 [amdgpu]
[    6.975707]  amdgpu_driver_load_kms+0x20/0x1a8 [amdgpu]
[    6.981305]  amdgpu_pci_probe+0x154/0x420 [amdgpu]

Will debug later!

@Coreforge
Copy link

There's a memcpy right after that memset that will also cause issues, and there's another set of those in the same file again. I updated the gist to also fix those.

I've also had it hang on reboot when it couldn't initialize the GPU properly, though that hasn't been a big deal for me.

@geerlingguy
Copy link
Owner Author

@Coreforge - Thanks; just applied and rebooted, same hang going on, but I forgot to mention, at some point (who knows when, I closed out of my VPN session last night after 20 minutes or so), it did finally reboot. So I'll do the same and check in again in a few hours. 🤞

@geerlingguy
Copy link
Owner Author

Woohoo!

[    6.959764] amdgpu 0000:03:00.0: amdgpu: GECC will be enabled in next boot cycle
[    6.977121] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
[    6.977130] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[    6.977168] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000003d, smu fw if version = 0x00000040, smu fw program = 0, smu fw version = 0x00505100 (80.81.0)
[    6.977175] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
[    7.063908] amdgpu 0000:03:00.0: amdgpu: SMU is initialized successfully!
[    7.064437] [drm] Display Core v3.2.247 initialized on DCN 3.2
[    7.064442] [drm] DP-HDMI FRL PCON supported
[    7.067000] [drm] DMUB hardware initialized: version=0x07001900
[    7.343403] [drm] kiq ring mec 3 pipe 1 q 0
[    7.348950] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[    7.349203] amdgpu 0000:03:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[    7.351724] [drm] Creating a new EEPROM table
[    7.640083] amdgpu 0000:03:00.0: amdgpu: SE 3, SH per SE 2, CU per SH 10, active_cu_number 48
[    7.640797] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[    7.640800] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    7.640802] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    7.640803] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[    7.640805] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[    7.640806] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[    7.640808] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[    7.640809] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[    7.640811] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[    7.640812] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[    7.640814] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[    7.640815] amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[    7.640817] amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8
[    7.640819] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 4 on hub 8
[    7.640820] amdgpu 0000:03:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
[    7.644837] amdgpu 0000:03:00.0: amdgpu: Using BACO for runtime pm
[    7.645847] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:03:00.0 on minor 2
[    7.689271] [drm] DSC precompute is not needed.
[    7.852634] Console: switching to colour frame buffer device 480x135
[    7.878185] amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device

No clue what's on the display though, I didn't set up a remote camera to check lol. I'll see how it's working in the morning. @Coreforge thanks for all your help!

@Coreforge
Copy link

Looks good so far.

@martinx72
Copy link

Confirmed, I just also applied the latest patch, and it displays via HDMI now.
cool!

image

image

@geerlingguy
Copy link
Owner Author

I wonder if @6by9 or @pelwell might have some advice about increasing the BAR space on Pi 5? I haven't attempted tweaking it at all since the days of the CM4 (see my guide for CM4 BAR space).

Right now these higher-VRAM cards are all setting resizable bar to 256 MB (example below is 6700 XT):

[    4.978741] [drm] Not enough PCI address space for a large BAR.
[    4.978745] amdgpu 0000:03:00.0: BAR 0: assigned [mem 0x1800000000-0x180fffffff 64bit pref]
[    4.978756] amdgpu 0000:03:00.0: BAR 2: assigned [mem 0x1810000000-0x18101fffff 64bit pref]
[    4.978768] amdgpu 0000:03:00.0: amdgpu: VRAM: 12272M 0x0000008000000000 - 0x00000082FEFFFFFF (12272M used)
[    4.978772] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    4.978775] amdgpu 0000:03:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[    4.978782] [drm] Detected VRAM RAM=12272M, BAR=256M

@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 10, 2024

Indeed, it's working!

IMG_0508

glmark2-es2 800x600 windowed on 4K display is getting 2356, which is right in the same range as on the RX 6700 XT. Which means there's definitely a massive constraint on bandwidth; CPU is not maxing out, so probably the x1 PCIe Gen 3 lane. I noticed the CPU might be maxing out a single thread though—is glmark2 bound to a single core? I noticed I can't get Tx/Rx bandwidth in nvtop on this card.

OBS 30 (installed via Pi-Apps) didn't seem to know what to do with the GPU, there were no warnings in dmesg when I launched it and only x264 was available for an encoder. Maybe I have to compile from source to see if it will pick up GPU on arm64?

I installed Blender using sudo apt install blender, but it won't launch. I open it, it seems to do something for a second or so, then nothing. (And no logs in dmesg). Would be interesting if I could wire up Blender with arm64 and an external GPU, but I believe it might not have even worked on the Ampere, unless I installed Blender in Windows (and then, it doesn't have a GPU driver to work with regardless). Might have to revisit 'productivity' apps later.

@Coreforge
Copy link

I'm not sure if the vaapi drivers are installed by default, so you might have to install those (although I expect the same issues I have on the 6700xt to be present on these cards too, I need to continue debugging those).

The 6700xt was a lot more CPU heavy than the 460 in some loads for me, which I think might be due to not having a large BAR. glmark2 shouldn't be that throughput constrained that a good gen3x1 connection will limit it that much, but it's still definitely not getting close to the maximum these cards can do (the 6700xt in my PC got something around 5000-6000).
Another thing I noticed though (I wrote about that in the 6000 series issue) is that different desktop environments perform differently. Regular LXDE performed the best for me, though I'm interested if glmark2-drm does better without the overhead of a desktop compositor.

@Coreforge
Copy link

Blender currently isn't in the debian testing repository, so I'd need to build it from source to test it, which might be a bit annoying with some libraries currently apparently not being available.

@KhazAkar
Copy link

Regarding LLMs... I was reading up on Ollama's potential support for Vulkan, which is useful even for x86 because many AMD cards aren't supported through ROCm. Also saw mention of LM Studio potentially supporting Vulkan... but I don't see any support for arm64 on their downloads page.

I tried downloading and running the x86 AppImage anyway, but got:

Warning: Weak Symbol OPENSSL_memory_get_size not found, cannot apply R_X86_64_JUMP_SLOT @0x109c4aac0 (0x96d7e06)
munmap_chunk(): invalid pointer
NativeBT: /tmp/.mount_LM_StuGr5mZe/lm-studio() [0x34a59154]
NativeBT: linux-vdso.so.1(__kernel_rt_sigreturn+0) [0x7fb6bb07a0]
NativeBT: /lib/aarch64-linux-gnu/libc.so.6(+0x80a50) [0x7fb6a20a50]
NativeBT: /lib/aarch64-linux-gnu/libc.so.6(gsignal+0x1c) [0x7fb69da72c]
NativeBT: /lib/aarch64-linux-gnu/libc.so.6(abort+0xf0) [0x7fb69c747c]
NativeBT: /lib/aarch64-linux-gnu/libc.so.6(+0x74aac) [0x7fb6a14aac]
NativeBT: /lib/aarch64-linux-gnu/libc.so.6(+0x8aeac) [0x7fb6a2aeac]
NativeBT: /lib/aarch64-linux-gnu/libc.so.6(+0x8b0ac) [0x7fb6a2b0ac]
NativeBT: /lib/aarch64-linux-gnu/libc.so.6(__libc_free+0x70) [0x7fb6a2f704]
NativeBT: [0x7fb2469dac]
EmulatedBT: box64(free+0) [0x300000a0]
EmulatedBT: /tmp/.mount_LM_StuGr5mZe/lm-studio+94b6fe0 [0x1094b6fe0]
EmulatedBT: /tmp/.mount_LM_StuGr5mZe/lm-studio(__libc_csu_init+45) [0x1022d5fd5]
EmulatedBT: box64(ExitEmulation+0) [0x30000080]
EmulatedBT: /tmp/.mount_LM_StuGr5mZe/lm-studio(+2a) [0x101fa902a]
20859|SIGABRT @0x7fb6a20a50 (???(/lib/aarch64-linux-gnu/libc.so.6+0x80a50)) (x64pc=0x300000a0/"???", rsp=0x7fb64e7098, stack=0x7fb5ce8000:0x7fb64e8000 own=(nil) fp=0x7fb64e7190), for accessing 0x3e80000517b (code=-6/prot=0), db=(nil)((nil):(nil)/(nil):(nil)/???:clean, hash:0/0) handler=(nil)
RSP-0x20:0x0000000000000000 RSP-0x18:0x0000007fb64e7130 RSP-0x10:0x0000000109de2020 RSP-0x08:0x0000007fb64e7190
RSP+0x00:0x00000001094b6b57 RSP+0x08:0x0000000000000000 RSP+0x10:0x0000000000000000 RSP+0x18:0x0000000000000000
RAX:0x0000007fb64e70f1 RCX:0x0000000000000031 RDX:0x00000078002108d8 RBX:0x0000000109de1ed0 
RSP:0x0000007fb64e7098 RBP:0x0000007fb64e7190 RSI:0x0000007fb6a39e40 RDI:0x00000078002108d0 
 R8:0x0000007fb64e7110  R9:0x0000000000000000 R10:0x0000000000000000 R11:0x0000000000000060 
R12:0x0000000000000000 R13:0x0000000109de1908 R14:0x0000007fb64e7130 R15:0x0000000109de2020 
ES:0x002b CS:0x0033 SS:0x002b DS:0x002b FS:0x0043 GS:0x0053 
Aborted

Would be interesting to see if LLM offload could bypass AMD's lack of support for ROCm.

For that, you can test gpt4all.io since Ollama Vulkan support is not there yet. Eventually, you can check ollama fork, called ollama-for-amd - https://github.com/likelovewant/ollama-for-amd - I run it on my x86 laptop with RX5500M dGPU.

@0cc4m
Copy link

0cc4m commented Nov 17, 2024

Both ollama and LM Studio use my Vulkan backend for llama.cpp. You can try building that directly and running one of the example programs, it should work on the Pi. If it detects the Pi's internal GPU first you can select the Radeon GPU using the environment variable GGML_VK_VISIBLE_DEVICES and setting that to the index of the device according to vulkaninfo --summary.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 18, 2024

@0cc4m - It looks like those Vulkan instructions are built off Windows x86 — do you have specific instructions for Linux? It looks like I could download the Vulkan SDK separately for Linux, but how can I build with the SDK dependencies on Linux (w64devkit.exe doesn't run there).

Oh... read through the rest, and it looks like:

# Install Vulkan SDK and cmake
sudo apt install -y libvulkan-dev cmake

# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# Build with Vulkan
cmake -B build -DGGML_VULKAN=1
cmake --build build --config Release
# Test the output binary (with "-ngl 33" to offload all layers to GPU)
./bin/llama-cli -m "PATH_TO_MODEL" -p "Hi you how are you" -n 50 -e -ngl 33 -t 4

# You should see in the output, ggml_vulkan detected your GPU. For example:
# ggml_vulkan: Using Intel(R) Graphics (ADL GT2) | uma: 1 | fp16: 1 | warp size: 32

However, I get the error:

-- Using runtime weight conversion of Q4_0 to Q4_0_x_x to enable optimized GEMM/GEMV kernels
-- Including CPU backend
CMake Warning at ggml/src/ggml-amx/CMakeLists.txt:106 (message):
  AMX requires x86 and gcc version > 11.0.  Turning off GGML_AMX.


CMake Error at /usr/share/cmake-3.25/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find Vulkan (missing: glslc) (found version "1.3.239")
Call Stack (most recent call first):
  /usr/share/cmake-3.25/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake-3.25/Modules/FindVulkan.cmake:597 (find_package_handle_standard_args)
  ggml/src/ggml-vulkan/CMakeLists.txt:1 (find_package)

So trying some more...

vulkaninfo does return a ton of information:

pi@pi5-pcie:~/Downloads/llama.cpp $ vulkaninfo
WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Failed to CreateInstance in ICD 0.  Skipping ICD.
'DISPLAY' environment variable not set... skipping surface info
==========
VULKANINFO
==========

Vulkan Instance Version: 1.3.239


Instance Extensions: count = 22
===============================
	VK_EXT_acquire_drm_display             : extension revision 1
	VK_EXT_acquire_xlib_display            : extension revision 1
	VK_EXT_debug_report                    : extension revision 10
... [condensed for brevity] ...

Also looking at the Vulkan SDK download site linked from those docs, the 1.3.296.0 (latest) version only has files under x86_64, not sure if it's compatible with arm64?

@geerlingguy
Copy link
Owner Author

Trying again with the Docker method...

# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# Build the Docker image
docker build -t llama-cpp-vulkan -f .devops/llama-cli-vulkan.Dockerfile .

However, this also fails:

3.489 W: https://packages.lunarg.com/vulkan/dists/jammy/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
3.538 Reading package lists...
4.625 Building dependency tree...
4.885 Reading state information...
4.925 E: Unable to locate package vulkan-sdk
------
llama-cli-vulkan.Dockerfile:9
--------------------
   8 |     # Install Vulkan SDK
   9 | >>> RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
  10 | >>>     wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list && \
  11 | >>>     apt update -y && \
  12 | >>>     apt-get install -y vulkan-sdk
  13 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - &&     wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list &&     apt update -y &&     apt-get install -y vulkan-sdk" did not complete successfully: exit code: 100

I'm guessing that package is only built for x86 (see: https://packages.lunarg.com/vulkan/1.3.296/dists/jammy/main/).

@0cc4m
Copy link

0cc4m commented Nov 18, 2024

Could NOT find Vulkan (missing: glslc)

@geerlingguy You don't need the Vulkan SDK, the libvulkan-dev package just doesn't contain the shader compiler glslc that's also needed. That should be in a package called either glslc or shaderc.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 18, 2024

@0cc4m Ah, indeed! Just did a sudo apt install glslc and compile was successful:

pi@pi5-pcie:~/Downloads/llama.cpp $ cmake -B build -DGGML_VULKAN=1
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- OpenMP found
-- Using llamafile
-- ARM detected
-- Using runtime weight conversion of Q4_0 to Q4_0_x_x to enable optimized GEMM/GEMV kernels
-- Including CPU backend
CMake Warning at ggml/src/ggml-amx/CMakeLists.txt:106 (message):
  AMX requires x86 and gcc version > 11.0.  Turning off GGML_AMX.


-- Found Vulkan: /usr/lib/aarch64-linux-gnu/libvulkan.so (found version "1.3.239") found components: glslc missing components: glslangValidator
-- Vulkan found
-- Including Vulkan backend
-- Configuring done
-- Generating done
-- Build files have been written to: /home/pi/Downloads/llama.cpp/build

Then the Release build:

pi@pi5-pcie:~/Downloads/llama.cpp $ cmake --build build --config Release
[  1%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o
[  1%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o
[  2%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o
[  2%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o
[  3%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o
[  4%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o
[  4%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml-aarch64.c.o
...
[taking a nice long time, with a number of warnings but so far no errors]

@0cc4m
Copy link

0cc4m commented Nov 18, 2024

Great! The initial run will probably be stuck for a while on the shader compile step due to the low CPU power, but (assuming it works just as it does on AMD64) the shaders should be cached afterwards. You can download a GGUF model from Huggingface to try it, for example Llama 8B Instruct Q4_K_S.

@geerlingguy
Copy link
Owner Author

Testing it on llama3.1:8b as you linked above:

# Download llama3.1:8b
cd models
wget https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf

# Test it
pi@pi5-pcie:~/Downloads/llama.cpp/build $ ./bin/llama-cli -m "../models/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf" -p "Why is the blue sky blue?" -n 50 -e -ngl 33 -t 4
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6700 XT (RADV NAVI22) (radv) | uma: 0 | fp16: 1 | warp size: 64
build: 4125 (531cb1c2) with cc (Debian 12.2.0-14) 12.2.0 for aarch64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_load_model_from_file: using device Vulkan0 (AMD Radeon RX 6700 XT (RADV NAVI22)) - 12032 MiB free
llama_model_loader: loaded meta data with 33 key-value pairs and 292 tensors from ../models/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf (version GGUF V3 (latest))
...
llm_load_print_meta: max token length = 256
ggml_vulkan: Compiling shaders..............................Done!
radv/amdgpu: Failed to allocate a buffer:
radv/amdgpu:    size      : 4290244608 bytes
radv/amdgpu:    alignment : 262144 bytes
radv/amdgpu:    domains   : 4
ggml_vulkan: Device memory allocation of size 4290240768 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
llama_model_load: error loading model: unable to allocate Vulkan0 buffer
llama_load_model_from_file: failed to load model
common_init_from_params: failed to load model '../models/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf'
main: error: unable to load model

@geerlingguy
Copy link
Owner Author

Trying with a smaller model (llama-3.2-1b-instruct-Q4_K_M.gguf, I get:

pi@pi5-pcie:~/Downloads/llama.cpp/build $ ./bin/llama-cli -m "../models/llama-3.2-1b-instruct-Q4_K_M.gguf" -p "Why is the blue sky blue?" -n 50 -e -ngl 33 -t 4
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6700 XT (RADV NAVI22) (radv) | uma: 0 | fp16: 1 | warp size: 64
build: 4125 (531cb1c2) with cc (Debian 12.2.0-14) 12.2.0 for aarch64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_load_model_from_file: using device Vulkan0 (AMD Radeon RX 6700 XT (RADV NAVI22)) - 12032 MiB free
llama_model_loader: loaded meta data with 31 key-value pairs and 147 tensors from ../models/llama-3.2-1b-instruct-Q4_K_M.gguf (version GGUF V3 (latest))
...
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 4

system_info: n_threads = 4 (n_threads_batch = 4) / 4 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | AMX_INT8 = 0 | FMA = 0 | NEON = 1 | SVE = 0 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 

sampler seed: 38238577
sampler params: 
	repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
	dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = -1
	top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
generate: n_ctx = 4096, n_batch = 2048, n_predict = 50, n_keep = 1

Why is the blue sky blue? The answer lies in the way light behaves.

## Step 1: Understanding the concept of light scattering
The sky appears blue because of the way light behaves when it enters our atmosphere. When sunlight enters Earth's atmosphere, it encounters tiny molecules of gases

llama_perf_sampler_print:    sampling time =      11.01 ms /    58 runs   (    0.19 ms per token,  5269.37 tokens per second)
llama_perf_context_print:        load time =    4013.49 ms
llama_perf_context_print: prompt eval time =      63.48 ms /     8 tokens (    7.93 ms per token,   126.03 tokens per second)
llama_perf_context_print:        eval time =     759.89 ms /    49 runs   (   15.51 ms per token,    64.48 tokens per second)
llama_perf_context_print:       total time =     937.79 ms /    57 tokens

@geerlingguy
Copy link
Owner Author

Looks like the memory allocation issue may be a problem on some other cards / models too: ggerganov/llama.cpp#5441

But would be nice to run a larger model on this card with it's 16 GB of RAM...

@KhazAkar
Copy link

@geerlingguy would be great to have radeontop output while running even small model. Eventually tweaking shared memory size between host and GPU and vulkaninfo to see exactly what's going on

@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 18, 2024

Also testing Llama-3.2-3B-Instruct-Q4_K_M.gguf:

cd models && wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
cd ../build

pi@pi5-pcie:~/Downloads/llama.cpp/build $ ./bin/llama-cli -m "../models/Llama-3.2-3B-Instruct-Q4_K_M.gguf" -p "Why is the blue sky blue?" -n 50 -e -ngl 33 -t 4
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6700 XT (RADV NAVI22) (radv) | uma: 0 | fp16: 1 | warp size: 64
...
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
generate: n_ctx = 4096, n_batch = 2048, n_predict = 50, n_keep = 1

Why is the blue sky blue? The question has puzzled humans for centuries. The answer is simple yet profound: it is due to a phenomenon called Rayleigh scattering, named after the British physicist Lord Rayleigh, who first explained it in the late 19th century.

## Step 

llama_perf_sampler_print:    sampling time =      12.50 ms /    58 runs   (    0.22 ms per token,  4638.52 tokens per second)
llama_perf_context_print:        load time =    8654.67 ms
llama_perf_context_print: prompt eval time =     147.54 ms /     8 tokens (   18.44 ms per token,    54.22 tokens per second)
llama_perf_context_print:        eval time =    1023.49 ms /    49 runs   (   20.89 ms per token,    47.88 tokens per second)
llama_perf_context_print:       total time =    1290.36 ms /    57 tokens

Note: All the above tests are on my RX 6700 XT, I forgot I have that plugged in right now lol.

@0cc4m
Copy link

0cc4m commented Nov 18, 2024

Can you upload the output of vulkaninfo on the Pi?

@geerlingguy
Copy link
Owner Author

@0cc4m Here it is:

vulkaninfo.txt

@0cc4m
Copy link

0cc4m commented Nov 18, 2024

@0cc4m Here it is:

vulkaninfo.txt

Looks good. There have been cases of allocations failing even though according to the driver they should be fine. The vulkaninfo output shows a max allocation size of 4294967292, but apparently that doesn't work.

I built a workaround for that, can you try setting the environment variable GGML_VK_FORCE_MAX_ALLOCATION_SIZE to something smaller, like 2147483648 for ~2GB or 1073741824 for ~1GB and then load the 8B model again?

@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 18, 2024

@0cc4m I tried:

$ export GGML_VK_FORCE_MAX_ALLOCATION_SIZE=2147483648
$ ./bin/llama-cli -m "../models/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf" -p "Why is the blue sky blue?" -n 50 -e -ngl 33 -t 4
...
llm_load_print_meta: EOG token        = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llama_model_load: error loading model: stoi
llama_load_model_from_file: failed to load model
common_init_from_params: failed to load model '../models/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf'
main: error: unable to load model
Aborted

Then at 1 GB:

$ export GGML_VK_FORCE_MAX_ALLOCATION_SIZE=1073741824
$ ./bin/llama-cli -m "../models/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf" -p "Why is the blue sky blue?" -n 50 -e -ngl 33 -t 4
...
generate: n_ctx = 4096, n_batch = 2048, n_predict = 50, n_keep = 1

Why is the blue sky blue? | Science Quiz
The blue color of the sky is due to a phenomenon called Rayleigh scattering, named after the British physicist Lord Rayleigh, who first described it in the late 19th century. The idea behind Rayleigh scattering is that when

llama_perf_sampler_print:    sampling time =      13.72 ms /    58 runs   (    0.24 ms per token,  4228.64 tokens per second)
llama_perf_context_print:        load time =   17101.33 ms
llama_perf_context_print: prompt eval time =     253.23 ms /     8 tokens (   31.65 ms per token,    31.59 tokens per second)
llama_perf_context_print:        eval time =    1234.30 ms /    49 runs   (   25.19 ms per token,    39.70 tokens per second)
llama_perf_context_print:       total time =    1609.77 ms /    57 tokens

Yay, 1 GB seems to have worked!

@KhazAkar
Copy link

What's the GPU memory usage when running such experiments? Does it allocate 1GB on GPU, or more, or less?

@geerlingguy
Copy link
Owner Author

@KhazAkar - 4 GB while loading, 5 GB while running:

Screenshot 2024-11-18 at 12 04 56 PM

(Again, note this is the RX 6700 XT)

@0cc4m
Copy link

0cc4m commented Nov 18, 2024

llama_model_load: error loading model: stoi

That's a bug, very odd that I didn't see that before.. std::stoi doesn't have the capacity for the 2GB number. I should have used std::stoul there. My bad, I'll fix it.

What's the GPU memory usage when running such experiments? Does it allocate 1GB on GPU, or more, or less?

It doesn't mean that it only allocates 1GB, it means it allocates 1GB chunks. The maximum size buffer I can allocate on most devices with Vulkan is 4GB, but somehow that's not working here.

This is an ARM64 OS, right?

@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 18, 2024

@0cc4m - Yes, Pi OS Bookworm (Based on Debian 12), arm64:

Linux pi5-pcie 6.6.51+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt3 (2024-10-08) aarch64 GNU/Linux

@0cc4m
Copy link

0cc4m commented Nov 18, 2024

Alright, then it's some driver thing again. 2GB should work if we just reduce it by one, try 2147483647, might give you a little more performance.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 18, 2024

@0cc4m - Thanks for the help here. I'm doing a little benchmarking over here: geerlingguy/ollama-benchmark#1

Now I'm tempted to buy a couple more AMD GPUs to see how things go with lower priced cards...

Edit: Also, 2GB does indeed work reducing it by one bit :)

This was referenced Nov 18, 2024
@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 19, 2024

(I moved a conversation about kernel debugging over to #684)

@jordan4ibanez
Copy link

jordan4ibanez commented Dec 1, 2024

Tested with RPI5 8GB. OEM Radeon RX 6800 XT. On the stock 64 bit image of rpiOS based on debian 12. I installed KDE and messed up the auto login for LightDM. This had a side effect of kicking me back into the default desktop environment and turning on GPU acceleration and it's smooth as butter! Running this with 3 4k monitors. I will boot into KDE and test games now.

holyshmoly

@jordan4ibanez
Copy link

I thought you'd enjoy the funny image of a mac pro with a pi bolted into it running 3 4k monitors.

mac_pi_pro

I had done some more testing. Raspberry pi os keeps trying to open up random things like the accessibility menu while I'm typing. Sometimes the keyboard lags and I have no idea why. I couldn't get sddm or kde to launch. It was saying that the $DISPLAY was not defined or it could not connect to the X server. I couldn't get box64 or box86 (whichever is the pi-apps steam installer) to launch steam. For some reason firefox had like, "a wind up" to get gpu acceleration going. I think this is due to the PCIE 3x 1x bridge lol. But other than that, I used it, it was nice. I couldn't really do any development because gfortran is stuck on version 12 on debian 12. Was a fun project though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants