Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XA61200主板上搭配RX550独显出现cpu死锁现象 #83

Open
Fearyncess opened this issue Dec 13, 2023 · 8 comments
Open

XA61200主板上搭配RX550独显出现cpu死锁现象 #83

Fearyncess opened this issue Dec 13, 2023 · 8 comments

Comments

@Fearyncess
Copy link

Fearyncess commented Dec 13, 2023

故障触发条件:在未设置其他额外amdgpu相关参数的情况下,在firefox内调用amdgpu驱动提供的VAAPI硬解接口,较长时间(3分钟到10分钟不等)持续播放任意高码率H264视频(未超出rx550硬解单元处理能力范围)。
后使用amdgpu.pcie_gen_cap=0x00020002参数强制锁定显卡仅使用PCIe2.0速率,该问题不再出现。

故障症状:图形界面死锁,其中一个cpu核心死锁,看门狗当机,键盘鼠标操作均无反应。

故障固件:UDK2018_3A6000-7A2000_Desktop_EVB_V4.0.05636-stable202311_support_fastboot_rel.fd

如何复现故障:

  • 使用XA61200主板
  • 固件使用如上提及版本
  • 使用开启了VAAPI硬解支持的浏览器(如firefox最新nightly),选择4K60或1080p60帧画质选项,编码强制选择AVC,持续播放(开启“洗脑循环”)该视频:https://www.bilibili.com/video/BV1vN4y1e7LU
死锁时的journalctl日志


Dec 13 00:34:21 Misha kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1061579, emitted seq=1061580
Dec 13 00:34:21 Misha kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 917 thread Xorg:cs0 pid 924
Dec 13 00:34:21 Misha kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
Dec 13 00:34:32 Misha kernel: watchdog: Watchdog detected hard LOCKUP on cpu 2
Dec 13 00:34:32 Misha kernel: Modules linked in: qrtr snd_hda_codec_conexant snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep kvm spi_>
Dec 13 00:34:32 Misha kernel: Sending NMI from CPU 1 to CPUs 2:
Dec 13 00:34:32 Misha kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Dec 13 00:34:32 Misha kernel: rcu:         2-...!: (0 ticks this GP) idle=6388/0/0x0 softirq=494920/494920 fqs=0 (false positive?)
Dec 13 00:34:32 Misha kernel: rcu:         3-...!: (1 ticks this GP) idle=bb6c/1/0x4000000000000000 softirq=493654/493655 fqs=0
Dec 13 00:34:32 Misha kernel: rcu:         (detected by 6, t=21025 jiffies, g=1352673, q=13 ncpus=8)
Dec 13 00:34:32 Misha kernel: rcu: rcu_preempt kthread timer wakeup didn't happen for 21032 jiffies! g1352673 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
Dec 13 00:34:32 Misha kernel: rcu:         Possible timer handling issue on cpu=2 timer-softirq=945830
Dec 13 00:34:32 Misha kernel: rcu: rcu_preempt kthread starved for 21055 jiffies! g1352673 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
Dec 13 00:34:32 Misha kernel: rcu:         Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
Dec 13 00:34:32 Misha kernel: rcu: RCU grace-period kthread stack dump:
Dec 13 00:34:32 Misha kernel: task:rcu_preempt     state:I stack:0     pid:18    tgid:18    ppid:2      flags:0x00000800
Dec 13 00:34:32 Misha kernel: Stack : 9000000005966930 0000000000000000 0000000000080000 900000000a801400
Dec 13 00:34:32 Misha kernel:         0000000000000402 900000010065e5f8 9000000004870b14 90000001006c3d08
Dec 13 00:34:32 Misha kernel:         0000000000000000 90000000049ca240 0000000000000000 9000000004879e98
Dec 13 00:34:32 Misha kernel:         90000000049c4398 90000000032c22a0 900000000a801400 900000010065df40
Dec 13 00:34:32 Misha kernel:         00000000000000b0 9000000000000004 90000000049d2008 0000000000000000
Dec 13 00:34:32 Misha kernel:         0000000000000002 86a32dec93ad0052 00000001009ffdda 86a32dec93ad0052
Dec 13 00:34:32 Misha kernel:         0000000000000001 9000000005976798 0000000000000001 90000001006c3d80
Dec 13 00:34:32 Misha kernel:         9000000005076000 9000000005080000 90000001006c3d08 900000010065df40
Dec 13 00:34:32 Misha kernel:         9000000005976000 9000000004870b14 00000001009ffdd9 9000000004878a08
Dec 13 00:34:32 Misha kernel:         0000000000000000 0000000000000000 900000000a801540 00000001009ffdd9
Dec 13 00:34:32 Misha kernel:         ...
Dec 13 00:34:32 Misha kernel: Call Trace:
Dec 13 00:34:32 Misha kernel: [<900000000486f858>] __schedule+0x5f8/0x1880
Dec 13 00:34:32 Misha kernel: [<9000000004870b14>] schedule+0x34/0x140
Dec 13 00:34:32 Misha kernel: [<9000000004878a08>] schedule_timeout+0x88/0x140
Dec 13 00:34:32 Misha kernel: [<900000000334624c>] rcu_gp_fqs_loop+0x14c/0x740
Dec 13 00:34:32 Misha kernel: [<90000000033497d8>] rcu_gp_kthread+0x238/0x280
Dec 13 00:34:32 Misha kernel: [<90000000032aec9c>] kthread+0x11c/0x140
Dec 13 00:34:32 Misha kernel: [<9000000003252208>] ret_from_kernel_thread+0xc/0xa4
Dec 13 00:34:32 Misha kernel:
Dec 13 00:34:52 Misha sshd[33192]: Accepted password for lain from 192.168.1.4 port 50826 ssh2
Dec 13 00:34:52 Misha audit[33192]: SYSCALL arch=c0000102 syscall=64 success=yes exit=4 a0=3 a1=7ffffb82e7c0 a2=4 a3=0 items=0 ppid=891 pid=33192 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgi>
Dec 13 00:34:52 Misha audit: PROCTITLE proctitle=737368643A206C61696E205B707269765D
Dec 13 00:34:52 Misha kernel: audit: type=1006 audit(1702398892.302:235): pid=33192 uid=0 subj=unconfined old-auid=4294967295 auid=1000 tty=(none) old-ses=4294967295 ses=7 res=1
Dec 13 00:34:52 Misha kernel: audit: type=1300 audit(1702398892.302:235): arch=c0000102 syscall=64 success=yes exit=4 a0=3 a1=7ffffb82e7c0 a2=4 a3=0 items=0 ppid=891 pid=33192 auid=1000 uid=0 gid=0 euid=0 sui>
Dec 13 00:34:52 Misha kernel: audit: type=1327 audit(1702398892.302:235): proctitle=737368643A206C61696E205B707269765D
Dec 13 00:34:52 Misha sshd[33192]: pam_unix(system-remote-login:session): session opened for user lain(uid=1000) by (uid=0)
Dec 13 00:34:52 Misha systemd-logind[842]: New session 7 of user lain.
Dec 13 00:35:14 Misha kernel: watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [gdbus:21686]
Dec 13 00:35:14 Misha kernel: Modules linked in: qrtr snd_hda_codec_conexant snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep kvm spi_>
Dec 13 00:35:14 Misha kernel: CPU: 5 PID: 21686 Comm: gdbus Not tainted 6.7.0-aosc-main #1
Dec 13 00:35:14 Misha kernel: Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05636-stable2
Dec 13 00:35:14 Misha kernel: pc 900000000338c9f4 ra 900000000338cb7c tp 9000000190584000 sp 9000000190587b40
Dec 13 00:35:14 Misha kernel: a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000
Dec 13 00:35:14 Misha kernel: a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000
Dec 13 00:35:14 Misha kernel: t0 0000000000000001 t1 900000000a831320 t2 0000000000000002 t3 0000000000000000
Dec 13 00:35:14 Misha kernel: t4 9000000005080000 t5 0000000000000040 t6 0000000000000001 t7 0000000000000000
Dec 13 00:35:14 Misha kernel: t8 0000000000000000 u0 900000010a440c00 s9 900000000bc31320 s0 0000000000000005
Dec 13 00:35:14 Misha kernel: s1 00000000000000b4 s2 9000000003261580 s3 0000000000000001 s4 900000000507ffd8
Dec 13 00:35:14 Misha kernel: s5 0000000000000004 s6 0000000000000001 s7 0000000000000000 s8 900000000b42b200
Dec 13 00:35:14 Misha kernel:    ra: 900000000338cb7c smp_call_function_many_cond+0x3fc/0x720
Dec 13 00:35:14 Misha kernel:   ERA: 900000000338c9f4 smp_call_function_many_cond+0x274/0x720
Dec 13 00:35:14 Misha kernel:  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
Dec 13 00:35:14 Misha kernel:  PRMD: 00000004 (PPLV0 +PIE -PWE)
Dec 13 00:35:14 Misha kernel:  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
Dec 13 00:35:14 Misha kernel:  ECFG: 00071c1c (LIE=2-4,10-12 VS=7)
Dec 13 00:35:14 Misha kernel: ESTAT: 00000800 [INT] (IS=11 ECode=0 EsubCode=0)
Dec 13 00:35:14 Misha kernel:  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
Dec 13 00:35:14 Misha kernel: CPU: 5 PID: 21686 Comm: gdbus Not tainted 6.7.0-aosc-main #1
Dec 13 00:35:14 Misha kernel: Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05636-stable2
Dec 13 00:35:14 Misha kernel: Stack : 9000000004e391a0 900000010040bcb8 9000000004864898 9000000190584000
Dec 13 00:35:14 Misha kernel:         900000010040bc00 0000000000000000 900000010040bc08 9000000004e391a0
Dec 13 00:35:14 Misha kernel:         0000000000000000 0000000000000000 0000000000000000 0000000000000000
Dec 13 00:35:14 Misha kernel:         0000000000000000 86a32dec93ad0052 0000000000000000 0000000000000000
Dec 13 00:35:14 Misha kernel:         0000000000000000 0000000000000000 0000000000000000 0000000000000000
Dec 13 00:35:14 Misha kernel:         732d36333635302e 0000000000000000 0000000006a60000 900000010040bdf0
Dec 13 00:35:14 Misha kernel:         9000000005080000 9000000004e391a0 0000000000000000 0000000000000004
Dec 13 00:35:14 Misha kernel:         0000000000000000 0000000000000016 900000000507ffd8 90000000049a4058
Dec 13 00:35:14 Misha kernel:         900000000b403940 9000000005080580 9000000003254520 00007fffdf7fafc8
Dec 13 00:35:14 Misha kernel:         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1c
Dec 13 00:35:14 Misha kernel:         ...
Dec 13 00:35:14 Misha kernel: Call Trace:
Dec 13 00:35:14 Misha kernel: [<9000000003254520>] show_stack+0x40/0x180
Dec 13 00:35:14 Misha kernel: [<9000000004864898>] dump_stack_lvl+0x78/0xc4
Dec 13 00:35:14 Misha kernel: [<90000000033ccf64>] watchdog_timer_fn+0x2c4/0x340
Dec 13 00:35:14 Misha kernel: [<900000000336da7c>] __hrtimer_run_queues+0x15c/0x400
Dec 13 00:35:14 Misha kernel: [<900000000336f088>] hrtimer_interrupt+0x128/0x2e0
Dec 13 00:35:14 Misha kernel: [<90000000032579fc>] constant_timer_interrupt+0x3c/0x60
Dec 13 00:35:14 Misha kernel: [<9000000003320010>] __handle_irq_event_percpu+0xb0/0x300
Dec 13 00:35:14 Misha kernel: [<9000000003320280>] handle_irq_event_percpu+0x20/0xa0
Dec 13 00:35:14 Misha kernel: [<9000000003327ff4>] handle_percpu_irq+0x74/0xc0
Dec 13 00:35:14 Misha kernel: [<900000000331ee50>] generic_handle_domain_irq+0x30/0x60
Dec 13 00:35:14 Misha kernel: [<9000000003e47ff0>] handle_cpu_irq+0x70/0xc0
Dec 13 00:35:14 Misha kernel: [<9000000004864c90>] handle_loongarch_irq+0x30/0x60
Dec 13 00:35:14 Misha kernel: [<9000000004864d60>] do_vint+0xa0/0x100
Dec 13 00:35:14 Misha kernel: [<900000000338c9f4>] smp_call_function_many_cond+0x274/0x720
Dec 13 00:35:14 Misha kernel: [<900000000338cff8>] on_each_cpu_cond_mask+0x58/0xe0
Dec 13 00:35:14 Misha kernel: [<9000000003261880>] flush_tlb_page+0x80/0x1e0
Dec 13 00:35:14 Misha kernel: [<900000000358b3a4>] ptep_set_access_flags+0x84/0xc0
Dec 13 00:35:14 Misha kernel: [<9000000003570fb4>] do_wp_page+0x114/0x1380
Dec 13 00:35:14 Misha kernel: [<900000000357629c>] __handle_mm_fault+0x8dc/0x15c0
Dec 13 00:35:14 Misha kernel: [<9000000003577120>] handle_mm_fault+0x1a0/0x320
Dec 13 00:35:14 Misha kernel: [<900000000487b558>] do_page_fault+0x158/0x3ec
Dec 13 00:35:14 Misha kernel: [<900000000326acb8>] tlb_do_page_fault_1+0x118/0x1b4
Dec 13 00:35:14 Misha kernel:
Dec 13 00:35:42 Misha kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Dec 13 00:35:42 Misha kernel: rcu:         2-...!: (0 ticks this GP) idle=6388/0/0x0 softirq=494920/494920 fqs=0 (false positive?)
Dec 13 00:35:42 Misha kernel: rcu:         3-...!: (1 ticks this GP) idle=bb6c/1/0x4000000000000000 softirq=493654/493655 fqs=0
Dec 13 00:35:42 Misha kernel: rcu:         (detected by 4, t=84105 jiffies, g=1352673, q=376 ncpus=8)
Dec 13 00:35:42 Misha kernel: Sending NMI from CPU 4 to CPUs 2:
Dec 13 00:35:42 Misha kernel: Unable to send backtrace IPI to CPU2 - perhaps it hung?
Dec 13 00:35:42 Misha kernel: watchdog: BUG: soft lockup - CPU#5 stuck for 48s! [gdbus:21686]

@KatyushaScarlet
Copy link

我也遇到了类似问题

  • 主板:XA612A0
  • 显卡:AMD RX590
  • 固件:202311
  • 系统:AOSC OS
  • 内核:6.7.0

附带两份日志:

3a6000-evb-rx590-aosc-6.7.0-amdgpu-dmesg.log
3a6000-evb-rx590-aosc-6.7.0-amdgpu-dmesg-2.log

@phorcys
Copy link

phorcys commented Dec 19, 2023

参考:https://bbs.loongarch.org/d/327-amdgpu/4

[LiarOnce](https://bbs.loongarch.org/u/451)
    19 天前
    已编辑

目前更新了 https://github.com/loongson/Firmware/tree/main/6000Series/PC/XA61200 的固件然后关闭 DPM 运行就正常了

内核参数参考:

GRUB_CMDLINE_LINUX="radeon.cik_support=0 radeon.si_support=0 amdgpu.cik_support=1 amdgpu.si_support=1 amdgpu.sg_display=0 amdgpu.runpm=0 amdgpu.dpm=0"

@Fearyncess
Copy link
Author

参考:https://bbs.loongarch.org/d/327-amdgpu/4

[LiarOnce](https://bbs.loongarch.org/u/451)
    19 天前
    已编辑

目前更新了 https://github.com/loongson/Firmware/tree/main/6000Series/PC/XA61200 的固件然后关闭 DPM 运行就正常了

内核参数参考:

GRUB_CMDLINE_LINUX="radeon.cik_support=0 radeon.si_support=0 amdgpu.cik_support=1 amdgpu.si_support=1 amdgpu.sg_display=0 amdgpu.runpm=0 amdgpu.dpm=0"

@phorcys 如果关闭DPM,那么显卡将不会自动调频,这会导致gpu工作频率降低

@LinuxResearcher
Copy link

参考:https://bbs.loongarch.org/d/327-amdgpu/4

[LiarOnce](https://bbs.loongarch.org/u/451)
    19 天前
    已编辑

目前更新了 https://github.com/loongson/Firmware/tree/main/6000Series/PC/XA61200 的固件然后关闭 DPM 运行就正常了

内核参数参考:

GRUB_CMDLINE_LINUX="radeon.cik_support=0 radeon.si_support=0 amdgpu.cik_support=1 amdgpu.si_support=1 amdgpu.sg_display=0 amdgpu.runpm=0 amdgpu.dpm=0"

我加上这一串参数后,感觉显示变卡了。

@xry111
Copy link

xry111 commented Feb 1, 2024

参考:https://bbs.loongarch.org/d/327-amdgpu/4

[LiarOnce](https://bbs.loongarch.org/u/451)
    19 天前
    已编辑

目前更新了 https://github.com/loongson/Firmware/tree/main/6000Series/PC/XA61200 的固件然后关闭 DPM 运行就正常了

内核参数参考:

GRUB_CMDLINE_LINUX="radeon.cik_support=0 radeon.si_support=0 amdgpu.cik_support=1 amdgpu.si_support=1 amdgpu.sg_display=0 amdgpu.runpm=0 amdgpu.dpm=0"

我加上这一串参数后,感觉显示变卡了。

这个问题就是越快,越高级的卡越容易发生,一切能让卡变慢的方法都能降低概率。

@LiarOnce
Copy link

LiarOnce commented Feb 10, 2024

对于RX550这样的北极星架构的卡其实是不太建议用我的这个内核参数的,这些参数对GCN 1.0/2.0架构生效,因为我使用的是一块R5 340 (GCN 1.0 Oland)的显卡。

过几天我会买一块RX560的显卡继续测试一下

MarsDoge added a commit to MarsDoge/Firmware that referenced this issue Mar 15, 2024
Fixes : loongson#83
        loongson#89

This is a test version, thanks!

Signed-off-by: Dongyan Qian <qiandongyan@loongson.cn>
Signed-off-by: Xiangdong Meng <mengxiangdong@loongson.cn>
@dg1vg4
Copy link

dg1vg4 commented Jun 27, 2024

如今这事终于是确定了。

@Fearyncess
Copy link
Author

Fearyncess commented Jun 28, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants