forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds HDD backplane and SAS expander board VPD to device tree #40
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
nkskjames
force-pushed
the
openbmc/add_vpd
branch
from
January 19, 2016 20:06
eaee790
to
4bbb78c
Compare
Signed-off-by: Norman James <nkskjames@gmail.com>
nkskjames
force-pushed
the
openbmc/add_vpd
branch
from
January 19, 2016 20:09
4bbb78c
to
674d7f9
Compare
shenki
pushed a commit
that referenced
this pull request
Sep 18, 2018
[ Upstream commit 78ee994 ] Because rfi_flush_fallback runs immediately before the return to userspace it currently runs with the user r1 (stack pointer). This means if we oops in there we will report a bad kernel stack pointer in the exception entry path, eg: Bad kernel stack pointer 7ffff7150e40 at c0000000000023b4 Oops: Bad kernel stack pointer, sig: 6 [#1] LE SMP NR_CPUS=32 NUMA PowerNV Modules linked in: CPU: 0 PID: 1246 Comm: klogd Not tainted 4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3 #7 NIP: c0000000000023b4 LR: 0000000010053e00 CTR: 0000000000000040 REGS: c0000000fffe7d40 TRAP: 4100 Not tainted (4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3) MSR: 9000000002803031 <SF,HV,VEC,VSX,FP,ME,IR,DR,LE> CR: 44000442 XER: 20000000 CFAR: c00000000000bac8 IRQMASK: c0000000f1e66a80 GPR00: 0000000002000000 00007ffff7150e40 00007fff93a99900 0000000000000020 ... NIP [c0000000000023b4] rfi_flush_fallback+0x34/0x80 LR [0000000010053e00] 0x10053e00 Although the NIP tells us where we were, and the TRAP number tells us what happened, it would still be nicer if we could report the actual exception rather than barfing about the stack pointer. We an do that fairly simply by loading the kernel stack pointer on entry and restoring the user value before returning. That way we see a regular oops such as: Unrecoverable exception 4100 at c00000000000239c Oops: Unrecoverable exception, sig: 6 [#1] LE SMP NR_CPUS=32 NUMA PowerNV Modules linked in: CPU: 0 PID: 1251 Comm: klogd Not tainted 4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty #40 NIP: c00000000000239c LR: 0000000010053e00 CTR: 0000000000000040 REGS: c0000000f1e17bb0 TRAP: 4100 Not tainted (4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty) MSR: 9000000002803031 <SF,HV,VEC,VSX,FP,ME,IR,DR,LE> CR: 44000442 XER: 20000000 CFAR: c00000000000bac8 IRQMASK: 0 ... NIP [c00000000000239c] rfi_flush_fallback+0x3c/0x80 LR [0000000010053e00] 0x10053e00 Call Trace: [c0000000f1e17e30] [c00000000000b9e4] system_call+0x5c/0x70 (unreliable) Note this shouldn't make the kernel stack pointer vulnerable to a meltdown attack, because it should be flushed from the cache before we return to userspace. The user r1 value will be in the cache, because we load it in the return path, but that is harmless. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shenki
pushed a commit
that referenced
this pull request
Oct 21, 2018
[ Upstream commit 3fcbb82 ] In 4.19-rc1, Eugeniy reported weird boot and IO errors on ARC HSDK | INFO: task syslogd:77 blocked for more than 10 seconds. | Not tainted 4.19.0-rc1-00007-gf213acea4e88 #40 | "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this | message. | syslogd D 0 77 76 0x00000000 | | Stack Trace: | __switch_to+0x0/0xac | __schedule+0x1b2/0x730 | io_schedule+0x5c/0xc0 | __lock_page+0x98/0xdc | find_lock_entry+0x38/0x100 | shmem_getpage_gfp.isra.3+0x82/0xbfc | shmem_fault+0x46/0x138 | handle_mm_fault+0x5bc/0x924 | do_page_fault+0x100/0x2b8 | ret_from_exception+0x0/0x8 He bisected to 84c6591 ("locking/atomics, asm-generic/bitops/lock.h: Rewrite using atomic_fetch_*()") This commit however only unmasked the real issue introduced by commit 4aef66c ("locking/atomic, arch/arc: Fix build") which missed the retry-if-scond-failed branch in atomic_fetch_##op() macros. The bisected commit started using atomic_fetch_##op() macros for building the rest of atomics. Fixes: 4aef66c ("locking/atomic, arch/arc: Fix build") Reported-by: Eugeniy Paltsev <paltsev@synopsys.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: wrote changelog] Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shenki
pushed a commit
that referenced
this pull request
Jun 12, 2019
…text commit 0c9e8b3 upstream. stub_probe() and stub_disconnect() call functions which could call sleeping function in invalid context whil holding busid_lock. Fix the problem by refining the lock holds to short critical sections to change the busid_priv fields. This fix restructures the code to limit the lock holds in stub_probe() and stub_disconnect(). stub_probe(): [15217.927028] BUG: sleeping function called from invalid context at mm/slab.h:418 [15217.927038] in_atomic(): 1, irqs_disabled(): 0, pid: 29087, name: usbip [15217.927044] 5 locks held by usbip/29087: [15217.927047] #0: 0000000091647f28 (sb_writers#6){....}, at: vfs_write+0x191/0x1c0 [15217.927062] #1: 000000008f9ba75b (&of->mutex){....}, at: kernfs_fop_write+0xf7/0x1b0 [15217.927072] #2: 00000000872e5b4b (&dev->mutex){....}, at: __device_driver_lock+0x3b/0x50 [15217.927082] #3: 00000000e74ececc (&dev->mutex){....}, at: __device_driver_lock+0x46/0x50 [15217.927090] #4: 00000000b20abbe0 (&(&busid_table[i].busid_lock)->rlock){....}, at: get_busid_priv+0x48/0x60 [usbip_host] [15217.927103] CPU: 3 PID: 29087 Comm: usbip Tainted: G W 5.1.0-rc6+ #40 [15217.927106] Hardware name: Dell Inc. OptiPlex 790/0HY9JP, BIOS A18 09/24/2013 [15217.927109] Call Trace: [15217.927118] dump_stack+0x63/0x85 [15217.927127] ___might_sleep+0xff/0x120 [15217.927133] __might_sleep+0x4a/0x80 [15217.927143] kmem_cache_alloc_trace+0x1aa/0x210 [15217.927156] stub_probe+0xe8/0x440 [usbip_host] [15217.927171] usb_probe_device+0x34/0x70 stub_disconnect(): [15279.182478] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908 [15279.182487] in_atomic(): 1, irqs_disabled(): 0, pid: 29114, name: usbip [15279.182492] 5 locks held by usbip/29114: [15279.182494] #0: 0000000091647f28 (sb_writers#6){....}, at: vfs_write+0x191/0x1c0 [15279.182506] #1: 00000000702cf0f3 (&of->mutex){....}, at: kernfs_fop_write+0xf7/0x1b0 [15279.182514] #2: 00000000872e5b4b (&dev->mutex){....}, at: __device_driver_lock+0x3b/0x50 [15279.182522] #3: 00000000e74ececc (&dev->mutex){....}, at: __device_driver_lock+0x46/0x50 [15279.182529] #4: 00000000b20abbe0 (&(&busid_table[i].busid_lock)->rlock){....}, at: get_busid_priv+0x48/0x60 [usbip_host] [15279.182541] CPU: 0 PID: 29114 Comm: usbip Tainted: G W 5.1.0-rc6+ #40 [15279.182543] Hardware name: Dell Inc. OptiPlex 790/0HY9JP, BIOS A18 09/24/2013 [15279.182546] Call Trace: [15279.182554] dump_stack+0x63/0x85 [15279.182561] ___might_sleep+0xff/0x120 [15279.182566] __might_sleep+0x4a/0x80 [15279.182574] __mutex_lock+0x55/0x950 [15279.182582] ? get_busid_priv+0x48/0x60 [usbip_host] [15279.182587] ? reacquire_held_locks+0xec/0x1a0 [15279.182591] ? get_busid_priv+0x48/0x60 [usbip_host] [15279.182597] ? find_held_lock+0x94/0xa0 [15279.182609] mutex_lock_nested+0x1b/0x20 [15279.182614] ? mutex_lock_nested+0x1b/0x20 [15279.182618] kernfs_remove_by_name_ns+0x2a/0x90 [15279.182625] sysfs_remove_file_ns+0x15/0x20 [15279.182629] device_remove_file+0x19/0x20 [15279.182634] stub_disconnect+0x6d/0x180 [usbip_host] [15279.182643] usb_unbind_device+0x27/0x60 Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shenki
pushed a commit
that referenced
this pull request
May 24, 2021
[ Upstream commit d5027ca ] Ritesh reported a bug [1] against UML, noting that it crashed on startup. The backtrace shows the following (heavily redacted): (gdb) bt ... #26 0x0000000060015b5d in sem_init () at ipc/sem.c:268 #27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2 #28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72 ... #40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359 ... #44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486 #45 0x00007f8990968b85 in __getgrnam_r [...] #46 0x00007f89909d6b77 in grantpt [...] #47 0x00007f8990a9394e in __GI_openpty [...] #48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407 #49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598 #50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45 #51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334 #52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144 indicating that the UML function openpty_cb() calls openpty(), which internally calls __getgrnam_r(), which causes the nsswitch machinery to get started. This loads, through lots of indirection that I snipped, the libcom_err.so.2 library, which (in an unknown function, "??") calls sem_init(). Now, of course it wants to get libpthread's sem_init(), since it's linked against libpthread. However, the dynamic linker looks up that symbol against the binary first, and gets the kernel's sem_init(). Hajime Tazaki noted that "objcopy -L" can localize a symbol, so the dynamic linker wouldn't do the lookup this way. I tried, but for some reason that didn't seem to work. Doing the same thing in the linker script instead does seem to work, though I cannot entirely explain - it *also* works if I just add "VERSION { { global: *; }; }" instead, indicating that something else is happening that I don't really understand. It may be that explicitly doing that marks them with some kind of empty version, and that's different from the default. Explicitly marking them with a version breaks kallsyms, so that doesn't seem to be possible. Marking all the symbols as local seems correct, and does seem to address the issue, so do that. Also do it for static link, nsswitch libraries could still be loaded there. [1] https://bugs.debian.org/983379 Reported-by: Ritesh Raj Sarraf <rrs@debian.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Tested-By: Ritesh Raj Sarraf <rrs@debian.org> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
shenki
pushed a commit
that referenced
this pull request
Jul 16, 2021
[ Upstream commit 1a45200 ] In 'bt878_irq', the driver calls 'tasklet_schedule', but this tasklet is set in 'dvb_bt8xx_load_card' of another driver 'dvb-bt8xx'. However, this two drivers are separate. The user may not load the 'dvb-bt8xx' driver when loading the 'bt8xx' driver, that is, the tasklet has not been initialized when 'tasklet_schedule' is called, so it is necessary to check whether the tasklet is initialized in 'bt878_probe'. Fix this by adding a check at the end of bt878_probe. The KASAN's report reveals it: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 800000006aab2067 P4D 800000006aab2067 PUD 6b2ea067 PMD 0 Oops: 0010 [#1] PREEMPT SMP KASAN PTI CPU: 2 PID: 8724 Comm: syz-executor.0 Not tainted 4.19.177- gdba4159c14ef-dirty #40 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59- gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010: (null) Code: Bad RIP value. RSP: 0018:ffff88806c287ea0 EFLAGS: 00010246 RAX: fffffbfff1b01774 RBX: dffffc0000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 1ffffffff1b01775 RDI: 0000000000000000 RBP: ffff88806c287f00 R08: fffffbfff1b01774 R09: fffffbfff1b01774 R10: 0000000000000001 R11: fffffbfff1b01773 R12: 0000000000000000 R13: ffff88806c29f530 R14: ffffffff8d80bb88 R15: ffffffff8d80bb90 FS: 00007f6b550e6700(0000) GS:ffff88806c280000(0000) knlGS: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 000000005ec98000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> tasklet_action_common.isra.17+0x141/0x420 kernel/softirq.c:522 tasklet_action+0x50/0x70 kernel/softirq.c:540 __do_softirq+0x224/0x92c kernel/softirq.c:292 invoke_softirq kernel/softirq.c:372 [inline] irq_exit+0x15a/0x180 kernel/softirq.c:412 exiting_irq arch/x86/include/asm/apic.h:535 [inline] do_IRQ+0x123/0x1e0 arch/x86/kernel/irq.c:260 common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:670 </IRQ> RIP: 0010:__do_sys_interrupt kernel/sys.c:2593 [inline] RIP: 0010:__se_sys_interrupt kernel/sys.c:2584 [inline] RIP: 0010:__x64_sys_interrupt+0x5b/0x80 kernel/sys.c:2584 Code: ba 00 04 00 00 48 c7 c7 c0 99 31 8c e8 ae 76 5e 01 48 85 c0 75 21 e8 14 ae 24 00 48 c7 c3 c0 99 31 8c b8 0c 00 00 00 0f 01 c1 <31> db e8 fe ad 24 00 48 89 d8 5b 5d c3 48 c7 c3 ea ff ff ff eb ec RSP: 0018:ffff888054167f10 EFLAGS: 00000212 ORIG_RAX: ffffffffffffffde RAX: 000000000000000c RBX: ffffffff8c3199c0 RCX: ffffc90001ca6000 RDX: 000000000000001a RSI: ffffffff813478fc RDI: ffffffff8c319dc0 RBP: ffff888054167f18 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000080 R11: fffffbfff18633b7 R12: ffff888054167f58 R13: ffff88805f638000 R14: 0000000000000000 R15: 0000000000000000 do_syscall_64+0xb0/0x4e0 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4692a9 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f6b550e5c48 EFLAGS: 00000246 ORIG_RAX: 000000000000014f RAX: ffffffffffffffda RBX: 000000000077bf60 RCX: 00000000004692a9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000140 RBP: 00000000004cf7eb R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000077bf60 R13: 0000000000000000 R14: 000000000077bf60 R15: 00007fff55a1dca0 Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) CR2: 0000000000000000 ---[ end trace 68e5849c3f77cbb6 ]--- RIP: 0010: (null) Code: Bad RIP value. RSP: 0018:ffff88806c287ea0 EFLAGS: 00010246 RAX: fffffbfff1b01774 RBX: dffffc0000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 1ffffffff1b01775 RDI: 0000000000000000 RBP: ffff88806c287f00 R08: fffffbfff1b01774 R09: fffffbfff1b01774 R10: 0000000000000001 R11: fffffbfff1b01773 R12: 0000000000000000 R13: ffff88806c29f530 R14: ffffffff8d80bb88 R15: ffffffff8d80bb90 FS: 00007f6b550e6700(0000) GS:ffff88806c280000(0000) knlGS: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 000000005ec98000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
shenki
pushed a commit
that referenced
this pull request
Jun 5, 2022
…frame() [ Upstream commit 9be4c88 ] The following KASAN warning is detected by QEMU. ================================================================== BUG: KASAN: stack-out-of-bounds in unwind_frame+0x508/0x870 Read of size 4 at addr c36bba90 by task cat/163 CPU: 1 PID: 163 Comm: cat Not tainted 5.10.0-rc1 #40 Hardware name: ARM-Versatile Express [<c0113fac>] (unwind_backtrace) from [<c010e71c>] (show_stack+0x10/0x14) [<c010e71c>] (show_stack) from [<c0b805b4>] (dump_stack+0x98/0xb0) [<c0b805b4>] (dump_stack) from [<c0b7d658>] (print_address_description.constprop.0+0x58/0x4bc) [<c0b7d658>] (print_address_description.constprop.0) from [<c031435c>] (kasan_report+0x154/0x170) [<c031435c>] (kasan_report) from [<c0113c44>] (unwind_frame+0x508/0x870) [<c0113c44>] (unwind_frame) from [<c010e298>] (__save_stack_trace+0x110/0x134) [<c010e298>] (__save_stack_trace) from [<c01ce0d8>] (stack_trace_save+0x8c/0xb4) [<c01ce0d8>] (stack_trace_save) from [<c0313520>] (kasan_set_track+0x38/0x60) [<c0313520>] (kasan_set_track) from [<c0314cb8>] (kasan_set_free_info+0x20/0x2c) [<c0314cb8>] (kasan_set_free_info) from [<c0313474>] (__kasan_slab_free+0xec/0x120) [<c0313474>] (__kasan_slab_free) from [<c0311e20>] (kmem_cache_free+0x7c/0x334) [<c0311e20>] (kmem_cache_free) from [<c01c35dc>] (rcu_core+0x390/0xccc) [<c01c35dc>] (rcu_core) from [<c01013a8>] (__do_softirq+0x180/0x518) [<c01013a8>] (__do_softirq) from [<c0135214>] (irq_exit+0x9c/0xe0) [<c0135214>] (irq_exit) from [<c01a40e4>] (__handle_domain_irq+0xb0/0x110) [<c01a40e4>] (__handle_domain_irq) from [<c0691248>] (gic_handle_irq+0xa0/0xb8) [<c0691248>] (gic_handle_irq) from [<c0100b0c>] (__irq_svc+0x6c/0x94) Exception stack(0xc36bb928 to 0xc36bb970) b920: c36bb9c0 00000000 c0126919 c0101228 c36bb9c0 b76d7730 b940: c36b8000 c36bb9a0 c3335b00 c01ce0d8 00000003 c36bba3c c36bb940 c36bb978 b960: c010e298 c011373c 60000013 ffffffff [<c0100b0c>] (__irq_svc) from [<c011373c>] (unwind_frame+0x0/0x870) [<c011373c>] (unwind_frame) from [<00000000>] (0x0) The buggy address belongs to the page: page:(ptrval) refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0x636bb flags: 0x0() raw: 00000000 00000000 ef867764 00000000 00000000 00000000 ffffffff 00000000 page dumped because: kasan: bad access detected addr c36bba90 is located in stack of task cat/163 at offset 48 in frame: stack_trace_save+0x0/0xb4 this frame has 1 object: [32, 48) 'trace' Memory state around the buggy address: c36bb980: f1 f1 f1 f1 00 04 f2 f2 00 00 f3 f3 00 00 00 00 c36bba00: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 >c36bba80: 00 00 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 ^ c36bbb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c36bbb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== There is a same issue on x86 and has been resolved by the commit f7d27c3 ("x86/mm, kasan: Silence KASAN warnings in get_wchan()"). The solution could be applied to arm architecture too. Signed-off-by: Lin Yujun <linyujun809@huawei.com> Reported-by: He Ying <heying24@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar
pushed a commit
that referenced
this pull request
Apr 29, 2024
commit 1e56086 upstream. A last minute revert in 6.7-final introduced a potential deadlock when enabling ASPM during probe of Qualcomm PCIe controllers as reported by lockdep: ============================================ WARNING: possible recursive locking detected 6.7.0 #40 Not tainted -------------------------------------------- kworker/u16:5/90 is trying to acquire lock: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc but task is already holding lock: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(pci_bus_sem); lock(pci_bus_sem); *** DEADLOCK *** Call trace: print_deadlock_bug+0x25c/0x348 __lock_acquire+0x10a4/0x2064 lock_acquire+0x1e8/0x318 down_read+0x60/0x184 pcie_aspm_pm_state_change+0x58/0xdc pci_set_full_power_state+0xa8/0x114 pci_set_power_state+0xc4/0x120 qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom] pci_walk_bus+0x64/0xbc qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom] The deadlock can easily be reproduced on machines like the Lenovo ThinkPad X13s by adding a delay to increase the race window during asynchronous probe where another thread can take a write lock. Add a new pci_set_power_state_locked() and associated helper functions that can be called with the PCI bus semaphore held to avoid taking the read lock twice. Link: https://lore.kernel.org/r/ZZu0qx2cmn7IwTyQ@hovoldconsulting.com Link: https://lore.kernel.org/r/20240130100243.11011-1-johan+linaro@kernel.org Fixes: f93e71a ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: <stable@vger.kernel.org> # 6.7 [bhelgaas: backported to v6.6.y, which contains 8cc22ba ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()""), a backport of f93e71a. This omits the drivers/pci/controller/dwc/pcie-qcom.c hunk that updates qcom_pcie_enable_aspm(), which was added by 9f4f3df ("PCI: qcom: Enable ASPM for platforms supporting 1.9.0 ops"), which is not present in v6.6.28.] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dkodihal
pushed a commit
to NVIDIA/linux
that referenced
this pull request
May 7, 2024
Introduce an AMD accumlated power reporting mechanism for the Family 15h, Model 60h processor that can be used to calculate the average power consumed by a processor during a measurement interval. The feature support is indicated by CPUID Fn8000_0007_EDX[12]. This feature will be implemented both in hwmon and perf. The current design provides one event to report per package/processor power consumption by counting each compute unit power value. Here the gory details of how the computation is done: * Tsample: compute unit power accumulator sample period * Tref: the PTSC counter period (PTSC: performance timestamp counter) * N: the ratio of compute unit power accumulator sample period to the PTSC period * Jmax: max compute unit accumulated power which is indicated by MSR_C001007b[MaxCpuSwPwrAcc] * Jx/Jy: compute unit accumulated power which is indicated by MSR_C001007a[CpuSwPwrAcc] * Tx/Ty: the value of performance timestamp counter which is indicated by CU_PTSC MSR_C0010280[PTSC] * PwrCPUave: CPU average power i. Determine the ratio of Tsample to Tref by executing CPUID Fn8000_0007. N = value of CPUID Fn8000_0007_ECX[CpuPwrSampleTimeRatio[15:0]]. ii. Read the full range of the cumulative energy value from the new MSR MaxCpuSwPwrAcc. Jmax = value returned. iii. At time x, software reads CpuSwPwrAcc and samples the PTSC. Jx = value read from CpuSwPwrAcc and Tx = value read from PTSC. iv. At time y, software reads CpuSwPwrAcc and samples the PTSC. Jy = value read from CpuSwPwrAcc and Ty = value read from PTSC. v. Calculate the average power consumption for a compute unit over time period (y-x). Unit of result is uWatt: if (Jy < Jx) // Rollover has occurred Jdelta = (Jy + Jmax) - Jx else Jdelta = Jy - Jx PwrCPUave = N * Jdelta * 1000 / (Ty - Tx) Simple example: root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' make -j4 CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CHK include/generated/timeconst.h CHK include/generated/bounds.h CHK include/generated/asm-offsets.h CALL scripts/checksyscalls.sh CHK include/generated/compile.h SKIPPED include/generated/compile.h Building modules, stage 2. Kernel: arch/x86/boot/bzImage is ready (openbmc#40) MODPOST 4225 modules Performance counter stats for 'system wide': 183.44 mWatts power/power-pkg/ 341.837270111 seconds time elapsed root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' sleep 10 Performance counter stats for 'system wide': 0.18 mWatts power/power-pkg/ 10.012551815 seconds time elapsed Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Ingo Molnar <mingo@kernel.org> Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: jacob.w.shin@gmail.com Link: http://lkml.kernel.org/r/1457502306-2559-1-git-send-email-ray.huang@amd.com [ Fixed the modular build. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Norman James nkskjames@gmail.com