-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4.11 does not load rootfs from mmc #12
Comments
Hmm... I remember I tried once to do such, though I don't remember results. When I have time to check I will inform you here. Not much time for this anymore, sorry. |
Yes, I believe you explained me before I need an initramfs, load some drivers, mount the real rootfs and then switch_root. |
It's my lucky day. I have yocto building your kernel and create .deb packages. It then installs these to the initramfs. I was missing - of course - the sdhci modules. The module packages installed into the rootfs now are: That changes everything:
So all I need to do now is fix the init script. After I clean up the mess a bit I will push the recipies to my repository. Closing this issue. |
I fixed my initramfs init script and my rootfs boots nicely (well with a small hickup as it expects nfsd, but the kernel module is not enabled in defconfig). Wow |
This patch closes a long standing race in configfs between the creation of a new symlink in create_link(), while the symlink target's config_item is being concurrently removed via configfs_rmdir(). This can happen because the symlink target's reference is obtained by config_item_get() in create_link() before the CONFIGFS_USET_DROPPING bit set by configfs_detach_prep() during configfs_rmdir() shutdown is actually checked.. This originally manifested itself on ppc64 on v4.8.y under heavy load using ibmvscsi target ports with Novalink API: [ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added [ 7879.893760] ------------[ cut here ]------------ [ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs] [ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G O 4.8.17-customv2.22 andy-shev#12 [ 7879.893812] task: c00000018a0d3400 task.stack: c0000001f3b40000 [ 7879.893813] NIP: d000000002c664ec LR: d000000002c60980 CTR: c000000000b70870 [ 7879.893814] REGS: c0000001f3b43810 TRAP: 0700 Tainted: G O (4.8.17-customv2.22) [ 7879.893815] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28222242 XER: 00000000 [ 7879.893820] CFAR: d000000002c664bc SOFTE: 1 GPR00: d000000002c60980 c0000001f3b43a90 d000000002c70908 c0000000fbc06820 GPR04: c0000001ef1bd900 0000000000000004 0000000000000001 0000000000000000 GPR08: 0000000000000000 0000000000000001 d000000002c69560 d000000002c66d80 GPR12: c000000000b70870 c00000000e798700 c0000001f3b43ca0 c0000001d4949d40 GPR16: c00000014637e1c0 0000000000000000 0000000000000000 c0000000f2392940 GPR20: c0000001f3b43b98 0000000000000041 0000000000600000 0000000000000000 GPR24: fffffffffffff000 0000000000000000 d000000002c60be0 c0000001f1dac490 GPR28: 0000000000000004 0000000000000000 c0000001ef1bd900 c0000000f2392940 [ 7879.893839] NIP [d000000002c664ec] config_item_get+0x7c/0x90 [configfs] [ 7879.893841] LR [d000000002c60980] check_perm+0x80/0x2e0 [configfs] [ 7879.893842] Call Trace: [ 7879.893844] [c0000001f3b43ac0] [d000000002c60980] check_perm+0x80/0x2e0 [configfs] [ 7879.893847] [c0000001f3b43b10] [c000000000329770] do_dentry_open+0x2c0/0x460 [ 7879.893849] [c0000001f3b43b70] [c000000000344480] path_openat+0x210/0x1490 [ 7879.893851] [c0000001f3b43c80] [c00000000034708c] do_filp_open+0xfc/0x170 [ 7879.893853] [c0000001f3b43db0] [c00000000032b5bc] do_sys_open+0x1cc/0x390 [ 7879.893856] [c0000001f3b43e30] [c000000000009584] system_call+0x38/0xec [ 7879.893856] Instruction dump: [ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d220000 e94981e0 892a0000 [ 7879.893861] 2f890000 409effe0 39200001 992a0000 <0fe00000> 4bffffd0 60000000 60000000 [ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]--- To close this race, go ahead and obtain the symlink's target config_item reference only after the existing CONFIGFS_USET_DROPPING check succeeds. This way, if configfs_rmdir() wins create_link() will return -ENONET, and if create_link() wins configfs_rmdir() will return -EBUSY. Reported-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org
The following lockdep splat has been noticed during LTP testing ====================================================== WARNING: possible circular locking dependency detected 4.13.0-rc3-next-20170807 #12 Not tainted ------------------------------------------------------ a.out/4771 is trying to acquire lock: (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140 but task is already holding lock: (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&mm->mmap_sem){++++++}: lock_acquire+0xc9/0x230 __might_fault+0x70/0xa0 _copy_to_user+0x23/0x70 filldir+0xa7/0x110 xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs] xfs_readdir+0x1fa/0x2c0 [xfs] xfs_file_readdir+0x30/0x40 [xfs] iterate_dir+0x17a/0x1a0 SyS_getdents+0xb0/0x160 entry_SYSCALL_64_fastpath+0x1f/0xbe -> #2 (&type->i_mutex_dir_key#3){++++++}: lock_acquire+0xc9/0x230 down_read+0x51/0xb0 lookup_slow+0xde/0x210 walk_component+0x160/0x250 link_path_walk+0x1a6/0x610 path_openat+0xe4/0xd50 do_filp_open+0x91/0x100 file_open_name+0xf5/0x130 filp_open+0x33/0x50 kernel_read_file_from_path+0x39/0x80 _request_firmware+0x39f/0x880 request_firmware_direct+0x37/0x50 request_microcode_fw+0x64/0xe0 reload_store+0xf7/0x180 dev_attr_store+0x18/0x30 sysfs_kf_write+0x44/0x60 kernfs_fop_write+0x113/0x1a0 __vfs_write+0x37/0x170 vfs_write+0xc7/0x1c0 SyS_write+0x58/0xc0 do_syscall_64+0x6c/0x1f0 return_from_SYSCALL_64+0x0/0x7a -> #1 (microcode_mutex){+.+.+.}: lock_acquire+0xc9/0x230 __mutex_lock+0x88/0x960 mutex_lock_nested+0x1b/0x20 microcode_init+0xbb/0x208 do_one_initcall+0x51/0x1a9 kernel_init_freeable+0x208/0x2a7 kernel_init+0xe/0x104 ret_from_fork+0x2a/0x40 -> #0 (cpu_hotplug_lock.rw_sem){++++++}: __lock_acquire+0x153c/0x1550 lock_acquire+0xc9/0x230 cpus_read_lock+0x4b/0x90 drain_all_stock.part.35+0x18/0x140 try_charge+0x3ab/0x6e0 mem_cgroup_try_charge+0x7f/0x2c0 shmem_getpage_gfp+0x25f/0x1050 shmem_fault+0x96/0x200 __do_fault+0x1e/0xa0 __handle_mm_fault+0x9c3/0xe00 handle_mm_fault+0x16e/0x380 __do_page_fault+0x24a/0x530 do_page_fault+0x30/0x80 page_fault+0x28/0x30 other info that might help us debug this: Chain exists of: cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(&type->i_mutex_dir_key#3); lock(&mm->mmap_sem); lock(cpu_hotplug_lock.rw_sem); *** DEADLOCK *** 2 locks held by a.out/4771: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530 #1: (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0 The problem is very similar to the one fixed by commit a459eeb ("mm, page_alloc: do not depend on cpu hotplug locks inside the allocator"). We are taking hotplug locks while we can be sitting on top of basically arbitrary locks. This just calls for problems. We can get rid of {get,put}_online_cpus, fortunately. We do not have to be worried about races with memory hotplug because drain_local_stock, which is called from both the WQ draining and the memory hotplug contexts, is always operating on the local cpu stock with IRQs disabled. The only thing to be careful about is that the target memcg doesn't vanish while we are still in drain_all_stock so take a reference on it. Link: http://lkml.kernel.org/r/20170913090023.28322-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Artem Savkov <asavkov@redhat.com> Tested-by: Artem Savkov <asavkov@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
…ns[] When adding GP-1-28 port pin support it was forgotten to remove the CLKOUT pin from the list of pins that are not associated with a GPIO port in pinmux_pins[]. This results in a warning when reading the pinctrl files in sysfs as the CLKOUT pin is still added as a none GPIO pin. Fix this by removing the duplicated entry which is no longer needed. ~ # cat /sys/kernel/debug/pinctrl/e6060000.pin-controller/pinconf-pins [ 89.432081] ------------[ cut here ]------------ [ 89.436904] Pin 496 is not in bias info list [ 89.441252] WARNING: CPU: 1 PID: 456 at drivers/pinctrl/sh-pfc/core.c:408 sh_pfc_pin_to_bias_reg+0xb0/0xb8 [ 89.451002] CPU: 1 PID: 456 Comm: cat Not tainted 4.16.0-rc1-arm64-renesas-00048-gdfafc344a4f24dde andy-shev#12 [ 89.460394] Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT) [ 89.468910] pstate: 80000085 (Nzcv daIf -PAN -UAO) [ 89.473747] pc : sh_pfc_pin_to_bias_reg+0xb0/0xb8 [ 89.478495] lr : sh_pfc_pin_to_bias_reg+0xb0/0xb8 [ 89.483241] sp : ffff00000aff3ab0 [ 89.486587] x29: ffff00000aff3ab0 x28: ffff00000893c698 [ 89.491955] x27: ffff000008ad7d98 x26: 0000000000000000 [ 89.497323] x25: ffff8006fb3f5028 x24: ffff8006fb3f5018 [ 89.502690] x23: 0000000000000001 x22: 00000000000001f0 [ 89.508057] x21: ffff8006fb3f5018 x20: ffff000008bef000 [ 89.513423] x19: 0000000000000000 x18: ffffffffffffffff [ 89.518790] x17: 0000000000006c4a x16: ffff000008d67c98 [ 89.524157] x15: 0000000000000001 x14: ffff00000896ca98 [ 89.529524] x13: 00000000cce5f611 x12: ffff8006f8d3b5a8 [ 89.534891] x11: ffff00000981e000 x10: ffff000008befa08 [ 89.540258] x9 : ffff8006f9b987a0 x8 : ffff000008befa08 [ 89.545625] x7 : ffff000008137094 x6 : 0000000000000000 [ 89.550991] x5 : 0000000000000000 x4 : 0000000000000001 [ 89.556357] x3 : 0000000000000007 x2 : 0000000000000007 [ 89.561723] x1 : 1ff24f80f1818600 x0 : 0000000000000000 [ 89.567091] Call trace: [ 89.569561] sh_pfc_pin_to_bias_reg+0xb0/0xb8 [ 89.573960] r8a7795_pinmux_get_bias+0x30/0xc0 [ 89.578445] sh_pfc_pinconf_get+0x1e0/0x2d8 [ 89.582669] pin_config_get_for_pin+0x20/0x30 [ 89.587067] pinconf_generic_dump_one+0x180/0x1c8 [ 89.591815] pinconf_generic_dump_pins+0x84/0xd8 [ 89.596476] pinconf_pins_show+0xc8/0x130 [ 89.600528] seq_read+0xe4/0x510 [ 89.603789] full_proxy_read+0x60/0x90 [ 89.607576] __vfs_read+0x30/0x140 [ 89.611010] vfs_read+0x90/0x170 [ 89.614269] SyS_read+0x60/0xd8 [ 89.617443] __sys_trace_return+0x0/0x4 [ 89.621314] ---[ end trace 99c8d0d39c13e794 ]--- Fixes: 82d2de5 ("pinctrl: sh-pfc: r8a7795: Add GP-1-28 port pin support") Reviewed-and-tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Scheduler debug stats include newlines that display out of alignment when prefixed by timestamps. For example, the dmesg utility: % echo t > /proc/sysrq-trigger % dmesg ... [ 83.124251] runnable tasks: S task PID tree-key switches prio wait-time sum-exec sum-sleep ----------------------------------------------------------------------------------------------------------- At the same time, some syslog utilities (like rsyslog by default) don't like the additional newlines control characters, saving lines like this to /var/log/messages: Mar 16 16:02:29 localhost kernel: #012runnable tasks:andy-shev#12 S task PID tree-key ... ^^^^ ^^^^ Clean these up by moving newline characters to their own SEQ_printf invocation. This leaves the /proc/sched_debug unchanged, but brings the entire output into alignment when prefixed: % echo t > /proc/sysrq-trigger % dmesg ... [ 62.410368] runnable tasks: [ 62.410368] S task PID tree-key switches prio wait-time sum-exec sum-sleep [ 62.410369] ----------------------------------------------------------------------------------------------------------- [ 62.410369] I kworker/u12:0 5 1932.215593 332 120 0.000000 3.621252 0.000000 0 0 / and no escaped control characters from rsyslog in /var/log/messages: Mar 16 16:15:06 localhost kernel: runnable tasks: Mar 16 16:15:06 localhost kernel: S task PID tree-key ... Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1521484555-8620-3-git-send-email-joe.lawrence@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
The seg6_build_state() function is called with RCU read lock held, so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead. [ 92.770271] ============================= [ 92.770628] WARNING: suspicious RCU usage [ 92.770921] 4.16.0-rc4+ andy-shev#12 Not tainted [ 92.771277] ----------------------------- [ 92.771585] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section! [ 92.772279] [ 92.772279] other info that might help us debug this: [ 92.772279] [ 92.773067] [ 92.773067] rcu_scheduler_active = 2, debug_locks = 1 [ 92.773514] 2 locks held by ip/2413: [ 92.773765] #0: (rtnl_mutex){+.+.}, at: [<00000000e5461720>] rtnetlink_rcv_msg+0x441/0x4d0 [ 92.774377] andy-shev#1: (rcu_read_lock){....}, at: [<00000000df4f161e>] lwtunnel_build_state+0x59/0x210 [ 92.775065] [ 92.775065] stack backtrace: [ 92.775371] CPU: 0 PID: 2413 Comm: ip Not tainted 4.16.0-rc4+ andy-shev#12 [ 92.775791] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014 [ 92.776608] Call Trace: [ 92.776852] dump_stack+0x7d/0xbc [ 92.777130] __schedule+0x133/0xf00 [ 92.777393] ? unwind_get_return_address_ptr+0x50/0x50 [ 92.777783] ? __sched_text_start+0x8/0x8 [ 92.778073] ? rcu_is_watching+0x19/0x30 [ 92.778383] ? kernel_text_address+0x49/0x60 [ 92.778800] ? __kernel_text_address+0x9/0x30 [ 92.779241] ? unwind_get_return_address+0x29/0x40 [ 92.779727] ? pcpu_alloc+0x102/0x8f0 [ 92.780101] _cond_resched+0x23/0x50 [ 92.780459] __mutex_lock+0xbd/0xad0 [ 92.780818] ? pcpu_alloc+0x102/0x8f0 [ 92.781194] ? seg6_build_state+0x11d/0x240 [ 92.781611] ? save_stack+0x9b/0xb0 [ 92.781965] ? __ww_mutex_wakeup_for_backoff+0xf0/0xf0 [ 92.782480] ? seg6_build_state+0x11d/0x240 [ 92.782925] ? lwtunnel_build_state+0x1bd/0x210 [ 92.783393] ? ip6_route_info_create+0x687/0x1640 [ 92.783846] ? ip6_route_add+0x74/0x110 [ 92.784236] ? inet6_rtm_newroute+0x8a/0xd0 Fixes: 6c8702c ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The spinlock in the raw3270_view structure is used by con3270, tty3270 and fs3270 in different ways. For con3270 the lock can be acquired in irq context, for tty3270 and fs3270 the highest context is bh. Lockdep sees the view->lock as a single class and if the 3270 driver is used for the console the following message is generated: WARNING: inconsistent lock state 5.1.0-rc3-05157-g5c168033979d andy-shev#12 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. swapper/0/1 [HC0[0]:SC1[1]:HE1:SE0] takes: (____ptrval____) (&(&view->lock)->rlock){?.-.}, at: tty3270_update+0x7c/0x330 Introduce a lockdep subclass for the view lock to distinguish bh from irq locks. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
By calling maps__insert() we assume to get 2 references on the map, which we relese within maps__remove call. However if there's already same map name, we currently don't bump the reference and can crash, like: Program received signal SIGABRT, Aborted. 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6 andy-shev#1 0x00007ffff75d0895 in abort () from /lib64/libc.so.6 andy-shev#2 0x00007ffff75d0769 in __assert_fail_base.cold () from /lib64/libc.so.6 andy-shev#3 0x00007ffff75de596 in __assert_fail () from /lib64/libc.so.6 andy-shev#4 0x00000000004fc006 in refcount_sub_and_test (i=1, r=0x1224e88) at tools/include/linux/refcount.h:131 andy-shev#5 refcount_dec_and_test (r=0x1224e88) at tools/include/linux/refcount.h:148 andy-shev#6 map__put (map=0x1224df0) at util/map.c:299 andy-shev#7 0x00000000004fdb95 in __maps__remove (map=0x1224df0, maps=0xb17d80) at util/map.c:953 andy-shev#8 maps__remove (maps=0xb17d80, map=0x1224df0) at util/map.c:959 andy-shev#9 0x00000000004f7d8a in map_groups__remove (map=<optimized out>, mg=<optimized out>) at util/map_groups.h:65 andy-shev#10 machine__process_ksymbol_unregister (sample=<optimized out>, event=0x7ffff7279670, machine=<optimized out>) at util/machine.c:728 andy-shev#11 machine__process_ksymbol (machine=<optimized out>, event=0x7ffff7279670, sample=<optimized out>) at util/machine.c:741 andy-shev#12 0x00000000004fffbb in perf_session__deliver_event (session=0xb11390, event=0x7ffff7279670, tool=0x7fffffffc7b0, file_offset=13936) at util/session.c:1362 andy-shev#13 0x00000000005039bb in do_flush (show_progress=false, oe=0xb17e80) at util/ordered-events.c:243 andy-shev#14 __ordered_events__flush (oe=0xb17e80, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:322 andy-shev#15 0x00000000005005e4 in perf_session__process_user_event (session=session@entry=0xb11390, event=event@entry=0x7ffff72a4af8, ... Add the map to the list and getting the reference event if we find the map with same name. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Fixes: 1e62856 ("perf symbols: Fix slowness due to -ffunction-section") Link: http://lkml.kernel.org/r/20190416160127.30203-10-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This fix is for a failure that occurred in the DWARF unwind perf test. Stack unwinders may probe memory when looking for frames. Memory sanitizer will poison and track uninitialized memory on the stack, and on the heap if the value is copied to the heap. This can lead to false memory sanitizer failures for the use of an uninitialized value. Avoid this problem by removing the poison on the copied stack. The full msan failure with track origins looks like: ==2168==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8 andy-shev#1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 andy-shev#2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 andy-shev#3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 andy-shev#4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 andy-shev#5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 andy-shev#6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 andy-shev#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 andy-shev#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 andy-shev#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 andy-shev#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 andy-shev#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) andy-shev#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 andy-shev#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 andy-shev#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 andy-shev#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 andy-shev#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 andy-shev#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 andy-shev#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 andy-shev#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 andy-shev#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 andy-shev#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 andy-shev#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 andy-shev#23 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22 andy-shev#1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13 andy-shev#2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 andy-shev#3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 andy-shev#4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 andy-shev#5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 andy-shev#6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 andy-shev#7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 andy-shev#8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 andy-shev#9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 andy-shev#10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 andy-shev#11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 andy-shev#12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) andy-shev#13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 andy-shev#14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 andy-shev#15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 andy-shev#16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 andy-shev#17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 andy-shev#18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 andy-shev#19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 andy-shev#20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 andy-shev#21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 andy-shev#22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 andy-shev#23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 andy-shev#24 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9 andy-shev#1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 andy-shev#2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 andy-shev#3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 andy-shev#4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 andy-shev#5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 andy-shev#6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 andy-shev#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 andy-shev#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 andy-shev#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 andy-shev#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 andy-shev#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) andy-shev#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 andy-shev#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 andy-shev#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 andy-shev#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 andy-shev#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 andy-shev#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 andy-shev#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 andy-shev#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 andy-shev#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 andy-shev#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 andy-shev#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 andy-shev#23 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10 andy-shev#1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13 andy-shev#2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18 andy-shev#3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 andy-shev#4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 andy-shev#5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 andy-shev#6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 andy-shev#7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 andy-shev#8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 andy-shev#9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 andy-shev#10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 andy-shev#11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 andy-shev#12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 andy-shev#13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) andy-shev#14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 andy-shev#15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 andy-shev#16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 andy-shev#17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 andy-shev#18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 andy-shev#19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 andy-shev#20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 andy-shev#21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 andy-shev#22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 andy-shev#23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 andy-shev#24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 andy-shev#25 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3 andy-shev#1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2 andy-shev#2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9 andy-shev#3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6 andy-shev#4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 andy-shev#5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) andy-shev#6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 andy-shev#7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 andy-shev#8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 andy-shev#9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 andy-shev#10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 andy-shev#11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 andy-shev#12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 andy-shev#13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 andy-shev#14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 andy-shev#15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 andy-shev#16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 andy-shev#17 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events' #0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445 SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: clang-built-linux@googlegroups.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandeep Dasgupta <sdasgup@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201113182053.754625-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
crq->msgs could be NULL if the previous reset did not complete after freeing crq->msgs. Check for NULL before dereferencing them. Snippet of call trace: ... ibmvnic 30000003 env3 (unregistering): Releasing sub-CRQ ibmvnic 30000003 env3 (unregistering): Releasing CRQ BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc0000000000c1a30 Oops: Kernel access of bad area, sig: 11 [andy-shev#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: ibmvnic(E-) rpadlpar_io rpaphp xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables xsk_diag tcp_diag udp_diag tun raw_diag inet_diag unix_diag bridge af_packet_diag netlink_diag stp llc rfkill sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ibmvnic] CPU: 20 PID: 8426 Comm: kworker/20:0 Tainted: G E 5.10.0-rc1+ andy-shev#12 Workqueue: events __ibmvnic_reset [ibmvnic] NIP: c0000000000c1a30 LR: c008000001b00c18 CTR: 0000000000000400 REGS: c00000000d05b7a0 TRAP: 0380 Tainted: G E (5.10.0-rc1+) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44002480 XER: 20040000 CFAR: c0000000000c19ec IRQMASK: 0 GPR00: 0000000000000400 c00000000d05ba30 c008000001b17c00 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 00000000000001e2 GPR08: 000000000001f400 ffffffffffffd950 0000000000000000 c008000001b0b280 GPR12: c0000000000c19c8 c00000001ec72e00 c00000000019a778 c00000002647b440 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000006 0000000000000001 0000000000000003 0000000000000002 GPR24: 0000000000001000 c008000001b0d570 0000000000000005 c00000007ab5d550 GPR28: c00000007ab5c000 c000000032fcf848 c00000007ab5cc00 c000000032fcf800 NIP [c0000000000c1a30] memset+0x68/0x104 LR [c008000001b00c18] ibmvnic_reset_crq+0x70/0x110 [ibmvnic] Call Trace: [c00000000d05ba30] [0000000000000800] 0x800 (unreliable) [c00000000d05bab0] [c008000001b0a930] do_reset.isra.40+0x224/0x634 [ibmvnic] [c00000000d05bb80] [c008000001b08574] __ibmvnic_reset+0x17c/0x3c0 [ibmvnic] [c00000000d05bc50] [c00000000018d9ac] process_one_work+0x2cc/0x800 [c00000000d05bd20] [c00000000018df58] worker_thread+0x78/0x520 [c00000000d05bdb0] [c00000000019a934] kthread+0x1c4/0x1d0 [c00000000d05be20] [c00000000000d5d0] ret_from_kernel_thread+0x5c/0x6c Fixes: 032c5e8 ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prior to sanitizing the GGTT, the only operations allowed in intel_display_init_nogem() are those to reserve the preallocated (and active) regions in the GGTT leftover from the BIOS. Trying to allocate a GGTT vma (such as intel_pin_and_fence_fb_obj during the initial modeset) may then conflict with other preallocated regions that have not yet been protected. Move the initial modesetting from the end of init_nogem to the beginning of init so that any vma pinning (either framebuffers or DSB, for example), is after the GGTT is ready to handle it. This will prevent the DSB object from being destroyed too early: [ 53.449241] BUG: KASAN: use-after-free in i915_init_ggtt+0x324/0x9e0 [i915] [ 53.449309] Read of size 8 at addr ffff88811b1e8070 by task systemd-udevd/345 [ 53.449399] CPU: 1 PID: 345 Comm: systemd-udevd Tainted: G W 5.10.0-rc5+ andy-shev#12 [ 53.449409] Call Trace: [ 53.449418] dump_stack+0x9a/0xcc [ 53.449558] ? i915_init_ggtt+0x324/0x9e0 [i915] [ 53.449565] print_address_description.constprop.0+0x3e/0x60 [ 53.449577] ? _raw_spin_lock_irqsave+0x4e/0x50 [ 53.449718] ? i915_init_ggtt+0x324/0x9e0 [i915] [ 53.449849] ? i915_init_ggtt+0x324/0x9e0 [i915] [ 53.449857] kasan_report.cold+0x1f/0x37 [ 53.449993] ? i915_init_ggtt+0x324/0x9e0 [i915] [ 53.450130] i915_init_ggtt+0x324/0x9e0 [i915] [ 53.450273] ? i915_ggtt_suspend+0x1f0/0x1f0 [i915] [ 53.450281] ? static_obj+0x69/0x80 [ 53.450289] ? lockdep_init_map_waits+0xa9/0x310 [ 53.450431] ? intel_wopcm_init+0x96/0x3d0 [i915] [ 53.450581] ? i915_gem_init+0x75/0x2d0 [i915] [ 53.450720] i915_gem_init+0x75/0x2d0 [i915] [ 53.450852] i915_driver_probe+0x8c2/0x1210 [i915] [ 53.450993] ? i915_pm_prepare+0x630/0x630 [i915] [ 53.451006] ? check_chain_key+0x1e7/0x2e0 [ 53.451025] ? __pm_runtime_resume+0x58/0xb0 [ 53.451157] i915_pci_probe+0xa6/0x2b0 [i915] [ 53.451285] ? i915_pci_remove+0x40/0x40 [i915] [ 53.451295] ? lockdep_hardirqs_on_prepare+0x124/0x230 [ 53.451302] ? _raw_spin_unlock_irqrestore+0x42/0x50 [ 53.451309] ? lockdep_hardirqs_on+0xbf/0x130 [ 53.451315] ? preempt_count_sub+0xf/0xb0 [ 53.451321] ? _raw_spin_unlock_irqrestore+0x2f/0x50 [ 53.451335] pci_device_probe+0xf9/0x190 [ 53.451350] really_probe+0x17f/0x5b0 [ 53.451365] driver_probe_device+0x13a/0x1c0 [ 53.451376] device_driver_attach+0x82/0x90 [ 53.451386] ? device_driver_attach+0x90/0x90 [ 53.451391] __driver_attach+0xab/0x190 [ 53.451401] ? device_driver_attach+0x90/0x90 [ 53.451407] bus_for_each_dev+0xe4/0x140 [ 53.451414] ? subsys_dev_iter_exit+0x10/0x10 [ 53.451423] ? __list_add_valid+0x2b/0xa0 [ 53.451440] bus_add_driver+0x227/0x2e0 [ 53.451454] driver_register+0xd3/0x150 [ 53.451585] i915_init+0x92/0xac [i915] [ 53.451592] ? 0xffffffffa0a20000 [ 53.451598] do_one_initcall+0xb6/0x3b0 [ 53.451606] ? trace_event_raw_event_initcall_finish+0x150/0x150 [ 53.451614] ? __kasan_kmalloc.constprop.0+0xc2/0xd0 [ 53.451627] ? kmem_cache_alloc_trace+0x4a4/0x8e0 [ 53.451634] ? kasan_unpoison_shadow+0x33/0x40 [ 53.451649] do_init_module+0xf8/0x350 [ 53.451662] load_module+0x43de/0x47f0 [ 53.451716] ? module_frob_arch_sections+0x20/0x20 [ 53.451731] ? rw_verify_area+0x5f/0x130 [ 53.451780] ? __do_sys_finit_module+0x10d/0x1a0 [ 53.451785] __do_sys_finit_module+0x10d/0x1a0 [ 53.451792] ? __ia32_sys_init_module+0x40/0x40 [ 53.451800] ? seccomp_do_user_notification.isra.0+0x5c0/0x5c0 [ 53.451829] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 53.451835] ? mark_held_locks+0x24/0x90 [ 53.451856] do_syscall_64+0x33/0x80 [ 53.451863] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.451868] RIP: 0033:0x7fde09b4470d [ 53.451875] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 53 f7 0c 00 f7 d8 64 89 01 48 [ 53.451880] RSP: 002b:00007ffd6abc1718 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 53.451890] RAX: ffffffffffffffda RBX: 000056444e528150 RCX: 00007fde09b4470d [ 53.451895] RDX: 0000000000000000 RSI: 00007fde09a21ded RDI: 000000000000000f [ 53.451899] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000 [ 53.451904] R10: 000000000000000f R11: 0000000000000246 R12: 00007fde09a21ded [ 53.451909] R13: 0000000000000000 R14: 000056444e329200 R15: 000056444e528150 [ 53.451957] Allocated by task 345: [ 53.451995] kasan_save_stack+0x1b/0x40 [ 53.452001] __kasan_kmalloc.constprop.0+0xc2/0xd0 [ 53.452006] kmem_cache_alloc+0x1cd/0x8d0 [ 53.452146] i915_vma_instance+0x126/0xb70 [i915] [ 53.452304] i915_gem_object_ggtt_pin_ww+0x222/0x3f0 [i915] [ 53.452446] intel_dsb_prepare+0x14f/0x230 [i915] [ 53.452588] intel_atomic_commit+0x183/0x690 [i915] [ 53.452730] intel_initial_commit+0x2bc/0x2f0 [i915] [ 53.452871] intel_modeset_init_nogem+0xa02/0x2af0 [i915] [ 53.452995] i915_driver_probe+0x8af/0x1210 [i915] [ 53.453120] i915_pci_probe+0xa6/0x2b0 [i915] [ 53.453125] pci_device_probe+0xf9/0x190 [ 53.453131] really_probe+0x17f/0x5b0 [ 53.453136] driver_probe_device+0x13a/0x1c0 [ 53.453142] device_driver_attach+0x82/0x90 [ 53.453148] __driver_attach+0xab/0x190 [ 53.453153] bus_for_each_dev+0xe4/0x140 [ 53.453158] bus_add_driver+0x227/0x2e0 [ 53.453164] driver_register+0xd3/0x150 [ 53.453286] i915_init+0x92/0xac [i915] [ 53.453292] do_one_initcall+0xb6/0x3b0 [ 53.453297] do_init_module+0xf8/0x350 [ 53.453302] load_module+0x43de/0x47f0 [ 53.453307] __do_sys_finit_module+0x10d/0x1a0 [ 53.453312] do_syscall_64+0x33/0x80 [ 53.453318] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.453345] Freed by task 82: [ 53.453379] kasan_save_stack+0x1b/0x40 [ 53.453384] kasan_set_track+0x1c/0x30 [ 53.453389] kasan_set_free_info+0x1b/0x30 [ 53.453394] __kasan_slab_free+0x112/0x160 [ 53.453399] kmem_cache_free+0xb2/0x3f0 [ 53.453536] i915_gem_flush_free_objects+0x31a/0x3b0 [i915] [ 53.453542] process_one_work+0x519/0x9f0 [ 53.453547] worker_thread+0x75/0x5c0 [ 53.453552] kthread+0x1da/0x230 [ 53.453557] ret_from_fork+0x22/0x30 [ 53.453584] The buggy address belongs to the object at ffff88811b1e8040 which belongs to the cache i915_vma of size 968 [ 53.453692] The buggy address is located 48 bytes inside of 968-byte region [ffff88811b1e8040, ffff88811b1e8408) [ 53.453792] The buggy address belongs to the page: [ 53.453842] page:00000000b35f7048 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88811b1ef940 pfn:0x11b1e8 [ 53.453847] head:00000000b35f7048 order:3 compound_mapcount:0 compound_pincount:0 [ 53.453853] flags: 0x8000000000010200(slab|head) [ 53.453860] raw: 8000000000010200 ffff888115596248 ffff888115596248 ffff8881155b6340 [ 53.453866] raw: ffff88811b1ef940 0000000000170001 00000001ffffffff 0000000000000000 [ 53.453870] page dumped because: kasan: bad access detected [ 53.453895] Memory state around the buggy address: [ 53.453944] ffff88811b1e7f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 53.454011] ffff88811b1e7f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 53.454079] >ffff88811b1e8000: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb [ 53.454146] ^ [ 53.454211] ffff88811b1e8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.454279] ffff88811b1e8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.454347] ================================================================== [ 53.454414] Disabling lock debugging due to kernel taint [ 53.454434] general protection fault, probably for non-canonical address 0xdead0000000000d0: 0000 [andy-shev#1] PREEMPT SMP KASAN PTI [ 53.454446] CPU: 1 PID: 345 Comm: systemd-udevd Tainted: G B W 5.10.0-rc5+ andy-shev#12 [ 53.454592] RIP: 0010:i915_init_ggtt+0x26f/0x9e0 [i915] [ 53.454602] Code: 89 8d 48 ff ff ff 4c 8d 60 d0 49 39 c7 0f 84 37 02 00 00 4c 89 b5 40 ff ff ff 4d 8d bc 24 90 00 00 00 4c 89 ff e8 c1 97 f8 e0 <49> 83 bc 24 90 00 00 00 00 0f 84 0f 02 00 00 49 8d 7c 24 08 e8 a8 [ 53.454618] RSP: 0018:ffff88812247f430 EFLAGS: 00010286 [ 53.454625] RAX: 0000000000000000 RBX: ffff888136440000 RCX: ffffffffa03fb78f [ 53.454633] RDX: 0000000000000000 RSI: 0000000000000008 RDI: dead000000000160 [ 53.454641] RBP: ffff88812247f500 R08: ffffffff8113589f R09: 0000000000000000 [ 53.454648] R10: ffffffff83063843 R11: fffffbfff060c708 R12: dead0000000000d0 [ 53.454656] R13: ffff888136449ba0 R14: 0000000000002000 R15: dead000000000160 [ 53.454664] FS: 00007fde095c4880(0000) GS:ffff88840c880000(0000) knlGS:0000000000000000 [ 53.454672] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 53.454679] CR2: 00007fef132b4f28 CR3: 000000012245c002 CR4: 00000000003706e0 [ 53.454686] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 53.454693] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 53.454700] Call Trace: [ 53.454833] ? i915_ggtt_suspend+0x1f0/0x1f0 [i915] Reported-by: Matthew Auld <matthew.auld@intel.com> Fixes: afeda4f ("drm/i915/dsb: Pre allocate and late cleanup of cmd buffer") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Tested-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201125193032.29282-1-chris@chris-wilson.co.uk (cherry picked from commit b3bf99d) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[ Upstream commit c5c97ca ] The ubsan reported the following error. It was because sample's raw data missed u32 padding at the end. So it broke the alignment of the array after it. The raw data contains an u32 size prefix so the data size should have an u32 padding after 8-byte aligned data. 27: Sample parsing :util/synthetic-events.c:1539:4: runtime error: store to misaligned address 0x62100006b9bc for type '__u64' (aka 'unsigned long long'), which requires 8 byte alignment 0x62100006b9bc: note: pointer points here 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ #0 0x561532a9fc96 in perf_event__synthesize_sample util/synthetic-events.c:1539:13 andy-shev#1 0x5615327f4a4f in do_test tests/sample-parsing.c:284:8 andy-shev#2 0x5615327f3f50 in test__sample_parsing tests/sample-parsing.c:381:9 andy-shev#3 0x56153279d3a1 in run_test tests/builtin-test.c:424:9 andy-shev#4 0x56153279c836 in test_and_print tests/builtin-test.c:454:9 andy-shev#5 0x56153279b7eb in __cmd_test tests/builtin-test.c:675:4 andy-shev#6 0x56153279abf0 in cmd_test tests/builtin-test.c:821:9 andy-shev#7 0x56153264e796 in run_builtin perf.c:312:11 andy-shev#8 0x56153264cf03 in handle_internal_command perf.c:364:8 andy-shev#9 0x56153264e47d in run_argv perf.c:408:2 andy-shev#10 0x56153264c9a9 in main perf.c:538:3 andy-shev#11 0x7f137ab6fbbc in __libc_start_main (/lib64/libc.so.6+0x38bbc) andy-shev#12 0x561532596828 in _start ... SUMMARY: UndefinedBehaviorSanitizer: misaligned-pointer-use util/synthetic-events.c:1539:4 in Fixes: 045f8cd ("perf tests: Add a sample parsing test") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210214091638.519643-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a2bd458 ] lanai_dev_open() can fail. When it fail, lanai->base is unmapped and the pci device is disabled. The caller, lanai_init_one(), then tries to run atm_dev_deregister(). This will subsequently call lanai_dev_close() and use the already released MMIO area. To fix this issue, set the lanai->base to NULL if open fail, and test the flag in lanai_dev_close(). [ 8.324153] lanai: lanai_start() failed, err=19 [ 8.324819] lanai(itf 0): shutting down interface [ 8.325211] BUG: unable to handle page fault for address: ffffc90000180024 [ 8.325781] #PF: supervisor write access in kernel mode [ 8.326215] #PF: error_code(0x0002) - not-present page [ 8.326641] PGD 100000067 P4D 100000067 PUD 100139067 PMD 10013a067 PTE 0 [ 8.327206] Oops: 0002 [andy-shev#1] SMP KASAN NOPTI [ 8.327557] CPU: 0 PID: 95 Comm: modprobe Not tainted 5.11.0-rc7-00090-gdcc0b49040c7 andy-shev#12 [ 8.328229] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-4 [ 8.329145] RIP: 0010:lanai_dev_close+0x4f/0xe5 [lanai] [ 8.329587] Code: 00 48 c7 c7 00 d3 01 c0 e8 49 4e 0a c2 48 8d bd 08 02 00 00 e8 6e 52 14 c1 48 80 [ 8.330917] RSP: 0018:ffff8881029ef680 EFLAGS: 00010246 [ 8.331196] RAX: 000000000003fffe RBX: ffff888102fb4800 RCX: ffffffffc001a98a [ 8.331572] RDX: ffffc90000180000 RSI: 0000000000000246 RDI: ffff888102fb4000 [ 8.331948] RBP: ffff888102fb4000 R08: ffffffff8115da8a R09: ffffed102053deaa [ 8.332326] R10: 0000000000000003 R11: ffffed102053dea9 R12: ffff888102fb48a4 [ 8.332701] R13: ffffffffc00123c0 R14: ffff888102fb4b90 R15: ffff888102fb4b88 [ 8.333077] FS: 00007f08eb9056a0(0000) GS:ffff88815b400000(0000) knlGS:0000000000000000 [ 8.333502] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.333806] CR2: ffffc90000180024 CR3: 0000000102a28000 CR4: 00000000000006f0 [ 8.334182] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8.334557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8.334932] Call Trace: [ 8.335066] atm_dev_deregister+0x161/0x1a0 [atm] [ 8.335324] lanai_init_one.cold+0x20c/0x96d [lanai] [ 8.335594] ? lanai_send+0x2a0/0x2a0 [lanai] [ 8.335831] local_pci_probe+0x6f/0xb0 [ 8.336039] pci_device_probe+0x171/0x240 [ 8.336255] ? pci_device_remove+0xe0/0xe0 [ 8.336475] ? kernfs_create_link+0xb6/0x110 [ 8.336704] ? sysfs_do_create_link_sd.isra.0+0x76/0xe0 [ 8.336983] really_probe+0x161/0x420 [ 8.337181] driver_probe_device+0x6d/0xd0 [ 8.337401] device_driver_attach+0x82/0x90 [ 8.337626] ? device_driver_attach+0x90/0x90 [ 8.337859] __driver_attach+0x60/0x100 [ 8.338065] ? device_driver_attach+0x90/0x90 [ 8.338298] bus_for_each_dev+0xe1/0x140 [ 8.338511] ? subsys_dev_iter_exit+0x10/0x10 [ 8.338745] ? klist_node_init+0x61/0x80 [ 8.338956] bus_add_driver+0x254/0x2a0 [ 8.339164] driver_register+0xd3/0x150 [ 8.339370] ? 0xffffffffc0028000 [ 8.339550] do_one_initcall+0x84/0x250 [ 8.339755] ? trace_event_raw_event_initcall_finish+0x150/0x150 [ 8.340076] ? free_vmap_area_noflush+0x1a5/0x5c0 [ 8.340329] ? unpoison_range+0xf/0x30 [ 8.340532] ? ____kasan_kmalloc.constprop.0+0x84/0xa0 [ 8.340806] ? unpoison_range+0xf/0x30 [ 8.341014] ? unpoison_range+0xf/0x30 [ 8.341217] do_init_module+0xf8/0x350 [ 8.341419] load_module+0x3fe6/0x4340 [ 8.341621] ? vm_unmap_ram+0x1d0/0x1d0 [ 8.341826] ? ____kasan_kmalloc.constprop.0+0x84/0xa0 [ 8.342101] ? module_frob_arch_sections+0x20/0x20 [ 8.342358] ? __do_sys_finit_module+0x108/0x170 [ 8.342604] __do_sys_finit_module+0x108/0x170 [ 8.342841] ? __ia32_sys_init_module+0x40/0x40 [ 8.343083] ? file_open_root+0x200/0x200 [ 8.343298] ? do_sys_open+0x85/0xe0 [ 8.343491] ? filp_open+0x50/0x50 [ 8.343675] ? exit_to_user_mode_prepare+0xfc/0x130 [ 8.343935] do_syscall_64+0x33/0x40 [ 8.344132] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 8.344401] RIP: 0033:0x7f08eb887cf7 [ 8.344594] Code: 48 89 57 30 48 8b 04 24 48 89 47 38 e9 1d a0 02 00 48 89 f8 48 89 f7 48 89 d6 41 [ 8.345565] RSP: 002b:00007ffcd5c98ad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 8.345962] RAX: ffffffffffffffda RBX: 00000000008fea70 RCX: 00007f08eb887cf7 [ 8.346336] RDX: 0000000000000000 RSI: 00000000008fd9e0 RDI: 0000000000000003 [ 8.346711] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000001 [ 8.347085] R10: 00007f08eb8eb300 R11: 0000000000000246 R12: 00000000008fd9e0 [ 8.347460] R13: 0000000000000000 R14: 00000000008fddd0 R15: 0000000000000001 [ 8.347836] Modules linked in: lanai(+) atm [ 8.348065] CR2: ffffc90000180024 [ 8.348244] ---[ end trace 7fdc1c668f2003e5 ]--- [ 8.348490] RIP: 0010:lanai_dev_close+0x4f/0xe5 [lanai] [ 8.348772] Code: 00 48 c7 c7 00 d3 01 c0 e8 49 4e 0a c2 48 8d bd 08 02 00 00 e8 6e 52 14 c1 48 80 [ 8.349745] RSP: 0018:ffff8881029ef680 EFLAGS: 00010246 [ 8.350022] RAX: 000000000003fffe RBX: ffff888102fb4800 RCX: ffffffffc001a98a [ 8.350397] RDX: ffffc90000180000 RSI: 0000000000000246 RDI: ffff888102fb4000 [ 8.350772] RBP: ffff888102fb4000 R08: ffffffff8115da8a R09: ffffed102053deaa [ 8.351151] R10: 0000000000000003 R11: ffffed102053dea9 R12: ffff888102fb48a4 [ 8.351525] R13: ffffffffc00123c0 R14: ffff888102fb4b90 R15: ffff888102fb4b88 [ 8.351918] FS: 00007f08eb9056a0(0000) GS:ffff88815b400000(0000) knlGS:0000000000000000 [ 8.352343] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.352647] CR2: ffffc90000180024 CR3: 0000000102a28000 CR4: 00000000000006f0 [ 8.353022] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8.353397] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8.353958] modprobe (95) used greatest stack depth: 26216 bytes left Signed-off-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 9c3a16f upstream. Crypto engine (CAAM) on LS1046A platform is configured HW-coherent, mark accordingly the DT node. As reported by Greg and Sascha, and explained by Robin, lack of "dma-coherent" property for an IP that is configured HW-coherent can lead to problems, e.g. on v5.11: > kernel BUG at drivers/crypto/caam/jr.c:247! > Internal error: Oops - BUG: 0 [andy-shev#1] PREEMPT SMP > Modules linked in: > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-20210225-3-00039-g434215968816-dirty andy-shev#12 > Hardware name: TQ TQMLS1046A SoM on Arkona AT1130 (C300) board (DT) > pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > pc : caam_jr_dequeue+0x98/0x57c > lr : caam_jr_dequeue+0x98/0x57c > sp : ffff800010003d50 > x29: ffff800010003d50 x28: ffff8000118d4000 > x27: ffff8000118d4328 x26: 00000000000001f0 > x25: ffff0008022be480 x24: ffff0008022c6410 > x23: 00000000000001f1 x22: ffff8000118d4329 > x21: 0000000000004d80 x20: 00000000000001f1 > x19: 0000000000000001 x18: 0000000000000020 > x17: 0000000000000000 x16: 0000000000000015 > x15: ffff800011690230 x14: 2e2e2e2e2e2e2e2e > x13: 2e2e2e2e2e2e2020 x12: 3030303030303030 > x11: ffff800011700a38 x10: 00000000fffff000 > x9 : ffff8000100ada30 x8 : ffff8000116a8a38 > x7 : 0000000000000001 x6 : 0000000000000000 > x5 : 0000000000000000 x4 : 0000000000000000 > x3 : 00000000ffffffff x2 : 0000000000000000 > x1 : 0000000000000000 x0 : 0000000000001800 > Call trace: > caam_jr_dequeue+0x98/0x57c > tasklet_action_common.constprop.0+0x164/0x18c > tasklet_action+0x44/0x54 > __do_softirq+0x160/0x454 > __irq_exit_rcu+0x164/0x16c > irq_exit+0x1c/0x30 > __handle_domain_irq+0xc0/0x13c > gic_handle_irq+0x5c/0xf0 > el1_irq+0xb4/0x180 > arch_cpu_idle+0x18/0x30 > default_idle_call+0x3c/0x1c0 > do_idle+0x23c/0x274 > cpu_startup_entry+0x34/0x70 > rest_init+0xdc/0xec > arch_call_rest_init+0x1c/0x28 > start_kernel+0x4ac/0x4e4 > Code: 91392021 912c2000 d377d8c6 97f24d96 (d4210000) Cc: <stable@vger.kernel.org> # v4.10+ Fixes: 8126d88 ("arm64: dts: add QorIQ LS1046A SoC support") Link: https://lore.kernel.org/linux-crypto/fe6faa24-d8f7-d18f-adfa-44fa0caa1598@arm.com Reported-by: Greg Ungerer <gerg@kernel.org> Reported-by: Sascha Hauer <s.hauer@pengutronix.de> Tested-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Acked-by: Greg Ungerer <gerg@kernel.org> Acked-by: Li Yang <leoyang.li@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The following deadlock is detected: truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write). PID: 14827 TASK: ffff881686a9af80 CPU: 20 COMMAND: "ora_p005_hrltd9" #0 __schedule at ffffffff818667cc #1 schedule at ffffffff81866de6 #2 inode_dio_wait at ffffffff812a2d04 #3 ocfs2_setattr at ffffffffc05f322e [ocfs2] #4 notify_change at ffffffff812a5a09 #5 do_truncate at ffffffff812808f5 #6 do_sys_ftruncate.constprop.18 at ffffffff81280cf2 #7 sys_ftruncate at ffffffff81280d8e #8 do_syscall_64 at ffffffff81003949 #9 entry_SYSCALL_64_after_hwframe at ffffffff81a001ad dio completion path is going to complete one direct IO (decrement inode->i_dio_count), but before that it hung at locking inode->i_rwsem: #0 __schedule+700 at ffffffff818667cc #1 schedule+54 at ffffffff81866de6 #2 rwsem_down_write_failed+536 at ffffffff8186aa28 #3 call_rwsem_down_write_failed+23 at ffffffff8185a1b7 #4 down_write+45 at ffffffff81869c9d #5 ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2] #6 ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2] #7 dio_complete+140 at ffffffff812c873c #8 dio_aio_complete_work+25 at ffffffff812c89f9 #9 process_one_work+361 at ffffffff810b1889 #10 worker_thread+77 at ffffffff810b233d #11 kthread+261 at ffffffff810b7fd5 #12 ret_from_fork+62 at ffffffff81a0035e Thus above forms ABBA deadlock. The same deadlock was mentioned in upstream commit 28f5a8a ("ocfs2: should wait dio before inode lock in ocfs2_setattr()"). It seems that that commit only removed the cluster lock (the victim of above dead lock) from the ABBA deadlock party. End-user visible effects: Process hang in truncate -> ocfs2_setattr path and other processes hang at ocfs2_dio_end_io_write path. This is to fix the deadlock itself. It removes inode_lock() call from dio completion path to remove the deadlock and add ip_alloc_sem lock in setattr path to synchronize the inode modifications. [wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested] Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 90bd070 upstream. The following deadlock is detected: truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write). PID: 14827 TASK: ffff881686a9af80 CPU: 20 COMMAND: "ora_p005_hrltd9" #0 __schedule at ffffffff818667cc andy-shev#1 schedule at ffffffff81866de6 andy-shev#2 inode_dio_wait at ffffffff812a2d04 andy-shev#3 ocfs2_setattr at ffffffffc05f322e [ocfs2] andy-shev#4 notify_change at ffffffff812a5a09 andy-shev#5 do_truncate at ffffffff812808f5 andy-shev#6 do_sys_ftruncate.constprop.18 at ffffffff81280cf2 andy-shev#7 sys_ftruncate at ffffffff81280d8e andy-shev#8 do_syscall_64 at ffffffff81003949 andy-shev#9 entry_SYSCALL_64_after_hwframe at ffffffff81a001ad dio completion path is going to complete one direct IO (decrement inode->i_dio_count), but before that it hung at locking inode->i_rwsem: #0 __schedule+700 at ffffffff818667cc andy-shev#1 schedule+54 at ffffffff81866de6 andy-shev#2 rwsem_down_write_failed+536 at ffffffff8186aa28 andy-shev#3 call_rwsem_down_write_failed+23 at ffffffff8185a1b7 andy-shev#4 down_write+45 at ffffffff81869c9d andy-shev#5 ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2] andy-shev#6 ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2] andy-shev#7 dio_complete+140 at ffffffff812c873c andy-shev#8 dio_aio_complete_work+25 at ffffffff812c89f9 andy-shev#9 process_one_work+361 at ffffffff810b1889 andy-shev#10 worker_thread+77 at ffffffff810b233d andy-shev#11 kthread+261 at ffffffff810b7fd5 andy-shev#12 ret_from_fork+62 at ffffffff81a0035e Thus above forms ABBA deadlock. The same deadlock was mentioned in upstream commit 28f5a8a ("ocfs2: should wait dio before inode lock in ocfs2_setattr()"). It seems that that commit only removed the cluster lock (the victim of above dead lock) from the ABBA deadlock party. End-user visible effects: Process hang in truncate -> ocfs2_setattr path and other processes hang at ocfs2_dio_end_io_write path. This is to fix the deadlock itself. It removes inode_lock() call from dio completion path to remove the deadlock and add ip_alloc_sem lock in setattr path to synchronize the inode modifications. [wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested] Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The helper, cxl_dpa_resource_start(), snapshots the dpa-address of an endpoint-decoder after acquiring the cxl_dpa_rwsem. However, it is sufficient to assert that cxl_dpa_rwsem is held rather than acquire it in the helper. Otherwise, it triggers multiple lockdep reports: 1/ Tracing callbacks are in an atomic context that can not acquire sleeping locks: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1288, name: bash preempt_count: 2, expected: 0 RCU nest depth: 0, expected: 0 [..] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023 Call Trace: <TASK> dump_stack_lvl+0x71/0x90 __might_resched+0x1b2/0x2c0 down_read+0x1a/0x190 cxl_dpa_resource_start+0x15/0x50 [cxl_core] cxl_trace_hpa+0x122/0x300 [cxl_core] trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core] 2/ The rwsem is already held in the inject poison path: WARNING: possible recursive locking detected 6.7.0-rc2+ #12 Tainted: G W OE N -------------------------------------------- bash/1288 is trying to acquire lock: ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_dpa_resource_start+0x15/0x50 [cxl_core] but task is already holding lock: ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_inject_poison+0x7d/0x1e0 [cxl_core] [..] Call Trace: <TASK> dump_stack_lvl+0x71/0x90 __might_resched+0x1b2/0x2c0 down_read+0x1a/0x190 cxl_dpa_resource_start+0x15/0x50 [cxl_core] cxl_trace_hpa+0x122/0x300 [cxl_core] trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core] __traceiter_cxl_poison+0x5c/0x80 [cxl_core] cxl_inject_poison+0x1bc/0x1e0 [cxl_core] This appears to have been an issue since the initial implementation and uncovered by the new cxl-poison.sh test [1]. That test is now passing with these changes. Fixes: 28a3ae4 ("cxl/trace: Add an HPA to cxl_poison trace events") Link: http://lore.kernel.org/r/e4f2716646918135ddbadf4146e92abb659de734.1700615159.git.alison.schofield@intel.com [1] Cc: <stable@vger.kernel.org> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
…SR-IOV When kdump kernel tries to copy dump data over SR-IOV, LPAR panics due to NULL pointer exception: Kernel attempted to read user page (0) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc000000020847ad4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: mlx5_core(+) vmx_crypto pseries_wdt papr_scm libnvdimm mlxfw tls psample sunrpc fuse overlay squashfs loop CPU: 12 PID: 315 Comm: systemd-udevd Not tainted 6.4.0-Test102+ #12 Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_008) hv:phyp pSeries NIP: c000000020847ad4 LR: c00000002083b2dc CTR: 00000000006cd18c REGS: c000000029162ca0 TRAP: 0300 Not tainted (6.4.0-Test102+) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 48288244 XER: 00000008 CFAR: c00000002083b2d8 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 ... NIP _find_next_zero_bit+0x24/0x110 LR bitmap_find_next_zero_area_off+0x5c/0xe0 Call Trace: dev_printk_emit+0x38/0x48 (unreliable) iommu_area_alloc+0xc4/0x180 iommu_range_alloc+0x1e8/0x580 iommu_alloc+0x60/0x130 iommu_alloc_coherent+0x158/0x2b0 dma_iommu_alloc_coherent+0x3c/0x50 dma_alloc_attrs+0x170/0x1f0 mlx5_cmd_init+0xc0/0x760 [mlx5_core] mlx5_function_setup+0xf0/0x510 [mlx5_core] mlx5_init_one+0x84/0x210 [mlx5_core] probe_one+0x118/0x2c0 [mlx5_core] local_pci_probe+0x68/0x110 pci_call_probe+0x68/0x200 pci_device_probe+0xbc/0x1a0 really_probe+0x104/0x540 __driver_probe_device+0xb4/0x230 driver_probe_device+0x54/0x130 __driver_attach+0x158/0x2b0 bus_for_each_dev+0xa8/0x130 driver_attach+0x34/0x50 bus_add_driver+0x16c/0x300 driver_register+0xa4/0x1b0 __pci_register_driver+0x68/0x80 mlx5_init+0xb8/0x100 [mlx5_core] do_one_initcall+0x60/0x300 do_init_module+0x7c/0x2b0 At the time of LPAR dump, before kexec hands over control to kdump kernel, DDWs (Dynamic DMA Windows) are scanned and added to the FDT. For the SR-IOV case, default DMA window "ibm,dma-window" is removed from the FDT and DDW added, for the device. Now, kexec hands over control to the kdump kernel. When the kdump kernel initializes, PCI busses are scanned and IOMMU group/tables created, in pci_dma_bus_setup_pSeriesLP(). For the SR-IOV case, there is no "ibm,dma-window". The original commit: b1fc44e, fixes the path where memory is pre-mapped (direct mapped) to the DDW. When TCEs are direct mapped, there is no need to initialize IOMMU tables. iommu_table_setparms_lpar() only considers "ibm,dma-window" property when initiallizing IOMMU table. In the scenario where TCEs are dynamically allocated for SR-IOV, newly created IOMMU table is not initialized. Later, when the device driver tries to enter TCEs for the SR-IOV device, NULL pointer execption is thrown from iommu_area_alloc(). The fix is to initialize the IOMMU table with DDW property stored in the FDT. There are 2 points to remember: 1. For the dedicated adapter, kdump kernel would encounter both default and DDW in FDT. In this case, DDW property is used to initialize the IOMMU table. 2. A DDW could be direct or dynamic mapped. kdump kernel would initialize IOMMU table and mark the existing DDW as "dynamic". This works fine since, at the time of table initialization, iommu_table_clear() makes some space in the DDW, for some predefined number of TCEs which are needed for kdump to succeed. Fixes: b1fc44e ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window") Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240125203017.61014-1-gbatra@linux.ibm.com
vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e #3 [fffffe00003fced0] do_nmi at ffffffff8922660d #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 #12 [ffffa65531497b68] printk at ffffffff89318306 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] #18 [ffffa65531497f10] kthread at ffffffff892d2e72 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 1548036 ("nfs: make the rpc_stat per net namespace") added functionality to specify rpc_stats function but missed adding it to the TCP TLS functionality. As the result, mounting with xprtsec=tls lead to the following kernel oops. [ 128.984192] Unable to handle kernel NULL pointer dereference at virtual address 000000000000001c [ 128.985058] Mem abort info: [ 128.985372] ESR = 0x0000000096000004 [ 128.985709] EC = 0x25: DABT (current EL), IL = 32 bits [ 128.986176] SET = 0, FnV = 0 [ 128.986521] EA = 0, S1PTW = 0 [ 128.986804] FSC = 0x04: level 0 translation fault [ 128.987229] Data abort info: [ 128.987597] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 128.988169] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 128.988811] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 128.989302] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000106c84000 [ 128.990048] [000000000000001c] pgd=0000000000000000, p4d=0000000000000000 [ 128.990736] Internal error: Oops: 0000000096000004 [#1] SMP [ 128.991168] Modules linked in: nfs_layout_nfsv41_files rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace netfs uinput dm_mod nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common mc vmw_vmci xfs libcrc32c e1000e crct10dif_ce ghash_ce sha2_ce vmwgfx nvme sha256_arm64 nvme_core sr_mod cdrom sha1_ce drm_ttm_helper ttm drm_kms_helper drm sg fuse [ 128.996466] CPU: 0 PID: 179 Comm: kworker/u4:26 Kdump: loaded Not tainted 6.8.0-rc6+ #12 [ 128.997226] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.21805430.BA64.2305221830 05/22/2023 [ 128.998084] Workqueue: xprtiod xs_tcp_tls_setup_socket [sunrpc] [ 128.998701] pstate: 81400005 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 128.999384] pc : call_start+0x74/0x138 [sunrpc] [ 128.999809] lr : __rpc_execute+0xb8/0x3e0 [sunrpc] [ 129.000244] sp : ffff8000832b3a00 [ 129.000508] x29: ffff8000832b3a00 x28: ffff800081ac79c0 x27: ffff800081ac7000 [ 129.001111] x26: 0000000004248060 x25: 0000000000000000 x24: ffff800081596008 [ 129.001757] x23: ffff80007b087240 x22: ffff00009a509d30 x21: 0000000000000000 [ 129.002345] x20: ffff000090075600 x19: ffff00009a509d00 x18: ffffffffffffffff [ 129.002912] x17: 733d4d4554535953 x16: 42555300312d746e x15: ffff8000832b3a88 [ 129.003464] x14: ffffffffffffffff x13: ffff8000832b3a7d x12: 0000000000000008 [ 129.004021] x11: 0101010101010101 x10: ffff8000150cb560 x9 : ffff80007b087c00 [ 129.004577] x8 : ffff00009a509de0 x7 : 0000000000000000 x6 : 00000000be8c4ee3 [ 129.005026] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff000094d56680 [ 129.005425] x2 : ffff80007b0637f8 x1 : ffff000090075600 x0 : ffff00009a509d00 [ 129.005824] Call trace: [ 129.005967] call_start+0x74/0x138 [sunrpc] [ 129.006233] __rpc_execute+0xb8/0x3e0 [sunrpc] [ 129.006506] rpc_execute+0x160/0x1d8 [sunrpc] [ 129.006778] rpc_run_task+0x148/0x1f8 [sunrpc] [ 129.007204] tls_probe+0x80/0xd0 [sunrpc] [ 129.007460] rpc_ping+0x28/0x80 [sunrpc] [ 129.007715] rpc_create_xprt+0x134/0x1a0 [sunrpc] [ 129.007999] rpc_create+0x128/0x2a0 [sunrpc] [ 129.008264] xs_tcp_tls_setup_socket+0xdc/0x508 [sunrpc] [ 129.008583] process_one_work+0x174/0x3c8 [ 129.008813] worker_thread+0x2c8/0x3e0 [ 129.009033] kthread+0x100/0x110 [ 129.009225] ret_from_fork+0x10/0x20 [ 129.009432] Code: f0ffffc2 911fe042 aa1403e1 aa1303e0 (b9401c83) Fixes: 1548036 ("nfs: make the rpc_stat per net namespace") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
syzkaller reported a warning [0] triggered while destroying immature netns. rpc_proc_register() was called in init_nfs_fs(), but its error has been ignored since at least the initial commit 1da177e4c3f4 ("Linux-2.6.12-rc2"). Recently, commit d47151b ("nfs: expose /proc/net/sunrpc/nfs in net namespaces") converted the procfs to per-netns and made the problem more visible. Even when rpc_proc_register() fails, nfs_net_init() could succeed, and thus nfs_net_exit() will be called while destroying the netns. Then, remove_proc_entry() will be called for non-existing proc directory and trigger the warning below. Let's handle the error of rpc_proc_register() properly in nfs_net_init(). [0]: name 'nfs' WARNING: CPU: 1 PID: 1710 at fs/proc/generic.c:711 remove_proc_entry+0x1bb/0x2d0 fs/proc/generic.c:711 Modules linked in: CPU: 1 PID: 1710 Comm: syz-executor.2 Not tainted 6.8.0-12822-gcd51db110a7e #12 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:remove_proc_entry+0x1bb/0x2d0 fs/proc/generic.c:711 Code: 41 5d 41 5e c3 e8 85 09 b5 ff 48 c7 c7 88 58 64 86 e8 09 0e 71 02 e8 74 09 b5 ff 4c 89 e6 48 c7 c7 de 1b 80 84 e8 c5 ad 97 ff <0f> 0b eb b1 e8 5c 09 b5 ff 48 c7 c7 88 58 64 86 e8 e0 0d 71 02 eb RSP: 0018:ffffc9000c6d7ce0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff8880422b8b00 RCX: ffffffff8110503c RDX: ffff888030652f00 RSI: ffffffff81105045 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: ffffffff81bb62cb R12: ffffffff84807ffc R13: ffff88804ad6fcc0 R14: ffffffff84807ffc R15: ffffffff85741ff8 FS: 00007f30cfba8640(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff51afe8000 CR3: 000000005a60a005 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> rpc_proc_unregister+0x64/0x70 net/sunrpc/stats.c:310 nfs_net_exit+0x1c/0x30 fs/nfs/inode.c:2438 ops_exit_list+0x62/0xb0 net/core/net_namespace.c:170 setup_net+0x46c/0x660 net/core/net_namespace.c:372 copy_net_ns+0x244/0x590 net/core/net_namespace.c:505 create_new_namespaces+0x2ed/0x770 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0xae/0x160 kernel/nsproxy.c:228 ksys_unshare+0x342/0x760 kernel/fork.c:3322 __do_sys_unshare kernel/fork.c:3393 [inline] __se_sys_unshare kernel/fork.c:3391 [inline] __x64_sys_unshare+0x1f/0x30 kernel/fork.c:3391 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x46/0x4e RIP: 0033:0x7f30d0febe5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007f30cfba7cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007f30d0febe5d RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000006c020600 RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 R13: 000000000000000b R14: 00007f30d104c530 R15: 0000000000000000 </TASK> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
With BPF_PROBE_MEM, BPF allows de-referencing an untrusted pointer. To thwart invalid memory accesses, the JITs add an exception table entry for all such accesses. But in case the src_reg + offset is a userspace address, the BPF program might read that memory if the user has mapped it. Make the verifier add guard instructions around such memory accesses and skip the load if the address falls into the userspace region. The JITs need to implement bpf_arch_uaddress_limit() to define where the userspace addresses end for that architecture or TASK_SIZE is taken as default. The implementation is as follows: REG_AX = SRC_REG if(offset) REG_AX += offset; REG_AX >>= 32; if (REG_AX <= (uaddress_limit >> 32)) DST_REG = 0; else DST_REG = *(size *)(SRC_REG + offset); Comparing just the upper 32 bits of the load address with the upper 32 bits of uaddress_limit implies that the values are being aligned down to a 4GB boundary before comparison. The above means that all loads with address <= uaddress_limit + 4GB are skipped. This is acceptable because there is a large hole (much larger than 4GB) between userspace and kernel space memory, therefore a correctly functioning BPF program should not access this 4GB memory above the userspace. Let's analyze what this patch does to the following fentry program dereferencing an untrusted pointer: SEC("fentry/tcp_v4_connect") int BPF_PROG(fentry_tcp_v4_connect, struct sock *sk) { *(volatile long *)sk; return 0; } BPF Program before | BPF Program after ------------------ | ----------------- 0: (79) r1 = *(u64 *)(r1 +0) 0: (79) r1 = *(u64 *)(r1 +0) ----------------------------------------------------------------------- 1: (79) r1 = *(u64 *)(r1 +0) --\ 1: (bf) r11 = r1 ----------------------------\ \ 2: (77) r11 >>= 32 2: (b7) r0 = 0 \ \ 3: (b5) if r11 <= 0x8000 goto pc+2 3: (95) exit \ \-> 4: (79) r1 = *(u64 *)(r1 +0) \ 5: (05) goto pc+1 \ 6: (b7) r1 = 0 \-------------------------------------- 7: (b7) r0 = 0 8: (95) exit As you can see from above, in the best case (off=0), 5 extra instructions are emitted. Now, we analyze the same program after it has gone through the JITs of ARM64 and RISC-V architectures. We follow the single load instruction that has the untrusted pointer and see what instrumentation has been added around it. x86-64 JIT ========== JIT's Instrumentation (upstream) --------------------- 0: nopl 0x0(%rax,%rax,1) 5: xchg %ax,%ax 7: push %rbp 8: mov %rsp,%rbp b: mov 0x0(%rdi),%rdi --------------------------------- f: movabs $0x800000000000,%r11 19: cmp %r11,%rdi 1c: jb 0x000000000000002a 1e: mov %rdi,%r11 21: add $0x0,%r11 28: jae 0x000000000000002e 2a: xor %edi,%edi 2c: jmp 0x0000000000000032 2e: mov 0x0(%rdi),%rdi --------------------------------- 32: xor %eax,%eax 34: leave 35: ret The x86-64 JIT already emits some instructions to protect against user memory access. This patch doesn't make any changes for the x86-64 JIT. ARM64 JIT ========= No Intrumentation Verifier's Instrumentation (upstream) (This patch) ----------------- -------------------------- 0: add x9, x30, #0x0 0: add x9, x30, #0x0 4: nop 4: nop 8: paciasp 8: paciasp c: stp x29, x30, [sp, #-16]! c: stp x29, x30, [sp, #-16]! 10: mov x29, sp 10: mov x29, sp 14: stp x19, x20, [sp, #-16]! 14: stp x19, x20, [sp, #-16]! 18: stp x21, x22, [sp, #-16]! 18: stp x21, x22, [sp, #-16]! 1c: stp x25, x26, [sp, #-16]! 1c: stp x25, x26, [sp, #-16]! 20: stp x27, x28, [sp, #-16]! 20: stp x27, x28, [sp, #-16]! 24: mov x25, sp 24: mov x25, sp 28: mov x26, #0x0 28: mov x26, #0x0 2c: sub x27, x25, #0x0 2c: sub x27, x25, #0x0 30: sub sp, sp, #0x0 30: sub sp, sp, #0x0 34: ldr x0, [x0] 34: ldr x0, [x0] -------------------------------------------------------------------------------- 38: ldr x0, [x0] ----------\ 38: add x9, x0, #0x0 -----------------------------------\\ 3c: lsr x9, x9, #32 3c: mov x7, #0x0 \\ 40: cmp x9, #0x10, lsl #12 40: mov sp, sp \\ 44: b.ls 0x0000000000000050 44: ldp x27, x28, [sp], #16 \\--> 48: ldr x0, [x0] 48: ldp x25, x26, [sp], #16 \ 4c: b 0x0000000000000054 4c: ldp x21, x22, [sp], #16 \ 50: mov x0, #0x0 50: ldp x19, x20, [sp], #16 \--------------------------------------- 54: ldp x29, x30, [sp], #16 54: mov x7, #0x0 58: add x0, x7, #0x0 58: mov sp, sp 5c: autiasp 5c: ldp x27, x28, [sp], #16 60: ret 60: ldp x25, x26, [sp], #16 64: nop 64: ldp x21, x22, [sp], #16 68: ldr x10, 0x0000000000000070 68: ldp x19, x20, [sp], #16 6c: br x10 6c: ldp x29, x30, [sp], #16 70: add x0, x7, #0x0 74: autiasp 78: ret 7c: nop 80: ldr x10, 0x0000000000000088 84: br x10 There are 6 extra instructions added in ARM64 in the best case. This will become 7 in the worst case (off != 0). RISC-V JIT (RISCV_ISA_C Disabled) ========== No Intrumentation Verifier's Instrumentation (upstream) (This patch) ----------------- -------------------------- 0: nop 0: nop 4: nop 4: nop 8: li a6, 33 8: li a6, 33 c: addi sp, sp, -16 c: addi sp, sp, -16 10: sd s0, 8(sp) 10: sd s0, 8(sp) 14: addi s0, sp, 16 14: addi s0, sp, 16 18: ld a0, 0(a0) 18: ld a0, 0(a0) --------------------------------------------------------------- 1c: ld a0, 0(a0) --\ 1c: mv t0, a0 --------------------------\ \ 20: srli t0, t0, 32 20: li a5, 0 \ \ 24: lui t1, 4096 24: ld s0, 8(sp) \ \ 28: sext.w t1, t1 28: addi sp, sp, 16 \ \ 2c: bgeu t1, t0, 12 2c: sext.w a0, a5 \ \--> 30: ld a0, 0(a0) 30: ret \ 34: j 8 \ 38: li a0, 0 \------------------------------ 3c: li a5, 0 40: ld s0, 8(sp) 44: addi sp, sp, 16 48: sext.w a0, a5 4c: ret There are 7 extra instructions added in RISC-V. Fixes: 8008342 ("bpf, arm64: Add BPF exception tables") Reported-by: Breno Leitao <leitao@debian.org> Suggested-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20240424100210.11982-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
We shouldn't set real_dev to NULL because packets can be in transit and xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume real_dev is set. Example trace: kernel: BUG: unable to handle page fault for address: 0000000000001030 kernel: bond0: (slave eni0np1): making interface the new active one kernel: #PF: supervisor write access in kernel mode kernel: #PF: error_code(0x0002) - not-present page kernel: PGD 0 P4D 0 kernel: Oops: 0002 [#1] PREEMPT SMP kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim] kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f kernel: bond0: (slave eni0np1): making interface the new active one kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60 kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00 kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014 kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000 kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000 kernel: FS: 00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0 kernel: bond0: (slave eni0np1): making interface the new active one kernel: Call Trace: kernel: <TASK> kernel: ? __die+0x1f/0x60 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: ? page_fault_oops+0x142/0x4c0 kernel: ? do_user_addr_fault+0x65/0x670 kernel: ? kvm_read_and_reset_apf_flags+0x3b/0x50 kernel: bond0: (slave eni0np1): making interface the new active one kernel: ? exc_page_fault+0x7b/0x180 kernel: ? asm_exc_page_fault+0x22/0x30 kernel: ? nsim_bpf_uninit+0x50/0x50 [netdevsim] kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim] kernel: bond0: (slave eni0np1): making interface the new active one kernel: bond_ipsec_offload_ok+0x7b/0x90 [bonding] kernel: xfrm_output+0x61/0x3b0 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: ip_push_pending_frames+0x56/0x80 Fixes: 18cb261 ("bonding: support hardware encryption offload to slaves") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
A sysfs reader can race with a device reset or removal, attempting to read device state when the device is not actually present. eg: [exception RIP: qed_get_current_link+17] #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede] #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb crash> struct net_device.state ffff9a9d21336000 state = 5, state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100). The device is not present, note lack of __LINK_STATE_PRESENT (0b10). This is the same sort of panic as observed in commit 4224cfd ("net-sysfs: add check for netdevice being present to speed_show"). There are many other callers of __ethtool_get_link_ksettings() which don't have a device presence check. Move this check into ethtool to protect all callers. Fixes: d519e17 ("net: export device speed and duplex via sysfs") Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I have the u-boot edison-v2017.05 and built the 32bits 4.11 version of the kernel, loaded that to the FAT formated part 9 and used:
The kernel boots fine up to the point:
[ 2.432544] md: Waiting for all devices to be available before autodetect
[ 2.439398] md: If you don't use raid, use raid=noautodetect
[ 2.446107] md: Autodetecting RAID arrays.
[ 2.450279] md: autorun ...
[ 2.453122] md: ... autorun DONE.
[ 2.456551] Waiting for root device PARTUUID=012b3303-34ac-284d-99b4-34e03a2335f4...
And sits there (no watchdog reset). The partition is the original rootfs that is booted by the yocto kernel, on the mmc.
The text was updated successfully, but these errors were encountered: