forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
capebus: Add nixie cape support #6
Closed
mranostay
wants to merge
7
commits into
koenkooi:3.7-for-panto-rebase
from
mranostay:3.7-for-koen-rebase
Closed
capebus: Add nixie cape support #6
mranostay
wants to merge
7
commits into
koenkooi:3.7-for-panto-rebase
from
mranostay:3.7-for-koen-rebase
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Added nixie capebus PWM, leds, and spi-vfd definitions to am355x-bone-common Signed-off-by: Matt Ranostay <mranostay@gmail.com>
Supported added for SPI VFD shift register device for nixie cape Signed-off-by: Matt Ranostay <mranostay@gmail.com>
Add support for nixie cape breadboard prototype. Signed-off-by: Matt Ranostay <mranostay@gmail.com>
Added SPI VFD and Nixie cape entrys in Kconfig and Makefile Signed-off-by: Matt Ranostay <mranostay@gmail.com>
Add SPI VFD device to capebus pdevs for generic lookup. Signed-off-by: Matt Ranostay <mranostay@gmail.com>
When updating vfd_display cancel old delayed workqueue and trigger new event. Signed-off-by: Matt Ranostay <mranostay@gmail.com>
Only memset digits that aren't being used to reduce flicker Signed-off-by: Matt Ranostay <mranostay@gmail.com>
Can you squash the fixes into the main commits? And please do it against https://github.com/beagleboard/kernel/, this is a generated tree, so pull requests won't work. |
koenkooi
pushed a commit
that referenced
this pull request
Dec 6, 2012
A rescue thread exiting TASK_INTERRUPTIBLE can lead to a task scheduling off, never to be seen again. In the case where this occurred, an exiting thread hit reiserfs homebrew conditional resched while holding a mutex, bringing the box to its knees. PID: 18105 TASK: ffff8807fd412180 CPU: 5 COMMAND: "kdmflush" #0 [ffff8808157e7670] schedule at ffffffff8143f489 #1 [ffff8808157e77b8] reiserfs_get_block at ffffffffa038ab2d [reiserfs] #2 [ffff8808157e79a8] __block_write_begin at ffffffff8117fb14 #3 [ffff8808157e7a98] reiserfs_write_begin at ffffffffa0388695 [reiserfs] #4 [ffff8808157e7ad8] generic_perform_write at ffffffff810ee9e2 #5 [ffff8808157e7b58] generic_file_buffered_write at ffffffff810eeb41 #6 [ffff8808157e7ba8] __generic_file_aio_write at ffffffff810f1a3a #7 [ffff8808157e7c58] generic_file_aio_write at ffffffff810f1c88 #8 [ffff8808157e7cc8] do_sync_write at ffffffff8114f850 #9 [ffff8808157e7dd8] do_acct_process at ffffffff810a268f [exception RIP: kernel_thread_helper] RIP: ffffffff8144a5c0 RSP: ffff8808157e7f58 RFLAGS: 00000202 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff8107af60 RDI: ffff8803ee491d18 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
koenkooi
pushed a commit
that referenced
this pull request
Mar 21, 2013
[ Upstream commit 9cb6cb7 ] The following script will produce a kernel oops: sudo ip netns add v sudo ip netns exec v ip ad add 127.0.0.1/8 dev lo sudo ip netns exec v ip link set lo up sudo ip netns exec v ip ro add 224.0.0.0/4 dev lo sudo ip netns exec v ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev lo sudo ip netns exec v ip link set vxlan0 up sudo ip netns del v where inspect by gdb: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 107] 0xffffffffa0289e33 in ?? () (gdb) bt #0 vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533 #1 vxlan_stop (dev=0xffff88001bafa000) at drivers/net/vxlan.c:1087 #2 0xffffffff812cc498 in __dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1299 #3 0xffffffff812cd920 in dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1335 #4 0xffffffff812cef31 in rollback_registered_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:4851 #5 0xffffffff812cf040 in unregister_netdevice_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:5752 #6 0xffffffff812cf1ba in default_device_exit_batch (net_list=0xffff88001f2e7e18) at net/core/dev.c:6170 #7 0xffffffff812cab27 in cleanup_net (work=<optimized out>) at net/core/net_namespace.c:302 #8 0xffffffff810540ef in process_one_work (worker=0xffff88001ba9ed40, work=0xffffffff8167d020) at kernel/workqueue.c:2157 #9 0xffffffff810549d0 in worker_thread (__worker=__worker@entry=0xffff88001ba9ed40) at kernel/workqueue.c:2276 torvalds#10 0xffffffff8105870c in kthread (_create=0xffff88001f2e5d68) at kernel/kthread.c:168 torvalds#11 <signal handler called> torvalds#12 0x0000000000000000 in ?? () torvalds#13 0x0000000000000000 in ?? () (gdb) fr 0 #0 vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533 533 struct sock *sk = vn->sock->sk; (gdb) l 528 static int vxlan_leave_group(struct net_device *dev) 529 { 530 struct vxlan_dev *vxlan = netdev_priv(dev); 531 struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); 532 int err = 0; 533 struct sock *sk = vn->sock->sk; 534 struct ip_mreqn mreq = { 535 .imr_multiaddr.s_addr = vxlan->gaddr, 536 .imr_ifindex = vxlan->link, 537 }; (gdb) p vn->sock $4 = (struct socket *) 0x0 The kernel calls `vxlan_exit_net` when deleting the netns before shutting down vxlan interfaces. Later the removal of all vxlan interfaces, where `vn->sock` is already gone causes the oops. so we should manually shutdown all interfaces before deleting `vn->sock` as the patch does. Signed-off-by: Zang MingJie <zealot0630@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Apr 5, 2013
commit ff931c8 upstream. clk inits on OMAP happen quite early, even before slab is available. The dependency comes from the fact that the timer init code starts to use clocks and hwmod and we need clocks to be initialized by then. There are various problems doing clk inits this early, one is, not being able to do dynamic clk registrations and hence the dependency on clk-private.h. The other is, inability to debug early kernel crashes without enabling DEBUG_LL and earlyprintk. Doing early clk init also exposed another instance of a kernel panic due to a BUG() when CONFIG_DEBUG_SLAB is enabled. [ 0.000000] Kernel BUG at c01174f8 [verbose debug info unavailable] [ 0.000000] Internal error: Oops - BUG: 0 [#1] SMP ARM [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 Not tainted (3.9.0-rc1-12179-g72d48f9 #6) [ 0.000000] PC is at __kmalloc+0x1d4/0x248 [ 0.000000] LR is at __clk_init+0x2e0/0x364 [ 0.000000] pc : [<c01174f8>] lr : [<c0441f54>] psr: 600001d3 [ 0.000000] sp : c076ff28 ip : c065cefc fp : c0441f54 [ 0.000000] r10: 0000001c r9 : 000080d0 r8 : c076ffd4 [ 0.000000] r7 : c074b578 r6 : c0794d88 r5 : 00000040 r4 : 00000000 [ 0.000000] r3 : 00000000 r2 : c07cac70 r1 : 000080d0 r0 : 0000001c [ 0.000000] Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment kernel [ 0.000000] Control: 10c53c7d Table: 8000404a DAC: 00000017 [ 0.000000] Process swapper (pid: 0, stack limit = 0xc076e240) [ 0.000000] Stack: (0xc076ff28 to 0xc0770000) [ 0.000000] ff20: 22222222 c0794ec8 c06546e8 00000000 00000040 c0794d88 [ 0.000000] ff40: c074b578 c076ffd4 c07951c8 c076e000 00000000 c0441f54 c074b578 c076ffd4 [ 0.000000] ff60: c0793828 00000040 c0794d88 c074b578 c076ffd4 c0776900 c076e000 c07272ac [ 0.000000] ff80: 2f800000 c074c968 c07f93d0 c0719780 c076ffa0 c076ff98 00000000 00000000 [ 0.000000] ffa0: 00000000 00000000 00000000 00000001 c074cd6c c077b1ec 8000406a c0715724 [ 0.000000] ffc0: 00000000 00000000 00000000 00000000 00000000 c074c968 10c53c7d c0776974 [ 0.000000] ffe0: c074cd6c c077b1ec 8000406a 411fc092 00000000 80008074 00000000 00000000 [ 0.000000] [<c01174f8>] (__kmalloc+0x1d4/0x248) from [<c0441f54>] (__clk_init+0x2e0/0x364) [ 0.000000] [<c0441f54>] (__clk_init+0x2e0/0x364) from [<c07272ac>] (omap4xxx_clk_init+0xbc/0x140) [ 0.000000] [<c07272ac>] (omap4xxx_clk_init+0xbc/0x140) from [<c0719780>] (setup_arch+0x15c/0x284) [ 0.000000] [<c0719780>] (setup_arch+0x15c/0x284) from [<c0715724>] (start_kernel+0x7c/0x334) [ 0.000000] [<c0715724>] (start_kernel+0x7c/0x334) from [<80008074>] (0x80008074) [ 0.000000] Code: e5883004 e1a00006 e28dd00c e8bd8ff0 (e7f001f2) [ 0.000000] ---[ end trace 1b75b31a2719ed1c ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! It was a know issue, that slab allocations would fail when common clock core tries to cache parent pointers for mux clocks on OMAP, and hence a patch 'clk: Allow late cache allocation for clk->parents, commit 7975059' was added to work this problem around. A BUG() within kmalloc() with CONFIG_DEBUG_SLAB enabled was completely overlooked causing this regression. More details on the issue reported can be found here, http://www.mail-archive.com/linux-omap@vger.kernel.org/msg85932.html With all these issues around clk inits happening way too early, it makes sense to at least move them to a point where dynamic memory allocations are possible. So move them to a point just before the timer code starts using clocks and hwmod. This should at least pave way for clk inits on OMAP moving to dynamic clock registrations instead of using the static macros defined in clk-private.h. The issue with kernel panic while CONFIG_DEBUG_SLAB is enabled was reported by Piotr Haber and Tony Lindgren and this patch fixes the reported issue as well. Reported-by: Piotr Haber <phaber@broadcom.com> Reported-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Rajendra Nayak <rnayak@ti.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Mike Turquette <mturquette@linaro.org> Acked-by: Paul Walmsley <paul@pwsan.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
May 9, 2013
commit 8f294b5 upstream. The settimeofday01 test in the LTP testsuite effectively does gettimeofday(current time); settimeofday(Jan 1, 1970 + 100 seconds); settimeofday(current time); This test causes a stack trace to be displayed on the console during the setting of timeofday to Jan 1, 1970 + 100 seconds: [ 131.066751] ------------[ cut here ]------------ [ 131.096448] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x135/0x140() [ 131.104935] Hardware name: Dinar [ 131.108150] Modules linked in: sg nfsv3 nfs_acl nfsv4 auth_rpcgss nfs dns_resolver fscache lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables kvm_amd kvm sp5100_tco bnx2 i2c_piix4 crc32c_intel k10temp fam15h_power ghash_clmulni_intel amd64_edac_mod pcspkr serio_raw edac_mce_amd edac_core microcode xfs libcrc32c sr_mod sd_mod cdrom ata_generic crc_t10dif pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm ahci pata_atiixp libahci libata usb_storage i2c_core dm_mirror dm_region_hash dm_log dm_mod [ 131.176784] Pid: 0, comm: swapper/28 Not tainted 3.8.0+ #6 [ 131.182248] Call Trace: [ 131.184684] <IRQ> [<ffffffff810612af>] warn_slowpath_common+0x7f/0xc0 [ 131.191312] [<ffffffff8106130a>] warn_slowpath_null+0x1a/0x20 [ 131.197131] [<ffffffff810b9fd5>] clockevents_program_event+0x135/0x140 [ 131.203721] [<ffffffff810bb584>] tick_program_event+0x24/0x30 [ 131.209534] [<ffffffff81089ab1>] hrtimer_interrupt+0x131/0x230 [ 131.215437] [<ffffffff814b9600>] ? cpufreq_p4_target+0x130/0x130 [ 131.221509] [<ffffffff81619119>] smp_apic_timer_interrupt+0x69/0x99 [ 131.227839] [<ffffffff8161805d>] apic_timer_interrupt+0x6d/0x80 [ 131.233816] <EOI> [<ffffffff81099745>] ? sched_clock_cpu+0xc5/0x120 [ 131.240267] [<ffffffff814b9ff0>] ? cpuidle_wrap_enter+0x50/0xa0 [ 131.246252] [<ffffffff814b9fe9>] ? cpuidle_wrap_enter+0x49/0xa0 [ 131.252238] [<ffffffff814ba050>] cpuidle_enter_tk+0x10/0x20 [ 131.257877] [<ffffffff814b9c89>] cpuidle_idle_call+0xa9/0x260 [ 131.263692] [<ffffffff8101c42f>] cpu_idle+0xaf/0x120 [ 131.268727] [<ffffffff815f8971>] start_secondary+0x255/0x257 [ 131.274449] ---[ end trace 1151a50552231615 ]--- When we change the system time to a low value like this, the value of timekeeper->offs_real will be a negative value. It seems that the WARN occurs because an hrtimer has been started in the time between the releasing of the timekeeper lock and the IPI call (via a call to on_each_cpu) in clock_was_set() in the do_settimeofday() code. The end result is that a REALTIME_CLOCK timer has been added with softexpires = expires = KTIME_MAX. The hrtimer_interrupt() fires/is called and the loop at kernel/hrtimer.c:1289 is executed. In this loop the code subtracts the clock base's offset (which was set to timekeeper->offs_real in do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which was KTIME_MAX): KTIME_MAX - (a negative value) = overflow A simple check for an overflow can resolve this problem. Using KTIME_MAX instead of the overflow value will result in the hrtimer function being run, and the reprogramming of the timer after that. Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Prarit Bhargava <prarit@redhat.com> [jstultz: Tweaked commit subject] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Jul 2, 2013
The check in omap_musb_mailbox does not properly check if the module has been fully initialized. The patch fixes that, and the kernel panic below: $ modprobe twl4030-usb [ 13.924743] twl4030_usb twl4030-usb.33: HW_CONDITIONS 0xe0/224; link 3 [ 13.940307] Unable to handle kernel NULL pointer dereference at virtual address 00000004 [ 13.948883] pgd = ef27c000 [ 13.951751] [00000004] *pgd=af256831, *pte=00000000, *ppte=00000000 [ 13.958374] Internal error: Oops: 17 [#1] ARM [ 13.962921] Modules linked in: twl4030_usb(+) omap2430 libcomposite [ 13.969543] CPU: 0 Not tainted (3.8.0-rc1-n9xx-11758-ge37a37c-dirty #6) [ 13.976867] PC is at omap_musb_mailbox+0x18/0x54 [omap2430] [ 13.982727] LR is at twl4030_usb_probe+0x240/0x354 [twl4030_usb] [ 13.989013] pc : [<bf013b6c>] lr : [<bf018958>] psr: 60000013 [ 13.989013] sp : ef273cf0 ip : ef273d08 fp : ef273d04 [ 14.001068] r10: bf01b000 r9 : bf0191d8 r8 : 00000001 [ 14.006530] r7 : 00000000 r6 : ef140e10 r5 : 00000003 r4 : 00000000 [ 14.013397] r3 : bf0142dc r2 : 00000006 r1 : 00000000 r0 : 00000003 [ 14.020233] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 14.027740] Control: 10c5387d Table: af27c019 DAC: 00000015 [ 14.033752] Process modprobe (pid: 616, stack limit = 0xef272238) [ 14.040161] Stack: (0xef273cf0 to 0xef274000) [ 14.044708] 3ce0: ef254310 00000001 ef273d34 ef273d08 [ 14.053314] 3d00: bf018958 bf013b60 bf0190a4 ef254310 c0101550 c0c3a138 ef140e10 ef140e44 [ 14.061889] 3d20: bf019150 00000001 ef273d44 ef273d38 c019890c bf018724 ef273d64 ef273d48 [ 14.070495] 3d40: c01974fc c01988f8 ef140e10 bf019150 ef140e44 00000000 ef273d84 ef273d68 [ 14.079071] 3d60: c0197728 c019748c c0197694 00000000 bf019150 c0197694 ef273dac ef273d88 [ 14.087677] 3d80: c0195c38 c01976a0 ef03610c ef143eb0 c0128954 ef254780 bf019150 c0b19548 [ 14.096252] 3da0: ef273dbc ef273db0 c0197098 c0195bf0 ef273dec ef273dc0 c0196c98 c0197080 [ 14.104858] 3dc0: bf0190a4 c0b27bc0 ef273dec bf019150 bf019190 c0b27bc0 ef272000 00000001 [ 14.113433] 3de0: ef273e14 ef273df0 c0197c18 c0196b30 ef273f48 bf019190 c0b27bc0 ef272000 [ 14.122039] 3e00: 00000001 bf01b000 ef273e24 ef273e18 c0198b28 c0197ba4 ef273e34 ef273e28 [ 14.130615] 3e20: bf01b014 c0198ae8 ef273e8c ef273e38 c0008918 bf01b00c c004f730 c012ba1c [ 14.139221] 3e40: ef273e74 00000000 c00505b0 c004f72c 00000000 ef273e60 ef273f48 bf019190 [ 14.147796] 3e60: 00000001 ef273f48 bf019190 00000001 ef286340 00000001 bf0191d8 c0065414 [ 14.156402] 3e80: ef273f44 ef273e90 c0067754 c00087fc bf01919c 00007fff c0064794 00000000 [ 14.164978] 3ea0: ef273ecc f0064000 00000001 ef272000 ef272000 00067f39 bf0192b0 bf01919c [ 14.173583] 3ec0: ef273f0c ef273ed0 c00a6bf0 c00a53fc ff000000 000000d2 c0067dc8 00000000 [ 14.182159] 3ee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 14.190765] 3f00: 00000000 00000000 00000000 00000000 00000000 00000000 ffffffff 00002968 [ 14.199340] 3f20: 00080878 00067f39 00000080 c000e2e8 ef272000 00000000 ef273fa4 ef273f48 [ 14.207946] 3f40: c0067e54 c0066188 f0064000 00002968 f0065530 f0065463 f0065fb0 000012c4 [ 14.216522] 3f60: 00001664 00000000 00000000 00000000 00000014 00000015 0000000c 00000000 [ 14.225128] 3f80: 00000008 00000000 00000000 00080370 00080878 0007422c 00000000 ef273fa8 [ 14.233703] 3fa0: c000e140 c0067d80 00080370 00080878 00080878 00002968 00067f39 00000000 [ 14.242309] 3fc0: 00080370 00080878 0007422c 00000080 00074030 00067f39 bec7aef8 00000000 [ 14.250885] 3fe0: b6f05300 bec7ab68 0000e93c b6f05310 60000010 00080878 af7fe821 af7fec21 [ 14.259460] Backtrace: [ 14.262054] [<bf013b54>] (omap_musb_mailbox+0x0/0x54 [omap2430]) from [<bf018958>] (twl4030_usb_probe+0x240/0x354 [twl4030_usb]) [ 14.274200] r5:00000001 r4:ef254310 [ 14.277984] [<bf018718>] (twl4030_usb_probe+0x0/0x354 [twl4030_usb]) from [<c019890c>] (platform_drv_probe+0x20/0x24) [ 14.289123] r8:00000001 r7:bf019150 r6:ef140e44 r5:ef140e10 r4:c0c3a138 [ 14.296203] [<c01988ec>] (platform_drv_probe+0x0/0x24) from [<c01974fc>] (driver_probe_device+0x7c/0x214) [ 14.306243] [<c0197480>] (driver_probe_device+0x0/0x214) from [<c0197728>] (__driver_attach+0x94/0x98) [ 14.316009] r7:00000000 r6:ef140e44 r5:bf019150 r4:ef140e10 [ 14.321990] [<c0197694>] (__driver_attach+0x0/0x98) from [<c0195c38>] (bus_for_each_dev+0x54/0x88) [ 14.331390] r6:c0197694 r5:bf019150 r4:00000000 r3:c0197694 [ 14.337371] [<c0195be4>] (bus_for_each_dev+0x0/0x88) from [<c0197098>] (driver_attach+0x24/0x28) [ 14.346588] r6:c0b19548 r5:bf019150 r4:ef254780 [ 14.351440] [<c0197074>] (driver_attach+0x0/0x28) from [<c0196c98>] (bus_add_driver+0x174/0x244) [ 14.360687] [<c0196b24>] (bus_add_driver+0x0/0x244) from [<c0197c18>] (driver_register+0x80/0x154) [ 14.370086] r8:00000001 r7:ef272000 r6:c0b27bc0 r5:bf019190 r4:bf019150 [ 14.377136] [<c0197b98>] (driver_register+0x0/0x154) from [<c0198b28>] (platform_driver_register+0x4c/0x60) [ 14.387390] [<c0198adc>] (platform_driver_register+0x0/0x60) from [<bf01b014>] (twl4030_usb_init+0x14/0x1c [twl4030_usb]) [ 14.398895] [<bf01b000>] (twl4030_usb_init+0x0/0x1c [twl4030_usb]) from [<c0008918>] (do_one_initcall+0x128/0x1a8) [ 14.409790] [<c00087f0>] (do_one_initcall+0x0/0x1a8) from [<c0067754>] (load_module+0x15d8/0x1bf8) [ 14.419189] [<c006617c>] (load_module+0x0/0x1bf8) from [<c0067e54>] (sys_init_module+0xe0/0xf4) [ 14.428344] [<c0067d74>] (sys_init_module+0x0/0xf4) from [<c000e140>] (ret_fast_syscall+0x0/0x30) [ 14.437652] r6:0007422c r5:00080878 r4:00080370 [ 14.442504] Code: e24cb004 e59f3038 e1a05000 e593401c (e5940004) [ 14.448944] ---[ end trace dbf47e5bc5ba03c2 ]--- [ 14.453826] Kernel panic - not syncing: Fatal exception Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Felipe Balbi <balbi@ti.com>
koenkooi
pushed a commit
that referenced
this pull request
Jul 2, 2013
All the current platforms can work with 0x8000_0000 based dma_addr_t since the Bus Bridges typically ignore the top bit (the only excpetion was Angel4 PCI-AHB bridge which we no longer care for). That way we don't need plat-specific cpu-addr to bus-addr conversion. Hooks still provided - just in case a platform has an obscure device which say needs 0 based bus address. That way <asm/dma_mapping.h> no longer needs to unconditinally include <plat/dma_addr.h> Also verfied that on Angel4 board, other peripherals (IDE-disk / EMAC) work fine with 0x8000_0000 based dma addresses. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de>
koenkooi
pushed a commit
that referenced
this pull request
Jul 2, 2013
…ernel/git/vgupta/arc Pull new ARC architecture from Vineet Gupta: "Initial ARC Linux port with some fixes on top for 3.9-rc1: I would like to introduce the Linux port to ARC Processors (from Synopsys) for 3.9-rc1. The patch-set has been discussed on the public lists since Nov and has received a fair bit of review, specially from Arnd, tglx, Al and other subsystem maintainers for DeviceTree, kgdb... The arch bits are in arch/arc, some asm-generic changes (acked by Arnd), a minor change to PARISC (acked by Helge). The series is a touch bigger for a new port for 2 main reasons: 1. It enables a basic kernel in first sub-series and adds ptrace/kgdb/.. later 2. Some of the fallout of review (DeviceTree support, multi-platform- image support) were added on top of orig series, primarily to record the revision history. This updated pull request additionally contains - fixes due to our GNU tools catching up with the new syscall/ptrace ABI - some (minor) cross-arch Kconfig updates." * tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (82 commits) ARC: split elf.h into uapi and export it for userspace ARC: Fixup the current ABI version ARC: gdbserver using regset interface possibly broken ARC: Kconfig cleanup tracking cross-arch Kconfig pruning in merge window ARC: make a copy of flat DT ARC: [plat-arcfpga] DT arc-uart bindings change: "baud" => "current-speed" ARC: Ensure CONFIG_VIRT_TO_BUS is not enabled ARC: Fix pt_orig_r8 access ARC: [3.9] Fallout of hlist iterator update ARC: 64bit RTSC timestamp hardware issue ARC: Don't fiddle with non-existent caches ARC: Add self to MAINTAINERS ARC: Provide a default serial.h for uart drivers needing BASE_BAUD ARC: [plat-arcfpga] defconfig for fully loaded ARC Linux ARC: [Review] Multi-platform image #8: platform registers SMP callbacks ARC: [Review] Multi-platform image #7: SMP common code to use callbacks ARC: [Review] Multi-platform image #6: cpu-to-dma-addr optional ARC: [Review] Multi-platform image #5: NR_IRQS defined by ARC core ARC: [Review] Multi-platform image #4: Isolate platform headers ARC: [Review] Multi-platform image #3: switch to board callback ...
koenkooi
pushed a commit
that referenced
this pull request
Jul 2, 2013
The following script will produce a kernel oops: sudo ip netns add v sudo ip netns exec v ip ad add 127.0.0.1/8 dev lo sudo ip netns exec v ip link set lo up sudo ip netns exec v ip ro add 224.0.0.0/4 dev lo sudo ip netns exec v ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev lo sudo ip netns exec v ip link set vxlan0 up sudo ip netns del v where inspect by gdb: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 107] 0xffffffffa0289e33 in ?? () (gdb) bt #0 vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533 #1 vxlan_stop (dev=0xffff88001bafa000) at drivers/net/vxlan.c:1087 #2 0xffffffff812cc498 in __dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1299 #3 0xffffffff812cd920 in dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1335 #4 0xffffffff812cef31 in rollback_registered_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:4851 #5 0xffffffff812cf040 in unregister_netdevice_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:5752 #6 0xffffffff812cf1ba in default_device_exit_batch (net_list=0xffff88001f2e7e18) at net/core/dev.c:6170 #7 0xffffffff812cab27 in cleanup_net (work=<optimized out>) at net/core/net_namespace.c:302 #8 0xffffffff810540ef in process_one_work (worker=0xffff88001ba9ed40, work=0xffffffff8167d020) at kernel/workqueue.c:2157 #9 0xffffffff810549d0 in worker_thread (__worker=__worker@entry=0xffff88001ba9ed40) at kernel/workqueue.c:2276 torvalds#10 0xffffffff8105870c in kthread (_create=0xffff88001f2e5d68) at kernel/kthread.c:168 torvalds#11 <signal handler called> torvalds#12 0x0000000000000000 in ?? () torvalds#13 0x0000000000000000 in ?? () (gdb) fr 0 #0 vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533 533 struct sock *sk = vn->sock->sk; (gdb) l 528 static int vxlan_leave_group(struct net_device *dev) 529 { 530 struct vxlan_dev *vxlan = netdev_priv(dev); 531 struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); 532 int err = 0; 533 struct sock *sk = vn->sock->sk; 534 struct ip_mreqn mreq = { 535 .imr_multiaddr.s_addr = vxlan->gaddr, 536 .imr_ifindex = vxlan->link, 537 }; (gdb) p vn->sock $4 = (struct socket *) 0x0 The kernel calls `vxlan_exit_net` when deleting the netns before shutting down vxlan interfaces. Later the removal of all vxlan interfaces, where `vn->sock` is already gone causes the oops. so we should manually shutdown all interfaces before deleting `vn->sock` as the patch does. Signed-off-by: Zang MingJie <zealot0630@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
koenkooi
pushed a commit
that referenced
this pull request
Jul 2, 2013
clk inits on OMAP happen quite early, even before slab is available. The dependency comes from the fact that the timer init code starts to use clocks and hwmod and we need clocks to be initialized by then. There are various problems doing clk inits this early, one is, not being able to do dynamic clk registrations and hence the dependency on clk-private.h. The other is, inability to debug early kernel crashes without enabling DEBUG_LL and earlyprintk. Doing early clk init also exposed another instance of a kernel panic due to a BUG() when CONFIG_DEBUG_SLAB is enabled. [ 0.000000] Kernel BUG at c01174f8 [verbose debug info unavailable] [ 0.000000] Internal error: Oops - BUG: 0 [#1] SMP ARM [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 Not tainted (3.9.0-rc1-12179-g72d48f9 #6) [ 0.000000] PC is at __kmalloc+0x1d4/0x248 [ 0.000000] LR is at __clk_init+0x2e0/0x364 [ 0.000000] pc : [<c01174f8>] lr : [<c0441f54>] psr: 600001d3 [ 0.000000] sp : c076ff28 ip : c065cefc fp : c0441f54 [ 0.000000] r10: 0000001c r9 : 000080d0 r8 : c076ffd4 [ 0.000000] r7 : c074b578 r6 : c0794d88 r5 : 00000040 r4 : 00000000 [ 0.000000] r3 : 00000000 r2 : c07cac70 r1 : 000080d0 r0 : 0000001c [ 0.000000] Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment kernel [ 0.000000] Control: 10c53c7d Table: 8000404a DAC: 00000017 [ 0.000000] Process swapper (pid: 0, stack limit = 0xc076e240) [ 0.000000] Stack: (0xc076ff28 to 0xc0770000) [ 0.000000] ff20: 22222222 c0794ec8 c06546e8 00000000 00000040 c0794d88 [ 0.000000] ff40: c074b578 c076ffd4 c07951c8 c076e000 00000000 c0441f54 c074b578 c076ffd4 [ 0.000000] ff60: c0793828 00000040 c0794d88 c074b578 c076ffd4 c0776900 c076e000 c07272ac [ 0.000000] ff80: 2f800000 c074c968 c07f93d0 c0719780 c076ffa0 c076ff98 00000000 00000000 [ 0.000000] ffa0: 00000000 00000000 00000000 00000001 c074cd6c c077b1ec 8000406a c0715724 [ 0.000000] ffc0: 00000000 00000000 00000000 00000000 00000000 c074c968 10c53c7d c0776974 [ 0.000000] ffe0: c074cd6c c077b1ec 8000406a 411fc092 00000000 80008074 00000000 00000000 [ 0.000000] [<c01174f8>] (__kmalloc+0x1d4/0x248) from [<c0441f54>] (__clk_init+0x2e0/0x364) [ 0.000000] [<c0441f54>] (__clk_init+0x2e0/0x364) from [<c07272ac>] (omap4xxx_clk_init+0xbc/0x140) [ 0.000000] [<c07272ac>] (omap4xxx_clk_init+0xbc/0x140) from [<c0719780>] (setup_arch+0x15c/0x284) [ 0.000000] [<c0719780>] (setup_arch+0x15c/0x284) from [<c0715724>] (start_kernel+0x7c/0x334) [ 0.000000] [<c0715724>] (start_kernel+0x7c/0x334) from [<80008074>] (0x80008074) [ 0.000000] Code: e5883004 e1a00006 e28dd00c e8bd8ff0 (e7f001f2) [ 0.000000] ---[ end trace 1b75b31a2719ed1c ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! It was a know issue, that slab allocations would fail when common clock core tries to cache parent pointers for mux clocks on OMAP, and hence a patch 'clk: Allow late cache allocation for clk->parents, commit 7975059' was added to work this problem around. A BUG() within kmalloc() with CONFIG_DEBUG_SLAB enabled was completely overlooked causing this regression. More details on the issue reported can be found here, http://www.mail-archive.com/linux-omap@vger.kernel.org/msg85932.html With all these issues around clk inits happening way too early, it makes sense to at least move them to a point where dynamic memory allocations are possible. So move them to a point just before the timer code starts using clocks and hwmod. This should at least pave way for clk inits on OMAP moving to dynamic clock registrations instead of using the static macros defined in clk-private.h. The issue with kernel panic while CONFIG_DEBUG_SLAB is enabled was reported by Piotr Haber and Tony Lindgren and this patch fixes the reported issue as well. Reported-by: Piotr Haber <phaber@broadcom.com> Reported-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Rajendra Nayak <rnayak@ti.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Mike Turquette <mturquette@linaro.org> Acked-by: Paul Walmsley <paul@pwsan.com> Cc: stable@vger.kernel.org # v3.8 Signed-off-by: Tony Lindgren <tony@atomide.com>
koenkooi
pushed a commit
that referenced
this pull request
Jul 2, 2013
The settimeofday01 test in the LTP testsuite effectively does gettimeofday(current time); settimeofday(Jan 1, 1970 + 100 seconds); settimeofday(current time); This test causes a stack trace to be displayed on the console during the setting of timeofday to Jan 1, 1970 + 100 seconds: [ 131.066751] ------------[ cut here ]------------ [ 131.096448] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x135/0x140() [ 131.104935] Hardware name: Dinar [ 131.108150] Modules linked in: sg nfsv3 nfs_acl nfsv4 auth_rpcgss nfs dns_resolver fscache lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables kvm_amd kvm sp5100_tco bnx2 i2c_piix4 crc32c_intel k10temp fam15h_power ghash_clmulni_intel amd64_edac_mod pcspkr serio_raw edac_mce_amd edac_core microcode xfs libcrc32c sr_mod sd_mod cdrom ata_generic crc_t10dif pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm ahci pata_atiixp libahci libata usb_storage i2c_core dm_mirror dm_region_hash dm_log dm_mod [ 131.176784] Pid: 0, comm: swapper/28 Not tainted 3.8.0+ #6 [ 131.182248] Call Trace: [ 131.184684] <IRQ> [<ffffffff810612af>] warn_slowpath_common+0x7f/0xc0 [ 131.191312] [<ffffffff8106130a>] warn_slowpath_null+0x1a/0x20 [ 131.197131] [<ffffffff810b9fd5>] clockevents_program_event+0x135/0x140 [ 131.203721] [<ffffffff810bb584>] tick_program_event+0x24/0x30 [ 131.209534] [<ffffffff81089ab1>] hrtimer_interrupt+0x131/0x230 [ 131.215437] [<ffffffff814b9600>] ? cpufreq_p4_target+0x130/0x130 [ 131.221509] [<ffffffff81619119>] smp_apic_timer_interrupt+0x69/0x99 [ 131.227839] [<ffffffff8161805d>] apic_timer_interrupt+0x6d/0x80 [ 131.233816] <EOI> [<ffffffff81099745>] ? sched_clock_cpu+0xc5/0x120 [ 131.240267] [<ffffffff814b9ff0>] ? cpuidle_wrap_enter+0x50/0xa0 [ 131.246252] [<ffffffff814b9fe9>] ? cpuidle_wrap_enter+0x49/0xa0 [ 131.252238] [<ffffffff814ba050>] cpuidle_enter_tk+0x10/0x20 [ 131.257877] [<ffffffff814b9c89>] cpuidle_idle_call+0xa9/0x260 [ 131.263692] [<ffffffff8101c42f>] cpu_idle+0xaf/0x120 [ 131.268727] [<ffffffff815f8971>] start_secondary+0x255/0x257 [ 131.274449] ---[ end trace 1151a50552231615 ]--- When we change the system time to a low value like this, the value of timekeeper->offs_real will be a negative value. It seems that the WARN occurs because an hrtimer has been started in the time between the releasing of the timekeeper lock and the IPI call (via a call to on_each_cpu) in clock_was_set() in the do_settimeofday() code. The end result is that a REALTIME_CLOCK timer has been added with softexpires = expires = KTIME_MAX. The hrtimer_interrupt() fires/is called and the loop at kernel/hrtimer.c:1289 is executed. In this loop the code subtracts the clock base's offset (which was set to timekeeper->offs_real in do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which was KTIME_MAX): KTIME_MAX - (a negative value) = overflow A simple check for an overflow can resolve this problem. Using KTIME_MAX instead of the overflow value will result in the hrtimer function being run, and the reprogramming of the timer after that. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Prarit Bhargava <prarit@redhat.com> [jstultz: Tweaked commit subject] Signed-off-by: John Stultz <john.stultz@linaro.org>
koenkooi
pushed a commit
that referenced
this pull request
Jul 2, 2013
Or Gerlitz says: ==================== This series adds support for the SRIOV ndo_set_vf callbacks to the mlx4 driver. Series done against the net-next tree as of commit 0c50134 "batman-adv: fix global protection fault during soft_iface destruction". We have successfully tested the series on net-next, except for getting the VF link info issue I have reported earlier today on netdev, we see the problem for both ixgbe and mlx4 VFs. Just to make sure get VF config is working OK with patch #6 - we have run it over 3.8.8 too. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
koenkooi
pushed a commit
that referenced
this pull request
Jul 2, 2013
Or Gerlitz says: ==================== This series adds support for the SRIOV ndo_set_vf callbacks to the mlx4 driver. Series done against the net-next tree as of commit 37fe066 "net: fix address check in rtnl_fdb_del" We have successfully tested the series on net-next, except for getting the VF link info issue I have reported earlier today on netdev, we see the problem for both ixgbe and mlx4 VFs. Just to make sure get VF config is working OK with patch #6 - we have run it over 3.8.8 too. We added to the V1 series two patches that disable HW timestamping when running over a VF, as this isn't supported yet. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
koenkooi
pushed a commit
that referenced
this pull request
Jul 2, 2013
Toralf reported the following oops to the linux-nfs mailing list: -----------------[snip]------------------ NFSD: unable to generate recoverydir name (-2). NFSD: disabling legacy clientid tracking. Reboot recovery will not function correctly! BUG: unable to handle kernel NULL pointer dereference at 000003c8 IP: [<f90a3d91>] nfsd4_client_tracking_exit+0x11/0x50 [nfsd] *pdpt = 000000002ba33001 *pde = 0000000000000000 Oops: 0000 [#1] SMP Modules linked in: loop nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun arc4 iwldvm mac80211 coretemp kvm_intel uvcvideo sdhci_pci sdhci mmc_core videobuf2_vmalloc videobuf2_memops usblp videobuf2_core i915 iwlwifi psmouse videodev cfg80211 kvm fbcon bitblit cfbfillrect acpi_cpufreq mperf evdev softcursor font cfbimgblt i2c_algo_bit cfbcopyarea intel_agp intel_gtt drm_kms_helper snd_hda_codec_conexant drm agpgart fb fbdev tpm_tis thinkpad_acpi tpm nvram e1000e rfkill thermal ptp wmi pps_core tpm_bios 8250_pci processor 8250 ac snd_hda_intel snd_hda_codec snd_pcm battery video i2c_i801 snd_page_alloc snd_timer button serial_core i2c_core snd soundcore thermal_sys hwmon aesni_intel ablk_helper cryp td lrw aes_i586 xts gf128mul cbc fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode] Pid: 6374, comm: nfsd Not tainted 3.9.1 #6 LENOVO 4180F65/4180F65 EIP: 0060:[<f90a3d91>] EFLAGS: 00010202 CPU: 0 EIP is at nfsd4_client_tracking_exit+0x11/0x50 [nfsd] EAX: 00000000 EBX: fffffffe ECX: 00000007 EDX: 00000007 ESI: eb9dcb00 EDI: eb2991c0 EBP: eb2bde38 ESP: eb2bde34 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 80050033 CR2: 000003c8 CR3: 2ba80000 CR4: 000407f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process nfsd (pid: 6374, ti=eb2bc000 task=eb2711c0 task.ti=eb2bc000) Stack: fffffffe eb2bde4c f90a3e0c f90a7754 fffffffe eb0a9c00 eb2bdea0 f90a41ed eb2991c0 1b270000 eb2991c0 eb2bde7c f9099ce9 eb2bde98 0129a020 eb29a020 eb2bdecc eb2991c0 eb2bdea8 f9099da5 00000000 eb9dcb00 00000001 67822f08 Call Trace: [<f90a3e0c>] legacy_recdir_name_error+0x3c/0x40 [nfsd] [<f90a41ed>] nfsd4_create_clid_dir+0x15d/0x1c0 [nfsd] [<f9099ce9>] ? nfsd4_lookup_stateid+0x99/0xd0 [nfsd] [<f9099da5>] ? nfs4_preprocess_seqid_op+0x85/0x100 [nfsd] [<f90a4287>] nfsd4_client_record_create+0x37/0x50 [nfsd] [<f909d6ce>] nfsd4_open_confirm+0xfe/0x130 [nfsd] [<f90980b1>] ? nfsd4_encode_operation+0x61/0x90 [nfsd] [<f909d5d0>] ? nfsd4_free_stateid+0xc0/0xc0 [nfsd] [<f908fd0b>] nfsd4_proc_compound+0x41b/0x530 [nfsd] [<f9081b7b>] nfsd_dispatch+0x8b/0x1a0 [nfsd] [<f857b85d>] svc_process+0x3dd/0x640 [sunrpc] [<f908165d>] nfsd+0xad/0x110 [nfsd] [<f90815b0>] ? nfsd_destroy+0x70/0x70 [nfsd] [<c1054824>] kthread+0x94/0xa0 [<c1486937>] ret_from_kernel_thread+0x1b/0x28 [<c1054790>] ? flush_kthread_work+0xd0/0xd0 Code: 86 b0 00 00 00 90 c5 0a f9 c7 04 24 70 76 0a f9 e8 74 a9 3d c8 eb ba 8d 76 00 55 89 e5 53 66 66 66 66 90 8b 15 68 c7 0a f9 85 d2 <8b> 88 c8 03 00 00 74 2c 3b 11 77 28 8b 5c 91 08 85 db 74 22 8b EIP: [<f90a3d91>] nfsd4_client_tracking_exit+0x11/0x50 [nfsd] SS:ESP 0068:eb2bde34 CR2: 00000000000003c8 ---[ end trace 09e54015d145c9c6 ]--- The problem appears to be a regression that was introduced in commit 9a9c647 "nfsd: make NFSv4 recovery client tracking options per net". Prior to that commit, it was safe to pass a NULL net pointer to nfsd4_client_tracking_exit in the legacy recdir case, and legacy_recdir_name_error did so. After that comit, the net pointer must be valid. This patch just fixes legacy_recdir_name_error to pass in a valid net pointer to that function. Cc: <stable@vger.kernel.org> # v3.8+ Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Reported-and-tested-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
koenkooi
pushed a commit
that referenced
this pull request
Jul 2, 2013
Daniel Petre reported crashes in icmp_dst_unreach() with following call graph: #3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77 #4 [ffff88003fc03948] icmp_send at ffffffff814d5fec #5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795 #6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965 #7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032 #8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66 #9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd torvalds#10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7 torvalds#11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596 Daniel found a similar problem mentioned in http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html And indeed this is the root cause : skb->cb[] contains data fooling IP stack. We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure() is called. Or else skb->cb[] might contain garbage from GSO segmentation layer. A similar fix was tested on linux-3.9, but gre code was refactored in linux-3.10. I'll send patches for stable kernels as well. Many thanks to Daniel for providing reports, patches and testing ! Reported-by: Daniel Petre <daniel.petre@rcs-rds.ro> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
koenkooi
pushed a commit
that referenced
this pull request
Jul 2, 2013
The following backtrace is reported with CONFIG_PROVE_RCU: drivers/infiniband/hw/qib/qib_keys.c:64 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 4 locks held by kworker/0:1/56: #0: (events){.+.+.+}, at: [<ffffffff8107a4f5>] process_one_work+0x165/0x4a0 #1: ((&wfc.work)){+.+.+.}, at: [<ffffffff8107a4f5>] process_one_work+0x165/0x4a0 #2: (device_mutex){+.+.+.}, at: [<ffffffffa0148dd8>] ib_register_device+0x38/0x220 [ib_core] #3: (&(&dev->lk_table.lock)->rlock){......}, at: [<ffffffffa017e81c>] qib_alloc_lkey+0x3c/0x1b0 [ib_qib] stack backtrace: Pid: 56, comm: kworker/0:1 Not tainted 3.10.0-rc1+ #6 Call Trace: [<ffffffff810c0b85>] lockdep_rcu_suspicious+0xe5/0x130 [<ffffffffa017e8e1>] qib_alloc_lkey+0x101/0x1b0 [ib_qib] [<ffffffffa0184886>] qib_get_dma_mr+0xa6/0xd0 [ib_qib] [<ffffffffa01461aa>] ib_get_dma_mr+0x1a/0x50 [ib_core] [<ffffffffa01678dc>] ib_mad_port_open+0x12c/0x390 [ib_mad] [<ffffffff810c2c55>] ? trace_hardirqs_on_caller+0x105/0x190 [<ffffffffa0167b92>] ib_mad_init_device+0x52/0x110 [ib_mad] [<ffffffffa01917c0>] ? sl2vl_attr_show+0x30/0x30 [ib_qib] [<ffffffffa0148f49>] ib_register_device+0x1a9/0x220 [ib_core] [<ffffffffa01b1685>] qib_register_ib_device+0x735/0xa40 [ib_qib] [<ffffffff8106ba98>] ? mod_timer+0x118/0x220 [<ffffffffa017d425>] qib_init_one+0x1e5/0x400 [ib_qib] [<ffffffff812ce86e>] local_pci_probe+0x4e/0x90 [<ffffffff81078118>] work_for_cpu_fn+0x18/0x30 [<ffffffff8107a566>] process_one_work+0x1d6/0x4a0 [<ffffffff8107a4f5>] ? process_one_work+0x165/0x4a0 [<ffffffff8107c9c9>] worker_thread+0x119/0x370 [<ffffffff8107c8b0>] ? manage_workers+0x180/0x180 [<ffffffff8108294e>] kthread+0xee/0x100 [<ffffffff81082860>] ? __init_kthread_worker+0x70/0x70 [<ffffffff815c04ac>] ret_from_fork+0x7c/0xb0 [<ffffffff81082860>] ? __init_kthread_worker+0x70/0x70 Per Documentation/RCU/lockdep-splat.txt, the code now uses rcu_access_pointer() vs. rcu_dereference(). Reported-by: Jay Fenlason <fenlason@redhat.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
koenkooi
pushed a commit
that referenced
this pull request
Jul 2, 2013
In Steven Rostedt's words: > I've been debugging the last couple of days why my tests have been > locking up. One of my tracing tests, runs all available tracers. The > lockup always happened with the mmiotrace, which is used to trace > interactions between priority drivers and the kernel. But to do this > easily, when the tracer gets registered, it disables all but the boot > CPUs. The lockup always happened after it got done disabling the CPUs. > > Then I decided to try this: > > while :; do > for i in 1 2 3; do > echo 0 > /sys/devices/system/cpu/cpu$i/online > done > for i in 1 2 3; do > echo 1 > /sys/devices/system/cpu/cpu$i/online > done > done > > Well, sure enough, that locked up too, with the same users. Doing a > sysrq-w (showing all blocked tasks): > > [ 2991.344562] task PC stack pid father > [ 2991.344562] rcu_preempt D ffff88007986fdf8 0 10 2 0x00000000 > [ 2991.344562] ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908 > [ 2991.344562] ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80 > [ 2991.344562] ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295 > [ 2991.344562] Call Trace: > [ 2991.344562] [<ffffffff815437ba>] schedule+0x64/0x66 > [ 2991.344562] [<ffffffff81541750>] schedule_timeout+0xbc/0xf9 > [ 2991.344562] [<ffffffff8154bec0>] ? ftrace_call+0x5/0x2f > [ 2991.344562] [<ffffffff81049513>] ? cascade+0xa8/0xa8 > [ 2991.344562] [<ffffffff815417ab>] schedule_timeout_uninterruptible+0x1e/0x20 > [ 2991.344562] [<ffffffff810c980c>] rcu_gp_kthread+0x502/0x94b > [ 2991.344562] [<ffffffff81062791>] ? __init_waitqueue_head+0x50/0x50 > [ 2991.344562] [<ffffffff810c930a>] ? rcu_gp_fqs+0x64/0x64 > [ 2991.344562] [<ffffffff81061cdb>] kthread+0xb1/0xb9 > [ 2991.344562] [<ffffffff81091e31>] ? lock_release_holdtime.part.23+0x4e/0x55 > [ 2991.344562] [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58 > [ 2991.344562] [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0 > [ 2991.344562] [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58 > [ 2991.344562] kworker/0:1 D ffffffff81a30680 0 47 2 0x00000000 > [ 2991.344562] Workqueue: events cpuset_hotplug_workfn > [ 2991.344562] ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8 > [ 2991.344562] ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80 > [ 2991.344562] ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000 > [ 2991.344562] Call Trace: > [ 2991.344562] [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609 > [ 2991.344562] [<ffffffff815437ba>] schedule+0x64/0x66 > [ 2991.344562] [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24 > [ 2991.344562] [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609 > [ 2991.344562] [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50 > [ 2991.344562] [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50 > [ 2991.344562] [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40 > [ 2991.344562] [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50 > [ 2991.344562] [<ffffffff810af7e6>] rebuild_sched_domains_locked+0x6e/0x3a8 > [ 2991.344562] [<ffffffff810b0ec6>] rebuild_sched_domains+0x1c/0x2a > [ 2991.344562] [<ffffffff810b109b>] cpuset_hotplug_workfn+0x1c7/0x1d3 > [ 2991.344562] [<ffffffff810b0ed9>] ? cpuset_hotplug_workfn+0x5/0x1d3 > [ 2991.344562] [<ffffffff81058e07>] process_one_work+0x2d4/0x4d1 > [ 2991.344562] [<ffffffff81058d3a>] ? process_one_work+0x207/0x4d1 > [ 2991.344562] [<ffffffff8105964c>] worker_thread+0x2e7/0x3b5 > [ 2991.344562] [<ffffffff81059365>] ? rescuer_thread+0x332/0x332 > [ 2991.344562] [<ffffffff81061cdb>] kthread+0xb1/0xb9 > [ 2991.344562] [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58 > [ 2991.344562] [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0 > [ 2991.344562] [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58 > [ 2991.344562] bash D ffffffff81a4aa80 0 2618 2612 0x10000000 > [ 2991.344562] ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c > [ 2991.344562] ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80 > [ 2991.344562] ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000 > [ 2991.344562] Call Trace: > [ 2991.344562] [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609 > [ 2991.344562] [<ffffffff815437ba>] schedule+0x64/0x66 > [ 2991.344562] [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24 > [ 2991.344562] [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609 > [ 2991.344562] [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e > [ 2991.344562] [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e > [ 2991.344562] [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40 > [ 2991.344562] [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e > [ 2991.344562] [<ffffffff81091c99>] ? __lock_is_held+0x32/0x53 > [ 2991.344562] [<ffffffff81548912>] notifier_call_chain+0x6b/0x98 > [ 2991.344562] [<ffffffff810671fd>] __raw_notifier_call_chain+0xe/0x10 > [ 2991.344562] [<ffffffff8103cf64>] __cpu_notify+0x20/0x32 > [ 2991.344562] [<ffffffff8103cf8d>] cpu_notify_nofail+0x17/0x36 > [ 2991.344562] [<ffffffff815225de>] _cpu_down+0x154/0x259 > [ 2991.344562] [<ffffffff81522710>] cpu_down+0x2d/0x3a > [ 2991.344562] [<ffffffff81526351>] store_online+0x4e/0xe7 > [ 2991.344562] [<ffffffff8134d764>] dev_attr_store+0x20/0x22 > [ 2991.344562] [<ffffffff811b3c5f>] sysfs_write_file+0x108/0x144 > [ 2991.344562] [<ffffffff8114c5ef>] vfs_write+0xfd/0x158 > [ 2991.344562] [<ffffffff8114c928>] SyS_write+0x5c/0x83 > [ 2991.344562] [<ffffffff8154c494>] tracesys+0xdd/0xe2 > > As well as held locks: > > [ 3034.728033] Showing all locks held in the system: > [ 3034.728033] 1 lock held by rcu_preempt/10: > [ 3034.728033] #0: (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff810c9471>] rcu_gp_kthread+0x167/0x94b > [ 3034.728033] 4 locks held by kworker/0:1/47: > [ 3034.728033] #0: (events){.+.+.+}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1 > [ 3034.728033] #1: (cpuset_hotplug_work){+.+.+.}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1 > [ 3034.728033] #2: (cpuset_mutex){+.+.+.}, at: [<ffffffff810b0ec1>] rebuild_sched_domains+0x17/0x2a > [ 3034.728033] #3: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50 > [ 3034.728033] 1 lock held by mingetty/2563: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 1 lock held by mingetty/2565: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 1 lock held by mingetty/2569: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 1 lock held by mingetty/2572: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 1 lock held by mingetty/2575: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 7 locks held by bash/2618: > [ 3034.728033] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff8114bc3f>] file_start_write+0x2a/0x2c > [ 3034.728033] #1: (&buffer->mutex#2){+.+.+.}, at: [<ffffffff811b3b93>] sysfs_write_file+0x3c/0x144 > [ 3034.728033] #2: (s_active#54){.+.+.+}, at: [<ffffffff811b3c3e>] sysfs_write_file+0xe7/0x144 > [ 3034.728033] #3: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff810217c2>] cpu_hotplug_driver_lock+0x17/0x19 > [ 3034.728033] #4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8103d196>] cpu_maps_update_begin+0x17/0x19 > [ 3034.728033] #5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103cfd8>] cpu_hotplug_begin+0x2c/0x6d > [ 3034.728033] #6: (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e > [ 3034.728033] 1 lock held by bash/2980: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > > Things looked a little weird. Also, this is a deadlock that lockdep did > not catch. But what we have here does not look like a circular lock > issue: > > Bash is blocked in rcu_cpu_notify(): > > 1961 /* Exclude any attempts to start a new grace period. */ > 1962 mutex_lock(&rsp->onoff_mutex); > > > kworker is blocked in get_online_cpus(), which makes sense as we are > currently taking down a CPU. > > But rcu_preempt is not blocked on anything. It is simply sleeping in > rcu_gp_kthread (really rcu_gp_init) here: > > 1453 #ifdef CONFIG_PROVE_RCU_DELAY > 1454 if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 && > 1455 system_state == SYSTEM_RUNNING) > 1456 schedule_timeout_uninterruptible(2); > 1457 #endif /* #ifdef CONFIG_PROVE_RCU_DELAY */ > > And it does this while holding the onoff_mutex that bash is waiting for. > > Doing a function trace, it showed me where it happened: > > [ 125.940066] rcu_pree-10 3.... 28384115273: schedule_timeout_uninterruptible <-rcu_gp_kthread > [...] > [ 125.940066] rcu_pree-10 3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==> next_comm=watchdog/3 next_pid=38 next_prio=120 > > The watchdog ran, and then: > > [ 125.940066] watchdog-38 3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==> next_comm=modprobe next_pid=2848 next_prio=118 > > Not sure what modprobe was doing, but shortly after that: > > [ 125.940066] modprobe-2848 3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==> next_comm=migration/3 next_pid=40 next_prio=0 > > Where the migration thread took down the CPU: > > [ 125.940066] migratio-40 3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==> next_comm=swapper/3 next_pid=0 next_prio=120 > > which finally did: > > [ 125.940066] <idle>-0 3...1 28389282142: arch_cpu_idle_dead <-cpu_startup_entry > [ 125.940066] <idle>-0 3...1 28389282548: native_play_dead <-arch_cpu_idle_dead > [ 125.940066] <idle>-0 3...1 28389282924: play_dead_common <-native_play_dead > [ 125.940066] <idle>-0 3...1 28389283468: idle_task_exit <-play_dead_common > [ 125.940066] <idle>-0 3...1 28389284644: amd_e400_remove_cpu <-play_dead_common > > > CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still > doing a schedule_timeout_uninterruptible() and it registered it's > timeout to the timer base for CPU 3. You would think that it would get > migrated right? The issue here is that the timer migration happens at > the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for > CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is > held by the thread that just put itself into a uninterruptible sleep, > that wont wake up until the CPU_DEAD notifier of the timer > infrastructure is called, which wont happen until the rcu notifier > finishes. Here's our deadlock! This commit breaks this deadlock cycle by substituting a shorter udelay() for the previous schedule_timeout_uninterruptible(), while at the same time increasing the probability of the delay. This maintains the intensity of the testing. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Steven Rostedt <rostedt@goodmis.org>
koenkooi
pushed a commit
that referenced
this pull request
Sep 6, 2013
When running the LTP testsuite one may hit this kernel BUG() with the write06 testcase: kernel BUG at mm/filemap.c:2023! CPU: 1 PID: 8614 Comm: writev01 Not tainted 3.10.0-rc7-64bit-c3000+ #6 IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000401e6e84 00000000401e6e88 IIR: 03ffe01f ISR: 0000000010340000 IOR: 000001fbe0380820 CPU: 1 CR30: 00000000bef80000 CR31: ffffffffffffffff ORIG_R28: 00000000bdc192c0 IAOQ[0]: iov_iter_advance+0x3c/0xc0 IAOQ[1]: iov_iter_advance+0x40/0xc0 RP(r2): generic_file_buffered_write+0x204/0x3f0 Backtrace: [<00000000401e764c>] generic_file_buffered_write+0x204/0x3f0 [<00000000401eab24>] __generic_file_aio_write+0x244/0x448 [<00000000401eadc0>] generic_file_aio_write+0x98/0x150 [<000000004024f460>] do_sync_readv_writev+0xc0/0x130 [<000000004025037c>] compat_do_readv_writev+0x12c/0x340 [<00000000402505f8>] compat_writev+0x68/0xa0 [<0000000040251d88>] compat_SyS_writev+0x98/0xf8 Reason for this crash is a gcc miscompilation in the fault handlers of pa_memcpy() which return the fault address instead of the copied bytes. Since this seems to be a generic problem with gcc-4.7.x (and below), it's better to simplify the fault handlers in pa_memcpy to avoid this problem. Here is a simple reproducer for the problem: int main(int argc, char **argv) { int fd, nbytes; struct iovec wr_iovec[] = { { "TEST STRING ",32}, { (char*)0x40005000,32} }; // random memory. fd = open(DATA_FILE, O_RDWR | O_CREAT, 0666); nbytes = writev(fd, wr_iovec, 2); printf("return value = %d, errno %d (%s)\n", nbytes, errno, strerror(errno)); return 0; } In addition, John David Anglin wrote: There is no gcc PR as pa_memcpy is not legitimate C code. There is an implicit assumption that certain variables will contain correct values when an exception occurs and the code randomly jumps to one of the exception blocks. There is no guarantee of this. If a PR was filed, it would likely be marked as invalid. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: <stable@vger.kernel.org> # 3.8+ Signed-off-by: Helge Deller <deller@gmx.de>
koenkooi
pushed a commit
that referenced
this pull request
Sep 6, 2013
If the @fn call work_on_cpu() again, the lockdep will complain: > [ INFO: possible recursive locking detected ] > 3.11.0-rc1-lockdep-fix-a #6 Not tainted > --------------------------------------------- > kworker/0:1/142 is trying to acquire lock: > ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0 > > but task is already holding lock: > ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock((&wfc.work)); > lock((&wfc.work)); > > *** DEADLOCK *** It is false-positive lockdep report. In this sutiation, the two "wfc"s of the two work_on_cpu() are different, they are both on stack. flush_work() can't be deadlock. To fix this, we need to avoid the lockdep checking in this case, thus we instroduce a internal __flush_work() which skip the lockdep. tj: Minor comment adjustment. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Reported-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Reported-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>
koenkooi
pushed a commit
that referenced
this pull request
Sep 6, 2013
Commits 6a1c068 and 9356b53, respectively 'tty: Convert termios_mutex to termios_rwsem' and 'n_tty: Access termios values safely' introduced a circular lock dependency with console_lock and termios_rwsem. The lockdep report [1] shows that n_tty_write() will attempt to claim console_lock while holding the termios_rwsem, whereas tty_do_resize() may already hold the console_lock while claiming the termios_rwsem. Since n_tty_write() and tty_do_resize() do not contend over the same data -- the tty->winsize structure -- correct the lock dependency by introducing a new lock which specifically serializes access to tty->winsize only. [1] Lockdep report ====================================================== [ INFO: possible circular locking dependency detected ] 3.10.0-0+tip-xeon+lockdep #0+tip Not tainted ------------------------------------------------------- modprobe/277 is trying to acquire lock: (&tty->termios_rwsem){++++..}, at: [<ffffffff81452656>] tty_do_resize+0x36/0xe0 but task is already holding lock: ((fb_notifier_list).rwsem){.+.+.+}, at: [<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 ((fb_notifier_list).rwsem){.+.+.+}: [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0 [<ffffffff8175b797>] down_read+0x47/0x5c [<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0 [<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20 [<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320 [<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper] [<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau] [<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau] [<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm] [<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau] [<ffffffff813b13db>] local_pci_probe+0x4b/0x80 [<ffffffff813b1701>] pci_device_probe+0x111/0x120 [<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0 [<ffffffff81497bab>] __driver_attach+0xab/0xb0 [<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0 [<ffffffff814971fe>] driver_attach+0x1e/0x20 [<ffffffff81496cc1>] bus_add_driver+0x111/0x290 [<ffffffff814982b7>] driver_register+0x77/0x170 [<ffffffff813b0454>] __pci_register_driver+0x64/0x70 [<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm] [<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau] [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0 [<ffffffff810c54cb>] load_module+0x123b/0x1bf0 [<ffffffff810c5f57>] SyS_init_module+0xd7/0x120 [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b -> #1 (console_lock){+.+.+.}: [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0 [<ffffffff810430a7>] console_lock+0x77/0x80 [<ffffffff8146b2a1>] con_flush_chars+0x31/0x50 [<ffffffff8145780c>] n_tty_write+0x1ec/0x4d0 [<ffffffff814541b9>] tty_write+0x159/0x2e0 [<ffffffff814543f5>] redirected_tty_write+0xb5/0xc0 [<ffffffff811ab9d5>] vfs_write+0xc5/0x1f0 [<ffffffff811abec5>] SyS_write+0x55/0xa0 [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b -> #0 (&tty->termios_rwsem){++++..}: [<ffffffff810b65c3>] __lock_acquire+0x1c43/0x1d30 [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0 [<ffffffff8175b724>] down_write+0x44/0x70 [<ffffffff81452656>] tty_do_resize+0x36/0xe0 [<ffffffff8146c841>] vc_do_resize+0x3e1/0x4c0 [<ffffffff8146c99f>] vc_resize+0x1f/0x30 [<ffffffff813e4535>] fbcon_init+0x385/0x5a0 [<ffffffff8146a4bc>] visual_init+0xbc/0x120 [<ffffffff8146cd13>] do_bind_con_driver+0x163/0x320 [<ffffffff8146cfa1>] do_take_over_console+0x61/0x70 [<ffffffff813e2b93>] do_fbcon_takeover+0x63/0xc0 [<ffffffff813e67a5>] fbcon_event_notify+0x715/0x820 [<ffffffff81762f9d>] notifier_call_chain+0x5d/0x110 [<ffffffff8107aadc>] __blocking_notifier_call_chain+0x6c/0xc0 [<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20 [<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320 [<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper] [<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau] [<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau] [<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm] [<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau] [<ffffffff813b13db>] local_pci_probe+0x4b/0x80 [<ffffffff813b1701>] pci_device_probe+0x111/0x120 [<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0 [<ffffffff81497bab>] __driver_attach+0xab/0xb0 [<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0 [<ffffffff814971fe>] driver_attach+0x1e/0x20 [<ffffffff81496cc1>] bus_add_driver+0x111/0x290 [<ffffffff814982b7>] driver_register+0x77/0x170 [<ffffffff813b0454>] __pci_register_driver+0x64/0x70 [<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm] [<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau] [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0 [<ffffffff810c54cb>] load_module+0x123b/0x1bf0 [<ffffffff810c5f57>] SyS_init_module+0xd7/0x120 [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Chain exists of: &tty->termios_rwsem --> console_lock --> (fb_notifier_list).rwsem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((fb_notifier_list).rwsem); lock(console_lock); lock((fb_notifier_list).rwsem); lock(&tty->termios_rwsem); *** DEADLOCK *** 7 locks held by modprobe/277: #0: (&__lockdep_no_validate__){......}, at: [<ffffffff81497b5b>] __driver_attach+0x5b/0xb0 #1: (&__lockdep_no_validate__){......}, at: [<ffffffff81497b69>] __driver_attach+0x69/0xb0 #2: (drm_global_mutex){+.+.+.}, at: [<ffffffffa008a6dd>] drm_get_pci_dev+0xbd/0x2a0 [drm] #3: (registration_lock){+.+.+.}, at: [<ffffffff813d93f5>] register_framebuffer+0x25/0x320 #4: (&fb_info->lock){+.+.+.}, at: [<ffffffff813d8116>] lock_fb_info+0x26/0x60 #5: (console_lock){+.+.+.}, at: [<ffffffff813d95a4>] register_framebuffer+0x1d4/0x320 #6: ((fb_notifier_list).rwsem){.+.+.+}, at: [<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0 stack backtrace: CPU: 0 PID: 277 Comm: modprobe Not tainted 3.10.0-0+tip-xeon+lockdep #0+tip Hardware name: Dell Inc. Precision WorkStation T5400 /0RW203, BIOS A11 04/30/2012 ffffffff8213e5e0 ffff8802aa2fb298 ffffffff81755f19 ffff8802aa2fb2e8 ffffffff8174f506 ffff8802aa2fa000 ffff8802aa2fb378 ffff8802aa2ea8e8 ffff8802aa2ea910 ffff8802aa2ea8e8 0000000000000006 0000000000000007 Call Trace: [<ffffffff81755f19>] dump_stack+0x19/0x1b [<ffffffff8174f506>] print_circular_bug+0x1fb/0x20c [<ffffffff810b65c3>] __lock_acquire+0x1c43/0x1d30 [<ffffffff810b775e>] ? mark_held_locks+0xae/0x120 [<ffffffff810b78d5>] ? trace_hardirqs_on_caller+0x105/0x1d0 [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0 [<ffffffff81452656>] ? tty_do_resize+0x36/0xe0 [<ffffffff8175b724>] down_write+0x44/0x70 [<ffffffff81452656>] ? tty_do_resize+0x36/0xe0 [<ffffffff81452656>] tty_do_resize+0x36/0xe0 [<ffffffff8146c841>] vc_do_resize+0x3e1/0x4c0 [<ffffffff8146c99f>] vc_resize+0x1f/0x30 [<ffffffff813e4535>] fbcon_init+0x385/0x5a0 [<ffffffff8146a4bc>] visual_init+0xbc/0x120 [<ffffffff8146cd13>] do_bind_con_driver+0x163/0x320 [<ffffffff8146cfa1>] do_take_over_console+0x61/0x70 [<ffffffff813e2b93>] do_fbcon_takeover+0x63/0xc0 [<ffffffff813e67a5>] fbcon_event_notify+0x715/0x820 [<ffffffff81762f9d>] notifier_call_chain+0x5d/0x110 [<ffffffff8107aadc>] __blocking_notifier_call_chain+0x6c/0xc0 [<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20 [<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320 [<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper] [<ffffffff8173cbcb>] ? kmemleak_alloc+0x5b/0xc0 [<ffffffff81198874>] ? kmem_cache_alloc_trace+0x104/0x290 [<ffffffffa01035e1>] ? drm_fb_helper_single_add_all_connectors+0x81/0xf0 [drm_kms_helper] [<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau] [<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau] [<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm] [<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau] [<ffffffff8175f162>] ? _raw_spin_unlock_irqrestore+0x42/0x80 [<ffffffff813b13db>] local_pci_probe+0x4b/0x80 [<ffffffff813b1701>] pci_device_probe+0x111/0x120 [<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0 [<ffffffff81497bab>] __driver_attach+0xab/0xb0 [<ffffffff81497b00>] ? driver_probe_device+0x3a0/0x3a0 [<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0 [<ffffffff814971fe>] driver_attach+0x1e/0x20 [<ffffffff81496cc1>] bus_add_driver+0x111/0x290 [<ffffffffa022a000>] ? 0xffffffffa0229fff [<ffffffff814982b7>] driver_register+0x77/0x170 [<ffffffffa022a000>] ? 0xffffffffa0229fff [<ffffffff813b0454>] __pci_register_driver+0x64/0x70 [<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm] [<ffffffffa022a000>] ? 0xffffffffa0229fff [<ffffffffa022a000>] ? 0xffffffffa0229fff [<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau] [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0 [<ffffffff810c54cb>] load_module+0x123b/0x1bf0 [<ffffffff81399a50>] ? ddebug_proc_open+0xb0/0xb0 [<ffffffff813855ae>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff810c5f57>] SyS_init_module+0xd7/0x120 [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Sep 6, 2013
We used to keep the port's char device structs and the /sys entries around till the last reference to the port was dropped. This is actually unnecessary, and resulted in buggy behaviour: 1. Open port in guest 2. Hot-unplug port 3. Hot-plug a port with the same 'name' property as the unplugged one This resulted in hot-plug being unsuccessful, as a port with the same name already exists (even though it was unplugged). This behaviour resulted in a warning message like this one: -------------------8<--------------------------------------- WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted) Hardware name: KVM sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:04.0/virtio0/virtio-ports/vport0p1' Call Trace: [<ffffffff8106b607>] ? warn_slowpath_common+0x87/0xc0 [<ffffffff8106b6f6>] ? warn_slowpath_fmt+0x46/0x50 [<ffffffff811f2319>] ? sysfs_add_one+0xc9/0x130 [<ffffffff811f23e8>] ? create_dir+0x68/0xb0 [<ffffffff811f2469>] ? sysfs_create_dir+0x39/0x50 [<ffffffff81273129>] ? kobject_add_internal+0xb9/0x260 [<ffffffff812733d8>] ? kobject_add_varg+0x38/0x60 [<ffffffff812734b4>] ? kobject_add+0x44/0x70 [<ffffffff81349de4>] ? get_device_parent+0xf4/0x1d0 [<ffffffff8134b389>] ? device_add+0xc9/0x650 -------------------8<--------------------------------------- Instead of relying on guest applications to release all references to the ports, we should go ahead and unregister the port from all the core layers. Any open/read calls on the port will then just return errors, and an unplug/plug operation on the host will succeed as expected. This also caused buggy behaviour in case of the device removal (not just a port): when the device was removed (which means all ports on that device are removed automatically as well), the ports with active users would clean up only when the last references were dropped -- and it would be too late then to be referencing char device pointers, resulting in oopses: -------------------8<--------------------------------------- PID: 6162 TASK: ffff8801147ad500 CPU: 0 COMMAND: "cat" #0 [ffff88011b9d5a90] machine_kexec at ffffffff8103232b #1 [ffff88011b9d5af0] crash_kexec at ffffffff810b9322 #2 [ffff88011b9d5bc0] oops_end at ffffffff814f4a50 #3 [ffff88011b9d5bf0] die at ffffffff8100f26b #4 [ffff88011b9d5c20] do_general_protection at ffffffff814f45e2 #5 [ffff88011b9d5c50] general_protection at ffffffff814f3db5 [exception RIP: strlen+2] RIP: ffffffff81272ae2 RSP: ffff88011b9d5d00 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880118901c18 RCX: 0000000000000000 RDX: ffff88011799982c RSI: 00000000000000d0 RDI: 3a303030302f3030 RBP: ffff88011b9d5d38 R8: 0000000000000006 R9: ffffffffa0134500 R10: 0000000000001000 R11: 0000000000001000 R12: ffff880117a1cc10 R13: 00000000000000d0 R14: 0000000000000017 R15: ffffffff81aff700 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffff88011b9d5d00] kobject_get_path at ffffffff8126dc5d #7 [ffff88011b9d5d40] kobject_uevent_env at ffffffff8126e551 #8 [ffff88011b9d5dd0] kobject_uevent at ffffffff8126e9eb #9 [ffff88011b9d5de0] device_del at ffffffff813440c7 -------------------8<--------------------------------------- So clean up when we have all the context, and all that's left to do when the references to the port have dropped is to free up the port struct itself. CC: <stable@vger.kernel.org> Reported-by: chayang <chayang@redhat.com> Reported-by: YOGANANTH SUBRAMANIAN <anantyog@in.ibm.com> Reported-by: FuXiangChun <xfu@redhat.com> Reported-by: Qunfang Zhang <qzhang@redhat.com> Reported-by: Sibiao Luo <sluo@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
koenkooi
pushed a commit
that referenced
this pull request
Sep 6, 2013
In several places, this snippet is used when removing neigh entries: list_del(&neigh->list); ipoib_neigh_free(neigh); The list_del() removes neigh from the associated struct ipoib_path, while ipoib_neigh_free() removes neigh from the device's neigh entry lookup table. Both of these operations are protected by the priv->lock spinlock. The table however is also protected via RCU, and so naturally the lock is not held when doing reads. This leads to a race condition, in which a thread may successfully look up a neigh entry that has already been deleted from neigh->list. Since the previous deletion will have marked the entry with poison, a second list_del() on the object will cause a panic: #5 [ffff8802338c3c70] general_protection at ffffffff815108c5 [exception RIP: list_del+16] RIP: ffffffff81289020 RSP: ffff8802338c3d20 RFLAGS: 00010082 RAX: dead000000200200 RBX: ffff880433e60c88 RCX: 0000000000009e6c RDX: 0000000000000246 RSI: ffff8806012ca298 RDI: ffff880433e60c88 RBP: ffff8802338c3d30 R8: ffff8806012ca2e8 R9: 00000000ffffffff R10: 0000000000000001 R11: 0000000000000000 R12: ffff8804346b2020 R13: ffff88032a3e7540 R14: ffff8804346b26e0 R15: 0000000000000246 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib] #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm] #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm] #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10 torvalds#10 [ffff8802338c3ee8] kthread at ffffffff81096c66 torvalds#11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca We move the list_del() into ipoib_neigh_free(), so that deletion happens only once, after the entry has been successfully removed from the lookup table. This same behavior is already used in ipoib_del_neighs_by_gid() and __ipoib_reap_neigh(). Signed-off-by: Jim Foraker <foraker1@llnl.gov> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
koenkooi
pushed a commit
that referenced
this pull request
Dec 10, 2015
It turns out that a PV domU also requires the "Xen PV" APIC driver. Otherwise, the flat driver is used and we get stuck in busy loops that never exit, such as in this stack trace: (gdb) target remote localhost:9999 Remote debugging using localhost:9999 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56 56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY) (gdb) bt #0 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56 #1 __default_send_IPI_shortcut (shortcut=<optimized out>, dest=<optimized out>, vector=<optimized out>) at ./arch/x86/include/asm/ipi.h:75 #2 apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54 #3 0xffffffff81011336 in arch_irq_work_raise () at arch/x86/kernel/irq_work.c:47 #4 0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at kernel/irq_work.c:100 #5 0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633 #6 0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>, fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:1778 #7 0xffffffff816010c8 in printk (fmt=<optimized out>) at kernel/printk/printk.c:1868 #8 0xffffffffc00013ea in ?? () #9 0x0000000000000000 in ?? () Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
koenkooi
pushed a commit
that referenced
this pull request
Dec 10, 2015
A recent change to the cpu_cooling code introduced a AB-BA deadlock scenario between the cpufreq_policy_notifier_list rwsem and the cooling_cpufreq_lock. This is caused by cooling_cpufreq_lock being held before the registration/removal of the notifier block (an operation which takes the rwsem), and the notifier code itself which takes the locks in the reverse order: ====================================================== [ INFO: possible circular locking dependency detected ] 3.18.0+ #1453 Not tainted ------------------------------------------------------- rc.local/770 is trying to acquire lock: (cooling_cpufreq_lock){+.+.+.}, at: [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc but task is already holding lock: ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>] __blocking_notifier_call_chain+0x34/0x68 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((cpufreq_policy_notifier_list).rwsem){++++.+}: [<c06bc3b0>] down_write+0x44/0x9c [<c0043444>] blocking_notifier_chain_register+0x28/0xd8 [<c04ad610>] cpufreq_register_notifier+0x68/0x90 [<c04abe4c>] __cpufreq_cooling_register.part.1+0x120/0x180 [<c04abf44>] __cpufreq_cooling_register+0x98/0xa4 [<c04abf8c>] cpufreq_cooling_register+0x18/0x1c [<bf0046f8>] imx_thermal_probe+0x1c0/0x470 [imx_thermal] [<c037cef8>] platform_drv_probe+0x50/0xac [<c037b710>] driver_probe_device+0x114/0x234 [<c037b8cc>] __driver_attach+0x9c/0xa0 [<c0379d68>] bus_for_each_dev+0x5c/0x90 [<c037b204>] driver_attach+0x24/0x28 [<c037ae7c>] bus_add_driver+0xe0/0x1d8 [<c037c0cc>] driver_register+0x80/0xfc [<c037cd80>] __platform_driver_register+0x50/0x64 [<bf007018>] 0xbf007018 [<c0008a5c>] do_one_initcall+0x88/0x1d8 [<c0095da4>] load_module+0x1768/0x1ef8 [<c0096614>] SyS_init_module+0xe0/0xf4 [<c000ec00>] ret_fast_syscall+0x0/0x48 -> #0 (cooling_cpufreq_lock){+.+.+.}: [<c00619f8>] lock_acquire+0xb0/0x124 [<c06ba3b4>] mutex_lock_nested+0x5c/0x3d8 [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc [<c0042bf4>] notifier_call_chain+0x4c/0x8c [<c0042f20>] __blocking_notifier_call_chain+0x50/0x68 [<c0042f58>] blocking_notifier_call_chain+0x20/0x28 [<c04ae62c>] cpufreq_set_policy+0x7c/0x1d0 [<c04af3cc>] store_scaling_governor+0x74/0x9c [<c04ad418>] store+0x90/0xc0 [<c0175384>] sysfs_kf_write+0x54/0x58 [<c01746b4>] kernfs_fop_write+0xdc/0x190 [<c010dcc0>] vfs_write+0xac/0x1b4 [<c010dfec>] SyS_write+0x44/0x90 [<c000ec00>] ret_fast_syscall+0x0/0x48 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((cpufreq_policy_notifier_list).rwsem); lock(cooling_cpufreq_lock); lock((cpufreq_policy_notifier_list).rwsem); lock(cooling_cpufreq_lock); *** DEADLOCK *** 7 locks held by rc.local/770: #0: (sb_writers#6){.+.+.+}, at: [<c010dda0>] vfs_write+0x18c/0x1b4 #1: (&of->mutex){+.+.+.}, at: [<c0174678>] kernfs_fop_write+0xa0/0x190 #2: (s_active#52){.+.+.+}, at: [<c0174680>] kernfs_fop_write+0xa8/0x190 #3: (cpu_hotplug.lock){++++++}, at: [<c0026a60>] get_online_cpus+0x34/0x90 #4: (cpufreq_rwsem){.+.+.+}, at: [<c04ad3e0>] store+0x58/0xc0 #5: (&policy->rwsem){+.+.+.}, at: [<c04ad3f8>] store+0x70/0xc0 #6: ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>] __blocking_notifier_call_chain+0x34/0x68 stack backtrace: CPU: 0 PID: 770 Comm: rc.local Not tainted 3.18.0+ #1453 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Backtrace: [<c00121c8>] (dump_backtrace) from [<c0012360>] (show_stack+0x18/0x1c) r6:c0b85a80 r5:c0b75630 r4:00000000 r3:00000000 [<c0012348>] (show_stack) from [<c06b6c48>] (dump_stack+0x7c/0x98) [<c06b6bcc>] (dump_stack) from [<c06b42a4>] (print_circular_bug+0x28c/0x2d8) r4:c0b85a80 r3:d0071d40 [<c06b4018>] (print_circular_bug) from [<c00613b0>] (__lock_acquire+0x1acc/0x1bb0) r10:c0b50660 r8:c09e6d80 r7:d0071d40 r6:c11d0f0c r5:00000007 r4:d0072240 [<c005f8e4>] (__lock_acquire) from [<c00619f8>] (lock_acquire+0xb0/0x124) r10:00000000 r9:c04abfc4 r8:00000000 r7:00000000 r6:00000000 r5:c0a06f0c r4:00000000 [<c0061948>] (lock_acquire) from [<c06ba3b4>] (mutex_lock_nested+0x5c/0x3d8) r10:ec853800 r9:c0a06ed4 r8:d0071d40 r7:c0a06ed4 r6:c11d0f0c r5:00000000 r4:c04abfc4 [<c06ba358>] (mutex_lock_nested) from [<c04abfc4>] (cpufreq_thermal_notifier+0x34/0xfc) r10:ec853800 r9:ec85380c r8:d00d7d3c r7:c0a06ed4 r6:d00d7d3c r5:00000000 r4:fffffffe [<c04abf90>] (cpufreq_thermal_notifier) from [<c0042bf4>] (notifier_call_chain+0x4c/0x8c) r7:00000000 r6:00000000 r5:00000000 r4:fffffffe [<c0042ba8>] (notifier_call_chain) from [<c0042f20>] (__blocking_notifier_call_chain+0x50/0x68) r8:c0a072a4 r7:00000000 r6:d00d7d3c r5:ffffffff r4:c0a06fc8 r3:ffffffff [<c0042ed0>] (__blocking_notifier_call_chain) from [<c0042f58>] (blocking_notifier_call_chain+0x20/0x28) r7:ec98b540 r6:c13ebc80 r5:ed76e600 r4:d00d7d3c [<c0042f38>] (blocking_notifier_call_chain) from [<c04ae62c>] (cpufreq_set_policy+0x7c/0x1d0) [<c04ae5b0>] (cpufreq_set_policy) from [<c04af3cc>] (store_scaling_governor+0x74/0x9c) r7:ec98b540 r6:0000000c r5:ec98b540 r4:ed76e600 [<c04af358>] (store_scaling_governor) from [<c04ad418>] (store+0x90/0xc0) r6:0000000c r5:ed76e6d4 r4:ed76e600 [<c04ad388>] (store) from [<c0175384>] (sysfs_kf_write+0x54/0x58) r8:0000000c r7:d00d7f78 r6:ec98b540 r5:0000000c r4:ec853800 r3:0000000c [<c0175330>] (sysfs_kf_write) from [<c01746b4>] (kernfs_fop_write+0xdc/0x190) r6:ec98b540 r5:00000000 r4:00000000 r3:c0175330 [<c01745d8>] (kernfs_fop_write) from [<c010dcc0>] (vfs_write+0xac/0x1b4) r10:0162aa70 r9:d00d6000 r8:0000000c r7:d00d7f78 r6:0162aa70 r5:0000000c r4:eccde500 [<c010dc14>] (vfs_write) from [<c010dfec>] (SyS_write+0x44/0x90) r10:0162aa70 r8:0000000c r7:eccde500 r6:eccde500 r5:00000000 r4:00000000 [<c010dfa8>] (SyS_write) from [<c000ec00>] (ret_fast_syscall+0x0/0x48) r10:00000000 r8:c000edc4 r7:00000004 r6:000216cc r5:0000000c r4:0162aa70 Solve this by moving to finer grained locking - use one mutex to protect the cpufreq_dev_list as a whole, and a separate lock to ensure correct ordering of cpufreq notifier registration and removal. cooling_list_lock is taken within cooling_cpufreq_lock on (un)registration to preserve the behavior of the code, i.e. to atomically add/remove to the list and (un)register the notifier. Fixes: 2dcd851 ("thermal: cpu_cooling: Update always cpufreq policy with Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
koenkooi
pushed a commit
that referenced
this pull request
Dec 10, 2015
commit 1ab3638 upstream. Not all gpio banks are necessarily enabled, in the current code this can lead to null pointer dereferences. [ 51.130000] Unable to handle kernel NULL pointer dereference at virtual address 00000058 [ 51.130000] pgd = dee04000 [ 51.130000] [00000058] *pgd=3f66d831, *pte=00000000, *ppte=00000000 [ 51.140000] Internal error: Oops: 17 [#1] ARM [ 51.140000] Modules linked in: [ 51.140000] CPU: 0 PID: 1664 Comm: cat Not tainted 4.1.1+ #6 [ 51.140000] Hardware name: Atmel SAMA5 [ 51.140000] task: df6dd880 ti: dec60000 task.ti: dec60000 [ 51.140000] PC is at at91_pinconf_get+0xb4/0x200 [ 51.140000] LR is at at91_pinconf_get+0xb4/0x200 [ 51.140000] pc : [<c01e71a0>] lr : [<c01e71a0>] psr: 600f0013 sp : dec61e48 ip : 600f0013 fp : df522538 [ 51.140000] r10: df52250c r9 : 00000058 r8 : 00000068 [ 51.140000] r7 : 00000000 r6 : df53c910 r5 : 00000000 r4 : dec61e7c [ 51.140000] r3 : 00000000 r2 : c06746d4 r1 : 00000000 r0 : 00000003 [ 51.140000] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 51.140000] Control: 10c53c7d Table: 3ee04059 DAC: 00000015 [ 51.140000] Process cat (pid: 1664, stack limit = 0xdec60208) [ 51.140000] Stack: (0xdec61e48 to 0xdec62000) [ 51.140000] 1e40: 00000358 00000000 df522500 ded15f80 c05a9d08 ded15f80 [ 51.140000] 1e60: 0000048c 00000061 df522500 ded15f80 c05a9d08 c01e7304 ded15f80 00000000 [ 51.140000] 1e80: c01e6008 00000060 0000048c c01e6034 c01e5f6c ded15f80 dec61ec 00000000 [ 51.140000] 1ea0: 00020000 ded6f280 dec61f80 00000001 00000001 c00ae0b8 b6e80000 ded15fb0 [ 51.140000] 1ec0: 00000000 00000000 df4bc974 00000055 00000800 ded6f280 b6e80000 ded6f280 [ 51.140000] 1ee0: ded6f280 00020000 b6e80000 00000000 00020000 c0090dec c0671e1c dec61fb0 [ 51.140000] 1f00: b6f8b510 00000001 00004201 c000924c 00000000 00000003 00000003 00000000 [ 51.140000] 1f20: df4bc940 00022000 00000022 c066e188 b6e7f000 c00836f4 000b6e7f ded6f280 [ 51.140000] 1f40: ded6f280 b6e80000 dec61f80 ded6f280 00020000 c0091508 00000000 00000003 [ 51.140000] 1f60: 00022000 00000000 00000000 ded6f280 ded6f280 00020000 b6e80000 c0091d9c [ 51.140000] 1f80: 00000000 00000000 ffffffff 00020000 00020000 b6e80000 00000003 c000f124 [ 51.140000] 1fa0: dec60000 c000efa0 00020000 00020000 00000003 b6e80000 00020000 000271c4 [ 51.140000] 1fc0: 00020000 00020000 b6e80000 00000003 7fffe000 00000000 00000000 00020000 [ 51.140000] 1fe0: 00000000 bef50b64 00013835 b6f29c76 400f0030 00000003 00000000 00000000 [ 51.140000] [<c01e71a0>] (at91_pinconf_get) from [<c01e7304>] (at91_pinconf_dbg_show+0x18/0x2c0) [ 51.140000] [<c01e7304>] (at91_pinconf_dbg_show) from [<c01e6034>] (pinconf_pins_show+0xc8/0xf8) [ 51.140000] [<c01e6034>] (pinconf_pins_show) from [<c00ae0b8>] (seq_read+0x1a0/0x464) [ 51.140000] [<c00ae0b8>] (seq_read) from [<c0090dec>] (__vfs_read+0x20/0xd0) [ 51.140000] [<c0090dec>] (__vfs_read) from [<c0091508>] (vfs_read+0x7c/0x108) [ 51.140000] [<c0091508>] (vfs_read) from [<c0091d9c>] (SyS_read+0x40/0x94) [ 51.140000] [<c0091d9c>] (SyS_read) from [<c000efa0>] (ret_fast_syscall+0x0/0x3c) [ 51.140000] Code: eb010ec2 e30a0d08 e34c005a eb0ae5a7 (e5993000) [ 51.150000] ---[ end trace fb3c370da3ea4794 ]--- Fixes: a0b957f ("pinctrl: at91: allow to have disabled gpio bank") Signed-off-by: David Dueck <davidcdueck@googlemail.com> Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Boris Brezillon <boris.brezillon@free-electrons.com> Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Dec 10, 2015
commit 1c2cb59 upstream. The EPOW interrupt handler uses rtas_get_sensor(), which in turn uses rtas_busy_delay() to wait for RTAS becoming ready in case it is necessary. But rtas_busy_delay() is annotated with might_sleep() and thus may not be used by interrupts handlers like the EPOW handler! This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is enabled: BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6 Call Trace: [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable) [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180 [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0 [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0 [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450 [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300 [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0 [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260 [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80 [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200 [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24 [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110 [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180 Fix this issue by introducing a new rtas_get_sensor_fast() function that does not use rtas_busy_delay() - and thus can only be used for sensors that do not cause a BUSY condition - known as "fast" sensors. The EPOW sensor is defined to be "fast" in sPAPR - mpe. Fixes: 587f83e ("powerpc/pseries: Use rtas_get_sensor in RAS code") Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Dec 10, 2015
commit e81107d upstream. My colleague ran into a program stall on a x86_64 server, where n_tty_read() was waiting for data even if there was data in the buffer in the pty. kernel stack for the stuck process looks like below. #0 [ffff88303d107b58] __schedule at ffffffff815c4b20 #1 [ffff88303d107bd0] schedule at ffffffff815c513e #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818 #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2 #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23 #5 [ffff88303d107dd0] tty_read at ffffffff81368013 #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704 #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57 #8 [ffff88303d107f00] sys_read at ffffffff811a4306 #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7 There seems to be two problems causing this issue. First, in drivers/tty/n_tty.c, __receive_buf() stores the data and updates ldata->commit_head using smp_store_release() and then checks the wait queue using waitqueue_active(). However, since there is no memory barrier, __receive_buf() could return without calling wake_up_interactive_poll(), and at the same time, n_tty_read() could start to wait in wait_woken() as in the following chart. __receive_buf() n_tty_read() ------------------------------------------------------------------------ if (waitqueue_active(&tty->read_wait)) /* Memory operations issued after the RELEASE may be completed before the RELEASE operation has completed */ add_wait_queue(&tty->read_wait, &wait); ... if (!input_available_p(tty, 0)) { smp_store_release(&ldata->commit_head, ldata->read_head); ... timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout); ------------------------------------------------------------------------ The second problem is that n_tty_read() also lacks a memory barrier call and could also cause __receive_buf() to return without calling wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken() as in the chart below. __receive_buf() n_tty_read() ------------------------------------------------------------------------ spin_lock_irqsave(&q->lock, flags); /* from add_wait_queue() */ ... if (!input_available_p(tty, 0)) { /* Memory operations issued after the RELEASE may be completed before the RELEASE operation has completed */ smp_store_release(&ldata->commit_head, ldata->read_head); if (waitqueue_active(&tty->read_wait)) __add_wait_queue(q, wait); spin_unlock_irqrestore(&q->lock,flags); /* from add_wait_queue() */ ... timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout); ------------------------------------------------------------------------ There are also other places in drivers/tty/n_tty.c which have similar calls to waitqueue_active(), so instead of adding many memory barrier calls, this patch simply removes the call to waitqueue_active(), leaving just wake_up*() behind. This fixes both problems because, even though the memory access before or after the spinlocks in both wake_up*() and add_wait_queue() can sneak into the critical section, it cannot go past it and the critical section assures that they will be serialized (please see "INTER-CPU ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a better explanation). Moreover, the resulting code is much simpler. Latency measurement using a ping-pong test over a pty doesn't show any visible performance drop. Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Jul 6, 2017
The register_vlan_device would invoke free_netdev directly, when register_vlan_dev failed. It would trigger the BUG_ON in free_netdev if the dev was already registered. In this case, the netdev would be freed in netdev_run_todo later. So add one condition check now. Only when dev is not registered, then free it directly. The following is the part coredump when netdev_upper_dev_link failed in register_vlan_dev. I removed the lines which are too long. [ 411.237457] ------------[ cut here ]------------ [ 411.237458] kernel BUG at net/core/dev.c:7998! [ 411.237484] invalid opcode: 0000 [#1] SMP [ 411.237705] [last unloaded: 8021q] [ 411.237718] CPU: 1 PID: 12845 Comm: vconfig Tainted: G E 4.12.0-rc5+ #6 [ 411.237737] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 411.237764] task: ffff9cbeb6685580 task.stack: ffffa7d2807d8000 [ 411.237782] RIP: 0010:free_netdev+0x116/0x120 [ 411.237794] RSP: 0018:ffffa7d2807dbdb0 EFLAGS: 00010297 [ 411.237808] RAX: 0000000000000002 RBX: ffff9cbeb6ba8fd8 RCX: 0000000000001878 [ 411.237826] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000000 [ 411.237844] RBP: ffffa7d2807dbdc8 R08: 0002986100029841 R09: 0002982100029801 [ 411.237861] R10: 0004000100029980 R11: 0004000100029980 R12: ffff9cbeb6ba9000 [ 411.238761] R13: ffff9cbeb6ba9060 R14: ffff9cbe60f1a000 R15: ffff9cbeb6ba9000 [ 411.239518] FS: 00007fb690d81700(0000) GS:ffff9cbebb640000(0000) knlGS:0000000000000000 [ 411.239949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 411.240454] CR2: 00007f7115624000 CR3: 0000000077cdf000 CR4: 00000000003406e0 [ 411.240936] Call Trace: [ 411.241462] vlan_ioctl_handler+0x3f1/0x400 [8021q] [ 411.241910] sock_ioctl+0x18b/0x2c0 [ 411.242394] do_vfs_ioctl+0xa1/0x5d0 [ 411.242853] ? sock_alloc_file+0xa6/0x130 [ 411.243465] SyS_ioctl+0x79/0x90 [ 411.243900] entry_SYSCALL_64_fastpath+0x1e/0xa9 [ 411.244425] RIP: 0033:0x7fb69089a357 [ 411.244863] RSP: 002b:00007ffcd04e0fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 411.245445] RAX: ffffffffffffffda RBX: 00007ffcd04e2884 RCX: 00007fb69089a357 [ 411.245903] RDX: 00007ffcd04e0fd0 RSI: 0000000000008983 RDI: 0000000000000003 [ 411.246527] RBP: 00007ffcd04e0fd0 R08: 0000000000000000 R09: 1999999999999999 [ 411.246976] R10: 000000000000053f R11: 0000000000000202 R12: 0000000000000004 [ 411.247414] R13: 00007ffcd04e1128 R14: 00007ffcd04e2888 R15: 0000000000000001 [ 411.249129] RIP: free_netdev+0x116/0x120 RSP: ffffa7d2807dbdb0 Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
koenkooi
pushed a commit
that referenced
this pull request
May 29, 2018
syzbot caught an infinite recursion in nsh_gso_segment(). Problem here is that we need to make sure the NSH header is of reasonable length. BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. depth: 48 max: 48! 48 locks held by syz-executor0/10189: #0: (ptrval) (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x30f/0x34c0 net/core/dev.c:3517 #1: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #1: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #2: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #2: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #3: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #3: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #4: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #4: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #5: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #5: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #6: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #6: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #7: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #7: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #8: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #8: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #9: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #9: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#10: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#10: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#11: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#11: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#12: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#12: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#13: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#13: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#14: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#14: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#15: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#15: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#16: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#16: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#17: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#17: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#18: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#18: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#19: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#19: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#20: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#20: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#21: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#21: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#22: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#22: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#23: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#23: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#24: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#24: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#25: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#25: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#26: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#26: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#27: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#27: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#28: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#28: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#29: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#29: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#30: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#30: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#31: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#31: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 dccp_close: ABORT with 65423 bytes unread torvalds#32: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#32: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#33: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#33: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#34: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#34: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#35: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#35: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#36: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#36: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#37: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#37: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#38: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#38: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#39: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#39: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#40: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#40: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#41: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#41: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#42: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#42: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#43: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#43: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#44: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#44: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#45: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#45: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#46: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#46: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#47: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#47: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 INFO: lockdep is turned off. CPU: 1 PID: 10189 Comm: syz-executor0 Not tainted 4.17.0-rc2+ torvalds#26 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 __lock_acquire+0x1788/0x5140 kernel/locking/lockdep.c:3449 lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920 rcu_lock_acquire include/linux/rcupdate.h:246 [inline] rcu_read_lock include/linux/rcupdate.h:632 [inline] skb_mac_gso_segment+0x25b/0x720 net/core/dev.c:2789 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 __skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865 skb_gso_segment include/linux/netdevice.h:4025 [inline] validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3118 validate_xmit_skb_list+0xbf/0x120 net/core/dev.c:3168 sch_direct_xmit+0x354/0x11e0 net/sched/sch_generic.c:312 qdisc_restart net/sched/sch_generic.c:399 [inline] __qdisc_run+0x741/0x1af0 net/sched/sch_generic.c:410 __dev_xmit_skb net/core/dev.c:3243 [inline] __dev_queue_xmit+0x28ea/0x34c0 net/core/dev.c:3551 dev_queue_xmit+0x17/0x20 net/core/dev.c:3616 packet_snd net/packet/af_packet.c:2951 [inline] packet_sendmsg+0x40f8/0x6070 net/packet/af_packet.c:2976 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:639 __sys_sendto+0x3d7/0x670 net/socket.c:1789 __do_sys_sendto net/socket.c:1801 [inline] __se_sys_sendto net/socket.c:1797 [inline] __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: c411ed8 ("nsh: add GSO support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Benc <jbenc@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
koenkooi
pushed a commit
that referenced
this pull request
May 29, 2018
Petr Machata says: ==================== net: ip6_gre: Fixes in headroom handling This series mends some problems in headroom management in ip6_gre module. The current code base has the following three closely-related problems: - ip6gretap tunnels neglect to ensure there's enough writable headroom before pushing GRE headers. - ip6erspan does this, but assumes that dev->needed_headroom is primed. But that doesn't happen until ip6_tnl_xmit() is called later. Thus for the first packet, ip6erspan actually behaves like ip6gretap above. - ip6erspan shares some of the code with ip6gretap, including calculations of needed header length. While there is custom ERSPAN-specific code for calculating the headroom, the computed values are overwritten by the ip6gretap code. The first two issues lead to a kernel panic in situations where a packet is mirrored from a veth device to the device in question. They are fixed, respectively, in patches #1 and #2, which include the full panic trace and a reproducer. The rest of the patchset deals with the last issue. In patches #3 to #6, several functions are split up into reusable parts. Finally in patch #7 these blocks are used to compose ERSPAN-specific callbacks where necessary to fix the hlen calculation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
koenkooi
pushed a commit
that referenced
this pull request
Jun 6, 2018
commit 9f0a93d upstream. When the module is removed the led workqueue is destroyed in the remove callback, before the led device is unregistered from the led subsystem. This leads to a NULL pointer derefence when the led device is unregistered automatically later as part of the module removal cleanup. Bellow is the backtrace showing the problem. BUG: unable to handle kernel NULL pointer dereference at (null) IP: __queue_work+0x8c/0x410 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI Modules linked in: ccm edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 joydev crypto_simd asus_nb_wmi glue_helper uvcvideo snd_hda_codec_conexant snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel asus_wmi snd_hda_codec cryptd snd_hda_core sparse_keymap videobuf2_vmalloc arc4 videobuf2_memops snd_hwdep input_leds videobuf2_v4l2 ath9k psmouse videobuf2_core videodev ath9k_common snd_pcm ath9k_hw media fam15h_power ath k10temp snd_timer mac80211 i2c_piix4 r8169 mii mac_hid cfg80211 asus_wireless(-) snd soundcore wmi shpchp 8250_dw ip_tables x_tables amdkfd amd_iommu_v2 amdgpu radeon chash i2c_algo_bit drm_kms_helper syscopyarea serio_raw sysfillrect sysimgblt fb_sys_fops ahci ttm libahci drm video CPU: 3 PID: 2177 Comm: rmmod Not tainted 4.15.0-5-generic #6+dev94.b4287e5bem1-Endless Hardware name: ASUSTeK COMPUTER INC. X555DG/X555DG, BIOS 5.011 05/05/2015 RIP: 0010:__queue_work+0x8c/0x410 RSP: 0018:ffffbe8cc249fcd8 EFLAGS: 00010086 RAX: ffff992ac6810800 RBX: 0000000000000000 RCX: 0000000000000008 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff992ac6400e18 RBP: ffffbe8cc249fd18 R08: ffff992ac6400db0 R09: 0000000000000000 R10: 0000000000000040 R11: ffff992ac6400dd8 R12: 0000000000002000 R13: ffff992abd762e00 R14: ffff992abd763e38 R15: 000000000001ebe0 FS: 00007f318203e700(0000) GS:ffff992aced80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001c720e000 CR4: 00000000001406e0 Call Trace: queue_work_on+0x38/0x40 led_state_set+0x2c/0x40 [asus_wireless] led_set_brightness_nopm+0x14/0x40 led_set_brightness+0x37/0x60 led_trigger_set+0xfc/0x1d0 led_classdev_unregister+0x32/0xd0 devm_led_classdev_release+0x11/0x20 release_nodes+0x109/0x1f0 devres_release_all+0x3c/0x50 device_release_driver_internal+0x16d/0x220 driver_detach+0x3f/0x80 bus_remove_driver+0x55/0xd0 driver_unregister+0x2c/0x40 acpi_bus_unregister_driver+0x15/0x20 asus_wireless_driver_exit+0x10/0xb7c [asus_wireless] SyS_delete_module+0x1da/0x2b0 entry_SYSCALL_64_fastpath+0x24/0x87 RIP: 0033:0x7f3181b65fd7 RSP: 002b:00007ffe74bcbe18 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3181b65fd7 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000555ea2559258 RBP: 0000555ea25591f0 R08: 00007ffe74bcad91 R09: 000000000000000a R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000003 R13: 00007ffe74bcae00 R14: 0000000000000000 R15: 0000555ea25591f0 Code: 01 00 00 02 0f 85 7d 01 00 00 48 63 45 d4 48 c7 c6 00 f4 fa 87 49 8b 9d 08 01 00 00 48 03 1c c6 4c 89 f7 e8 87 fb ff ff 48 85 c0 <48> 8b 3b 0f 84 c5 01 00 00 48 39 f8 0f 84 bc 01 00 00 48 89 c7 RIP: __queue_work+0x8c/0x410 RSP: ffffbe8cc249fcd8 CR2: 0000000000000000 ---[ end trace 7aa4f4a232e9c39c ]--- Unregistering the led device on the remove callback before destroying the workqueue avoids this problem. https://bugzilla.kernel.org/show_bug.cgi?id=196097 Reported-by: Dun Hum <bitter.taste@gmx.com> Cc: stable@vger.kernel.org Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Jun 6, 2018
[ Upstream commit af50e4b ] syzbot caught an infinite recursion in nsh_gso_segment(). Problem here is that we need to make sure the NSH header is of reasonable length. BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. depth: 48 max: 48! 48 locks held by syz-executor0/10189: #0: (ptrval) (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x30f/0x34c0 net/core/dev.c:3517 #1: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #1: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #2: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #2: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #3: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #3: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #4: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #4: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #5: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #5: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #6: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #6: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #7: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #7: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #8: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #8: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #9: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #9: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#10: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#10: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#11: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#11: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#12: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#12: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#13: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#13: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#14: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#14: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#15: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#15: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#16: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#16: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#17: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#17: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#18: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#18: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#19: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#19: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#20: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#20: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#21: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#21: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#22: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#22: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#23: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#23: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#24: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#24: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#25: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#25: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#26: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#26: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#27: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#27: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#28: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#28: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#29: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#29: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#30: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#30: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#31: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#31: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 dccp_close: ABORT with 65423 bytes unread torvalds#32: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#32: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#33: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#33: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#34: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#34: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#35: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#35: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#36: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#36: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#37: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#37: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#38: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#38: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#39: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#39: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#40: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#40: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#41: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#41: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#42: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#42: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#43: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#43: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#44: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#44: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#45: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#45: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#46: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#46: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 torvalds#47: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] torvalds#47: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 INFO: lockdep is turned off. CPU: 1 PID: 10189 Comm: syz-executor0 Not tainted 4.17.0-rc2+ torvalds#26 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 __lock_acquire+0x1788/0x5140 kernel/locking/lockdep.c:3449 lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920 rcu_lock_acquire include/linux/rcupdate.h:246 [inline] rcu_read_lock include/linux/rcupdate.h:632 [inline] skb_mac_gso_segment+0x25b/0x720 net/core/dev.c:2789 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 __skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865 skb_gso_segment include/linux/netdevice.h:4025 [inline] validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3118 validate_xmit_skb_list+0xbf/0x120 net/core/dev.c:3168 sch_direct_xmit+0x354/0x11e0 net/sched/sch_generic.c:312 qdisc_restart net/sched/sch_generic.c:399 [inline] __qdisc_run+0x741/0x1af0 net/sched/sch_generic.c:410 __dev_xmit_skb net/core/dev.c:3243 [inline] __dev_queue_xmit+0x28ea/0x34c0 net/core/dev.c:3551 dev_queue_xmit+0x17/0x20 net/core/dev.c:3616 packet_snd net/packet/af_packet.c:2951 [inline] packet_sendmsg+0x40f8/0x6070 net/packet/af_packet.c:2976 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:639 __sys_sendto+0x3d7/0x670 net/socket.c:1789 __do_sys_sendto net/socket.c:1801 [inline] __se_sys_sendto net/socket.c:1797 [inline] __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: c411ed8 ("nsh: add GSO support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Benc <jbenc@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Jun 6, 2018
[ Upstream commit 626118b ] In the current implementation, `rmmod snd_bcm2835` does not release resources properly. It causes an oops when trying to list sound devices. This commit fixes it. The details WRT allocation / free are described below. Device structure WRT allocation: pdev \childdev[] \card \chip \pcm \ctl Allocation / register sequence: * childdev: devm_kzalloc - freed during driver detach * childdev: device_initialize - freed during device_unregister * pdev: devres_alloc - freed during driver detach * childdev: device_add - removed during device_unregister * pdev, childdev: devres_add - freed during driver detach * card: snd_card_new - freed during snd_card_free * chip: kzalloc - freed during kfree * card, chip: snd_device_new - freed during snd_device_free * chip: new_pcm - TODO: free pcm * chip: new_ctl - TODO: free ctl * card: snd_card_register - unregistered during snd_card_free Free / unregister sequence: * card: snd_card_free * card, chip: snd_device_free * childdev: device_unregister * chip: kfree Steps to reproduce the issue before this commit: ~~~~ $ rmmod snd_bcm2835 $ aplay -L [ 138.648130] Unable to handle kernel paging request at virtual address 7f1343c0 [ 138.660415] pgd = ad8f0000 [ 138.665567] [7f1343c0] *pgd=3864c811, *pte=00000000, *ppte=00000000 [ 138.674887] Internal error: Oops: 7 [#1] SMP ARM [ 138.683571] Modules linked in: sha256_generic cfg80211 rfkill snd_pcm snd_timer snd fixed uio_pdrv_genirq uio ip_tables x_tables ipv6 [last unloaded: snd_bcm2835 ] [ 138.706594] CPU: 3 PID: 463 Comm: aplay Tainted: G WC 4.15.0-rc1-v 7+ #6 [ 138.719833] Hardware name: BCM2835 [ 138.726016] task: b877ac00 task.stack: aebec000 [ 138.733408] PC is at try_module_get+0x38/0x24c [ 138.740813] LR is at snd_ctl_open+0x58/0x194 [snd] [ 138.748485] pc : [<801c4d5c>] lr : [<7f0e6b2c>] psr: 20000013 [ 138.757709] sp : aebedd60 ip : aebedd88 fp : aebedd84 [ 138.765884] r10: 00000000 r9 : 00000004 r8 : 7f0ed440 [ 138.774040] r7 : b7e469b0 r6 : 7f0e6b2c r5 : afd91900 r4 : 7f1343c0 [ 138.783571] r3 : aebec000 r2 : 00000001 r1 : b877ac00 r0 : 7f1343c0 [ 138.793084] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 138.803300] Control: 10c5387d Table: 2d8f006a DAC: 00000055 [ 138.812064] Process aplay (pid: 463, stack limit = 0xaebec210) [ 138.820868] Stack: (0xaebedd60 to 0xaebee000) [ 138.828207] dd60: 00000000 b848d000 afd91900 00000000 b7e469b0 7f0ed440 aebedda4 aebedd88 [ 138.842371] dd80: 7f0e6b2c 801c4d30 afd91900 7f0ea4dc 00000000 b7e469b0 aebeddcc aebedda8 [ 138.856611] dda0: 7f0e250c 7f0e6ae0 7f0e2464 b8478ec0 b7e469b0 afd91900 7f0ea388 00000000 [ 138.870864] ddc0: aebeddf4 aebeddd0 802ce590 7f0e2470 8090ab64 afd91900 afd91900 b7e469b0 [ 138.885301] dde0: afd91908 802ce4e4 aebede1c aebeddf8 802c57b4 802ce4f0 afd91900 aebedea8 [ 138.900110] de00: b7fa4c00 00000000 00000000 00000004 aebede3c aebede20 802c6ba8 802c56b4 [ 138.915260] de20: aebedea8 00000000 aebedf5c 00000000 aebedea4 aebede40 802d9a68 802c6b58 [ 138.930661] de40: b874ddd0 00000000 00000000 00000001 00000041 00000000 afd91900 aebede70 [ 138.946402] de60: 00000000 00000000 00000002 b7e469b0 b8a87610 b8d6ab80 801852f8 00080000 [ 138.962314] de80: aebedf5c aebedea8 00000001 80108464 aebec000 00000000 aebedf4c aebedea8 [ 138.978414] dea0: 802dacd4 802d970c b8a87610 b8d6ab80 a7982bc6 00000009 af363019 b9231480 [ 138.994617] dec0: 00000000 b8c038a0 b7e469b0 00000101 00000002 00000238 00000000 00000000 [ 139.010823] dee0: 00000000 aebedee8 00080000 0000000f aebedf3c aebedf00 802ed7e4 80843f94 [ 139.027025] df00: 00000003 00080000 b9231490 b9231480 00000000 00080000 af363000 00000000 [ 139.043229] df20: 00000005 00000002 ffffff9c 00000000 00080000 ffffff9c af363000 00000003 [ 139.059430] df40: aebedf94 aebedf50 802c6f70 802dac70 aebec000 00000000 00000001 00000000 [ 139.075629] df60: 00020000 00000004 00000100 00000001 7ebe577c 0002e038 00000000 00000005 [ 139.091828] df80: 80108464 aebec000 aebedfa4 aebedf98 802c7060 802c6e6c 00000000 aebedfa8 [ 139.108025] dfa0: 801082c0 802c7040 7ebe577c 0002e038 7ebe577c 00080000 00000b98 e81c8400 [ 139.124222] dfc0: 7ebe577c 0002e038 00000000 00000005 7ebe57e4 00a20af8 7ebe57f0 76f87394 [ 139.140419] dfe0: 00000000 7ebe55c4 76ec88e8 76df1d9c 60000010 7ebe577c 00000000 00000000 [ 139.156715] [<801c4d5c>] (try_module_get) from [<7f0e6b2c>] (snd_ctl_open+0x58/0x194 [snd]) [ 139.173222] [<7f0e6b2c>] (snd_ctl_open [snd]) from [<7f0e250c>] (snd_open+0xa8/0x14c [snd]) [ 139.189683] [<7f0e250c>] (snd_open [snd]) from [<802ce590>] (chrdev_open+0xac/0x188) [ 139.205465] [<802ce590>] (chrdev_open) from [<802c57b4>] (do_dentry_open+0x10c/0x314) [ 139.221347] [<802c57b4>] (do_dentry_open) from [<802c6ba8>] (vfs_open+0x5c/0x88) [ 139.236788] [<802c6ba8>] (vfs_open) from [<802d9a68>] (path_openat+0x368/0x944) [ 139.248270] [<802d9a68>] (path_openat) from [<802dacd4>] (do_filp_open+0x70/0xc4) [ 139.263731] [<802dacd4>] (do_filp_open) from [<802c6f70>] (do_sys_open+0x110/0x1d4) [ 139.279378] [<802c6f70>] (do_sys_open) from [<802c7060>] (SyS_open+0x2c/0x30) [ 139.290647] [<802c7060>] (SyS_open) from [<801082c0>] (ret_fast_syscall+0x0/0x28) [ 139.306021] Code: e3c3303f e5932004 e2822001 e5832004 (e5943000) [ 139.316265] ---[ end trace 7f3f7f6193b663ed ]--- [ 139.324956] note: aplay[463] exited with preempt_count 1 ~~~~ Signed-off-by: Kirill Marinushkin <k.marinushkin@gmail.com> Cc: Eric Anholt <eric@anholt.net> Cc: Stefan Wahren <stefan.wahren@i2se.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Ray Jui <rjui@broadcom.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: bcm-kernel-feedback-list@broadcom.com Cc: Michael Zoran <mzoran@crowfest.net> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: linux-rpi-kernel@lists.infradead.org Cc: linux-arm-kernel@lists.infradead.org Cc: devel@driverdev.osuosl.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Jun 6, 2018
[ Upstream commit fca3234 ] Executing command 'perf stat -T -- ls' dumps core on x86 and s390. Here is the call back chain (done on x86): # gdb ./perf .... (gdb) r stat -T -- ls ... Program received signal SIGSEGV, Segmentation fault. 0x00007ffff56d1963 in vasprintf () from /lib64/libc.so.6 (gdb) where #0 0x00007ffff56d1963 in vasprintf () from /lib64/libc.so.6 #1 0x00007ffff56ae484 in asprintf () from /lib64/libc.so.6 #2 0x00000000004f1982 in __parse_events_add_pmu (parse_state=0x7fffffffd580, list=0xbfb970, name=0xbf3ef0 "cpu", head_config=0xbfb930, auto_merge_stats=false) at util/parse-events.c:1233 #3 0x00000000004f1c8e in parse_events_add_pmu (parse_state=0x7fffffffd580, list=0xbfb970, name=0xbf3ef0 "cpu", head_config=0xbfb930) at util/parse-events.c:1288 #4 0x0000000000537ce3 in parse_events_parse (_parse_state=0x7fffffffd580, scanner=0xbf4210) at util/parse-events.y:234 #5 0x00000000004f2c7a in parse_events__scanner (str=0x6b66c0 "task-clock,{instructions,cycles,cpu/cycles-t/,cpu/tx-start/}", parse_state=0x7fffffffd580, start_token=258) at util/parse-events.c:1673 #6 0x00000000004f2e23 in parse_events (evlist=0xbe9990, str=0x6b66c0 "task-clock,{instructions,cycles,cpu/cycles-t/,cpu/tx-start/}", err=0x0) at util/parse-events.c:1713 #7 0x000000000044e137 in add_default_attributes () at builtin-stat.c:2281 #8 0x000000000044f7b5 in cmd_stat (argc=1, argv=0x7fffffffe3b0) at builtin-stat.c:2828 #9 0x00000000004c8b0f in run_builtin (p=0xab01a0 <commands+288>, argc=4, argv=0x7fffffffe3b0) at perf.c:297 torvalds#10 0x00000000004c8d7c in handle_internal_command (argc=4, argv=0x7fffffffe3b0) at perf.c:349 torvalds#11 0x00000000004c8ece in run_argv (argcp=0x7fffffffe20c, argv=0x7fffffffe200) at perf.c:393 torvalds#12 0x00000000004c929c in main (argc=4, argv=0x7fffffffe3b0) at perf.c:537 (gdb) It turns out that a NULL pointer is referenced. Here are the function calls: ... cmd_stat() +---> add_default_attributes() +---> parse_events(evsel_list, transaction_attrs, NULL); 3rd parameter set to NULL Function parse_events(xx, xx, struct parse_events_error *err) dives into a bison generated scanner and creates parser state information for it first: struct parse_events_state parse_state = { .list = LIST_HEAD_INIT(parse_state.list), .idx = evlist->nr_entries, .error = err, <--- NULL POINTER !!! .evlist = evlist, }; Now various functions inside the bison scanner are called to end up in __parse_events_add_pmu(struct parse_events_state *parse_state, ..) with first parameter being a pointer to above structure definition. Now the PMU event name is not found (because being executed in a VM) and this function tries to create an error message with asprintf(&parse_state->error.str, ....) which references a NULL pointer and dumps core. Fix this by providing a pointer to the necessary error information instead of NULL. Technically only the else part is needed to avoid the core dump, just lets be safe... Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180308145735.64717-1-tmricht@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Jun 6, 2018
[ Upstream commit 2bbea6e ] when mounting an ISO filesystem sometimes (very rarely) the system hangs because of a race condition between two tasks. PID: 6766 TASK: ffff88007b2a6dd0 CPU: 0 COMMAND: "mount" #0 [ffff880078447ae0] __schedule at ffffffff8168d605 #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49 #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995 #3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef #4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod] #5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50 #6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3 #7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs] #8 [ffff880078447da8] mount_bdev at ffffffff81202570 #9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs] torvalds#10 [ffff880078447e28] mount_fs at ffffffff81202d09 torvalds#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f torvalds#12 [ffff880078447ea8] do_mount at ffffffff81220fee torvalds#13 [ffff880078447f28] sys_mount at ffffffff812218d6 torvalds#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49 RIP: 00007fd9ea914e9a RSP: 00007ffd5d9bf648 RFLAGS: 00010246 RAX: 00000000000000a5 RBX: ffffffff81698c49 RCX: 0000000000000010 RDX: 00007fd9ec2bc210 RSI: 00007fd9ec2bc290 RDI: 00007fd9ec2bcf30 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000010 R10: 00000000c0ed0001 R11: 0000000000000206 R12: 00007fd9ec2bc040 R13: 00007fd9eb6b2380 R14: 00007fd9ec2bc210 R15: 00007fd9ec2bcf30 ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b This task was trying to mount the cdrom. It allocated and configured a super_block struct and owned the write-lock for the super_block->s_umount rwsem. While exclusively owning the s_umount lock, it called sr_block_ioctl and waited to acquire the global sr_mutex lock. PID: 6785 TASK: ffff880078720fb0 CPU: 0 COMMAND: "systemd-udevd" #0 [ffff880078417898] __schedule at ffffffff8168d605 #1 [ffff880078417900] schedule at ffffffff8168dc59 #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605 #3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838 #4 [ffff8800784179d0] down_read at ffffffff8168cde0 #5 [ffff8800784179e8] get_super at ffffffff81201cc7 #6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de #7 [ffff880078417a40] flush_disk at ffffffff8123a94b #8 [ffff880078417a88] check_disk_change at ffffffff8123ab50 #9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom] torvalds#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod] torvalds#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86 torvalds#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65 torvalds#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b torvalds#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7 torvalds#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf torvalds#16 [ffff880078417d00] do_last at ffffffff8120d53d torvalds#17 [ffff880078417db0] path_openat at ffffffff8120e6b2 torvalds#18 [ffff880078417e48] do_filp_open at ffffffff8121082b torvalds#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33 torvalds#20 [ffff880078417f70] sys_open at ffffffff811fde4e torvalds#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49 RIP: 00007f29438b0c20 RSP: 00007ffc76624b78 RFLAGS: 00010246 RAX: 0000000000000002 RBX: ffffffff81698c49 RCX: 0000000000000000 RDX: 00007f2944a5fa70 RSI: 00000000000a0800 RDI: 00007f2944a5fa70 RBP: 00007f2944a5f540 R8: 0000000000000000 R9: 0000000000000020 R10: 00007f2943614c40 R11: 0000000000000246 R12: ffffffff811fde4e R13: ffff880078417f78 R14: 000000000000000c R15: 00007f2944a4b010 ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b This task tried to open the cdrom device, the sr_block_open function acquired the global sr_mutex lock. The call to check_disk_change() then saw an event flag indicating a possible media change and tried to flush any cached data for the device. As part of the flush, it tried to acquire the super_block->s_umount lock associated with the cdrom device. This was the same super_block as created and locked by the previous task. The first task acquires the s_umount lock and then the sr_mutex_lock; the second task acquires the sr_mutex_lock and then the s_umount lock. This patch fixes the issue by moving check_disk_change() out of cdrom_open() and let the caller take care of it. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Nov 22, 2018
[ Upstream commit f5e2848 ] When enumerating page size definitions to check hardware support, we construct a constant which is (1U << (def->shift - 10)). However, the array of page size definitions is only initalised for various MMU_PAGE_* constants, so it contains a number of 0-initialised elements with def->shift == 0. This means we end up shifting by a very large number, which gives the following UBSan splat: ================================================================================ UBSAN: Undefined behaviour in /home/dja/dev/linux/linux/arch/powerpc/mm/tlb_nohash.c:506:21 shift exponent 4294967286 is too large for 32-bit type 'unsigned int' CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-00045-ga604f927b012-dirty #6 Call Trace: [c00000000101bc20] [c000000000a13d54] .dump_stack+0xa8/0xec (unreliable) [c00000000101bcb0] [c0000000004f20a8] .ubsan_epilogue+0x18/0x64 [c00000000101bd30] [c0000000004f2b10] .__ubsan_handle_shift_out_of_bounds+0x110/0x1a4 [c00000000101be20] [c000000000d21760] .early_init_mmu+0x1b4/0x5a0 [c00000000101bf10] [c000000000d1ba28] .early_setup+0x100/0x130 [c00000000101bf90] [c000000000000528] start_here_multiplatform+0x68/0x80 ================================================================================ Fix this by first checking if the element exists (shift != 0) before constructing the constant. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Feb 7, 2020
commit 314eed3 upstream. When running on a system with >512MB RAM with a 32-bit kernel built with: CONFIG_DEBUG_VIRTUAL=y CONFIG_HIGHMEM=y CONFIG_HARDENED_USERCOPY=y all execve()s will fail due to argv copying into kmap()ed pages, and on usercopy checking the calls ultimately of virt_to_page() will be looking for "bad" kmap (highmem) pointers due to CONFIG_DEBUG_VIRTUAL=y: ------------[ cut here ]------------ kernel BUG at ../arch/x86/mm/physaddr.c:83! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc8 #6 Hardware name: Dell Inc. Inspiron 1318/0C236D, BIOS A04 01/15/2009 EIP: __phys_addr+0xaf/0x100 ... Call Trace: __check_object_size+0xaf/0x3c0 ? __might_sleep+0x80/0xa0 copy_strings+0x1c2/0x370 copy_strings_kernel+0x2b/0x40 __do_execve_file+0x4ca/0x810 ? kmem_cache_alloc+0x1c7/0x370 do_execve+0x1b/0x20 ... The check is from arch/x86/mm/physaddr.c: VIRTUAL_BUG_ON((phys_addr >> PAGE_SHIFT) > max_low_pfn); Due to the kmap() in fs/exec.c: kaddr = kmap(kmapped_page); ... if (copy_from_user(kaddr+offset, str, bytes_to_copy)) ... Now we can fetch the correct page to avoid the pfn check. In both cases, hardened usercopy will need to walk the page-span checker (if enabled) to do sanity checking. Reported-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Fixes: f5509cc ("mm: Hardened usercopy") Cc: Matthew Wilcox <willy@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/201909171056.7F2FFD17@keescook Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Feb 7, 2020
commit 0216234 upstream. We release wrong pointer on error path in cpu_cache_level__read function, leading to segfault: (gdb) r record ls Starting program: /root/perf/tools/perf/perf record ls ... [ perf record: Woken up 1 times to write data ] double free or corruption (out) Thread 1 "perf" received signal SIGABRT, Aborted. 0x00007ffff7463798 in raise () from /lib64/power9/libc.so.6 (gdb) bt #0 0x00007ffff7463798 in raise () from /lib64/power9/libc.so.6 #1 0x00007ffff7443bac in abort () from /lib64/power9/libc.so.6 #2 0x00007ffff74af8bc in __libc_message () from /lib64/power9/libc.so.6 #3 0x00007ffff74b92b8 in malloc_printerr () from /lib64/power9/libc.so.6 #4 0x00007ffff74bb874 in _int_free () from /lib64/power9/libc.so.6 #5 0x0000000010271260 in __zfree (ptr=0x7fffffffa0b0) at ../../lib/zalloc.. #6 0x0000000010139340 in cpu_cache_level__read (cache=0x7fffffffa090, cac.. #7 0x0000000010143c90 in build_caches (cntp=0x7fffffffa118, size=<optimiz.. ... Releasing the proper pointer. Fixes: 720e98b ("perf tools: Add perf data cache feature") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org: # v4.6+ Link: http://lore.kernel.org/lkml/20190912105235.10689-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Feb 7, 2020
commit 443f2d5 upstream. Observe a segmentation fault when 'perf stat' is asked to repeat forever with the interval option. Without fix: # perf stat -r 0 -I 5000 -e cycles -a sleep 10 # time counts unit events 5.000211692 3,13,89,82,34,157 cycles 10.000380119 1,53,98,52,22,294 cycles 10.040467280 17,16,79,265 cycles Segmentation fault This problem was only observed when we use forever option aka -r 0 and works with limited repeats. Calling print_counter with ts being set to NULL, is not a correct option when interval is set. Hence avoid print_counter(NULL,..) if interval is set. With fix: # perf stat -r 0 -I 5000 -e cycles -a sleep 10 # time counts unit events 5.019866622 3,15,14,43,08,697 cycles 10.039865756 3,15,16,31,95,261 cycles 10.059950628 1,26,05,47,158 cycles 5.009902655 3,14,52,62,33,932 cycles 10.019880228 3,14,52,22,89,154 cycles 10.030543876 66,90,18,333 cycles 5.009848281 3,14,51,98,25,437 cycles 10.029854402 3,15,14,93,04,918 cycles 5.009834177 3,14,51,95,92,316 cycles Committer notes: Did the 'git bisect' to find the cset introducing the problem to add the Fixes tag below, and at that time the problem reproduced as: (gdb) run stat -r0 -I500 sleep 1 <SNIP> Program received signal SIGSEGV, Segmentation fault. print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866 866 sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep); (gdb) bt #0 print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866 #1 0x000000000041860a in print_counters (ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffd640) at builtin-stat.c:938 #2 0x0000000000419a7f in cmd_stat (argc=2, argv=0x7fffffffd640, prefix=<optimized out>) at builtin-stat.c:1411 #3 0x000000000045c65a in run_builtin (p=p@entry=0x6291b8 <commands+216>, argc=argc@entry=5, argv=argv@entry=0x7fffffffd640) at perf.c:370 #4 0x000000000045c893 in handle_internal_command (argc=5, argv=0x7fffffffd640) at perf.c:429 #5 0x000000000045c8f1 in run_argv (argcp=argcp@entry=0x7fffffffd4ac, argv=argv@entry=0x7fffffffd4a0) at perf.c:473 #6 0x000000000045cac9 in main (argc=<optimized out>, argv=<optimized out>) at perf.c:588 (gdb) Mostly the same as just before this patch: Program received signal SIGSEGV, Segmentation fault. 0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964 964 sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep); (gdb) bt #0 0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964 #1 0x0000000000588047 in perf_evlist__print_counters (evlist=0xbc9b90, config=0xa1f2a0 <stat_config>, _target=0xa1f0c0 <target>, ts=0x0, argc=2, argv=0x7fffffffd670) at util/stat-display.c:1172 #2 0x000000000045390f in print_counters (ts=0x0, argc=2, argv=0x7fffffffd670) at builtin-stat.c:656 #3 0x0000000000456bb5 in cmd_stat (argc=2, argv=0x7fffffffd670) at builtin-stat.c:1960 #4 0x00000000004dd2e0 in run_builtin (p=0xa30e00 <commands+288>, argc=5, argv=0x7fffffffd670) at perf.c:310 #5 0x00000000004dd54d in handle_internal_command (argc=5, argv=0x7fffffffd670) at perf.c:362 #6 0x00000000004dd694 in run_argv (argcp=0x7fffffffd4cc, argv=0x7fffffffd4c0) at perf.c:406 #7 0x00000000004dda11 in main (argc=5, argv=0x7fffffffd670) at perf.c:531 (gdb) Fixes: d4f63a4 ("perf stat: Introduce print_counters function") Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # v4.2+ Link: http://lore.kernel.org/lkml/20190904094738.9558-3-srikar@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Nov 4, 2020
commit 18c850f upstream. There's long existed a lockdep splat because we open our bdev's under the ->device_list_mutex at mount time, which acquires the bd_mutex. Usually this goes unnoticed, but if you do loopback devices at all suddenly the bd_mutex comes with a whole host of other dependencies, which results in the splat when you mount a btrfs file system. ====================================================== WARNING: possible circular locking dependency detected 5.8.0-0.rc3.1.fc33.x86_64+debug #1 Not tainted ------------------------------------------------------ systemd-journal/509 is trying to acquire lock: ffff970831f84db0 (&fs_info->reloc_mutex){+.+.}-{3:3}, at: btrfs_record_root_in_trans+0x44/0x70 [btrfs] but task is already holding lock: ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #6 (sb_pagefaults){.+.+}-{0:0}: __sb_start_write+0x13e/0x220 btrfs_page_mkwrite+0x59/0x560 [btrfs] do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 asm_exc_page_fault+0x1e/0x30 -> #5 (&mm->mmap_lock#2){++++}-{3:3}: __might_fault+0x60/0x80 _copy_from_user+0x20/0xb0 get_sg_io_hdr+0x9a/0xb0 scsi_cmd_ioctl+0x1ea/0x2f0 cdrom_ioctl+0x3c/0x12b4 sr_block_ioctl+0xa4/0xd0 block_ioctl+0x3f/0x50 ksys_ioctl+0x82/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #4 (&cd->lock){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 sr_block_open+0xa2/0x180 __blkdev_get+0xdd/0x550 blkdev_get+0x38/0x150 do_dentry_open+0x16b/0x3e0 path_openat+0x3c9/0xa00 do_filp_open+0x75/0x100 do_sys_openat2+0x8a/0x140 __x64_sys_openat+0x46/0x70 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #3 (&bdev->bd_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 __blkdev_get+0x6a/0x550 blkdev_get+0x85/0x150 blkdev_get_by_path+0x2c/0x70 btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] open_fs_devices+0x88/0x240 [btrfs] btrfs_open_devices+0x92/0xa0 [btrfs] btrfs_mount_root+0x250/0x490 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x119/0x380 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x8c6/0xca0 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 btrfs_run_dev_stats+0x36/0x420 [btrfs] commit_cowonly_roots+0x91/0x2d0 [btrfs] btrfs_commit_transaction+0x4e6/0x9f0 [btrfs] btrfs_sync_file+0x38a/0x480 [btrfs] __x64_sys_fdatasync+0x47/0x80 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&fs_info->tree_log_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 btrfs_commit_transaction+0x48e/0x9f0 [btrfs] btrfs_sync_file+0x38a/0x480 [btrfs] __x64_sys_fdatasync+0x47/0x80 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&fs_info->reloc_mutex){+.+.}-{3:3}: __lock_acquire+0x1241/0x20c0 lock_acquire+0xb0/0x400 __mutex_lock+0x7b/0x820 btrfs_record_root_in_trans+0x44/0x70 [btrfs] start_transaction+0xd2/0x500 [btrfs] btrfs_dirty_inode+0x44/0xd0 [btrfs] file_update_time+0xc6/0x120 btrfs_page_mkwrite+0xda/0x560 [btrfs] do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Chain exists of: &fs_info->reloc_mutex --> &mm->mmap_lock#2 --> sb_pagefaults Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sb_pagefaults); lock(&mm->mmap_lock#2); lock(sb_pagefaults); lock(&fs_info->reloc_mutex); *** DEADLOCK *** 3 locks held by systemd-journal/509: #0: ffff97083bdec8b8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x12e/0x4b0 #1: ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs] #2: ffff97083144d6a8 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x3f8/0x500 [btrfs] stack backtrace: CPU: 0 PID: 509 Comm: systemd-journal Not tainted 5.8.0-0.rc3.1.fc33.x86_64+debug #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0x92/0xc8 check_noncircular+0x134/0x150 __lock_acquire+0x1241/0x20c0 lock_acquire+0xb0/0x400 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] ? lock_acquire+0xb0/0x400 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] __mutex_lock+0x7b/0x820 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] ? kvm_sched_clock_read+0x14/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0xc/0xb0 btrfs_record_root_in_trans+0x44/0x70 [btrfs] start_transaction+0xd2/0x500 [btrfs] btrfs_dirty_inode+0x44/0xd0 [btrfs] file_update_time+0xc6/0x120 btrfs_page_mkwrite+0xda/0x560 [btrfs] ? sched_clock+0x5/0x10 do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x7fa3972fdbfe Code: Bad RIP value. Fix this by not holding the ->device_list_mutex at this point. The device_list_mutex exists to protect us from modifying the device list while the file system is running. However it can also be modified by doing a scan on a device. But this action is specifically protected by the uuid_mutex, which we are holding here. We cannot race with opening at this point because we have the ->s_mount lock held during the mount. Not having the ->device_list_mutex here is perfectly safe as we're not going to change the devices at this point. CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add some comments ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Nov 4, 2020
… set [ Upstream commit 1101c87 ] We received an error report that perf-record caused 'Segmentation fault' on a newly system (e.g. on the new installed ubuntu). (gdb) backtrace #0 __read_once_size (size=4, res=<synthetic pointer>, p=0x14) at /root/0-jinyao/acme/tools/include/linux/compiler.h:139 #1 atomic_read (v=0x14) at /root/0-jinyao/acme/tools/include/asm/../../arch/x86/include/asm/atomic.h:28 #2 refcount_read (r=0x14) at /root/0-jinyao/acme/tools/include/linux/refcount.h:65 #3 perf_mmap__read_init (map=map@entry=0x0) at mmap.c:177 #4 0x0000561ce5c0de39 in perf_evlist__poll_thread (arg=0x561ce68584d0) at util/sideband_evlist.c:62 #5 0x00007fad78491609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #6 0x00007fad7823c103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 The root cause is, evlist__add_bpf_sb_event() just returns 0 if HAVE_LIBBPF_SUPPORT is not defined (inline function path). So it will not create a valid evsel for side-band event. But perf-record still creates BPF side band thread to process the side-band event, then the error happpens. We can reproduce this issue by removing the libelf-dev. e.g. 1. apt-get remove libelf-dev 2. perf record -a -- sleep 1 root@test:~# ./perf record -a -- sleep 1 perf: Segmentation fault Obtained 6 stack frames. ./perf(+0x28eee8) [0x5562d6ef6ee8] /lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7fbfdc65f210] ./perf(+0x342e74) [0x5562d6faae74] ./perf(+0x257e39) [0x5562d6ebfe39] /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7fbfdc990609] /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fbfdc73b103] Segmentation fault (core dumped) To fix this issue, 1. We either install the missing libraries to let HAVE_LIBBPF_SUPPORT be defined. e.g. apt-get install libelf-dev and install other related libraries. 2. Use this patch to skip the side-band event setup if HAVE_LIBBPF_SUPPORT is not set. Committer notes: The side band thread is not used just with BPF, it is also used with --switch-output-event, so narrow the ifdef to the BPF specific part. Fixes: 23cbb41 ("perf record: Move side band evlist setup to separate routine") Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200805022937.29184-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
koenkooi
pushed a commit
that referenced
this pull request
Nov 4, 2020
[ Upstream commit b0f3b87 ] https://bugzilla.kernel.org/show_bug.cgi?id=208565 PID: 257 TASK: ecdd0000 CPU: 0 COMMAND: "init" #0 [<c0b420ec>] (__schedule) from [<c0b423c8>] #1 [<c0b423c8>] (schedule) from [<c0b459d4>] #2 [<c0b459d4>] (rwsem_down_read_failed) from [<c0b44fa0>] #3 [<c0b44fa0>] (down_read) from [<c044233c>] #4 [<c044233c>] (f2fs_truncate_blocks) from [<c0442890>] #5 [<c0442890>] (f2fs_truncate) from [<c044d408>] #6 [<c044d408>] (f2fs_evict_inode) from [<c030be18>] #7 [<c030be18>] (evict) from [<c030a558>] #8 [<c030a558>] (iput) from [<c047c600>] #9 [<c047c600>] (f2fs_sync_node_pages) from [<c0465414>] torvalds#10 [<c0465414>] (f2fs_write_checkpoint) from [<c04575f4>] torvalds#11 [<c04575f4>] (f2fs_sync_fs) from [<c0441918>] torvalds#12 [<c0441918>] (f2fs_do_sync_file) from [<c0441098>] torvalds#13 [<c0441098>] (f2fs_sync_file) from [<c0323fa0>] torvalds#14 [<c0323fa0>] (vfs_fsync_range) from [<c0324294>] torvalds#15 [<c0324294>] (do_fsync) from [<c0324014>] torvalds#16 [<c0324014>] (sys_fsync) from [<c0108bc0>] This can be caused by flush_dirty_inode() in f2fs_sync_node_pages() where iput() requires f2fs_lock_op() again resulting in livelock. Reported-by: Zhiguo Niu <Zhiguo.Niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
koenkooi
pushed a commit
that referenced
this pull request
Nov 4, 2020
[ Upstream commit e679654 ] In our production system, we observed rcu stalls when 'bpftool prog` is running. rcu: INFO: rcu_sched self-detected stall on CPU rcu: \x097-....: (20999 ticks this GP) idle=302/1/0x4000000000000000 softirq=1508852/1508852 fqs=4913 \x09(t=21031 jiffies g=2534773 q=179750) NMI backtrace for cpu 7 CPU: 7 PID: 184195 Comm: bpftool Kdump: loaded Tainted: G W 5.8.0-00004-g68bfc7f8c1b4 #6 Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019 Call Trace: <IRQ> dump_stack+0x57/0x70 nmi_cpu_backtrace.cold+0x14/0x53 ? lapic_can_unplug_cpu.cold+0x39/0x39 nmi_trigger_cpumask_backtrace+0xb7/0xc7 rcu_dump_cpu_stacks+0xa2/0xd0 rcu_sched_clock_irq.cold+0x1ff/0x3d9 ? tick_nohz_handler+0x100/0x100 update_process_times+0x5b/0x90 tick_sched_timer+0x5e/0xf0 __hrtimer_run_queues+0x12a/0x2a0 hrtimer_interrupt+0x10e/0x280 __sysvec_apic_timer_interrupt+0x51/0xe0 asm_call_on_stack+0xf/0x20 </IRQ> sysvec_apic_timer_interrupt+0x6f/0x80 asm_sysvec_apic_timer_interrupt+0x12/0x20 RIP: 0010:task_file_seq_get_next+0x71/0x220 Code: 00 00 8b 53 1c 49 8b 7d 00 89 d6 48 8b 47 20 44 8b 18 41 39 d3 76 75 48 8b 4f 20 8b 01 39 d0 76 61 41 89 d1 49 39 c1 48 19 c0 <48> 8b 49 08 21 d0 48 8d 04 c1 4c 8b 08 4d 85 c9 74 46 49 8b 41 38 RSP: 0018:ffffc90006223e10 EFLAGS: 00000297 RAX: ffffffffffffffff RBX: ffff888f0d172388 RCX: ffff888c8c07c1c0 RDX: 00000000000f017b RSI: 00000000000f017b RDI: ffff888c254702c0 RBP: ffffc90006223e68 R08: ffff888be2a1c140 R09: 00000000000f017b R10: 0000000000000002 R11: 0000000000100000 R12: ffff888f23c24118 R13: ffffc90006223e60 R14: ffffffff828509a0 R15: 00000000ffffffff task_file_seq_next+0x52/0xa0 bpf_seq_read+0xb9/0x320 vfs_read+0x9d/0x180 ksys_read+0x5f/0xe0 do_syscall_64+0x38/0x60 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f8815f4f76e Code: c0 e9 f6 fe ff ff 55 48 8d 3d 76 70 0a 00 48 89 e5 e8 36 06 02 00 66 0f 1f 44 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 52 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 RSP: 002b:00007fff8f9df578 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 000000000170b9c0 RCX: 00007f8815f4f76e RDX: 0000000000001000 RSI: 00007fff8f9df5b0 RDI: 0000000000000007 RBP: 00007fff8f9e05f0 R08: 0000000000000049 R09: 0000000000000010 R10: 00007f881601fa40 R11: 0000000000000246 R12: 00007fff8f9e05a8 R13: 00007fff8f9e05a8 R14: 0000000001917f90 R15: 000000000000e22e Note that `bpftool prog` actually calls a task_file bpf iterator program to establish an association between prog/map/link/btf anon files and processes. In the case where the above rcu stall occured, we had a process having 1587 tasks and each task having roughly 81305 files. This implied 129 million bpf prog invocations. Unfortunwtely none of these files are prog/map/link/btf files so bpf iterator/prog needs to traverse all these files and not able to return to user space since there are no seq_file buffer overflow. This patch fixed the issue in bpf_seq_read() to limit the number of visited objects. If the maximum number of visited objects is reached, no more objects will be visited in the current syscall. If there is nothing written in the seq_file buffer, -EAGAIN will return to the user so user can try again. The maximum number of visited objects is set at 1 million. In our Intel Xeon D-2191 2.3GHZ 18-core server, bpf_seq_read() visiting 1 million files takes around 0.18 seconds. We did not use cond_resched() since for some iterators, e.g., netlink iterator, where rcu read_lock critical section spans between consecutive seq_ops->next(), which makes impossible to do cond_resched() in the key while loop of function bpf_seq_read(). Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/bpf/20200818222309.2181348-1-yhs@fb.com Signed-off-by: Sasha Levin <sashal@kernel.org>
koenkooi
pushed a commit
that referenced
this pull request
Nov 4, 2020
commit e89c4a9 upstream. I got the following lockdep splat while testing: ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc7-00172-g021118712e59 torvalds#932 Not tainted ------------------------------------------------------ btrfs/229626 is trying to acquire lock: ffffffff828513f0 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue+0x378/0x450 but task is already holding lock: ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #7 (&fs_info->scrub_lock){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 btrfs_scrub_dev+0x11c/0x630 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4 btrfs_ioctl+0x2799/0x30a0 ksys_ioctl+0x83/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #6 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 btrfs_run_dev_stats+0x49/0x480 commit_cowonly_roots+0xb5/0x2a0 btrfs_commit_transaction+0x516/0xa60 sync_filesystem+0x6b/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0xe/0x30 btrfs_kill_super+0x12/0x20 deactivate_locked_super+0x29/0x60 cleanup_mnt+0xb8/0x140 task_work_run+0x6d/0xb0 __prepare_exit_to_usermode+0x1cc/0x1e0 do_syscall_64+0x5c/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #5 (&fs_info->tree_log_mutex){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 btrfs_commit_transaction+0x4bb/0xa60 sync_filesystem+0x6b/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0xe/0x30 btrfs_kill_super+0x12/0x20 deactivate_locked_super+0x29/0x60 cleanup_mnt+0xb8/0x140 task_work_run+0x6d/0xb0 __prepare_exit_to_usermode+0x1cc/0x1e0 do_syscall_64+0x5c/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #4 (&fs_info->reloc_mutex){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 btrfs_record_root_in_trans+0x43/0x70 start_transaction+0xd1/0x5d0 btrfs_dirty_inode+0x42/0xd0 touch_atime+0xa1/0xd0 btrfs_file_mmap+0x3f/0x60 mmap_region+0x3a4/0x640 do_mmap+0x376/0x580 vm_mmap_pgoff+0xd5/0x120 ksys_mmap_pgoff+0x193/0x230 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #3 (&mm->mmap_lock#2){++++}-{3:3}: __might_fault+0x68/0x90 _copy_to_user+0x1e/0x80 perf_read+0x141/0x2c0 vfs_read+0xad/0x1b0 ksys_read+0x5f/0xe0 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (&cpuctx_mutex){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 perf_event_init_cpu+0x88/0x150 perf_event_init+0x1db/0x20b start_kernel+0x3ae/0x53c secondary_startup_64+0xa4/0xb0 -> #1 (pmus_lock){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 perf_event_init_cpu+0x4f/0x150 cpuhp_invoke_callback+0xb1/0x900 _cpu_up.constprop.26+0x9f/0x130 cpu_up+0x7b/0xc0 bringup_nonboot_cpus+0x4f/0x60 smp_init+0x26/0x71 kernel_init_freeable+0x110/0x258 kernel_init+0xa/0x103 ret_from_fork+0x1f/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: __lock_acquire+0x1272/0x2310 lock_acquire+0x9e/0x360 cpus_read_lock+0x39/0xb0 alloc_workqueue+0x378/0x450 __btrfs_alloc_workqueue+0x15d/0x200 btrfs_alloc_workqueue+0x51/0x160 scrub_workers_get+0x5a/0x170 btrfs_scrub_dev+0x18c/0x630 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4 btrfs_ioctl+0x2799/0x30a0 ksys_ioctl+0x83/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: cpu_hotplug_lock --> &fs_devs->device_list_mutex --> &fs_info->scrub_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->scrub_lock); lock(&fs_devs->device_list_mutex); lock(&fs_info->scrub_lock); lock(cpu_hotplug_lock); *** DEADLOCK *** 2 locks held by btrfs/229626: #0: ffff88bfe8bb86e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_scrub_dev+0xbd/0x630 #1: ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630 stack backtrace: CPU: 15 PID: 229626 Comm: btrfs Kdump: loaded Not tainted 5.8.0-rc7-00172-g021118712e59 torvalds#932 Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018 Call Trace: dump_stack+0x78/0xa0 check_noncircular+0x165/0x180 __lock_acquire+0x1272/0x2310 lock_acquire+0x9e/0x360 ? alloc_workqueue+0x378/0x450 cpus_read_lock+0x39/0xb0 ? alloc_workqueue+0x378/0x450 alloc_workqueue+0x378/0x450 ? rcu_read_lock_sched_held+0x52/0x80 __btrfs_alloc_workqueue+0x15d/0x200 btrfs_alloc_workqueue+0x51/0x160 scrub_workers_get+0x5a/0x170 btrfs_scrub_dev+0x18c/0x630 ? start_transaction+0xd1/0x5d0 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4 btrfs_ioctl+0x2799/0x30a0 ? do_sigaction+0x102/0x250 ? lockdep_hardirqs_on_prepare+0xca/0x160 ? _raw_spin_unlock_irq+0x24/0x30 ? trace_hardirqs_on+0x1c/0xe0 ? _raw_spin_unlock_irq+0x24/0x30 ? do_sigaction+0x102/0x250 ? ksys_ioctl+0x83/0xc0 ksys_ioctl+0x83/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This happens because we're allocating the scrub workqueues under the scrub and device list mutex, which brings in a whole host of other dependencies. Because the work queue allocation is done with GFP_KERNEL, it can trigger reclaim, which can lead to a transaction commit, which in turns needs the device_list_mutex, it can lead to a deadlock. A different problem for which this fix is a solution. Fix this by moving the actual allocation outside of the scrub lock, and then only take the lock once we're ready to actually assign them to the fs_info. We'll now have to cleanup the workqueues in a few more places, so I've added a helper to do the refcount dance to safely free the workqueues. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Nov 4, 2020
[ Upstream commit 2cd896a ] If we hit the UINT_MAX limit of bio->bi_iter.bi_size and so we are anyway not merging this page in this bio, then it make sense to make same_page also as false before returning. Without this patch, we hit below WARNING in iomap. This mostly happens with very large memory system and / or after tweaking vm dirty threshold params to delay writeback of dirty data. WARNING: CPU: 18 PID: 5130 at fs/iomap/buffered-io.c:74 iomap_page_release+0x120/0x150 CPU: 18 PID: 5130 Comm: fio Kdump: loaded Tainted: G W 5.8.0-rc3 #6 Call Trace: __remove_mapping+0x154/0x320 (unreliable) iomap_releasepage+0x80/0x180 try_to_release_page+0x94/0xe0 invalidate_inode_page+0xc8/0x110 invalidate_mapping_pages+0x1dc/0x540 generic_fadvise+0x3c8/0x450 xfs_file_fadvise+0x2c/0xe0 [xfs] vfs_fadvise+0x3c/0x60 ksys_fadvise64_64+0x68/0xe0 sys_fadvise64+0x28/0x40 system_call_exception+0xf8/0x1c0 system_call_common+0xf0/0x278 Fixes: cc90bc6 ("block: fix "check bi_size overflow before merge"") Reported-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
koenkooi
pushed a commit
that referenced
this pull request
Nov 4, 2020
[ Upstream commit 22fe5a2 ] The aliases were never released causing the following leaks: Indirect leak of 1224 byte(s) in 9 object(s) allocated from: #0 0x7feefb830628 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x107628) #1 0x56332c8f1b62 in __perf_pmu__new_alias util/pmu.c:322 #2 0x56332c8f401f in pmu_add_cpu_aliases_map util/pmu.c:778 #3 0x56332c792ce9 in __test__pmu_event_aliases tests/pmu-events.c:295 #4 0x56332c792ce9 in test_aliases tests/pmu-events.c:367 #5 0x56332c76a09b in run_test tests/builtin-test.c:410 #6 0x56332c76a09b in test_and_print tests/builtin-test.c:440 #7 0x56332c76ce69 in __cmd_test tests/builtin-test.c:695 #8 0x56332c76ce69 in cmd_test tests/builtin-test.c:807 #9 0x56332c7d2214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 torvalds#10 0x56332c6701a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 torvalds#11 0x56332c6701a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 torvalds#12 0x56332c6701a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 torvalds#13 0x7feefb359cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: 956a783 ("perf test: Test pmu-events aliases") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-11-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
koenkooi
pushed a commit
that referenced
this pull request
Nov 4, 2020
[ Upstream commit b12eea5 ] The evsel->unit borrows a pointer of pmu event or alias instead of owns a string. But tool event (duration_time) passes a result of strdup() caused a leak. It was found by ASAN during metric test: Direct leak of 210 byte(s) in 70 object(s) allocated from: #0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5) #1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414 #2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414 #3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439 #4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096 #5 0x559fbbcc95da in __parse_events util/parse-events.c:2141 #6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406 #7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393 #8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415 #9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498 torvalds#10 0x559fbbc0109b in run_test tests/builtin-test.c:410 torvalds#11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440 torvalds#12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695 torvalds#13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807 torvalds#14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 torvalds#15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 torvalds#16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 torvalds#17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 torvalds#18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: f0fbb11 ("perf stat: Implement duration_time as a proper event") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
koenkooi
pushed a commit
that referenced
this pull request
Nov 4, 2020
[ Upstream commit d26383d ] The following leaks were detected by ASAN: Indirect leak of 360 byte(s) in 9 object(s) allocated from: #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) #1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333 #2 0x560578f752fc in perf_pmu_parse util/pmu.y:59 #3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73 #4 0x560578e07045 in test__pmu tests/pmu.c:155 #5 0x560578de109b in run_test tests/builtin-test.c:410 #6 0x560578de109b in test_and_print tests/builtin-test.c:440 #7 0x560578de401a in __cmd_test tests/builtin-test.c:661 #8 0x560578de401a in cmd_test tests/builtin-test.c:807 #9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 torvalds#10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 torvalds#11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 torvalds#12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 torvalds#13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: cff7f95 ("perf tests: Move pmu tests into separate object") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
koenkooi
pushed a commit
that referenced
this pull request
Nov 4, 2020
[ Upstream commit 1253935 ] Current neigh update event handler implementation takes reference to neighbour structure, assigns it to nhe->n, tries to schedule workqueue task and releases the reference if task was already enqueued. This results potentially overwriting existing nhe->n pointer with another neighbour instance, which causes double release of the instance (once in neigh update handler that failed to enqueue to workqueue and another one in neigh update workqueue task that processes updated nhe->n pointer instead of original one): [ 3376.512806] ------------[ cut here ]------------ [ 3376.513534] refcount_t: underflow; use-after-free. [ 3376.521213] Modules linked in: act_skbedit act_mirred act_tunnel_key vxlan ip6_udp_tunnel udp_tunnel nfnetlink act_gact cls_flower sch_ingress openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 mlx5_ib mlx5_core mlxfw pci_hyperv_intf ptp pps_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp rpcrdma rdma_ucm ib_umad ib_ipoib ib_iser rdma_cm ib_cm iw_cm rfkill ib_uverbs ib_core sunrpc kvm_intel kvm iTCO_wdt iTCO_vendor_support virtio_net irqbypass net_failover crc32_pclmul lpc_ich i2c_i801 failover pcspkr i2c_smbus mfd_core ghash_clmulni_intel sch_fq_codel drm i2c _core ip_tables crc32c_intel serio_raw [last unloaded: mlxfw] [ 3376.529468] CPU: 8 PID: 22756 Comm: kworker/u20:5 Not tainted 5.9.0-rc5+ #6 [ 3376.530399] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 3376.531975] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core] [ 3376.532820] RIP: 0010:refcount_warn_saturate+0xd8/0xe0 [ 3376.533589] Code: ff 48 c7 c7 e0 b8 27 82 c6 05 0b b6 09 01 01 e8 94 93 c1 ff 0f 0b c3 48 c7 c7 88 b8 27 82 c6 05 f7 b5 09 01 01 e8 7e 93 c1 ff <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13 [ 3376.536017] RSP: 0018:ffffc90002a97e30 EFLAGS: 00010286 [ 3376.536793] RAX: 0000000000000000 RBX: ffff8882de30d648 RCX: 0000000000000000 [ 3376.537718] RDX: ffff8882f5c28f20 RSI: ffff8882f5c18e40 RDI: ffff8882f5c18e40 [ 3376.538654] RBP: ffff8882cdf56c00 R08: 000000000000c580 R09: 0000000000001a4d [ 3376.539582] R10: 0000000000000731 R11: ffffc90002a97ccd R12: 0000000000000000 [ 3376.540519] R13: ffff8882de30d600 R14: ffff8882de30d640 R15: ffff88821e000900 [ 3376.541444] FS: 0000000000000000(0000) GS:ffff8882f5c00000(0000) knlGS:0000000000000000 [ 3376.542732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3376.543545] CR2: 0000556e5504b248 CR3: 00000002c6f10005 CR4: 0000000000770ee0 [ 3376.544483] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3376.545419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3376.546344] PKRU: 55555554 [ 3376.546911] Call Trace: [ 3376.547479] mlx5e_rep_neigh_update.cold+0x33/0xe2 [mlx5_core] [ 3376.548299] process_one_work+0x1d8/0x390 [ 3376.548977] worker_thread+0x4d/0x3e0 [ 3376.549631] ? rescuer_thread+0x3e0/0x3e0 [ 3376.550295] kthread+0x118/0x130 [ 3376.550914] ? kthread_create_worker_on_cpu+0x70/0x70 [ 3376.551675] ret_from_fork+0x1f/0x30 [ 3376.552312] ---[ end trace d84e8f46d2a77eec ]--- Fix the bug by moving work_struct to dedicated dynamically-allocated structure. This enabled every event handler to work on its own private neighbour pointer and removes the need for handling the case when task is already enqueued. Fixes: 232c001 ("net/mlx5e: Add support to neighbour update flow") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
koenkooi
pushed a commit
that referenced
this pull request
Dec 17, 2020
[ Upstream commit 05ca530 ] This patch avoid the warning in vkms_get_vblank_timestamp when vblanks aren't enabled. When running igt test kms_cursor_crc just after vkms module, the warning raised like below. Initial value of vblank time is zero and hrtimer.node.expires is also zero if vblank aren't enabled before. vkms module isn't real hardware but just virtual hardware module. so vkms can't generate a resonable timestamp when hrtimer is off. it's best to grab the current time. [106444.464503] [IGT] kms_cursor_crc: starting subtest pipe-A-cursor-size-change [106444.471475] WARNING: CPU: 0 PID: 10109 at vkms_get_vblank_timestamp+0x42/0x50 [vkms] [106444.471511] CPU: 0 PID: 10109 Comm: kms_cursor_crc Tainted: G W OE 5.9.0-rc1+ #6 [106444.471514] RIP: 0010:vkms_get_vblank_timestamp+0x42/0x50 [vkms] [106444.471528] Call Trace: [106444.471551] drm_get_last_vbltimestamp+0xb9/0xd0 [drm] [106444.471566] drm_reset_vblank_timestamp+0x63/0xe0 [drm] [106444.471579] drm_crtc_vblank_on+0x85/0x150 [drm] [106444.471582] vkms_crtc_atomic_enable+0xe/0x10 [vkms] [106444.471592] drm_atomic_helper_commit_modeset_enables+0x1db/0x230 [drm_kms_helper] [106444.471594] vkms_atomic_commit_tail+0x38/0xc0 [vkms] [106444.471601] commit_tail+0x97/0x130 [drm_kms_helper] [106444.471608] drm_atomic_helper_commit+0x117/0x140 [drm_kms_helper] [106444.471622] drm_atomic_commit+0x4a/0x50 [drm] [106444.471629] drm_atomic_helper_set_config+0x63/0xb0 [drm_kms_helper] [106444.471642] drm_mode_setcrtc+0x1d9/0x7b0 [drm] [106444.471654] ? drm_mode_getcrtc+0x1a0/0x1a0 [drm] [106444.471666] drm_ioctl_kernel+0xb6/0x100 [drm] [106444.471677] drm_ioctl+0x3ad/0x470 [drm] [106444.471688] ? drm_mode_getcrtc+0x1a0/0x1a0 [drm] [106444.471692] ? tomoyo_file_ioctl+0x19/0x20 [106444.471694] __x64_sys_ioctl+0x96/0xd0 [106444.471697] do_syscall_64+0x37/0x80 [106444.471699] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com> Cc: Haneen Mohammed <hamohammed.sa@gmail.com> Cc: Melissa Wen <melissa.srw@gmail.com> Signed-off-by: Sidong Yang <realwakka@gmail.com> Reviewed-by: Melissa Wen <melissa.srw@gmail.com> Signed-off-by: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200828124553.2178-1-realwakka@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
koenkooi
pushed a commit
that referenced
this pull request
Dec 17, 2020
commit 66d204a upstream. Very sporadically I had test case btrfs/069 from fstests hanging (for years, it is not a recent regression), with the following traces in dmesg/syslog: [162301.160628] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg started [162301.181196] BTRFS info (device sdc): scrub: finished on devid 4 with status: 0 [162301.287162] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg finished [162513.513792] INFO: task btrfs-transacti:1356167 blocked for more than 120 seconds. [162513.514318] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.514522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.514747] task:btrfs-transacti state:D stack: 0 pid:1356167 ppid: 2 flags:0x00004000 [162513.514751] Call Trace: [162513.514761] __schedule+0x5ce/0xd00 [162513.514765] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.514771] schedule+0x46/0xf0 [162513.514844] wait_current_trans+0xde/0x140 [btrfs] [162513.514850] ? finish_wait+0x90/0x90 [162513.514864] start_transaction+0x37c/0x5f0 [btrfs] [162513.514879] transaction_kthread+0xa4/0x170 [btrfs] [162513.514891] ? btrfs_cleanup_transaction+0x660/0x660 [btrfs] [162513.514894] kthread+0x153/0x170 [162513.514897] ? kthread_stop+0x2c0/0x2c0 [162513.514902] ret_from_fork+0x22/0x30 [162513.514916] INFO: task fsstress:1356184 blocked for more than 120 seconds. [162513.515192] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.515431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.515680] task:fsstress state:D stack: 0 pid:1356184 ppid:1356177 flags:0x00004000 [162513.515682] Call Trace: [162513.515688] __schedule+0x5ce/0xd00 [162513.515691] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.515697] schedule+0x46/0xf0 [162513.515712] wait_current_trans+0xde/0x140 [btrfs] [162513.515716] ? finish_wait+0x90/0x90 [162513.515729] start_transaction+0x37c/0x5f0 [btrfs] [162513.515743] btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs] [162513.515753] btrfs_sync_fs+0x61/0x1c0 [btrfs] [162513.515758] ? __ia32_sys_fdatasync+0x20/0x20 [162513.515761] iterate_supers+0x87/0xf0 [162513.515765] ksys_sync+0x60/0xb0 [162513.515768] __do_sys_sync+0xa/0x10 [162513.515771] do_syscall_64+0x33/0x80 [162513.515774] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.515781] RIP: 0033:0x7f5238f50bd7 [162513.515782] Code: Bad RIP value. [162513.515784] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2 [162513.515786] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7 [162513.515788] RDX: 00000000ffffffff RSI: 000000000daf0e74 RDI: 000000000000003a [162513.515789] RBP: 0000000000000032 R08: 000000000000000a R09: 00007f5239019be0 [162513.515791] R10: fffffffffffff24f R11: 0000000000000206 R12: 000000000000003a [162513.515792] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340 [162513.515804] INFO: task fsstress:1356185 blocked for more than 120 seconds. [162513.516064] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.516329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.516617] task:fsstress state:D stack: 0 pid:1356185 ppid:1356177 flags:0x00000000 [162513.516620] Call Trace: [162513.516625] __schedule+0x5ce/0xd00 [162513.516628] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.516634] schedule+0x46/0xf0 [162513.516647] wait_current_trans+0xde/0x140 [btrfs] [162513.516650] ? finish_wait+0x90/0x90 [162513.516662] start_transaction+0x4d7/0x5f0 [btrfs] [162513.516679] btrfs_setxattr_trans+0x3c/0x100 [btrfs] [162513.516686] __vfs_setxattr+0x66/0x80 [162513.516691] __vfs_setxattr_noperm+0x70/0x200 [162513.516697] vfs_setxattr+0x6b/0x120 [162513.516703] setxattr+0x125/0x240 [162513.516709] ? lock_acquire+0xb1/0x480 [162513.516712] ? mnt_want_write+0x20/0x50 [162513.516721] ? rcu_read_lock_any_held+0x8e/0xb0 [162513.516723] ? preempt_count_add+0x49/0xa0 [162513.516725] ? __sb_start_write+0x19b/0x290 [162513.516727] ? preempt_count_add+0x49/0xa0 [162513.516732] path_setxattr+0xba/0xd0 [162513.516739] __x64_sys_setxattr+0x27/0x30 [162513.516741] do_syscall_64+0x33/0x80 [162513.516743] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.516745] RIP: 0033:0x7f5238f56d5a [162513.516746] Code: Bad RIP value. [162513.516748] RSP: 002b:00007fff67b97868 EFLAGS: 00000202 ORIG_RAX: 00000000000000bc [162513.516750] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f5238f56d5a [162513.516751] RDX: 000055b1fbb0d5a0 RSI: 00007fff67b978a0 RDI: 000055b1fbb0d470 [162513.516753] RBP: 000055b1fbb0d5a0 R08: 0000000000000001 R09: 00007fff67b97700 [162513.516754] R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000004 [162513.516756] R13: 0000000000000024 R14: 0000000000000001 R15: 00007fff67b978a0 [162513.516767] INFO: task fsstress:1356196 blocked for more than 120 seconds. [162513.517064] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.517365] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.517763] task:fsstress state:D stack: 0 pid:1356196 ppid:1356177 flags:0x00004000 [162513.517780] Call Trace: [162513.517786] __schedule+0x5ce/0xd00 [162513.517789] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.517796] schedule+0x46/0xf0 [162513.517810] wait_current_trans+0xde/0x140 [btrfs] [162513.517814] ? finish_wait+0x90/0x90 [162513.517829] start_transaction+0x37c/0x5f0 [btrfs] [162513.517845] btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs] [162513.517857] btrfs_sync_fs+0x61/0x1c0 [btrfs] [162513.517862] ? __ia32_sys_fdatasync+0x20/0x20 [162513.517865] iterate_supers+0x87/0xf0 [162513.517869] ksys_sync+0x60/0xb0 [162513.517872] __do_sys_sync+0xa/0x10 [162513.517875] do_syscall_64+0x33/0x80 [162513.517878] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.517881] RIP: 0033:0x7f5238f50bd7 [162513.517883] Code: Bad RIP value. [162513.517885] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2 [162513.517887] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7 [162513.517889] RDX: 0000000000000000 RSI: 000000007660add2 RDI: 0000000000000053 [162513.517891] RBP: 0000000000000032 R08: 0000000000000067 R09: 00007f5239019be0 [162513.517893] R10: fffffffffffff24f R11: 0000000000000206 R12: 0000000000000053 [162513.517895] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340 [162513.517908] INFO: task fsstress:1356197 blocked for more than 120 seconds. [162513.518298] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.518672] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.519157] task:fsstress state:D stack: 0 pid:1356197 ppid:1356177 flags:0x00000000 [162513.519160] Call Trace: [162513.519165] __schedule+0x5ce/0xd00 [162513.519168] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.519174] schedule+0x46/0xf0 [162513.519190] wait_current_trans+0xde/0x140 [btrfs] [162513.519193] ? finish_wait+0x90/0x90 [162513.519206] start_transaction+0x4d7/0x5f0 [btrfs] [162513.519222] btrfs_create+0x57/0x200 [btrfs] [162513.519230] lookup_open+0x522/0x650 [162513.519246] path_openat+0x2b8/0xa50 [162513.519270] do_filp_open+0x91/0x100 [162513.519275] ? find_held_lock+0x32/0x90 [162513.519280] ? lock_acquired+0x33b/0x470 [162513.519285] ? do_raw_spin_unlock+0x4b/0xc0 [162513.519287] ? _raw_spin_unlock+0x29/0x40 [162513.519295] do_sys_openat2+0x20d/0x2d0 [162513.519300] do_sys_open+0x44/0x80 [162513.519304] do_syscall_64+0x33/0x80 [162513.519307] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.519309] RIP: 0033:0x7f5238f4a903 [162513.519310] Code: Bad RIP value. [162513.519312] RSP: 002b:00007fff67b97758 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 [162513.519314] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f5238f4a903 [162513.519316] RDX: 0000000000000000 RSI: 00000000000001b6 RDI: 000055b1fbb0d470 [162513.519317] RBP: 00007fff67b978c0 R08: 0000000000000001 R09: 0000000000000002 [162513.519319] R10: 00007fff67b974f7 R11: 0000000000000246 R12: 0000000000000013 [162513.519320] R13: 00000000000001b6 R14: 00007fff67b97906 R15: 000055b1fad1c620 [162513.519332] INFO: task btrfs:1356211 blocked for more than 120 seconds. [162513.519727] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.520115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.520508] task:btrfs state:D stack: 0 pid:1356211 ppid:1356178 flags:0x00004002 [162513.520511] Call Trace: [162513.520516] __schedule+0x5ce/0xd00 [162513.520519] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.520525] schedule+0x46/0xf0 [162513.520544] btrfs_scrub_pause+0x11f/0x180 [btrfs] [162513.520548] ? finish_wait+0x90/0x90 [162513.520562] btrfs_commit_transaction+0x45a/0xc30 [btrfs] [162513.520574] ? start_transaction+0xe0/0x5f0 [btrfs] [162513.520596] btrfs_dev_replace_finishing+0x6d8/0x711 [btrfs] [162513.520619] btrfs_dev_replace_by_ioctl.cold+0x1cc/0x1fd [btrfs] [162513.520639] btrfs_ioctl+0x2a25/0x36f0 [btrfs] [162513.520643] ? do_sigaction+0xf3/0x240 [162513.520645] ? find_held_lock+0x32/0x90 [162513.520648] ? do_sigaction+0xf3/0x240 [162513.520651] ? lock_acquired+0x33b/0x470 [162513.520655] ? _raw_spin_unlock_irq+0x24/0x50 [162513.520657] ? lockdep_hardirqs_on+0x7d/0x100 [162513.520660] ? _raw_spin_unlock_irq+0x35/0x50 [162513.520662] ? do_sigaction+0xf3/0x240 [162513.520671] ? __x64_sys_ioctl+0x83/0xb0 [162513.520672] __x64_sys_ioctl+0x83/0xb0 [162513.520677] do_syscall_64+0x33/0x80 [162513.520679] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.520681] RIP: 0033:0x7fc3cd307d87 [162513.520682] Code: Bad RIP value. [162513.520684] RSP: 002b:00007ffe30a56bb8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [162513.520686] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc3cd307d87 [162513.520687] RDX: 00007ffe30a57a30 RSI: 00000000ca289435 RDI: 0000000000000003 [162513.520689] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [162513.520690] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000003 [162513.520692] R13: 0000557323a212e0 R14: 00007ffe30a5a520 R15: 0000000000000001 [162513.520703] Showing all locks held in the system: [162513.520712] 1 lock held by khungtaskd/54: [162513.520713] #0: ffffffffb40a91a0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x197 [162513.520728] 1 lock held by in:imklog/596: [162513.520729] #0: ffff8f3f0d781400 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4d/0x60 [162513.520782] 1 lock held by btrfs-transacti/1356167: [162513.520784] #0: ffff8f3d810cc848 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0x4a/0x170 [btrfs] [162513.520798] 1 lock held by btrfs/1356190: [162513.520800] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0x60 [162513.520805] 1 lock held by fsstress/1356184: [162513.520806] #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0 [162513.520811] 3 locks held by fsstress/1356185: [162513.520812] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50 [162513.520815] #1: ffff8f3d80a650b8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: vfs_setxattr+0x50/0x120 [162513.520820] #2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] [162513.520833] 1 lock held by fsstress/1356196: [162513.520834] #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0 [162513.520838] 3 locks held by fsstress/1356197: [162513.520839] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50 [162513.520843] #1: ffff8f3d506465e8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: path_openat+0x2a7/0xa50 [162513.520846] #2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] [162513.520858] 2 locks held by btrfs/1356211: [162513.520859] #0: ffff8f3d810cde30 (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+.}-{3:3}, at: btrfs_dev_replace_finishing+0x52/0x711 [btrfs] [162513.520877] #1: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] This was weird because the stack traces show that a transaction commit, triggered by a device replace operation, is blocking trying to pause any running scrubs but there are no stack traces of blocked tasks doing a scrub. After poking around with drgn, I noticed there was a scrub task that was constantly running and blocking for shorts periods of time: >>> t = find_task(prog, 1356190) >>> prog.stack_trace(t) #0 __schedule+0x5ce/0xcfc #1 schedule+0x46/0xe4 #2 schedule_timeout+0x1df/0x475 #3 btrfs_reada_wait+0xda/0x132 #4 scrub_stripe+0x2a8/0x112f #5 scrub_chunk+0xcd/0x134 #6 scrub_enumerate_chunks+0x29e/0x5ee #7 btrfs_scrub_dev+0x2d5/0x91b #8 btrfs_ioctl+0x7f5/0x36e7 #9 __x64_sys_ioctl+0x83/0xb0 torvalds#10 do_syscall_64+0x33/0x77 torvalds#11 entry_SYSCALL_64+0x7c/0x156 Which corresponds to: int btrfs_reada_wait(void *handle) { struct reada_control *rc = handle; struct btrfs_fs_info *fs_info = rc->fs_info; while (atomic_read(&rc->elems)) { if (!atomic_read(&fs_info->reada_works_cnt)) reada_start_machine(fs_info); wait_event_timeout(rc->wait, atomic_read(&rc->elems) == 0, (HZ + 9) / 10); } (...) So the counter "rc->elems" was set to 1 and never decreased to 0, causing the scrub task to loop forever in that function. Then I used the following script for drgn to check the readahead requests: $ cat dump_reada.py import sys import drgn from drgn import NULL, Object, cast, container_of, execscript, \ reinterpret, sizeof from drgn.helpers.linux import * mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1" mnt = None for mnt in for_each_mount(prog, dst = mnt_path): pass if mnt is None: sys.stderr.write(f'Error: mount point {mnt_path} not found\n') sys.exit(1) fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info) def dump_re(re): nzones = re.nzones.value_() print(f're at {hex(re.value_())}') print(f'\t logical {re.logical.value_()}') print(f'\t refcnt {re.refcnt.value_()}') print(f'\t nzones {nzones}') for i in range(nzones): dev = re.zones[i].device name = dev.name.str.string_() print(f'\t\t dev id {dev.devid.value_()} name {name}') print() for _, e in radix_tree_for_each(fs_info.reada_tree): re = cast('struct reada_extent *', e) dump_re(re) $ drgn dump_reada.py re at 0xffff8f3da9d25ad8 logical 38928384 refcnt 1 nzones 1 dev id 0 name b'/dev/sdd' $ So there was one readahead extent with a single zone corresponding to the source device of that last device replace operation logged in dmesg/syslog. Also the ID of that zone's device was 0 which is a special value set in the source device of a device replace operation when the operation finishes (constant BTRFS_DEV_REPLACE_DEVID set at btrfs_dev_replace_finishing()), confirming again that device /dev/sdd was the source of a device replace operation. Normally there should be as many zones in the readahead extent as there are devices, and I wasn't expecting the extent to be in a block group with a 'single' profile, so I went and confirmed with the following drgn script that there weren't any single profile block groups: $ cat dump_block_groups.py import sys import drgn from drgn import NULL, Object, cast, container_of, execscript, \ reinterpret, sizeof from drgn.helpers.linux import * mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1" mnt = None for mnt in for_each_mount(prog, dst = mnt_path): pass if mnt is None: sys.stderr.write(f'Error: mount point {mnt_path} not found\n') sys.exit(1) fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info) BTRFS_BLOCK_GROUP_DATA = (1 << 0) BTRFS_BLOCK_GROUP_SYSTEM = (1 << 1) BTRFS_BLOCK_GROUP_METADATA = (1 << 2) BTRFS_BLOCK_GROUP_RAID0 = (1 << 3) BTRFS_BLOCK_GROUP_RAID1 = (1 << 4) BTRFS_BLOCK_GROUP_DUP = (1 << 5) BTRFS_BLOCK_GROUP_RAID10 = (1 << 6) BTRFS_BLOCK_GROUP_RAID5 = (1 << 7) BTRFS_BLOCK_GROUP_RAID6 = (1 << 8) BTRFS_BLOCK_GROUP_RAID1C3 = (1 << 9) BTRFS_BLOCK_GROUP_RAID1C4 = (1 << 10) def bg_flags_string(bg): flags = bg.flags.value_() ret = '' if flags & BTRFS_BLOCK_GROUP_DATA: ret = 'data' if flags & BTRFS_BLOCK_GROUP_METADATA: if len(ret) > 0: ret += '|' ret += 'meta' if flags & BTRFS_BLOCK_GROUP_SYSTEM: if len(ret) > 0: ret += '|' ret += 'system' if flags & BTRFS_BLOCK_GROUP_RAID0: ret += ' raid0' elif flags & BTRFS_BLOCK_GROUP_RAID1: ret += ' raid1' elif flags & BTRFS_BLOCK_GROUP_DUP: ret += ' dup' elif flags & BTRFS_BLOCK_GROUP_RAID10: ret += ' raid10' elif flags & BTRFS_BLOCK_GROUP_RAID5: ret += ' raid5' elif flags & BTRFS_BLOCK_GROUP_RAID6: ret += ' raid6' elif flags & BTRFS_BLOCK_GROUP_RAID1C3: ret += ' raid1c3' elif flags & BTRFS_BLOCK_GROUP_RAID1C4: ret += ' raid1c4' else: ret += ' single' return ret def dump_bg(bg): print() print(f'block group at {hex(bg.value_())}') print(f'\t start {bg.start.value_()} length {bg.length.value_()}') print(f'\t flags {bg.flags.value_()} - {bg_flags_string(bg)}') bg_root = fs_info.block_group_cache_tree.address_of_() for bg in rbtree_inorder_for_each_entry('struct btrfs_block_group', bg_root, 'cache_node'): dump_bg(bg) $ drgn dump_block_groups.py block group at 0xffff8f3d673b0400 start 22020096 length 16777216 flags 258 - system raid6 block group at 0xffff8f3d53ddb400 start 38797312 length 536870912 flags 260 - meta raid6 block group at 0xffff8f3d5f4d9c00 start 575668224 length 2147483648 flags 257 - data raid6 block group at 0xffff8f3d08189000 start 2723151872 length 67108864 flags 258 - system raid6 block group at 0xffff8f3db70ff000 start 2790260736 length 1073741824 flags 260 - meta raid6 block group at 0xffff8f3d5f4dd800 start 3864002560 length 67108864 flags 258 - system raid6 block group at 0xffff8f3d67037000 start 3931111424 length 2147483648 flags 257 - data raid6 $ So there were only 2 reasons left for having a readahead extent with a single zone: reada_find_zone(), called when creating a readahead extent, returned NULL either because we failed to find the corresponding block group or because a memory allocation failed. With some additional and custom tracing I figured out that on every further ocurrence of the problem the block group had just been deleted when we were looping to create the zones for the readahead extent (at reada_find_extent()), so we ended up with only one zone in the readahead extent, corresponding to a device that ends up getting replaced. So after figuring that out it became obvious why the hang happens: 1) Task A starts a scrub on any device of the filesystem, except for device /dev/sdd; 2) Task B starts a device replace with /dev/sdd as the source device; 3) Task A calls btrfs_reada_add() from scrub_stripe() and it is currently starting to scrub a stripe from block group X. This call to btrfs_reada_add() is the one for the extent tree. When btrfs_reada_add() calls reada_add_block(), it passes the logical address of the extent tree's root node as its 'logical' argument - a value of 38928384; 4) Task A then enters reada_find_extent(), called from reada_add_block(). It finds there isn't any existing readahead extent for the logical address 38928384, so it proceeds to the path of creating a new one. It calls btrfs_map_block() to find out which stripes exist for the block group X. On the first iteration of the for loop that iterates over the stripes, it finds the stripe for device /dev/sdd, so it creates one zone for that device and adds it to the readahead extent. Before getting into the second iteration of the loop, the cleanup kthread deletes block group X because it was empty. So in the iterations for the remaining stripes it does not add more zones to the readahead extent, because the calls to reada_find_zone() returned NULL because they couldn't find block group X anymore. As a result the new readahead extent has a single zone, corresponding to the device /dev/sdd; 4) Before task A returns to btrfs_reada_add() and queues the readahead job for the readahead work queue, task B finishes the device replace and at btrfs_dev_replace_finishing() swaps the device /dev/sdd with the new device /dev/sdg; 5) Task A returns to reada_add_block(), which increments the counter "->elems" of the reada_control structure allocated at btrfs_reada_add(). Then it returns back to btrfs_reada_add() and calls reada_start_machine(). This queues a job in the readahead work queue to run the function reada_start_machine_worker(), which calls __reada_start_machine(). At __reada_start_machine() we take the device list mutex and for each device found in the current device list, we call reada_start_machine_dev() to start the readahead work. However at this point the device /dev/sdd was already freed and is not in the device list anymore. This means the corresponding readahead for the extent at 38928384 is never started, and therefore the "->elems" counter of the reada_control structure allocated at btrfs_reada_add() never goes down to 0, causing the call to btrfs_reada_wait(), done by the scrub task, to wait forever. Note that the readahead request can be made either after the device replace started or before it started, however in pratice it is very unlikely that a device replace is able to start after a readahead request is made and is able to complete before the readahead request completes - maybe only on a very small and nearly empty filesystem. This hang however is not the only problem we can have with readahead and device removals. When the readahead extent has other zones other than the one corresponding to the device that is being removed (either by a device replace or a device remove operation), we risk having a use-after-free on the device when dropping the last reference of the readahead extent. For example if we create a readahead extent with two zones, one for the device /dev/sdd and one for the device /dev/sde: 1) Before the readahead worker starts, the device /dev/sdd is removed, and the corresponding btrfs_device structure is freed. However the readahead extent still has the zone pointing to the device structure; 2) When the readahead worker starts, it only finds device /dev/sde in the current device list of the filesystem; 3) It starts the readahead work, at reada_start_machine_dev(), using the device /dev/sde; 4) Then when it finishes reading the extent from device /dev/sde, it calls __readahead_hook() which ends up dropping the last reference on the readahead extent through the last call to reada_extent_put(); 5) At reada_extent_put() it iterates over each zone of the readahead extent and attempts to delete an element from the device's 'reada_extents' radix tree, resulting in a use-after-free, as the device pointer of the zone for /dev/sdd is now stale. We can also access the device after dropping the last reference of a zone, through reada_zone_release(), also called by reada_extent_put(). And a device remove suffers the same problem, however since it shrinks the device size down to zero before removing the device, it is very unlikely to still have readahead requests not completed by the time we free the device, the only possibility is if the device has a very little space allocated. While the hang problem is exclusive to scrub, since it is currently the only user of btrfs_reada_add() and btrfs_reada_wait(), the use-after-free problem affects any path that triggers readhead, which includes btree_readahead_hook() and __readahead_hook() (a readahead worker can trigger readahed for the children of a node) for example - any path that ends up calling reada_add_block() can trigger the use-after-free after a device is removed. So fix this by waiting for any readahead requests for a device to complete before removing a device, ensuring that while waiting for existing ones no new ones can be made. This problem has been around for a very long time - the readahead code was added in 2011, device remove exists since 2008 and device replace was introduced in 2013, hard to pick a specific commit for a git Fixes tag. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
koenkooi
pushed a commit
that referenced
this pull request
Dec 17, 2020
[ Upstream commit e773ca7 ] Actually, burst size is equal to '1 << desc->rqcfg.brst_size'. we should use burst size, not desc->rqcfg.brst_size. dma memcpy performance on Rockchip RV1126 @ 1512MHz A7, 1056MHz LPDDR3, 200MHz DMA: dmatest: /# echo dma0chan0 > /sys/module/dmatest/parameters/channel /# echo 4194304 > /sys/module/dmatest/parameters/test_buf_size /# echo 8 > /sys/module/dmatest/parameters/iterations /# echo y > /sys/module/dmatest/parameters/norandom /# echo y > /sys/module/dmatest/parameters/verbose /# echo 1 > /sys/module/dmatest/parameters/run dmatest: dma0chan0-copy0: result #1: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #2: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #3: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #4: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #5: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #6: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #7: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #8: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 Before: dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 48 iops 200338 KB/s (0) After this patch: dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 179 iops 734873 KB/s (0) After this patch and increase dma clk to 400MHz: dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 259 iops 1062929 KB/s (0) Signed-off-by: Sugar Zhang <sugar.zhang@rock-chips.com> Link: https://lore.kernel.org/r/1605326106-55681-1-git-send-email-sugar.zhang@rock-chips.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added initial nixie cape prototype board support.