Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drm/i915/hsw: Disable L3 caching of atomic memory operations. #2

Merged
merged 1 commit into from
Jun 28, 2014

Conversation

courtney-lunarg
Copy link

Back ported this patch to work around bug on Haswell w.r.t. shader atomic counters. System will lock up and require power cycle if run any shader using atomic operations.

Otherwise using any atomic memory operation will lock up the GPU due
to a Haswell hardware bug.

v2: Use the _MASKED_BIT_ENABLE macro. Drop drm parameter definition.

Signed-off-by: Francisco Jerez currojerez@riseup.net
Reviewed-by: Ben Widawsky ben@bwidawsk.net
Cc: Daniel Vetter daniel@ffwll.ch
Cc: stable@vger.kernel.org
[danvet: Fix checkpatch fail.]
Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch

Conflicts:
drivers/gpu/drm/i915/intel_pm.c

Otherwise using any atomic memory operation will lock up the GPU due
to a Haswell hardware bug.

v2: Use the _MASKED_BIT_ENABLE macro.  Drop drm parameter definition.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: <stable@vger.kernel.org>
[danvet: Fix checkpatch fail.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Conflicts:
	drivers/gpu/drm/i915/intel_pm.c
Plagman added a commit that referenced this pull request Jun 28, 2014
drm/i915/hsw: Disable L3 caching of atomic memory operations.
@Plagman Plagman merged commit 5309cda into ValveSoftware:alchemist-3.10 Jun 28, 2014
Plagman pushed a commit that referenced this pull request Nov 20, 2014
commit 0b462c8 upstream.

While a queue is being destroyed, all the blkgs are destroyed and its
->root_blkg pointer is set to NULL.  If someone else starts to drain
while the queue is in this state, the following oops happens.

  NULL pointer dereference at 0000000000000028
  IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  PGD e4a1067 PUD b773067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
  CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
  RIP: 0010:[<ffffffff8144e944>]  [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
  R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
  FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
  Stack:
   ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
   ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
   ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
  Call Trace:
   [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60
   [<ffffffff81427641>] __blk_drain_queue+0x71/0x180
   [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0
   [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120
   [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50
   [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40
   [<ffffffff8142d476>] blk_release_queue+0x26/0xd0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff81427505>] blk_put_queue+0x15/0x20
   [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0
   [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0
   [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20
   [<ffffffff817930e2>] device_release+0x32/0xa0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff817934d7>] put_device+0x17/0x20
   [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0
   [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40
   [<ffffffff817d1257>] sdev_store_delete+0x27/0x30
   [<ffffffff81792ca8>] dev_attr_store+0x18/0x30
   [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50
   [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170
   [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0
   [<ffffffff811f69bd>] SyS_write+0x4d/0xc0
   [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b

776687b ("block, blk-mq: draining can't be skipped even if
bypass_depth was non-zero") made it easier to trigger this bug by
making blk_queue_bypass_start() drain even when it loses the first
bypass test to blk_cleanup_queue(); however, the bug has always been
there even before the commit as blk_queue_bypass_start() could race
against queue destruction, win the initial bypass test but perform the
actual draining after blk_cleanup_queue() already destroyed all blkgs.

Fix it by skippping calling into policy draining if all the blkgs are
already gone.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Shirish Pargaonkar <spargaonkar@suse.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Jet Chen <jet.chen@intel.com>
Tested-by: Shirish Pargaonkar <spargaonkar@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Plagman pushed a commit that referenced this pull request Nov 20, 2014
commit 504d587 upstream.

clockevents_increase_min_delta() calls printk() from under
hrtimer_bases.lock. That causes lock inversion on scheduler locks because
printk() can call into the scheduler. Lockdep puts it as:

======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc8-06195-g939f04b #2 Not tainted
-------------------------------------------------------
trinity-main/74 is trying to acquire lock:
 (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c

but task is already holding lock:
 (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (hrtimer_bases.lock){-.-...}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197
       [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85
       [<81080792>] task_clock_event_start+0x3a/0x3f
       [<810807a4>] task_clock_event_add+0xd/0x14
       [<8108259a>] event_sched_in+0xb6/0x17a
       [<810826a2>] group_sched_in+0x44/0x122
       [<81082885>] ctx_sched_in.isra.67+0x105/0x11f
       [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b
       [<81082bf6>] __perf_install_in_context+0x8b/0xa3
       [<8107eb8e>] remote_function+0x12/0x2a
       [<8105f5af>] smp_call_function_single+0x2d/0x53
       [<8107e17d>] task_function_call+0x30/0x36
       [<8107fb82>] perf_install_in_context+0x87/0xbb
       [<810852c9>] SYSC_perf_event_open+0x5c6/0x701
       [<810856f9>] SyS_perf_event_open+0x17/0x19
       [<8142f8ee>] syscall_call+0x7/0xb

-> #4 (&ctx->lock){......}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f04c>] _raw_spin_lock+0x21/0x30
       [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<8142cae0>] schedule+0xf/0x11
       [<8142f9a6>] work_resched+0x5/0x30

-> #3 (&rq->lock){-.-.-.}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f04c>] _raw_spin_lock+0x21/0x30
       [<81040873>] __task_rq_lock+0x33/0x3a
       [<8104184c>] wake_up_new_task+0x25/0xc2
       [<8102474b>] do_fork+0x15c/0x2a0
       [<810248a9>] kernel_thread+0x1a/0x1f
       [<814232a2>] rest_init+0x1a/0x10e
       [<817af949>] start_kernel+0x303/0x308
       [<817af2ab>] i386_start_kernel+0x79/0x7d

-> #2 (&p->pi_lock){-.-...}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<810413dd>] try_to_wake_up+0x1d/0xd6
       [<810414cd>] default_wake_function+0xb/0xd
       [<810461f3>] __wake_up_common+0x39/0x59
       [<81046346>] __wake_up+0x29/0x3b
       [<811b8733>] tty_wakeup+0x49/0x51
       [<811c3568>] uart_write_wakeup+0x17/0x19
       [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
       [<811c5f28>] serial8250_handle_irq+0x54/0x6a
       [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
       [<811c56d8>] serial8250_interrupt+0x38/0x9e
       [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
       [<81051296>] handle_irq_event+0x2c/0x43
       [<81052cee>] handle_level_irq+0x57/0x80
       [<81002a72>] handle_irq+0x46/0x5c
       [<810027df>] do_IRQ+0x32/0x89
       [<8143036e>] common_interrupt+0x2e/0x33
       [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
       [<811c25a4>] uart_start+0x2d/0x32
       [<811c2c04>] uart_write+0xc7/0xd6
       [<811bc6f6>] n_tty_write+0xb8/0x35e
       [<811b9beb>] tty_write+0x163/0x1e4
       [<811b9cd9>] redirected_tty_write+0x6d/0x75
       [<810b6ed6>] vfs_write+0x75/0xb0
       [<810b7265>] SyS_write+0x44/0x77
       [<8142f8ee>] syscall_call+0x7/0xb

-> #1 (&tty->write_wait){-.....}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<81046332>] __wake_up+0x15/0x3b
       [<811b8733>] tty_wakeup+0x49/0x51
       [<811c3568>] uart_write_wakeup+0x17/0x19
       [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
       [<811c5f28>] serial8250_handle_irq+0x54/0x6a
       [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
       [<811c56d8>] serial8250_interrupt+0x38/0x9e
       [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
       [<81051296>] handle_irq_event+0x2c/0x43
       [<81052cee>] handle_level_irq+0x57/0x80
       [<81002a72>] handle_irq+0x46/0x5c
       [<810027df>] do_IRQ+0x32/0x89
       [<8143036e>] common_interrupt+0x2e/0x33
       [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
       [<811c25a4>] uart_start+0x2d/0x32
       [<811c2c04>] uart_write+0xc7/0xd6
       [<811bc6f6>] n_tty_write+0xb8/0x35e
       [<811b9beb>] tty_write+0x163/0x1e4
       [<811b9cd9>] redirected_tty_write+0x6d/0x75
       [<810b6ed6>] vfs_write+0x75/0xb0
       [<810b7265>] SyS_write+0x44/0x77
       [<8142f8ee>] syscall_call+0x7/0xb

-> #0 (&port_lock_key){-.....}:
       [<8104a62d>] __lock_acquire+0x9ea/0xc6d
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<811c60be>] serial8250_console_write+0x8c/0x10c
       [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
       [<8104f5d5>] console_unlock+0x1d7/0x398
       [<8104fb70>] vprintk_emit+0x3da/0x3e4
       [<81425f76>] printk+0x17/0x19
       [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
       [<8105c548>] clockevents_program_event+0xe7/0xf3
       [<8105cc1c>] tick_program_event+0x1e/0x23
       [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
       [<8103c49e>] __remove_hrtimer+0x5b/0x79
       [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
       [<8103cb4b>] hrtimer_cancel+0xd/0x18
       [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
       [<81080705>] task_clock_event_stop+0x20/0x64
       [<81080756>] task_clock_event_del+0xd/0xf
       [<81081350>] event_sched_out+0xab/0x11e
       [<810813e0>] group_sched_out+0x1d/0x66
       [<81081682>] ctx_sched_out+0xaf/0xbf
       [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<8142cae0>] schedule+0xf/0x11
       [<8142f9a6>] work_resched+0x5/0x30

other info that might help us debug this:

Chain exists of:
  &port_lock_key --> &ctx->lock --> hrtimer_bases.lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(hrtimer_bases.lock);
                               lock(&ctx->lock);
                               lock(hrtimer_bases.lock);
  lock(&port_lock_key);

 *** DEADLOCK ***

4 locks held by trinity-main/74:
 #0:  (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb
 #1:  (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
 #2:  (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
 #3:  (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4

stack backtrace:
CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2
 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
Call Trace:
 [<81426f69>] dump_stack+0x16/0x18
 [<81425a99>] print_circular_bug+0x18f/0x19c
 [<8104a62d>] __lock_acquire+0x9ea/0xc6d
 [<8104a942>] lock_acquire+0x92/0x101
 [<811c60be>] ? serial8250_console_write+0x8c/0x10c
 [<811c6032>] ? wait_for_xmitr+0x76/0x76
 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
 [<811c60be>] ? serial8250_console_write+0x8c/0x10c
 [<811c60be>] serial8250_console_write+0x8c/0x10c
 [<8104af87>] ? lock_release+0x191/0x223
 [<811c6032>] ? wait_for_xmitr+0x76/0x76
 [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
 [<8104f5d5>] console_unlock+0x1d7/0x398
 [<8104fb70>] vprintk_emit+0x3da/0x3e4
 [<81425f76>] printk+0x17/0x19
 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
 [<8105cc1c>] tick_program_event+0x1e/0x23
 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
 [<8103c49e>] __remove_hrtimer+0x5b/0x79
 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
 [<8103cb4b>] hrtimer_cancel+0xd/0x18
 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
 [<81080705>] task_clock_event_stop+0x20/0x64
 [<81080756>] task_clock_event_del+0xd/0xf
 [<81081350>] event_sched_out+0xab/0x11e
 [<810813e0>] group_sched_out+0x1d/0x66
 [<81081682>] ctx_sched_out+0xaf/0xbf
 [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
 [<8104416d>] ? __dequeue_entity+0x23/0x27
 [<81044505>] ? pick_next_task_fair+0xb1/0x120
 [<8142cacc>] __schedule+0x4c6/0x4cb
 [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108
 [<810475b0>] ? trace_hardirqs_off+0xb/0xd
 [<81056346>] ? rcu_irq_exit+0x64/0x77

Fix the problem by using printk_deferred() which does not call into the
scheduler.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Plagman pushed a commit that referenced this pull request Nov 20, 2014
commit 2a1b4cf upstream.

While a queue is being destroyed, all the blkgs are destroyed and its
->root_blkg pointer is set to NULL.  If someone else starts to drain
while the queue is in this state, the following oops happens.

  NULL pointer dereference at 0000000000000028
  IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  PGD e4a1067 PUD b773067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
  CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
  RIP: 0010:[<ffffffff8144e944>]  [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
  R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
  FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
  Stack:
   ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
   ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
   ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
  Call Trace:
   [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60
   [<ffffffff81427641>] __blk_drain_queue+0x71/0x180
   [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0
   [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120
   [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50
   [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40
   [<ffffffff8142d476>] blk_release_queue+0x26/0xd0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff81427505>] blk_put_queue+0x15/0x20
   [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0
   [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0
   [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20
   [<ffffffff817930e2>] device_release+0x32/0xa0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff817934d7>] put_device+0x17/0x20
   [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0
   [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40
   [<ffffffff817d1257>] sdev_store_delete+0x27/0x30
   [<ffffffff81792ca8>] dev_attr_store+0x18/0x30
   [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50
   [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170
   [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0
   [<ffffffff811f69bd>] SyS_write+0x4d/0xc0
   [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b

776687b ("block, blk-mq: draining can't be skipped even if
bypass_depth was non-zero") made it easier to trigger this bug by
making blk_queue_bypass_start() drain even when it loses the first
bypass test to blk_cleanup_queue(); however, the bug has always been
there even before the commit as blk_queue_bypass_start() could race
against queue destruction, win the initial bypass test but perform the
actual draining after blk_cleanup_queue() already destroyed all blkgs.

Fix it by skippping calling into policy draining if all the blkgs are
already gone.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Shirish Pargaonkar <spargaonkar@suse.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Jet Chen <jet.chen@intel.com>
Tested-by: Shirish Pargaonkar <spargaonkar@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Plagman pushed a commit that referenced this pull request Nov 20, 2014
commit 35425ea upstream.

Christopher Head 2014-06-28 05:26:20 UTC described:
"I tried to reproduce this on 3.12.21. Instead, when I do "echo hello > foo"
in an ecryptfs mount with ecryptfs_xattr specified, I get a kernel crash:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
PGD d7840067 PUD b2c3c067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: nvidia(PO)
CPU: 3 PID: 3566 Comm: bash Tainted: P           O 3.12.21-gentoo-r1 #2
Hardware name: ASUSTek Computer Inc. G60JX/G60JX, BIOS 206 03/15/2010
task: ffff8801948944c0 ti: ffff8800bad70000 task.ti: ffff8800bad70000
RIP: 0010:[<ffffffff8110eb39>]  [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
RSP: 0018:ffff8800bad71c10  EFLAGS: 00010246
RAX: 00000000000181a4 RBX: ffff880198648480 RCX: 0000000000000000
RDX: 0000000000000004 RSI: ffff880172010450 RDI: 0000000000000000
RBP: ffff880198490e40 R08: 0000000000000000 R09: 0000000000000000
R10: ffff880172010450 R11: ffffea0002c51e80 R12: 0000000000002000
R13: 000000000000001a R14: 0000000000000000 R15: ffff880198490e40
FS:  00007ff224caa700(0000) GS:ffff88019fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000bb07f000 CR4: 00000000000007e0
Stack:
ffffffff811826e8 ffff8800a39d8000 0000000000000000 000000000000001a
ffff8800a01d0000 ffff8800a39d8000 ffffffff81185fd5 ffffffff81082c2c
00000001a39d8000 53d0abbc98490e40 0000000000000037 ffff8800a39d8220
Call Trace:
[<ffffffff811826e8>] ? ecryptfs_setxattr+0x40/0x52
[<ffffffff81185fd5>] ? ecryptfs_write_metadata+0x1b3/0x223
[<ffffffff81082c2c>] ? should_resched+0x5/0x23
[<ffffffff8118322b>] ? ecryptfs_initialize_file+0xaf/0xd4
[<ffffffff81183344>] ? ecryptfs_create+0xf4/0x142
[<ffffffff810f8c0d>] ? vfs_create+0x48/0x71
[<ffffffff810f9c86>] ? do_last.isra.68+0x559/0x952
[<ffffffff810f7ce7>] ? link_path_walk+0xbd/0x458
[<ffffffff810fa2a3>] ? path_openat+0x224/0x472
[<ffffffff810fa7bd>] ? do_filp_open+0x2b/0x6f
[<ffffffff81103606>] ? __alloc_fd+0xd6/0xe7
[<ffffffff810ee6ab>] ? do_sys_open+0x65/0xe9
[<ffffffff8157d022>] ? system_call_fastpath+0x16/0x1b
RIP  [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
RSP <ffff8800bad71c10>
CR2: 0000000000000000
---[ end trace df9dba5f1ddb8565 ]---"

If we create a file when we mount with ecryptfs_xattr_metadata option, we will
encounter a crash in this path:
->ecryptfs_create
  ->ecryptfs_initialize_file
    ->ecryptfs_write_metadata
      ->ecryptfs_write_metadata_to_xattr
        ->ecryptfs_setxattr
          ->fsstack_copy_attr_all
It's because our dentry->d_inode used in fsstack_copy_attr_all is NULL, and it
will be initialized when ecryptfs_initialize_file finish.

So we should skip copying attr from lower inode when the value of ->d_inode is
invalid.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fledermaus pushed a commit to fledermaus/steamos_kernel that referenced this pull request Jul 24, 2015
[ Upstream commit d496f78 ]

A ROSE socket doesn't necessarily always have a neighbour pointer so check
if the neighbour pointer is valid before dereferencing it.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Tested-by: Bernard Pidoux <f6bvp@free.fr>
Cc: stable@vger.kernel.org ValveSoftware#2.6.11+
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
fledermaus pushed a commit to fledermaus/steamos_kernel that referenced this pull request Jul 24, 2015
[ Upstream commit 8a8c35f ]

Beginning at commit d52d399 ("ipv6: Create percpu rt6_info"), the
following INFO splat is logged:

  ===============================
  [ INFO: suspicious RCU usage. ]
  4.1.0-rc7-next-20150612 ValveSoftware#1 Not tainted
  -------------------------------
  kernel/sched/core.c:7318 Illegal context switch in RCU-bh read-side critical section!
  other info that might help us debug this:
  rcu_scheduler_active = 1, debug_locks = 0
   3 locks held by systemd/1:
   #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff815f0c8f>] rtnetlink_rcv+0x1f/0x40
   ValveSoftware#1:  (rcu_read_lock_bh){......}, at: [<ffffffff816a34e2>] ipv6_add_addr+0x62/0x540
   ValveSoftware#2:  (addrconf_hash_lock){+...+.}, at: [<ffffffff816a3604>] ipv6_add_addr+0x184/0x540
  stack backtrace:
  CPU: 0 PID: 1 Comm: systemd Not tainted 4.1.0-rc7-next-20150612 ValveSoftware#1
  Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20   04/17/2014
  Call Trace:
    dump_stack+0x4c/0x6e
    lockdep_rcu_suspicious+0xe7/0x120
    ___might_sleep+0x1d5/0x1f0
    __might_sleep+0x4d/0x90
    kmem_cache_alloc+0x47/0x250
    create_object+0x39/0x2e0
    kmemleak_alloc_percpu+0x61/0xe0
    pcpu_alloc+0x370/0x630

Additional backtrace lines are truncated.  In addition, the above splat
is followed by several "BUG: sleeping function called from invalid
context at mm/slub.c:1268" outputs.  As suggested by Martin KaFai Lau,
these are the clue to the fix.  Routine kmemleak_alloc_percpu() always
uses GFP_KERNEL for its allocations, whereas it should follow the gfp
from its callers.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@vger.kernel.org>	[3.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
fledermaus pushed a commit to fledermaus/steamos_kernel that referenced this pull request Jul 24, 2015
[ Upstream commit db2efec ]

Jean reported another crash, similar to the one fixed by feaff8e:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
    IP: [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
    PGD 0
    Oops: 0000 [ValveSoftware#1] SMP
    Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache vmw_vsock_vmci_transport vsock cfg80211 rfkill coretemp crct10dif_pclmul ppdev vmw_balloon crc32_pclmul crc32c_intel ghash_clmulni_intel pcspkr vmxnet3 parport_pc i2c_piix4 microcode serio_raw parport nfsd floppy vmw_vmci acpi_cpufreq auth_rpcgss shpchp nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih ata_generic mptbase i2c_core pata_acpi
    CPU: 0 PID: 329 Comm: kworker/0:1H Not tainted 4.1.0-rc7+ ValveSoftware#2
    Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
    Workqueue: rpciod rpc_async_schedule [sunrpc]
    30ec000
    RIP: 0010:[<ffffffff8124ef7f>]  [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
    RSP: 0018:ffff8802330efc08  EFLAGS: 00010296
    RAX: ffff8802330efc58 RBX: ffff880097187c80 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
    RBP: ffff8802330efc18 R08: ffff88023fc173d8 R09: 3038b7bf00000000
    R10: 00002f1a02000000 R11: 3038b7bf00000000 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff8802337a2300 R15: 0000000000000020
    FS:  0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000148 CR3: 000000003680f000 CR4: 00000000001407f0
    Stack:
     ffff880097187c80 ffff880097187cd8 ffff8802330efc98 ffffffff81250281
     ffff8802330efc68 ffffffffa013e7df ffff8802330efc98 0000000000000246
     ffff8801f6901c00 ffff880233d2b8d8 ffff8802330efc58 ffff8802330efc58
    Call Trace:
     [<ffffffff81250281>] __posix_lock_file+0x31/0x5e0
     [<ffffffffa013e7df>] ? rpc_wake_up_task_queue_locked.part.35+0xcf/0x240 [sunrpc]
     [<ffffffff8125088b>] posix_lock_file_wait+0x3b/0xd0
     [<ffffffffa03890b2>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4]
     [<ffffffffa0365808>] ? nfs41_sequence_done+0xd8/0x300 [nfsv4]
     [<ffffffffa0367525>] do_vfs_lock+0x35/0x40 [nfsv4]
     [<ffffffffa03690c1>] nfs4_locku_done+0x81/0x120 [nfsv4]
     [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
     [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
     [<ffffffffa013e33c>] rpc_exit_task+0x2c/0x90 [sunrpc]
     [<ffffffffa0134400>] ? call_refreshresult+0x170/0x170 [sunrpc]
     [<ffffffffa013ece4>] __rpc_execute+0x84/0x410 [sunrpc]
     [<ffffffffa013f085>] rpc_async_schedule+0x15/0x20 [sunrpc]
     [<ffffffff810add67>] process_one_work+0x147/0x400
     [<ffffffff810ae42b>] worker_thread+0x11b/0x460
     [<ffffffff810ae310>] ? rescuer_thread+0x2f0/0x2f0
     [<ffffffff810b35d9>] kthread+0xc9/0xe0
     [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pmd+0xa0/0x160
     [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
     [<ffffffff8173c222>] ret_from_fork+0x42/0x70
     [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
    Code: a5 81 e8 85 75 e4 ff c6 05 31 ee aa 00 01 eb 98 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 <48> 8b 9f 48 01 00 00 48 85 db 74 08 48 89 d8 5b 41 5c 5d c3 83
    RIP  [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
     RSP <ffff8802330efc08>
    CR2: 0000000000000148
    ---[ end trace 64484f16250de7ef ]---

The problem is almost exactly the same as the one fixed by feaff8e.
We must take a reference to the struct file when running the LOCKU
compound to prevent the final fput from running until the operation is
complete.

Reported-by: Jean Spector <jean@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
fledermaus pushed a commit to fledermaus/steamos_kernel that referenced this pull request Aug 6, 2015
[ Upstream commit d496f78 ]

A ROSE socket doesn't necessarily always have a neighbour pointer so check
if the neighbour pointer is valid before dereferencing it.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Tested-by: Bernard Pidoux <f6bvp@free.fr>
Cc: stable@vger.kernel.org ValveSoftware#2.6.11+
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
fledermaus pushed a commit to fledermaus/steamos_kernel that referenced this pull request Aug 6, 2015
[ Upstream commit 8a8c35f ]

Beginning at commit d52d399 ("ipv6: Create percpu rt6_info"), the
following INFO splat is logged:

  ===============================
  [ INFO: suspicious RCU usage. ]
  4.1.0-rc7-next-20150612 ValveSoftware#1 Not tainted
  -------------------------------
  kernel/sched/core.c:7318 Illegal context switch in RCU-bh read-side critical section!
  other info that might help us debug this:
  rcu_scheduler_active = 1, debug_locks = 0
   3 locks held by systemd/1:
   #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff815f0c8f>] rtnetlink_rcv+0x1f/0x40
   ValveSoftware#1:  (rcu_read_lock_bh){......}, at: [<ffffffff816a34e2>] ipv6_add_addr+0x62/0x540
   ValveSoftware#2:  (addrconf_hash_lock){+...+.}, at: [<ffffffff816a3604>] ipv6_add_addr+0x184/0x540
  stack backtrace:
  CPU: 0 PID: 1 Comm: systemd Not tainted 4.1.0-rc7-next-20150612 ValveSoftware#1
  Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20   04/17/2014
  Call Trace:
    dump_stack+0x4c/0x6e
    lockdep_rcu_suspicious+0xe7/0x120
    ___might_sleep+0x1d5/0x1f0
    __might_sleep+0x4d/0x90
    kmem_cache_alloc+0x47/0x250
    create_object+0x39/0x2e0
    kmemleak_alloc_percpu+0x61/0xe0
    pcpu_alloc+0x370/0x630

Additional backtrace lines are truncated.  In addition, the above splat
is followed by several "BUG: sleeping function called from invalid
context at mm/slub.c:1268" outputs.  As suggested by Martin KaFai Lau,
these are the clue to the fix.  Routine kmemleak_alloc_percpu() always
uses GFP_KERNEL for its allocations, whereas it should follow the gfp
from its callers.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@vger.kernel.org>	[3.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
fledermaus pushed a commit to fledermaus/steamos_kernel that referenced this pull request Aug 6, 2015
[ Upstream commit db2efec ]

Jean reported another crash, similar to the one fixed by feaff8e:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
    IP: [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
    PGD 0
    Oops: 0000 [ValveSoftware#1] SMP
    Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache vmw_vsock_vmci_transport vsock cfg80211 rfkill coretemp crct10dif_pclmul ppdev vmw_balloon crc32_pclmul crc32c_intel ghash_clmulni_intel pcspkr vmxnet3 parport_pc i2c_piix4 microcode serio_raw parport nfsd floppy vmw_vmci acpi_cpufreq auth_rpcgss shpchp nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih ata_generic mptbase i2c_core pata_acpi
    CPU: 0 PID: 329 Comm: kworker/0:1H Not tainted 4.1.0-rc7+ ValveSoftware#2
    Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
    Workqueue: rpciod rpc_async_schedule [sunrpc]
    30ec000
    RIP: 0010:[<ffffffff8124ef7f>]  [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
    RSP: 0018:ffff8802330efc08  EFLAGS: 00010296
    RAX: ffff8802330efc58 RBX: ffff880097187c80 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
    RBP: ffff8802330efc18 R08: ffff88023fc173d8 R09: 3038b7bf00000000
    R10: 00002f1a02000000 R11: 3038b7bf00000000 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff8802337a2300 R15: 0000000000000020
    FS:  0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000148 CR3: 000000003680f000 CR4: 00000000001407f0
    Stack:
     ffff880097187c80 ffff880097187cd8 ffff8802330efc98 ffffffff81250281
     ffff8802330efc68 ffffffffa013e7df ffff8802330efc98 0000000000000246
     ffff8801f6901c00 ffff880233d2b8d8 ffff8802330efc58 ffff8802330efc58
    Call Trace:
     [<ffffffff81250281>] __posix_lock_file+0x31/0x5e0
     [<ffffffffa013e7df>] ? rpc_wake_up_task_queue_locked.part.35+0xcf/0x240 [sunrpc]
     [<ffffffff8125088b>] posix_lock_file_wait+0x3b/0xd0
     [<ffffffffa03890b2>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4]
     [<ffffffffa0365808>] ? nfs41_sequence_done+0xd8/0x300 [nfsv4]
     [<ffffffffa0367525>] do_vfs_lock+0x35/0x40 [nfsv4]
     [<ffffffffa03690c1>] nfs4_locku_done+0x81/0x120 [nfsv4]
     [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
     [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
     [<ffffffffa013e33c>] rpc_exit_task+0x2c/0x90 [sunrpc]
     [<ffffffffa0134400>] ? call_refreshresult+0x170/0x170 [sunrpc]
     [<ffffffffa013ece4>] __rpc_execute+0x84/0x410 [sunrpc]
     [<ffffffffa013f085>] rpc_async_schedule+0x15/0x20 [sunrpc]
     [<ffffffff810add67>] process_one_work+0x147/0x400
     [<ffffffff810ae42b>] worker_thread+0x11b/0x460
     [<ffffffff810ae310>] ? rescuer_thread+0x2f0/0x2f0
     [<ffffffff810b35d9>] kthread+0xc9/0xe0
     [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pmd+0xa0/0x160
     [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
     [<ffffffff8173c222>] ret_from_fork+0x42/0x70
     [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
    Code: a5 81 e8 85 75 e4 ff c6 05 31 ee aa 00 01 eb 98 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 <48> 8b 9f 48 01 00 00 48 85 db 74 08 48 89 d8 5b 41 5c 5d c3 83
    RIP  [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
     RSP <ffff8802330efc08>
    CR2: 0000000000000148
    ---[ end trace 64484f16250de7ef ]---

The problem is almost exactly the same as the one fixed by feaff8e.
We must take a reference to the struct file when running the LOCKU
compound to prevent the final fput from running until the operation is
complete.

Reported-by: Jean Spector <jean@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Sep 16, 2015
[ Upstream commit ecf5fc6 ]

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Sep 16, 2015
[ Upstream commit 75a6f82 ]

	Normally opening a file, unlinking it and then closing will have
the inode freed upon close() (provided that it's not otherwise busy and
has no remaining links, of course).  However, there's one case where that
does *not* happen.  Namely, if you open it by fhandle with cold dcache,
then unlink() and close().

	In normal case you get d_delete() in unlink(2) notice that dentry
is busy and unhash it; on the final dput() it will be forcibly evicted from
dcache, triggering iput() and inode removal.  In this case, though, we end
up with *two* dentries - disconnected (created by open-by-fhandle) and
regular one (used by unlink()).  The latter will have its reference to inode
dropped just fine, but the former will not - it's considered hashed (it
is on the ->s_anon list), so it will stay around until the memory pressure
will finally do it in.  As the result, we have the final iput() delayed
indefinitely.  It's trivial to reproduce -

void flush_dcache(void)
{
        system("mount -o remount,rw /");
}

static char buf[20 * 1024 * 1024];

main()
{
        int fd;
        union {
                struct file_handle f;
                char buf[MAX_HANDLE_SZ];
        } x;
        int m;

        x.f.handle_bytes = sizeof(x);
        chdir("/root");
        mkdir("foo", 0700);
        fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
        close(fd);
        name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
        flush_dcache();
        fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
        unlink("foo/bar");
        write(fd, buf, sizeof(buf));
        system("df .");			/* 20Mb eaten */
        close(fd);
        system("df .");			/* should've freed those 20Mb */
        flush_dcache();
        system("df .");			/* should be the same as #2 */
}

will spit out something like
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/root         322023 303843      1131 100% /
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/root         322023 303843      1131 100% /
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/root         322023 283282     21692  93% /
- inode gets freed only when dentry is finally evicted (here we trigger
than by remount; normally it would've happened in response to memory
pressure hell knows when).

Cc: stable@vger.kernel.org # v2.6.38+; earlier ones need s/kill_it/unhash_it/
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
…_BH() in preemptible context.

[ Upstream commit 44f49dd ]

Fixes the following kernel BUG :

BUG: using __this_cpu_add() in preemptible [00000000] code: bash/2758
caller is __this_cpu_preempt_check+0x13/0x15
CPU: 0 PID: 2758 Comm: bash Tainted: P           O   3.18.19 #2
 ffffffff8170eaca ffff880110d1b788 ffffffff81482b2a 0000000000000000
 0000000000000000 ffff880110d1b7b8 ffffffff812010ae ffff880007cab800
 ffff88001a060800 ffff88013a899108 ffff880108b84240 ffff880110d1b7c8
Call Trace:
[<ffffffff81482b2a>] dump_stack+0x52/0x80
[<ffffffff812010ae>] check_preemption_disabled+0xce/0xe1
[<ffffffff812010d4>] __this_cpu_preempt_check+0x13/0x15
[<ffffffff81419d60>] ipmr_queue_xmit+0x647/0x70c
[<ffffffff8141a154>] ip_mr_forward+0x32f/0x34e
[<ffffffff8141af76>] ip_mroute_setsockopt+0xe03/0x108c
[<ffffffff810553fc>] ? get_parent_ip+0x11/0x42
[<ffffffff810e6974>] ? pollwake+0x4d/0x51
[<ffffffff81058ac0>] ? default_wake_function+0x0/0xf
[<ffffffff810553fc>] ? get_parent_ip+0x11/0x42
[<ffffffff810613d9>] ? __wake_up_common+0x45/0x77
[<ffffffff81486ea9>] ? _raw_spin_unlock_irqrestore+0x1d/0x32
[<ffffffff810618bc>] ? __wake_up_sync_key+0x4a/0x53
[<ffffffff8139a519>] ? sock_def_readable+0x71/0x75
[<ffffffff813dd226>] do_ip_setsockopt+0x9d/0xb55
[<ffffffff81429818>] ? unix_seqpacket_sendmsg+0x3f/0x41
[<ffffffff813963fe>] ? sock_sendmsg+0x6d/0x86
[<ffffffff813959d4>] ? sockfd_lookup_light+0x12/0x5d
[<ffffffff8139650a>] ? SyS_sendto+0xf3/0x11b
[<ffffffff810d5738>] ? new_sync_read+0x82/0xaa
[<ffffffff813ddd19>] compat_ip_setsockopt+0x3b/0x99
[<ffffffff813fb24a>] compat_raw_setsockopt+0x11/0x32
[<ffffffff81399052>] compat_sock_common_setsockopt+0x18/0x1f
[<ffffffff813c4d05>] compat_SyS_setsockopt+0x1a9/0x1cf
[<ffffffff813c4149>] compat_SyS_socketcall+0x180/0x1e3
[<ffffffff81488ea1>] cstar_dispatch+0x7/0x1e

Signed-off-by: Ani Sinha <ani@arista.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
commit 1f9c6e1 upstream.

There were several bugs here.

1)  The done label was in the wrong place so we didn't copy any
    information out when there was no command given.

2)  We were using PAGE_SIZE as the size of the buffer instead of
    "PAGE_SIZE - pos".

3)  snprintf() returns the number of characters that would have been
    printed if there were enough space.  If there was not enough space
    (and we had fixed the memory corruption bug #2) then it would result
    in an information leak when we do simple_read_from_buffer().  I've
    changed it to use scnprintf() instead.

I also removed the initialization at the start of the function, because
I thought it made the code a little more clear.

Fixes: 5e6e3a9 ('wireless: mwifiex: initial commit for Marvell mwifiex driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
[ Upstream commit b4fe85f ]

Drivers like vxlan use the recently introduced
udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb
makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the
packet, updates the struct stats using the usual
u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats).
udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch
tstats, so drivers like vxlan, immediately after, call
iptunnel_xmit_stats, which does the same thing - calls
u64_stats_update_begin/end on this_cpu_ptr(dev->tstats).

While vxlan is probably fine (I don't know?), calling a similar function
from, say, an unbound workqueue, on a fully preemptable kernel causes
real issues:

[  188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6
[  188.435579] caller is debug_smp_processor_id+0x17/0x20
[  188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 #2
[  188.435607] Call Trace:
[  188.435611]  [<ffffffff8234e936>] dump_stack+0x4f/0x7b
[  188.435615]  [<ffffffff81915f3d>] check_preemption_disabled+0x19d/0x1c0
[  188.435619]  [<ffffffff81915f77>] debug_smp_processor_id+0x17/0x20

The solution would be to protect the whole
this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with
disabling preemption and then reenabling it.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
[ Upstream commit 1b8e6a0 ]

When a passive TCP is created, we eventually call tcp_md5_do_add()
with sk pointing to the child. It is not owner by the user yet (we
will add this socket into listener accept queue a bit later anyway)

But we do own the spinlock, so amend the lockdep annotation to avoid
following splat :

[ 8451.090932] net/ipv4/tcp_ipv4.c:923 suspicious rcu_dereference_protected() usage!
[ 8451.090932]
[ 8451.090932] other info that might help us debug this:
[ 8451.090932]
[ 8451.090934]
[ 8451.090934] rcu_scheduler_active = 1, debug_locks = 1
[ 8451.090936] 3 locks held by socket_sockopt_/214795:
[ 8451.090936]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff855c6ac1>] __netif_receive_skb_core+0x151/0xe90
[ 8451.090947]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff85618143>] ip_local_deliver_finish+0x43/0x2b0
[ 8451.090952]  #2:  (slock-AF_INET){+.-...}, at: [<ffffffff855acda5>] sk_clone_lock+0x1c5/0x500
[ 8451.090958]
[ 8451.090958] stack backtrace:
[ 8451.090960] CPU: 7 PID: 214795 Comm: socket_sockopt_

[ 8451.091215] Call Trace:
[ 8451.091216]  <IRQ>  [<ffffffff856fb29c>] dump_stack+0x55/0x76
[ 8451.091229]  [<ffffffff85123b5b>] lockdep_rcu_suspicious+0xeb/0x110
[ 8451.091235]  [<ffffffff8564544f>] tcp_md5_do_add+0x1bf/0x1e0
[ 8451.091239]  [<ffffffff85645751>] tcp_v4_syn_recv_sock+0x1f1/0x4c0
[ 8451.091242]  [<ffffffff85642b27>] ? tcp_v4_md5_hash_skb+0x167/0x190
[ 8451.091246]  [<ffffffff85647c78>] tcp_check_req+0x3c8/0x500
[ 8451.091249]  [<ffffffff856451ae>] ? tcp_v4_inbound_md5_hash+0x11e/0x190
[ 8451.091253]  [<ffffffff85647170>] tcp_v4_rcv+0x3c0/0x9f0
[ 8451.091256]  [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0
[ 8451.091260]  [<ffffffff856181b6>] ip_local_deliver_finish+0xb6/0x2b0
[ 8451.091263]  [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0
[ 8451.091267]  [<ffffffff85618d38>] ip_local_deliver+0x48/0x80
[ 8451.091270]  [<ffffffff85618510>] ip_rcv_finish+0x160/0x700
[ 8451.091273]  [<ffffffff8561900e>] ip_rcv+0x29e/0x3d0
[ 8451.091277]  [<ffffffff855c74b7>] __netif_receive_skb_core+0xb47/0xe90

Fixes: a8afca0 ("tcp: md5: protects md5sig_info with RCU")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
commit e46e31a upstream.

When using the Promise TX2+ SATA controller on PA-RISC, the system often
crashes with kernel panic, for example just writing data with the dd
utility will make it crash.

Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources

CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2
Backtrace:
 [<000000004021497c>] show_stack+0x14/0x20
 [<0000000040410bf0>] dump_stack+0x88/0x100
 [<000000004023978c>] panic+0x124/0x360
 [<0000000040452c18>] sba_alloc_range+0x698/0x6a0
 [<0000000040453150>] sba_map_sg+0x260/0x5b8
 [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata]
 [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata]
 [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata]
 [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130
 [<000000004049da34>] scsi_request_fn+0x6e4/0x970
 [<00000000403e95a8>] __blk_run_queue+0x40/0x60
 [<00000000403e9d8c>] blk_run_queue+0x3c/0x68
 [<000000004049a534>] scsi_run_queue+0x2a4/0x360
 [<000000004049be68>] scsi_end_request+0x1a8/0x238
 [<000000004049de84>] scsi_io_completion+0xfc/0x688
 [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0

The cause of the crash is not exhaustion of the IOMMU space, there is
plenty of free pages. The function sba_alloc_range is called with size
0x11000, thus the pages_needed variable is 0x11. The function
sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
0x10 (because dma_get_seg_boundary(dev) returns 0xffff).

The function sba_search_bitmap attempts to allocate 17 pages that must not
cross 16-page boundary - it can't satisfy this requirement
(iommu_is_span_boundary always returns true) and fails even if there are
many free entries in the IOMMU space.

How did it happen that we try to allocate 17 pages that don't cross
16-page boundary? The cause is in the function iommu_coalesce_chunks. This
function tries to coalesce adjacent entries in the scatterlist. The
function does several checks if it may coalesce one entry with the next,
one of those checks is this:

	if (startsg->length + dma_len > max_seg_size)
		break;

When it finishes coalescing adjacent entries, it allocates the mapping:

sg_dma_len(contig_sg) = dma_len;
dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
sg_dma_address(contig_sg) =
	PIDE_FLAG
	| (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT)
	| dma_offset;

It is possible that (startsg->length + dma_len > max_seg_size) is false
(we are just near the 0x10000 max_seg_size boundary), so the funcion
decides to coalesce this entry with the next entry. When the coalescing
succeeds, the function performs
	dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
to allocate 17 pages for a device that must not cross 16-page boundary.

To fix the bug, we must make sure that dma_len after addition of
dma_offset and alignment doesn't cross the segment boundary. I.e. change
	if (startsg->length + dma_len > max_seg_size)
		break;
to
	if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
		break;

This patch makes this change (it precalculates max_seg_boundary at the
beginning of the function iommu_coalesce_chunks). I also added a check
that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
not needed for Promise TX2+ SATA, but it may be needed for other devices
that have dma_get_seg_boundary lower than dma_get_max_seg_size).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
[ Upstream commit 361cad3 ]

We've seen this in a packet capture - I've intermixed what I
think was going on. The fix here is to grab the so_lock sooner.

1964379 -> #1 open (for write) reply seqid=1
1964393 -> #2 open (for read) reply seqid=2

  __nfs4_close(), state->n_wronly--
  nfs4_state_set_mode_locked(), changes state->state = [R]
  state->flags is [RW]
  state->state is [R], state->n_wronly == 0, state->n_rdonly == 1

1964398 -> #3 open (for write) call -> because close is already running
1964399 -> downgrade (to read) call seqid=2 (close of #1)
1964402 -> #3 open (for write) reply seqid=3

 __update_open_stateid()
   nfs_set_open_stateid_locked(), changes state->flags
   state->flags is [RW]
   state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
   new sequence number is exposed now via nfs4_stateid_copy()

   next step would be update_open_stateflags(), pending so_lock

1964403 -> downgrade reply seqid=2, fails with OLD_STATEID (close of #1)

   nfs4_close_prepare() gets so_lock and recalcs flags -> send close

1964405 -> downgrade (to read) call seqid=3 (close of #1 retry)

   __update_open_stateid() gets so_lock
 * update_open_stateflags() updates state->n_wronly.
   nfs4_state_set_mode_locked() updates state->state

   state->flags is [RW]
   state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1

 * should have suppressed the preceding nfs4_close_prepare() from
   sending open_downgrade

1964406 -> write call
1964408 -> downgrade (to read) reply seqid=4 (close of #1 retry)

   nfs_clear_open_stateid_locked()
   state->flags is [R]
   state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1

1964409 -> write reply (fails, openmode)

Signed-off-by: Andrew Elble <aweits@rit.edu>
Cc: stable@vger,kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
[ Upstream commit 3d5fe03 ]

We can end up allocating a new compression stream with GFP_KERNEL from
within the IO path, which may result is nested (recursive) IO
operations.  That can introduce problems if the IO path in question is a
reclaimer, holding some locks that will deadlock nested IOs.

Allocate streams and working memory using GFP_NOIO flag, forbidding
recursive IO and FS operations.

An example:

  inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
  git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes:
   (jbd2_handle){+.+.?.}, at:  start_this_handle+0x4ca/0x555
  {IN-RECLAIM_FS-W} state was registered at:
     __lock_acquire+0x8da/0x117b
     lock_acquire+0x10c/0x1a7
     start_this_handle+0x52d/0x555
     jbd2__journal_start+0xb4/0x237
     __ext4_journal_start_sb+0x108/0x17e
     ext4_dirty_inode+0x32/0x61
     __mark_inode_dirty+0x16b/0x60c
     iput+0x11e/0x274
     __dentry_kill+0x148/0x1b8
     shrink_dentry_list+0x274/0x44a
     prune_dcache_sb+0x4a/0x55
     super_cache_scan+0xfc/0x176
     shrink_slab.part.14.constprop.25+0x2a2/0x4d3
     shrink_zone+0x74/0x140
     kswapd+0x6b7/0x930
     kthread+0x107/0x10f
     ret_from_fork+0x3f/0x70
  irq event stamp: 138297
  hardirqs last  enabled at (138297):  debug_check_no_locks_freed+0x113/0x12f
  hardirqs last disabled at (138296):  debug_check_no_locks_freed+0x33/0x12f
  softirqs last  enabled at (137818):  __do_softirq+0x2d3/0x3e9
  softirqs last disabled at (137813):  irq_exit+0x41/0x95

               other info that might help us debug this:
   Possible unsafe locking scenario:
         CPU0
         ----
    lock(jbd2_handle);
    <Interrupt>
      lock(jbd2_handle);

                *** DEADLOCK ***
  5 locks held by git/20158:
   #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff81155411>] mnt_want_write+0x24/0x4b
   #1:  (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff81145087>] lock_rename+0xd9/0xe3
   #2:  (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8114f8e2>] lock_two_nondirectories+0x3f/0x6b
   #3:  (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [<ffffffff8114f909>] lock_two_nondirectories+0x66/0x6b
   #4:  (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555

               stack backtrace:
  CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211
  Call Trace:
    dump_stack+0x4c/0x6e
    mark_lock+0x384/0x56d
    mark_held_locks+0x5f/0x76
    lockdep_trace_alloc+0xb2/0xb5
    kmem_cache_alloc_trace+0x32/0x1e2
    zcomp_strm_alloc+0x25/0x73 [zram]
    zcomp_strm_multi_find+0xe7/0x173 [zram]
    zcomp_strm_find+0xc/0xe [zram]
    zram_bvec_rw+0x2ca/0x7e0 [zram]
    zram_make_request+0x1fa/0x301 [zram]
    generic_make_request+0x9c/0xdb
    submit_bio+0xf7/0x120
    ext4_io_submit+0x2e/0x43
    ext4_bio_write_page+0x1b7/0x300
    mpage_submit_page+0x60/0x77
    mpage_map_and_submit_buffers+0x10f/0x21d
    ext4_writepages+0xc8c/0xe1b
    do_writepages+0x23/0x2c
    __filemap_fdatawrite_range+0x84/0x8b
    filemap_flush+0x1c/0x1e
    ext4_alloc_da_blocks+0xb8/0x117
    ext4_rename+0x132/0x6dc
    ? mark_held_locks+0x5f/0x76
    ext4_rename2+0x29/0x2b
    vfs_rename+0x540/0x636
    SyS_renameat2+0x359/0x44d
    SyS_rename+0x1e/0x20
    entry_SYSCALL_64_fastpath+0x12/0x6f

[minchan@kernel.org: add stable mark]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Kyeongdon Kim <kyeongdon.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
[ Upstream commit c2d6cb1 ]

While running a stress test I ran into a deadlock when running the delayed
iputs at transaction time, which produced the following report and trace:

[  886.399989] =============================================
[  886.400871] [ INFO: possible recursive locking detected ]
[  886.401663] 4.4.0-rc6-btrfs-next-18+ #1 Not tainted
[  886.402384] ---------------------------------------------
[  886.403182] fio/8277 is trying to acquire lock:
[  886.403568]  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
[  886.403568]
[  886.403568] but task is already holding lock:
[  886.403568]  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
[  886.403568]
[  886.403568] other info that might help us debug this:
[  886.403568]  Possible unsafe locking scenario:
[  886.403568]
[  886.403568]        CPU0
[  886.403568]        ----
[  886.403568]   lock(&fs_info->delayed_iput_sem);
[  886.403568]   lock(&fs_info->delayed_iput_sem);
[  886.403568]
[  886.403568]  *** DEADLOCK ***
[  886.403568]
[  886.403568]  May be due to missing lock nesting notation
[  886.403568]
[  886.403568] 3 locks held by fio/8277:
[  886.403568]  #0:  (sb_writers#11){.+.+.+}, at: [<ffffffff81174c4c>] __sb_start_write+0x5f/0xb0
[  886.403568]  #1:  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa054620d>] btrfs_file_write_iter+0x73/0x408 [btrfs]
[  886.403568]  #2:  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
[  886.403568]
[  886.403568] stack backtrace:
[  886.403568] CPU: 6 PID: 8277 Comm: fio Not tainted 4.4.0-rc6-btrfs-next-18+ #1
[  886.403568] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[  886.403568]  0000000000000000 ffff88009f80f770 ffffffff8125d4fd ffffffff82af1fc0
[  886.403568]  ffff88009f80f830 ffffffff8108e5f9 0000000200000000 ffff88009fd92290
[  886.403568]  0000000000000000 ffffffff82af1fc0 ffffffff829cfb01 00042b216d008804
[  886.403568] Call Trace:
[  886.403568]  [<ffffffff8125d4fd>] dump_stack+0x4e/0x79
[  886.403568]  [<ffffffff8108e5f9>] __lock_acquire+0xd42/0xf0b
[  886.403568]  [<ffffffff810c22db>] ? __module_address+0xdf/0x108
[  886.403568]  [<ffffffff8108eb77>] lock_acquire+0x10d/0x194
[  886.403568]  [<ffffffff8108eb77>] ? lock_acquire+0x10d/0x194
[  886.403568]  [<ffffffffa0538823>] ? btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
[  886.489542]  [<ffffffff8148556b>] down_read+0x3e/0x4d
[  886.489542]  [<ffffffffa0538823>] ? btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
[  886.489542]  [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
[  886.489542]  [<ffffffffa0533953>] btrfs_commit_transaction+0x8f5/0x96e [btrfs]
[  886.489542]  [<ffffffffa0521d7a>] flush_space+0x435/0x44a [btrfs]
[  886.489542]  [<ffffffffa052218b>] ? reserve_metadata_bytes+0x26a/0x384 [btrfs]
[  886.489542]  [<ffffffffa05221ae>] reserve_metadata_bytes+0x28d/0x384 [btrfs]
[  886.489542]  [<ffffffffa052256c>] ? btrfs_block_rsv_refill+0x58/0x96 [btrfs]
[  886.489542]  [<ffffffffa0522584>] btrfs_block_rsv_refill+0x70/0x96 [btrfs]
[  886.489542]  [<ffffffffa053d747>] btrfs_evict_inode+0x394/0x55a [btrfs]
[  886.489542]  [<ffffffff81188e31>] evict+0xa7/0x15c
[  886.489542]  [<ffffffff81189878>] iput+0x1d3/0x266
[  886.489542]  [<ffffffffa053887c>] btrfs_run_delayed_iputs+0x8f/0xbf [btrfs]
[  886.489542]  [<ffffffffa0533953>] btrfs_commit_transaction+0x8f5/0x96e [btrfs]
[  886.489542]  [<ffffffff81085096>] ? signal_pending_state+0x31/0x31
[  886.489542]  [<ffffffffa0521191>] btrfs_alloc_data_chunk_ondemand+0x1d7/0x288 [btrfs]
[  886.489542]  [<ffffffffa0521282>] btrfs_check_data_free_space+0x40/0x59 [btrfs]
[  886.489542]  [<ffffffffa05228f5>] btrfs_delalloc_reserve_space+0x1e/0x4e [btrfs]
[  886.489542]  [<ffffffffa053620a>] btrfs_direct_IO+0x10c/0x27e [btrfs]
[  886.489542]  [<ffffffff8111d9a1>] generic_file_direct_write+0xb3/0x128
[  886.489542]  [<ffffffffa05463c3>] btrfs_file_write_iter+0x229/0x408 [btrfs]
[  886.489542]  [<ffffffff8108ae38>] ? __lock_is_held+0x38/0x50
[  886.489542]  [<ffffffff8117279e>] __vfs_write+0x7c/0xa5
[  886.489542]  [<ffffffff81172cda>] vfs_write+0xa0/0xe4
[  886.489542]  [<ffffffff811734cc>] SyS_write+0x50/0x7e
[  886.489542]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[ 1081.852335] INFO: task fio:8244 blocked for more than 120 seconds.
[ 1081.854348]       Not tainted 4.4.0-rc6-btrfs-next-18+ #1
[ 1081.857560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1081.863227] fio        D ffff880213f9bb28     0  8244   8240 0x00000000
[ 1081.868719]  ffff880213f9bb28 00ffffff810fc6b0 ffffffff0000000a ffff88023ed55240
[ 1081.872499]  ffff880206b5d400 ffff880213f9c000 ffff88020a4d5318 ffff880206b5d400
[ 1081.876834]  ffffffff00000001 ffff880206b5d400 ffff880213f9bb40 ffffffff81482ba4
[ 1081.880782] Call Trace:
[ 1081.881793]  [<ffffffff81482ba4>] schedule+0x7f/0x97
[ 1081.883340]  [<ffffffff81485eb5>] rwsem_down_write_failed+0x2d5/0x325
[ 1081.895525]  [<ffffffff8108d48d>] ? trace_hardirqs_on_caller+0x16/0x1ab
[ 1081.897419]  [<ffffffff81269723>] call_rwsem_down_write_failed+0x13/0x20
[ 1081.899251]  [<ffffffff81269723>] ? call_rwsem_down_write_failed+0x13/0x20
[ 1081.901063]  [<ffffffff81089fae>] ? __down_write_nested.isra.0+0x1f/0x21
[ 1081.902365]  [<ffffffff814855bd>] down_write+0x43/0x57
[ 1081.903846]  [<ffffffffa05211b0>] ? btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
[ 1081.906078]  [<ffffffffa05211b0>] btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
[ 1081.908846]  [<ffffffff8108d461>] ? mark_held_locks+0x56/0x6c
[ 1081.910409]  [<ffffffffa0521282>] btrfs_check_data_free_space+0x40/0x59 [btrfs]
[ 1081.912482]  [<ffffffffa05228f5>] btrfs_delalloc_reserve_space+0x1e/0x4e [btrfs]
[ 1081.914597]  [<ffffffffa053620a>] btrfs_direct_IO+0x10c/0x27e [btrfs]
[ 1081.919037]  [<ffffffff8111d9a1>] generic_file_direct_write+0xb3/0x128
[ 1081.920754]  [<ffffffffa05463c3>] btrfs_file_write_iter+0x229/0x408 [btrfs]
[ 1081.922496]  [<ffffffff8108ae38>] ? __lock_is_held+0x38/0x50
[ 1081.923922]  [<ffffffff8117279e>] __vfs_write+0x7c/0xa5
[ 1081.925275]  [<ffffffff81172cda>] vfs_write+0xa0/0xe4
[ 1081.926584]  [<ffffffff811734cc>] SyS_write+0x50/0x7e
[ 1081.927968]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[ 1081.985293] INFO: lockdep is turned off.
[ 1081.986132] INFO: task fio:8249 blocked for more than 120 seconds.
[ 1081.987434]       Not tainted 4.4.0-rc6-btrfs-next-18+ #1
[ 1081.988534] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1081.990147] fio        D ffff880218febbb8     0  8249   8240 0x00000000
[ 1081.991626]  ffff880218febbb8 00ffffff81486b8e ffff88020000000b ffff88023ed75240
[ 1081.993258]  ffff8802120a9a00 ffff880218fec000 ffff88020a4d5318 ffff8802120a9a00
[ 1081.994850]  ffffffff00000001 ffff8802120a9a00 ffff880218febbd0 ffffffff81482ba4
[ 1081.996485] Call Trace:
[ 1081.997037]  [<ffffffff81482ba4>] schedule+0x7f/0x97
[ 1081.998017]  [<ffffffff81485eb5>] rwsem_down_write_failed+0x2d5/0x325
[ 1081.999241]  [<ffffffff810852a5>] ? finish_wait+0x6d/0x76
[ 1082.000306]  [<ffffffff81269723>] call_rwsem_down_write_failed+0x13/0x20
[ 1082.001533]  [<ffffffff81269723>] ? call_rwsem_down_write_failed+0x13/0x20
[ 1082.002776]  [<ffffffff81089fae>] ? __down_write_nested.isra.0+0x1f/0x21
[ 1082.003995]  [<ffffffff814855bd>] down_write+0x43/0x57
[ 1082.005000]  [<ffffffffa05211b0>] ? btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
[ 1082.007403]  [<ffffffffa05211b0>] btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
[ 1082.008988]  [<ffffffffa0545064>] btrfs_fallocate+0x7c1/0xc2f [btrfs]
[ 1082.010193]  [<ffffffff8108a1ba>] ? percpu_down_read+0x4e/0x77
[ 1082.011280]  [<ffffffff81174c4c>] ? __sb_start_write+0x5f/0xb0
[ 1082.012265]  [<ffffffff81174c4c>] ? __sb_start_write+0x5f/0xb0
[ 1082.013021]  [<ffffffff811712e4>] vfs_fallocate+0x170/0x1ff
[ 1082.013738]  [<ffffffff81181ebb>] ioctl_preallocate+0x89/0x9b
[ 1082.014778]  [<ffffffff811822d7>] do_vfs_ioctl+0x40a/0x4ea
[ 1082.015778]  [<ffffffff81176ea7>] ? SYSC_newfstat+0x25/0x2e
[ 1082.016806]  [<ffffffff8118b4de>] ? __fget_light+0x4d/0x71
[ 1082.017789]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
[ 1082.018706]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f

This happens because we can recursively acquire the semaphore
fs_info->delayed_iput_sem when attempting to allocate space to satisfy
a file write request as shown in the first trace above - when committing
a transaction we acquire (down_read) the semaphore before running the
delayed iputs, and when running a delayed iput() we can end up calling
an inode's eviction handler, which in turn commits another transaction
and attempts to acquire (down_read) again the semaphore to run more
delayed iput operations.
This results in a deadlock because if a task acquires multiple times a
semaphore it should invoke down_read_nested() with a different lockdep
class for each level of recursion.

Fix this by simplifying the implementation and use a mutex instead that
is acquired by the cleaner kthread before it runs the delayed iputs
instead of always acquiring a semaphore before delayed references are
run from anywhere.

Fixes: d7c1517 (btrfs: Fix NO_SPACE bug caused by delayed-iput)
Cc: stable@vger.kernel.org   # 4.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
[ Upstream commit 57adec8 ]

Calling apply_to_page_range with an empty range results in a BUG_ON
from the core code. This can be triggered by trying to load the st_drv
module with CONFIG_DEBUG_SET_MODULE_RONX enabled:

  kernel BUG at mm/memory.c:1874!
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 3 PID: 1764 Comm: insmod Not tainted 4.5.0-rc1+ #2
  Hardware name: ARM Juno development board (r0) (DT)
  task: ffffffc9763b8000 ti: ffffffc975af8000 task.ti: ffffffc975af8000
  PC is at apply_to_page_range+0x2cc/0x2d0
  LR is at change_memory_common+0x80/0x108

This patch fixes the issue by making change_memory_common (called by the
set_memory_* functions) a NOP when numpages == 0, therefore avoiding the
erroneous call to apply_to_page_range and bringing us into line with x86
and s390.

Cc: <stable@vger.kernel.org>
Reviewed-by: Laura Abbott <labbott@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mika Penttilä <mika.penttila@nextfour.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
[ Upstream commit 00cd29b ]

The starting node for a klist iteration is often passed in from
somewhere way above the klist infrastructure, meaning there's no
guarantee the node is still on the list.  We've seen this in SCSI where
we use bus_find_device() to iterate through a list of devices.  In the
face of heavy hotplug activity, the last device returned by
bus_find_device() can be removed before the next call.  This leads to

Dec  3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 at include/linux/kref.h:47 klist_iter_init_node+0x3d/0x50()
Dec  3 13:22:02 localhost kernel: Modules linked in: scsi_debug x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd grace sunrpc tg3 ptp pps_core
Dec  3 13:22:02 localhost kernel: CPU: 2 PID: 28073 Comm: cat Not tainted 4.4.0-rc1+ #2
Dec  3 13:22:02 localhost kernel: Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013
Dec  3 13:22:02 localhost kernel: ffffffff81a20e77 ffff880613acfd18 ffffffff81321eef 0000000000000000
Dec  3 13:22:02 localhost kernel: ffff880613acfd50 ffffffff8107ca52 ffff88061176b198 0000000000000000
Dec  3 13:22:02 localhost kernel: ffffffff814542b0 ffff880610cfb100 ffff88061176b198 ffff880613acfd60
Dec  3 13:22:02 localhost kernel: Call Trace:
Dec  3 13:22:02 localhost kernel: [<ffffffff81321eef>] dump_stack+0x44/0x55
Dec  3 13:22:02 localhost kernel: [<ffffffff8107ca52>] warn_slowpath_common+0x82/0xc0
Dec  3 13:22:02 localhost kernel: [<ffffffff814542b0>] ? proc_scsi_show+0x20/0x20
Dec  3 13:22:02 localhost kernel: [<ffffffff8107cb4a>] warn_slowpath_null+0x1a/0x20
Dec  3 13:22:02 localhost kernel: [<ffffffff8167225d>] klist_iter_init_node+0x3d/0x50
Dec  3 13:22:02 localhost kernel: [<ffffffff81421d41>] bus_find_device+0x51/0xb0
Dec  3 13:22:02 localhost kernel: [<ffffffff814545ad>] scsi_seq_next+0x2d/0x40
[...]

And an eventual crash. It can actually occur in any hotplug system
which has a device finder and a starting device.

We can fix this globally by making sure the starting node for
klist_iter_init_node() is actually a member of the list before using it
(and by starting from the beginning if it isn't).

Reported-by: Ewan D. Milne <emilne@redhat.com>
Tested-by: Ewan D. Milne <emilne@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
[ Upstream commit d144dfe ]

If we use USB ID pin as wakeup source, and there is a USB block
device on this USB OTG (ID) cable, the system will be deadlock
after system resume.

The root cause for this problem is: the workqueue ci_otg may try
to remove hcd before the driver resume has finished, and hcd will
disconnect the device on it, then, it will call device_release_driver,
and holds the device lock "dev->mutex", but it is never unlocked since
it waits workqueue writeback to run to flush the block information, but
the workqueue writeback is freezable, it is not thawed before driver
resume has finished.

When the driver (device: sd 0:0:0:0:) resume goes to dpm_complete, it
tries to get its device lock "dev->mutex", but it can't get it forever,
then the deadlock occurs. Below call stacks show the situation.

So, in order to fix this problem, we need to change workqueue ci_otg
as freezable, then the work item in this workqueue will be run after
driver's resume, this workqueue will not be blocked forever like above
case since the workqueue writeback has been thawed too.

Tested at: i.mx6qdl-sabresd and i.mx6sx-sdb.

[  555.178869] kworker/u2:13   D c07de74c     0   826      2 0x00000000
[  555.185310] Workqueue: ci_otg ci_otg_work
[  555.189353] Backtrace:
[  555.191849] [<c07de4fc>] (__schedule) from [<c07dec6c>] (schedule+0x48/0xa0)
[  555.198912]  r10:ee471ba0 r9:00000000 r8:00000000 r7:00000002 r6:ee470000 r5:ee471ba4
[  555.206867]  r4:ee470000
[  555.209453] [<c07dec24>] (schedule) from [<c07e2fc4>] (schedule_timeout+0x15c/0x1e0)
[  555.217212]  r4:7fffffff r3:edc2b000
[  555.220862] [<c07e2e68>] (schedule_timeout) from [<c07df6c8>] (wait_for_common+0x94/0x144)
[  555.229140]  r8:00000000 r7:00000002 r6:ee470000 r5:ee471ba4 r4:7fffffff
[  555.235980] [<c07df634>] (wait_for_common) from [<c07df790>] (wait_for_completion+0x18/0x1c)
[  555.244430]  r10:00000001 r9:c0b5563c r8:c0042e48 r7:ef086000 r6:eea4372c r5:ef131b00
[  555.252383]  r4:00000000
[  555.254970] [<c07df778>] (wait_for_completion) from [<c0043cb8>] (flush_work+0x19c/0x234)
[  555.263177] [<c0043b1c>] (flush_work) from [<c0043fac>] (flush_delayed_work+0x48/0x4c)
[  555.271106]  r8:ed5b5000 r7:c0b38a3c r6:eea439cc r5:eea4372c r4:eea4372c
[  555.277958] [<c0043f64>] (flush_delayed_work) from [<c00eae18>] (bdi_unregister+0x84/0xec)
[  555.286236]  r4:eea43520 r3:20000153
[  555.289885] [<c00ead94>] (bdi_unregister) from [<c02c2154>] (blk_cleanup_queue+0x180/0x29c)
[  555.298250]  r5:eea43808 r4:eea43400
[  555.301909] [<c02c1fd4>] (blk_cleanup_queue) from [<c0417914>] (__scsi_remove_device+0x48/0xb8)
[  555.310623]  r7:00000000 r6:20000153 r5:ededa950 r4:ededa800
[  555.316403] [<c04178cc>] (__scsi_remove_device) from [<c0415e90>] (scsi_forget_host+0x64/0x68)
[  555.325028]  r5:ededa800 r4:ed5b5000
[  555.328689] [<c0415e2c>] (scsi_forget_host) from [<c0409828>] (scsi_remove_host+0x78/0x104)
[  555.337054]  r5:ed5b5068 r4:ed5b5000
[  555.340709] [<c04097b0>] (scsi_remove_host) from [<c04cdfcc>] (usb_stor_disconnect+0x50/0xb4)
[  555.349247]  r6:ed5b56e4 r5:ed5b5818 r4:ed5b5690 r3:00000008
[  555.355025] [<c04cdf7c>] (usb_stor_disconnect) from [<c04b3bc8>] (usb_unbind_interface+0x78/0x25c)
[  555.363997]  r8:c13919b4 r7:edd3c000 r6:edd3c020 r5:ee551c68 r4:ee551c00 r3:c04cdf7c
[  555.371892] [<c04b3b50>] (usb_unbind_interface) from [<c03dc248>] (__device_release_driver+0x8c/0x118)
[  555.381213]  r10:00000001 r9:edd90c00 r8:c13919b4 r7:ee551c68 r6:c0b546e0 r5:c0b5563c
[  555.389167]  r4:edd3c020
[  555.391752] [<c03dc1bc>] (__device_release_driver) from [<c03dc2fc>] (device_release_driver+0x28/0x34)
[  555.401071]  r5:edd3c020 r4:edd3c054
[  555.404721] [<c03dc2d4>] (device_release_driver) from [<c03db304>] (bus_remove_device+0xe0/0x110)
[  555.413607]  r5:edd3c020 r4:ef17f04c
[  555.417253] [<c03db224>] (bus_remove_device) from [<c03d8128>] (device_del+0x114/0x21c)
[  555.425270]  r6:edd3c028 r5:edd3c020 r4:ee551c00 r3:00000000
[  555.431045] [<c03d8014>] (device_del) from [<c04b1560>] (usb_disable_device+0xa4/0x1e8)
[  555.439061]  r8:edd3c000 r7:eded8000 r6:00000000 r5:00000001 r4:ee551c00
[  555.445906] [<c04b14bc>] (usb_disable_device) from [<c04a8e54>] (usb_disconnect+0x74/0x224)
[  555.454271]  r9:edd90c00 r8:ee551000 r7:ee551c68 r6:ee551c9c r5:ee551c00 r4:00000001
[  555.462156] [<c04a8de0>] (usb_disconnect) from [<c04a8fb8>] (usb_disconnect+0x1d8/0x224)
[  555.470259]  r10:00000001 r9:edd90000 r8:ee471e2c r7:ee551468 r6:ee55149c r5:ee551400
[  555.478213]  r4:00000001
[  555.480797] [<c04a8de0>] (usb_disconnect) from [<c04ae5ec>] (usb_remove_hcd+0xa0/0x1ac)
[  555.488813]  r10:00000001 r9:ee471eb0 r8:00000000 r7:ef3d9500 r6:eded810c r5:eded80b0
[  555.496765]  r4:eded8000
[  555.499351] [<c04ae54c>] (usb_remove_hcd) from [<c04d4158>] (host_stop+0x28/0x64)
[  555.506847]  r6:eeb50010 r5:eded8000 r4:eeb51010
[  555.511563] [<c04d4130>] (host_stop) from [<c04d09b8>] (ci_otg_work+0xc4/0x124)
[  555.518885]  r6:00000001 r5:eeb50010 r4:eeb502a0 r3:c04d4130
[  555.524665] [<c04d08f4>] (ci_otg_work) from [<c00454f0>] (process_one_work+0x194/0x420)
[  555.532682]  r6:ef086000 r5:eeb502a0 r4:edc44480
[  555.537393] [<c004535c>] (process_one_work) from [<c00457b0>] (worker_thread+0x34/0x514)
[  555.545496]  r10:edc44480 r9:ef086000 r8:c0b1a100 r7:ef086034 r6:00000088 r5:edc44498
[  555.553450]  r4:ef086000
[  555.556032] [<c004577c>] (worker_thread) from [<c004bab4>] (kthread+0xdc/0xf8)
[  555.563268]  r10:00000000 r9:00000000 r8:00000000 r7:c004577c r6:edc44480 r5:eddc15c0
[  555.571221]  r4:00000000
[  555.573804] [<c004b9d8>] (kthread) from [<c000fef0>] (ret_from_fork+0x14/0x24)
[  555.581040]  r7:00000000 r6:00000000 r5:c004b9d8 r4:eddc15c0

[  553.429383] sh              D c07de74c     0   694    691 0x00000000
[  553.435801] Backtrace:
[  553.438295] [<c07de4fc>] (__schedule) from [<c07dec6c>] (schedule+0x48/0xa0)
[  553.445358]  r10:edd3c054 r9:edd3c078 r8:edddbd50 r7:edcbbc00 r6:c1377c34 r5:60000153
[  553.453313]  r4:eddda000
[  553.455896] [<c07dec24>] (schedule) from [<c07deff8>] (schedule_preempt_disabled+0x10/0x14)
[  553.464261]  r4:edd3c058 r3:0000000a
[  553.467910] [<c07defe8>] (schedule_preempt_disabled) from [<c07e0bbc>] (mutex_lock_nested+0x1a0/0x3e8)
[  553.477254] [<c07e0a1c>] (mutex_lock_nested) from [<c03e927c>] (dpm_complete+0xc0/0x1b0)
[  553.485358]  r10:00561408 r9:edd3c054 r8:c0b4863c r7:edddbd90 r6:c0b485d8 r5:edd3c020
[  553.493313]  r4:edd3c0d0
[  553.495896] [<c03e91bc>] (dpm_complete) from [<c03e9388>] (dpm_resume_end+0x1c/0x20)
[  553.503652]  r9:00000000 r8:c0b1a9d0 r7:c1334ec0 r6:c1334edc r5:00000003 r4:00000010
[  553.511544] [<c03e936c>] (dpm_resume_end) from [<c0079894>] (suspend_devices_and_enter+0x158/0x504)
[  553.520604]  r4:00000000 r3:c1334efc
[  553.524250] [<c007973c>] (suspend_devices_and_enter) from [<c0079e74>] (pm_suspend+0x234/0x2cc)
[  553.532961]  r10:00561408 r9:ed6b7300 r8:00000004 r7:c1334eec r6:00000000 r5:c1334ee8
[  553.540914]  r4:00000003
[  553.543493] [<c0079c40>] (pm_suspend) from [<c0078a6c>] (state_store+0x6c/0xc0)

[  555.703684] 7 locks held by kworker/u2:13/826:
[  555.708140]  #0:  ("%s""ci_otg"){++++.+}, at: [<c0045484>] process_one_work+0x128/0x420
[  555.716277]  #1:  ((&ci->work)){+.+.+.}, at: [<c0045484>] process_one_work+0x128/0x420
[  555.724317]  #2:  (usb_bus_list_lock){+.+.+.}, at: [<c04ae5e4>] usb_remove_hcd+0x98/0x1ac
[  555.732626]  #3:  (&dev->mutex){......}, at: [<c04a8e28>] usb_disconnect+0x48/0x224
[  555.740403]  #4:  (&dev->mutex){......}, at: [<c04a8e28>] usb_disconnect+0x48/0x224
[  555.748179]  #5:  (&dev->mutex){......}, at: [<c03dc2f4>] device_release_driver+0x20/0x34
[  555.756487]  #6:  (&shost->scan_mutex){+.+.+.}, at: [<c04097d0>] scsi_remove_host+0x20/0x104

Cc: <stable@vger.kernel.org> #v3.14+
Cc: Jun Li <jun.li@nxp.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
[ Upstream commit cf0a254 ]

BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
in_atomic(): 1, irqs_disabled(): 128, pid: 383, name: sh
Preemption disabled at:[<ffff800000124c18>] kgdb_cpu_enter+0x158/0x6b8

CPU: 3 PID: 383 Comm: sh Tainted: G        W       4.1.13-rt13 #2
Hardware name: Freescale Layerscape 2085a RDB Board (DT)
Call trace:
[<ffff8000000885e8>] dump_backtrace+0x0/0x128
[<ffff800000088734>] show_stack+0x24/0x30
[<ffff80000079a7c4>] dump_stack+0x80/0xa0
[<ffff8000000bd324>] ___might_sleep+0x18c/0x1a0
[<ffff8000007a20ac>] __rt_spin_lock+0x2c/0x40
[<ffff8000007a2268>] rt_read_lock+0x40/0x58
[<ffff800000085328>] single_step_handler+0x38/0xd8
[<ffff800000082368>] do_debug_exception+0x58/0xb8
Exception stack(0xffff80834a1e7c80 to 0xffff80834a1e7da0)
7c80: ffffff9c ffffffff 92c23ba0 0000ffff 4a1e7e40 ffff8083 001bfcc4 ffff8000
7ca0: f2000400 00000000 00000000 00000000 4a1e7d80 ffff8083 0049501c ffff8000
7cc0: 00005402 00000000 00aaa210 ffff8000 4a1e7ea0 ffff8083 000833f4 ffff8000
7ce0: ffffff9c ffffffff 92c23ba0 0000ffff 4a1e7ea0 ffff8083 001bfcc0 ffff8000
7d00: 4a0fc400 ffff8083 00005402 00000000 4a1e7d40 ffff8083 00490324 ffff8000
7d20: ffffff9c 00000000 92c23ba0 0000ffff 000a0000 00000000 00000000 00000000
7d40: 00000008 00000000 00080000 00000000 92c23b8b 0000ffff 92c23b8e 0000ffff
7d60: 00000038 00000000 00001cb2 00000000 00000005 00000000 92d7b498 0000ffff
7d80: 01010101 01010101 92be9000 0000ffff 00000000 00000000 00000030 00000000
[<ffff8000000833f4>] el1_dbg+0x18/0x6c

This issue is similar with 62c6c61("arm64: replace read_lock to rcu lock in
call_break_hook"), but comes to single_step_handler.

This also solves kgdbts boot test silent hang issue on 4.4 -rt kernel.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
… and subvolume roots

[ Upstream commit f32e48e ]

The following call trace is seen when btrfs/031 test is executed in a loop,

[  158.661848] ------------[ cut here ]------------
[  158.662634] WARNING: CPU: 2 PID: 890 at /home/chandan/repos/linux/fs/btrfs/ioctl.c:558 create_subvol+0x3d1/0x6ea()
[  158.664102] BTRFS: Transaction aborted (error -2)
[  158.664774] Modules linked in:
[  158.665266] CPU: 2 PID: 890 Comm: btrfs Not tainted 4.4.0-rc6-g511711a #2
[  158.666251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[  158.667392]  ffffffff81c0a6b0 ffff8806c7c4f8e8 ffffffff81431fc8 ffff8806c7c4f930
[  158.668515]  ffff8806c7c4f920 ffffffff81051aa1 ffff880c85aff000 ffff8800bb44d000
[  158.669647]  ffff8808863b5c98 0000000000000000 00000000fffffffe ffff8806c7c4f980
[  158.670769] Call Trace:
[  158.671153]  [<ffffffff81431fc8>] dump_stack+0x44/0x5c
[  158.671884]  [<ffffffff81051aa1>] warn_slowpath_common+0x81/0xc0
[  158.672769]  [<ffffffff81051b27>] warn_slowpath_fmt+0x47/0x50
[  158.673620]  [<ffffffff813bc98d>] create_subvol+0x3d1/0x6ea
[  158.674440]  [<ffffffff813777c9>] btrfs_mksubvol.isra.30+0x369/0x520
[  158.675376]  [<ffffffff8108a4aa>] ? percpu_down_read+0x1a/0x50
[  158.676235]  [<ffffffff81377a81>] btrfs_ioctl_snap_create_transid+0x101/0x180
[  158.677268]  [<ffffffff81377b52>] btrfs_ioctl_snap_create+0x52/0x70
[  158.678183]  [<ffffffff8137afb4>] btrfs_ioctl+0x474/0x2f90
[  158.678975]  [<ffffffff81144b8e>] ? vma_merge+0xee/0x300
[  158.679751]  [<ffffffff8115be31>] ? alloc_pages_vma+0x91/0x170
[  158.680599]  [<ffffffff81123f62>] ? lru_cache_add_active_or_unevictable+0x22/0x70
[  158.681686]  [<ffffffff813d99cf>] ? selinux_file_ioctl+0xff/0x1d0
[  158.682581]  [<ffffffff8117b791>] do_vfs_ioctl+0x2c1/0x490
[  158.683399]  [<ffffffff813d3cde>] ? security_file_ioctl+0x3e/0x60
[  158.684297]  [<ffffffff8117b9d4>] SyS_ioctl+0x74/0x80
[  158.685051]  [<ffffffff819b2bd7>] entry_SYSCALL_64_fastpath+0x12/0x6a
[  158.685958] ---[ end trace 4b63312de5a2cb76 ]---
[  158.686647] BTRFS: error (device loop0) in create_subvol:558: errno=-2 No such entry
[  158.709508] BTRFS info (device loop0): forced readonly
[  158.737113] BTRFS info (device loop0): disk space caching is enabled
[  158.738096] BTRFS error (device loop0): Remounting read-write after error is not allowed
[  158.851303] BTRFS error (device loop0): cleaner transaction attach returned -30

This occurs because,

Mount filesystem
Create subvol with ID 257
Unmount filesystem
Mount filesystem
Delete subvol with ID 257
  btrfs_drop_snapshot()
    Add root corresponding to subvol 257 into
    btrfs_transaction->dropped_roots list
Create new subvol (i.e. create_subvol())
  257 is returned as the next free objectid
  btrfs_read_fs_root_no_name()
    Finds the btrfs_root instance corresponding to the old subvol with ID 257
    in btrfs_fs_info->fs_roots_radix.
    Returns error since btrfs_root_item->refs has the value of 0.

To fix the issue the commit initializes tree root's and subvolume root's
highest_objectid when loading the roots from disk.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
[ Upstream commit 6acfe68 ]

Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
than if an underlying null_blk device were used directly.  One of the
reasons for this drop in performance is that blk_insert_clone_request()
was calling blk_mq_insert_request() with @async=true.  This forced the
use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues
which ushered in ping-ponging between process context (fio in this case)
and kblockd's kworker to submit the cloned request.  The ftrace
function_graph tracer showed:

  kworker-2013  =>   fio-12190
  fio-12190    =>  kworker-2013
  ...
  kworker-2013  =>   fio-12190
  fio-12190    =>  kworker-2013
  ...

Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to
_not_ use kblockd to submit the cloned requests isn't enough to
eliminate the observed context switches.

In addition to this dm-mq specific blk-core fix, there are 2 DM core
fixes to dm-mq that (when paired with the blk-core fix) completely
eliminate the observed context switching:

1)  don't blk_mq_run_hw_queues in blk-mq request completion

    Motivated by desire to reduce overhead of dm-mq, punting to kblockd
    just increases context switches.

    In my testing against a really fast null_blk device there was no benefit
    to running blk_mq_run_hw_queues() on completion (and no other blk-mq
    driver does this).  So hopefully this change doesn't induce the need for
    yet another revert like commit 621739b !

2)  use blk_mq_complete_request() in dm_complete_request()

    blk_complete_request() doesn't offer the traditional q->mq_ops vs
    .request_fn branching pattern that other historic block interfaces
    do (e.g. blk_get_request).  Using blk_mq_complete_request() for
    blk-mq requests is important for performance.  It should be noted
    that, like blk_complete_request(), blk_mq_complete_request() doesn't
    natively handle partial completions -- but the request-based
    DM-multipath target does provide the required partial completion
    support by dm.c:end_clone_bio() triggering requeueing of the request
    via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE.

dm-mq fix #2 is _much_ more important than #1 for eliminating the
context switches.
Before: cpu          : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475
After:  cpu          : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472

With these changes multithreaded async read IOPs improved from ~950K
to ~1350K for this dm-mq stacked on null_blk test-case.  The raw read
IOPs of the underlying null_blk device for the same workload is ~1950K.

Fixes: 7fb4898 ("block: add blk-mq support to blk_insert_cloned_request()")
Fixes: bfebd1c ("dm: add full blk-mq support to request-based DM")
Cc: stable@vger.kernel.org # 4.1+
Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
[ Upstream commit ad6b1d9 ]

The problem seems to be that if a new device is detected
while we have already removed the shared HCD, then many of the
xhci operations (e.g.  xhci_alloc_dev(), xhci_setup_device())
hang as command never completes.

I don't think XHCI can operate without the shared HCD as we've
already called xhci_halt() in xhci_only_stop_hcd() when shared HCD
goes away. We need to prevent new commands from being queued
not only when HCD is dying but also when HCD is halted.

The following lockup was detected while testing the otg state
machine.

[  178.199951] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[  178.205799] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
[  178.214458] xhci-hcd xhci-hcd.0.auto: hcc params 0x0220f04c hci version 0x100 quirks 0x00010010
[  178.223619] xhci-hcd xhci-hcd.0.auto: irq 400, io mem 0x48890000
[  178.230677] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[  178.237796] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[  178.245358] usb usb1: Product: xHCI Host Controller
[  178.250483] usb usb1: Manufacturer: Linux 4.0.0-rc1-00024-g6111320 xhci-hcd
[  178.257783] usb usb1: SerialNumber: xhci-hcd.0.auto
[  178.267014] hub 1-0:1.0: USB hub found
[  178.272108] hub 1-0:1.0: 1 port detected
[  178.278371] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[  178.284171] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2
[  178.294038] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003
[  178.301183] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[  178.308776] usb usb2: Product: xHCI Host Controller
[  178.313902] usb usb2: Manufacturer: Linux 4.0.0-rc1-00024-g6111320 xhci-hcd
[  178.321222] usb usb2: SerialNumber: xhci-hcd.0.auto
[  178.329061] hub 2-0:1.0: USB hub found
[  178.333126] hub 2-0:1.0: 1 port detected
[  178.567585] dwc3 48890000.usb: usb_otg_start_host 0
[  178.572707] xhci-hcd xhci-hcd.0.auto: remove, state 4
[  178.578064] usb usb2: USB disconnect, device number 1
[  178.586565] xhci-hcd xhci-hcd.0.auto: USB bus 2 deregistered
[  178.592585] xhci-hcd xhci-hcd.0.auto: remove, state 1
[  178.597924] usb usb1: USB disconnect, device number 1
[  178.603248] usb 1-1: new high-speed USB device number 2 using xhci-hcd
[  190.597337] INFO: task kworker/u4:0:6 blocked for more than 10 seconds.
[  190.604273]       Not tainted 4.0.0-rc1-00024-g6111320 #1058
[  190.610228] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  190.618443] kworker/u4:0    D c05c0ac0     0     6      2 0x00000000
[  190.625120] Workqueue: usb_otg usb_otg_work
[  190.629533] [<c05c0ac0>] (__schedule) from [<c05c10ac>] (schedule+0x34/0x98)
[  190.636915] [<c05c10ac>] (schedule) from [<c05c1318>] (schedule_preempt_disabled+0xc/0x10)
[  190.645591] [<c05c1318>] (schedule_preempt_disabled) from [<c05c23d0>] (mutex_lock_nested+0x1ac/0x3fc)
[  190.655353] [<c05c23d0>] (mutex_lock_nested) from [<c046cf8c>] (usb_disconnect+0x3c/0x208)
[  190.664043] [<c046cf8c>] (usb_disconnect) from [<c0470cf0>] (_usb_remove_hcd+0x98/0x1d8)
[  190.672535] [<c0470cf0>] (_usb_remove_hcd) from [<c0485da8>] (usb_otg_start_host+0x50/0xf4)
[  190.681299] [<c0485da8>] (usb_otg_start_host) from [<c04849a4>] (otg_set_protocol+0x5c/0xd0)
[  190.690153] [<c04849a4>] (otg_set_protocol) from [<c0484b88>] (otg_set_state+0x170/0xbfc)
[  190.698735] [<c0484b88>] (otg_set_state) from [<c0485740>] (otg_statemachine+0x12c/0x470)
[  190.707326] [<c0485740>] (otg_statemachine) from [<c0053c84>] (process_one_work+0x1b4/0x4a0)
[  190.716162] [<c0053c84>] (process_one_work) from [<c00540f8>] (worker_thread+0x154/0x44c)
[  190.724742] [<c00540f8>] (worker_thread) from [<c0058f88>] (kthread+0xd4/0xf0)
[  190.732328] [<c0058f88>] (kthread) from [<c000e810>] (ret_from_fork+0x14/0x24)
[  190.739898] 5 locks held by kworker/u4:0/6:
[  190.744274]  #0:  ("%s""usb_otg"){.+.+.+}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
[  190.752799]  #1:  ((&otgd->work)){+.+.+.}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
[  190.761326]  #2:  (&otgd->fsm.lock){+.+.+.}, at: [<c048562c>] otg_statemachine+0x18/0x470
[  190.769934]  #3:  (usb_bus_list_lock){+.+.+.}, at: [<c0470ce8>] _usb_remove_hcd+0x90/0x1d8
[  190.778635]  #4:  (&dev->mutex){......}, at: [<c046cf8c>] usb_disconnect+0x3c/0x208
[  190.786700] INFO: task kworker/1:0:14 blocked for more than 10 seconds.
[  190.793633]       Not tainted 4.0.0-rc1-00024-g6111320 #1058
[  190.799567] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  190.807783] kworker/1:0     D c05c0ac0     0    14      2 0x00000000
[  190.814457] Workqueue: usb_hub_wq hub_event
[  190.818866] [<c05c0ac0>] (__schedule) from [<c05c10ac>] (schedule+0x34/0x98)
[  190.826252] [<c05c10ac>] (schedule) from [<c05c4e40>] (schedule_timeout+0x13c/0x1ec)
[  190.834377] [<c05c4e40>] (schedule_timeout) from [<c05c19f0>] (wait_for_common+0xbc/0x150)
[  190.843062] [<c05c19f0>] (wait_for_common) from [<bf068a3c>] (xhci_setup_device+0x164/0x5cc [xhci_hcd])
[  190.852986] [<bf068a3c>] (xhci_setup_device [xhci_hcd]) from [<c046b7f4>] (hub_port_init+0x3f4/0xb10)
[  190.862667] [<c046b7f4>] (hub_port_init) from [<c046eb64>] (hub_event+0x704/0x1018)
[  190.870704] [<c046eb64>] (hub_event) from [<c0053c84>] (process_one_work+0x1b4/0x4a0)
[  190.878919] [<c0053c84>] (process_one_work) from [<c00540f8>] (worker_thread+0x154/0x44c)
[  190.887503] [<c00540f8>] (worker_thread) from [<c0058f88>] (kthread+0xd4/0xf0)
[  190.895076] [<c0058f88>] (kthread) from [<c000e810>] (ret_from_fork+0x14/0x24)
[  190.902650] 5 locks held by kworker/1:0/14:
[  190.907023]  #0:  ("usb_hub_wq"){.+.+.+}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
[  190.915454]  #1:  ((&hub->events)){+.+.+.}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
[  190.924070]  #2:  (&dev->mutex){......}, at: [<c046e490>] hub_event+0x30/0x1018
[  190.931768]  #3:  (&port_dev->status_lock){+.+.+.}, at: [<c046eb50>] hub_event+0x6f0/0x1018
[  190.940558]  #4:  (&bus->usb_address0_mutex){+.+.+.}, at: [<c046b458>] hub_port_init+0x58/0xb10

Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
[ Upstream commit b49b927 ]

We shouldn't be calling clk_prepare_enable()/clk_prepare_disable()
in an atomic context.

Fixes the following issue:

[    5.830970] ehci-omap: OMAP-EHCI Host Controller driver
[    5.830974] driver_register 'ehci-omap'
[    5.895849] driver_register 'wl1271_sdio'
[    5.896870] BUG: scheduling while atomic: udevd/994/0x00000002
[    5.896876] 4 locks held by udevd/994:
[    5.896904]  #0:  (&dev->mutex){......}, at: [<c049597c>] __driver_attach+0x60/0xac
[    5.896923]  #1:  (&dev->mutex){......}, at: [<c049598c>] __driver_attach+0x70/0xac
[    5.896946]  #2:  (tll_lock){+.+...}, at: [<c04c2630>] omap_tll_enable+0x2c/0xd0
[    5.896966]  #3:  (prepare_lock){+.+...}, at: [<c05ce9c8>] clk_prepare_lock+0x48/0xe0
[    5.897042] Modules linked in: wlcore_sdio(+) ehci_omap(+) dwc3_omap snd_soc_ts3a225e leds_is31fl319x bq27xxx_battery_i2c tsc2007 bq27xxx_battery bq2429x_charger ina2xx tca8418_keypad as5013 leds_tca6507 twl6040_vibra gpio_twl6040 bmp085_i2c(+) palmas_gpadc usb3503 palmas_pwrbutton bmg160_i2c(+) bmp085 bma150(+) bmg160_core bmp280 input_polldev snd_soc_omap_mcbsp snd_soc_omap_mcpdm snd_soc_omap snd_pcm_dmaengine
[    5.897048] Preemption disabled at:[<  (null)>]   (null)
[    5.897051]
[    5.897059] CPU: 0 PID: 994 Comm: udevd Not tainted 4.6.0-rc5-letux+ #233
[    5.897062] Hardware name: Generic OMAP5 (Flattened Device Tree)
[    5.897076] [<c010e714>] (unwind_backtrace) from [<c010af34>] (show_stack+0x10/0x14)
[    5.897087] [<c010af34>] (show_stack) from [<c040aa7c>] (dump_stack+0x88/0xc0)
[    5.897099] [<c040aa7c>] (dump_stack) from [<c020c558>] (__schedule_bug+0xac/0xd0)
[    5.897111] [<c020c558>] (__schedule_bug) from [<c06f3d44>] (__schedule+0x88/0x7e4)
[    5.897120] [<c06f3d44>] (__schedule) from [<c06f46d8>] (schedule+0x9c/0xc0)
[    5.897129] [<c06f46d8>] (schedule) from [<c06f4904>] (schedule_preempt_disabled+0x14/0x20)
[    5.897140] [<c06f4904>] (schedule_preempt_disabled) from [<c06f64e4>] (mutex_lock_nested+0x258/0x43c)
[    5.897150] [<c06f64e4>] (mutex_lock_nested) from [<c05ce9c8>] (clk_prepare_lock+0x48/0xe0)
[    5.897160] [<c05ce9c8>] (clk_prepare_lock) from [<c05d0e7c>] (clk_prepare+0x10/0x28)
[    5.897169] [<c05d0e7c>] (clk_prepare) from [<c04c2668>] (omap_tll_enable+0x64/0xd0)
[    5.897180] [<c04c2668>] (omap_tll_enable) from [<c04c1728>] (usbhs_runtime_resume+0x18/0x17c)
[    5.897192] [<c04c1728>] (usbhs_runtime_resume) from [<c049d404>] (pm_generic_runtime_resume+0x2c/0x40)
[    5.897202] [<c049d404>] (pm_generic_runtime_resume) from [<c049f180>] (__rpm_callback+0x38/0x68)
[    5.897210] [<c049f180>] (__rpm_callback) from [<c049f220>] (rpm_callback+0x70/0x88)
[    5.897218] [<c049f220>] (rpm_callback) from [<c04a0a00>] (rpm_resume+0x4ec/0x7ec)
[    5.897227] [<c04a0a00>] (rpm_resume) from [<c04a0f48>] (__pm_runtime_resume+0x4c/0x64)
[    5.897236] [<c04a0f48>] (__pm_runtime_resume) from [<c04958dc>] (driver_probe_device+0x30/0x70)
[    5.897246] [<c04958dc>] (driver_probe_device) from [<c04959a4>] (__driver_attach+0x88/0xac)
[    5.897256] [<c04959a4>] (__driver_attach) from [<c04940f8>] (bus_for_each_dev+0x50/0x84)
[    5.897267] [<c04940f8>] (bus_for_each_dev) from [<c0494e40>] (bus_add_driver+0xcc/0x1e4)
[    5.897276] [<c0494e40>] (bus_add_driver) from [<c0496914>] (driver_register+0xac/0xf4)
[    5.897286] [<c0496914>] (driver_register) from [<c01018e0>] (do_one_initcall+0x100/0x1b8)
[    5.897296] [<c01018e0>] (do_one_initcall) from [<c01c7a54>] (do_init_module+0x58/0x1c0)
[    5.897304] [<c01c7a54>] (do_init_module) from [<c01c8a3c>] (SyS_finit_module+0x88/0x90)
[    5.897313] [<c01c8a3c>] (SyS_finit_module) from [<c0107120>] (ret_fast_syscall+0x0/0x1c)
[    5.912697] ------------[ cut here ]------------
[    5.912711] WARNING: CPU: 0 PID: 994 at kernel/sched/core.c:2996 _raw_spin_unlock+0x28/0x58
[    5.912717] DEBUG_LOCKS_WARN_ON(val > preempt_count())

Cc: <stable@vger.kernel.org>
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Tested-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
[ Upstream commit 8e96a87 ]

Userspace can quite legitimately perform an exec() syscall with a
suspended transaction. exec() does not return to the old process, rather
it load a new one and starts that, the expectation therefore is that the
new process starts not in a transaction. Currently exec() is not treated
any differently to any other syscall which creates problems.

Firstly it could allow a new process to start with a suspended
transaction for a binary that no longer exists. This means that the
checkpointed state won't be valid and if the suspended transaction were
ever to be resumed and subsequently aborted (a possibility which is
exceedingly likely as exec()ing will likely doom the transaction) the
new process will jump to invalid state.

Secondly the incorrect attempt to keep the transactional state while
still zeroing state for the new process creates at least two TM Bad
Things. The first triggers on the rfid to return to userspace as
start_thread() has given the new process a 'clean' MSR but the suspend
will still be set in the hardware MSR. The second TM Bad Thing triggers
in __switch_to() as the processor is still transactionally suspended but
__switch_to() wants to zero the TM sprs for the new process.

This is an example of the outcome of calling exec() with a suspended
transaction. Note the first 700 is likely the first TM bad thing
decsribed earlier only the kernel can't report it as we've loaded
userspace registers. c000000000009980 is the rfid in
fast_exception_return()

  Bad kernel stack pointer 3fffcfa1a370 at c000000000009980
  Oops: Bad kernel stack pointer, sig: 6 [#1]
  CPU: 0 PID: 2006 Comm: tm-execed Not tainted
  NIP: c000000000009980 LR: 0000000000000000 CTR: 0000000000000000
  REGS: c00000003ffefd40 TRAP: 0700   Not tainted
  MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]>  CR: 00000000  XER: 00000000
  CFAR: c0000000000098b4 SOFTE: 0
  PACATMSCRATCH: b00000010000d033
  GPR00: 0000000000000000 00003fffcfa1a370 0000000000000000 0000000000000000
  GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 00003fff966611c0 0000000000000000 0000000000000000 0000000000000000
  NIP [c000000000009980] fast_exception_return+0xb0/0xb8
  LR [0000000000000000]           (null)
  Call Trace:
  Instruction dump:
  f84d0278 e9a100d8 7c7b03a6 e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070
  e8410080 e8610088 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed023b

  Kernel BUG at c000000000043e80 [verbose debug info unavailable]
  Unexpected TM Bad Thing exception at c000000000043e80 (msr 0x201033)
  Oops: Unrecoverable exception, sig: 6 [#2]
  CPU: 0 PID: 2006 Comm: tm-execed Tainted: G      D
  task: c0000000fbea6d80 ti: c00000003ffec000 task.ti: c0000000fb7ec000
  NIP: c000000000043e80 LR: c000000000015a24 CTR: 0000000000000000
  REGS: c00000003ffef7e0 TRAP: 0700   Tainted: G      D
  MSR: 8000000300201033 <SF,ME,IR,DR,RI,LE,TM[SE]>  CR: 28002828  XER: 00000000
  CFAR: c000000000015a20 SOFTE: 0
  PACATMSCRATCH: b00000010000d033
  GPR00: 0000000000000000 c00000003ffefa60 c000000000db5500 c0000000fbead000
  GPR04: 8000000300001033 2222222222222222 2222222222222222 00000000ff160000
  GPR08: 0000000000000000 800000010000d033 c0000000fb7e3ea0 c00000000fe00004
  GPR12: 0000000000002200 c00000000fe00000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 c0000000fbea7410 00000000ff160000
  GPR24: c0000000ffe1f600 c0000000fbea8700 c0000000fbea8700 c0000000fbead000
  GPR28: c000000000e20198 c0000000fbea6d80 c0000000fbeab680 c0000000fbea6d80
  NIP [c000000000043e80] tm_restore_sprs+0xc/0x1c
  LR [c000000000015a24] __switch_to+0x1f4/0x420
  Call Trace:
  Instruction dump:
  7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
  4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020

This fixes CVE-2016-5828.

Fixes: bc2a940 ("powerpc: Hook in new transactional memory code")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Plagman pushed a commit that referenced this pull request Aug 15, 2016
[ Upstream commit ab2a4bf ]

The USB core contains a bug that can show up when a USB-3 host
controller is removed.  If the primary (USB-2) hcd structure is
released before the shared (USB-3) hcd, the core will try to do a
double-free of the common bandwidth_mutex.

The problem was described in graphical form by Chung-Geol Kim, who
first reported it:

=================================================
     At *remove USB(3.0) Storage
     sequence <1> --> <5> ((Problem Case))
=================================================
                                  VOLD
------------------------------------|------------
                                 (uevent)
                            ________|_________
                           |<1>               |
                           |dwc3_otg_sm_work  |
                           |usb_put_hcd       |
                           |peer_hcd(kref=2)|
                           |__________________|
                            ________|_________
                           |<2>               |
                           |New USB BUS #2    |
                           |                  |
                           |peer_hcd(kref=1)  |
                           |                  |
                         --(Link)-bandXX_mutex|
                         | |__________________|
                         |
    ___________________  |
   |<3>                | |
   |dwc3_otg_sm_work   | |
   |usb_put_hcd        | |
   |primary_hcd(kref=1)| |
   |___________________| |
    _________|_________  |
   |<4>                | |
   |New USB BUS #1     | |
   |hcd_release        | |
   |primary_hcd(kref=0)| |
   |                   | |
   |bandXX_mutex(free) |<-
   |___________________|
                               (( VOLD ))
                            ______|___________
                           |<5>               |
                           |      SCSI        |
                           |usb_put_hcd       |
                           |peer_hcd(kref=0)  |
                           |*hcd_release      |
                           |bandXX_mutex(free*)|<- double free
                           |__________________|

=================================================

This happens because hcd_release() frees the bandwidth_mutex whenever
it sees a primary hcd being released (which is not a very good idea
in any case), but in the course of releasing the primary hcd, it
changes the pointers in the shared hcd in such a way that the shared
hcd will appear to be primary when it gets released.

This patch fixes the problem by changing hcd_release() so that it
deallocates the bandwidth_mutex only when the _last_ hcd structure
referencing it is released.  The patch also removes an unnecessary
test, so that when an hcd is released, both the shared_hcd and
primary_hcd pointers in the hcd's peer will be cleared.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Chung-Geol Kim <chunggeol.kim@samsung.com>
Tested-by: Chung-Geol Kim <chunggeol.kim@samsung.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
johnv-valve pushed a commit that referenced this pull request Nov 18, 2016
Backport of caaee62 ("ptrace: use fsuid,
fsgid, effective creds for fs access checks") to v4.1 failed to update the
mode parameter in the mm_access() call in pagemap_read() to have one of the
new PTRACE_MODE_*CREDS flags.

Attempting to read any other process' pagemap results in a WARN()

WARNING: CPU: 0 PID: 883 at kernel/ptrace.c:229 __ptrace_may_access+0x14a/0x160()
denying ptrace access check without PTRACE_MODE_*CREDS
Modules linked in: loop sg e1000 i2c_piix4 ppdev virtio_balloon virtio_pci parport_pc i2c_core virtio_ring ata_generic serio_raw pata_acpi virtio parport pcspkr floppy acpi_cpufreq ip_tables ext3 mbcache jbd sd_mod ata_piix crc32c_intel libata
CPU: 0 PID: 883 Comm: cat Tainted: G        W       4.1.12-51.el7uek.x86_64 #2
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000286 00000000619f225a ffff88003b6fbc18 ffffffff81717021
  ffff88003b6fbc70 ffffffff819be870 ffff88003b6fbc58 ffffffff8108477a
  000000003b6fbc58 0000000000000001 ffff88003d287000 0000000000000001
Call Trace:
  [<ffffffff81717021>] dump_stack+0x63/0x81
  [<ffffffff8108477a>] warn_slowpath_common+0x8a/0xc0
  [<ffffffff81084805>] warn_slowpath_fmt+0x55/0x70
  [<ffffffff8108e57a>] __ptrace_may_access+0x14a/0x160
  [<ffffffff8108f372>] ptrace_may_access+0x32/0x50
  [<ffffffff81081bad>] mm_access+0x6d/0xb0
  [<ffffffff81278c81>] pagemap_read+0xe1/0x360
  [<ffffffff811a046b>] ? lru_cache_add_active_or_unevictable+0x2b/0xa0
  [<ffffffff8120d2e7>] __vfs_read+0x37/0x100
  [<ffffffff812b9ab4>] ? security_file_permission+0x84/0xa0
  [<ffffffff8120d8b6>] ? rw_verify_area+0x56/0xe0
  [<ffffffff8120d9c6>] vfs_read+0x86/0x140
  [<ffffffff8120e945>] SyS_read+0x55/0xd0
  [<ffffffff8171eb6e>] system_call_fastpath+0x12/0x71

Fixes: ab88ce5 (ptrace: use fsuid, fsgid, effective creds for fs access checks)
Signed-off-by: Kenny Keslar <kenny.keslar@oracle.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
johnv-valve pushed a commit that referenced this pull request Nov 18, 2016
[ Upstream commit bd975d1 ]

The secmech hmac(md5) structures are present in the TCP_Server_Info
struct and can be shared among multiple CIFS sessions.  However, the
server mutex is not currently held when these structures are allocated
and used, which can lead to a kernel crashes, as in the scenario below:

mount.cifs(8) #1				mount.cifs(8) #2

Is secmech.sdeschmaccmd5 allocated?
// false

						Is secmech.sdeschmaccmd5 allocated?
						// false

secmech.hmacmd = crypto_alloc_shash..
secmech.sdeschmaccmd5 = kzalloc..
sdeschmaccmd5->shash.tfm = &secmec.hmacmd;

						secmech.sdeschmaccmd5 = kzalloc
						// sdeschmaccmd5->shash.tfm
						// not yet assigned

crypto_shash_update()
 deref NULL sdeschmaccmd5->shash.tfm

 Unable to handle kernel paging request at virtual address 00000030
 epc   : 8027ba34 crypto_shash_update+0x38/0x158
 ra    : 8020f2e8 setup_ntlmv2_rsp+0x4bc/0xa84
 Call Trace:
  crypto_shash_update+0x38/0x158
  setup_ntlmv2_rsp+0x4bc/0xa84
  build_ntlmssp_auth_blob+0xbc/0x34c
  sess_auth_rawntlmssp_authenticate+0xac/0x248
  CIFS_SessSetup+0xf0/0x178
  cifs_setup_session+0x4c/0x84
  cifs_get_smb_ses+0x2c8/0x314
  cifs_mount+0x38c/0x76c
  cifs_do_mount+0x98/0x440
  mount_fs+0x20/0xc0
  vfs_kern_mount+0x58/0x138
  do_mount+0x1e8/0xccc
  SyS_mount+0x88/0xd4
  syscall_common+0x30/0x54

Fix this by locking the srv_mutex around the code which uses these
hmac(md5) structures.  All the other secmech algos already have similar
locking.

Fixes: 95dc8dd ("Limit allocation of crypto mechanisms to dialect which requires")
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
johnv-valve pushed a commit that referenced this pull request Nov 18, 2016
[ Upstream commit adc8a43 ]

Done, because line6_stream_stop() locks and calls line6_unlink_audio_urbs(),
which in turn invokes audio_out_callback(), which tries to lock 2nd time.

Fixes:

=============================================
[ INFO: possible recursive locking detected ]
4.4.15+ #15 Not tainted
---------------------------------------------
mplayer/3591 is trying to acquire lock:
 (&(&line6pcm->out.lock)->rlock){-.-...}, at: [<bfa27655>] audio_out_callback+0x70/0x110 [snd_usb_line6]

but task is already holding lock:
 (&(&line6pcm->out.lock)->rlock){-.-...}, at: [<bfa26aad>] line6_stream_stop+0x24/0x5c [snd_usb_line6]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&line6pcm->out.lock)->rlock);
  lock(&(&line6pcm->out.lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by mplayer/3591:
 #0:  (snd_pcm_link_rwlock){.-.-..}, at: [<bf8d49a7>] snd_pcm_stream_lock+0x1e/0x40 [snd_pcm]
 #1:  (&(&substream->self_group.lock)->rlock){-.-...}, at: [<bf8d49af>] snd_pcm_stream_lock+0x26/0x40 [snd_pcm]
 #2:  (&(&line6pcm->out.lock)->rlock){-.-...}, at: [<bfa26aad>] line6_stream_stop+0x24/0x5c [snd_usb_line6]

stack backtrace:
CPU: 0 PID: 3591 Comm: mplayer Not tainted 4.4.15+ #15
Hardware name: Generic AM33XX (Flattened Device Tree)
[<c0015d85>] (unwind_backtrace) from [<c001253d>] (show_stack+0x11/0x14)
[<c001253d>] (show_stack) from [<c02f1bdf>] (dump_stack+0x8b/0xac)
[<c02f1bdf>] (dump_stack) from [<c0076f43>] (__lock_acquire+0xc8b/0x1780)
[<c0076f43>] (__lock_acquire) from [<c007810d>] (lock_acquire+0x99/0x1c0)
[<c007810d>] (lock_acquire) from [<c06171e7>] (_raw_spin_lock_irqsave+0x3f/0x4c)
[<c06171e7>] (_raw_spin_lock_irqsave) from [<bfa27655>] (audio_out_callback+0x70/0x110 [snd_usb_line6])
[<bfa27655>] (audio_out_callback [snd_usb_line6]) from [<c04294db>] (__usb_hcd_giveback_urb+0x53/0xd0)
[<c04294db>] (__usb_hcd_giveback_urb) from [<c046388d>] (musb_giveback+0x3d/0x98)
[<c046388d>] (musb_giveback) from [<c04647f5>] (musb_urb_dequeue+0x6d/0x114)
[<c04647f5>] (musb_urb_dequeue) from [<c042ac11>] (usb_hcd_unlink_urb+0x39/0x98)
[<c042ac11>] (usb_hcd_unlink_urb) from [<bfa26a87>] (line6_unlink_audio_urbs+0x6a/0x6c [snd_usb_line6])
[<bfa26a87>] (line6_unlink_audio_urbs [snd_usb_line6]) from [<bfa26acb>] (line6_stream_stop+0x42/0x5c [snd_usb_line6])
[<bfa26acb>] (line6_stream_stop [snd_usb_line6]) from [<bfa26fe7>] (snd_line6_trigger+0xb6/0xf4 [snd_usb_line6])
[<bfa26fe7>] (snd_line6_trigger [snd_usb_line6]) from [<bf8d47b7>] (snd_pcm_do_stop+0x36/0x38 [snd_pcm])
[<bf8d47b7>] (snd_pcm_do_stop [snd_pcm]) from [<bf8d462f>] (snd_pcm_action_single+0x22/0x40 [snd_pcm])
[<bf8d462f>] (snd_pcm_action_single [snd_pcm]) from [<bf8d46f9>] (snd_pcm_action+0xac/0xb0 [snd_pcm])
[<bf8d46f9>] (snd_pcm_action [snd_pcm]) from [<bf8d4b61>] (snd_pcm_drop+0x38/0x64 [snd_pcm])
[<bf8d4b61>] (snd_pcm_drop [snd_pcm]) from [<bf8d6233>] (snd_pcm_common_ioctl1+0x7fe/0xbe8 [snd_pcm])
[<bf8d6233>] (snd_pcm_common_ioctl1 [snd_pcm]) from [<bf8d6779>] (snd_pcm_playback_ioctl1+0x15c/0x51c [snd_pcm])
[<bf8d6779>] (snd_pcm_playback_ioctl1 [snd_pcm]) from [<bf8d6b59>] (snd_pcm_playback_ioctl+0x20/0x28 [snd_pcm])
[<bf8d6b59>] (snd_pcm_playback_ioctl [snd_pcm]) from [<c016714b>] (do_vfs_ioctl+0x3af/0x5c8)

Fixes: 63e20df ('ALSA: line6: Reorganize PCM stream handling')
Cc: <stable@vger.kernel.org> # v4.0+
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Andrej Krutak <dev@andree.sk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
johnv-valve pushed a commit that referenced this pull request Nov 18, 2016
[ Upstream commit 7e95630 ]

In the mipsr2_decoder() function, used to emulate pre-MIPSr6
instructions that were removed in MIPSr6, the init_fpu() function is
called if a removed pre-MIPSr6 floating point instruction is the first
floating point instruction used by the task. However, init_fpu()
performs varous actions that rely upon not being migrated. For example
in the most basic case it sets the coprocessor 0 Status.CU1 bit to
enable the FPU & then loads FP register context into the FPU registers.
If the task were to migrate during this time, it may end up attempting
to load FP register context on a different CPU where it hasn't set the
CU1 bit, leading to errors such as:

    do_cpu invoked from kernel context![#2]:
    CPU: 2 PID: 7338 Comm: fp-prctl Tainted: G      D         4.7.0-00424-g49b0c82 #2
    task: 838e4000 ti: 88d38000 task.ti: 88d38000
    $ 0   : 00000000 00000001 ffffffff 88d3fef8
    $ 4   : 838e4000 88d38004 00000000 00000001
    $ 8   : 3400fc01 801f8020 808e9100 24000000
    $12   : dbffffff 807b69d8 807b0000 00000000
    $16   : 00000000 80786150 00400fc4 809c0398
    $20   : 809c0338 0040273c 88d3ff28 808e9d30
    $24   : 808e9d30 00400fb4
    $28   : 88d38000 88d3fe88 00000000 8011a2ac
    Hi    : 0040273c
    Lo    : 88d3ff28
    epc   : 80114178 _restore_fp+0x10/0xa0
    ra    : 8011a2ac mipsr2_decoder+0xd5c/0x1660
    Status: 1400fc03	KERNEL EXL IE
    Cause : 1080002c (ExcCode 0b)
    PrId  : 0001a920 (MIPS I6400)
    Modules linked in:
    Process fp-prctl (pid: 7338, threadinfo=88d38000, task=838e4000, tls=766527d0)
    Stack : 00000000 00000000 00000000 88d3fe98 00000000 00000000 809c0398 809c0338
    	  808e9100 00000000 88d3ff28 00400fc4 00400fc4 0040273c 7fb69e18 004a0000
    	  004a0000 004a0000 7664add0 8010de18 00000000 00000000 88d3fef8 88d3ff28
    	  808e9100 00000000 766527d0 8010e534 000c0000 85755000 8181d580 00000000
    	  00000000 00000000 004a0000 00000000 766527d0 7fb69e18 004a0000 80105c20
    	  ...
    Call Trace:
    [<80114178>] _restore_fp+0x10/0xa0
    [<8011a2ac>] mipsr2_decoder+0xd5c/0x1660
    [<8010de18>] do_ri+0x90/0x6b8
    [<80105c20>] ret_from_exception+0x0/0x10

Fix this by disabling preemption around the call to init_fpu(), ensuring
that it starts & completes on one CPU.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: b0a668f ("MIPS: kernel: mips-r2-to-r6-emul: Add R2 emulator for MIPS R6")
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.0+
Patchwork: https://patchwork.linux-mips.org/patch/14305/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
johnv-valve pushed a commit that referenced this pull request Jun 7, 2017
[ Upstream commit ddc665a ]

When the instruction right before the branch destination is
a 64 bit load immediate, we currently calculate the wrong
jump offset in the ctx->offset[] array as we only account
one instruction slot for the 64 bit load immediate although
it uses two BPF instructions. Fix it up by setting the offset
into the right slot after we incremented the index.

Before (ldimm64 test 1):

  [...]
  00000020:  52800007  mov w7, #0x0 // #0
  00000024:  d2800060  mov x0, #0x3 // #3
  00000028:  d2800041  mov x1, #0x2 // #2
  0000002c:  eb01001f  cmp x0, x1
  00000030:  54ffff82  b.cs 0x00000020
  00000034:  d29fffe7  mov x7, #0xffff // #65535
  00000038:  f2bfffe7  movk x7, #0xffff, lsl #16
  0000003c:  f2dfffe7  movk x7, #0xffff, lsl #32
  00000040:  f2ffffe7  movk x7, #0xffff, lsl #48
  00000044:  d29dddc7  mov x7, #0xeeee // #61166
  00000048:  f2bdddc7  movk x7, #0xeeee, lsl #16
  0000004c:  f2ddddc7  movk x7, #0xeeee, lsl #32
  00000050:  f2fdddc7  movk x7, #0xeeee, lsl #48
  [...]

After (ldimm64 test 1):

  [...]
  00000020:  52800007  mov w7, #0x0 // #0
  00000024:  d2800060  mov x0, #0x3 // #3
  00000028:  d2800041  mov x1, #0x2 // #2
  0000002c:  eb01001f  cmp x0, x1
  00000030:  540000a2  b.cs 0x00000044
  00000034:  d29fffe7  mov x7, #0xffff // #65535
  00000038:  f2bfffe7  movk x7, #0xffff, lsl #16
  0000003c:  f2dfffe7  movk x7, #0xffff, lsl #32
  00000040:  f2ffffe7  movk x7, #0xffff, lsl #48
  00000044:  d29dddc7  mov x7, #0xeeee // #61166
  00000048:  f2bdddc7  movk x7, #0xeeee, lsl #16
  0000004c:  f2ddddc7  movk x7, #0xeeee, lsl #32
  00000050:  f2fdddc7  movk x7, #0xeeee, lsl #48
  [...]

Also, add a couple of test cases to make sure JITs pass
this test. Tested on Cavium ThunderX ARMv8. The added
test cases all pass after the fix.

Fixes: 8eee539 ("arm64: bpf: fix out-of-bounds read in bpf2a64_offset()")
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
johnv-valve pushed a commit that referenced this pull request Jul 3, 2017
…nt()

commit 4eecbe8 upstream.

In case oldtrig == trig == NULL (which happens when we set none
trigger, when there is already none set) there is a NULL pointer
dereference during iio_trigger_put(trig). Below is kernel output when
this occurs:

[   26.741790] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[   26.750179] pgd = cacc0000
[   26.752936] [00000000] *pgd=8adc6835, *pte=00000000, *ppte=00000000
[   26.759531] Internal error: Oops: 17 [#1] SMP ARM
[   26.764261] Modules linked in: usb_f_ncm u_ether usb_f_acm u_serial usb_f_fs libcomposite configfs evbug
[   26.773844] CPU: 0 PID: 152 Comm: synchro Not tainted 4.12.0-rc1 #2
[   26.780128] Hardware name: Freescale i.MX6 Ultralite (Device Tree)
[   26.786329] task: cb1de200 task.stack: cac92000
[   26.790892] PC is at iio_trigger_write_current+0x188/0x1f4
[   26.796403] LR is at lock_release+0xf8/0x20c
[   26.800696] pc : [<c0736f34>]    lr : [<c016efb0>]    psr: 600d0013
[   26.800696] sp : cac93e30  ip : cac93db0  fp : cac93e5c
[   26.812193] r10: c0e64fe8  r9 : 00000000  r8 : 00000001
[   26.817436] r7 : cb190810  r6 : 00000010  r5 : 00000001  r4 : 00000000
[   26.823982] r3 : 00000000  r2 : 00000000  r1 : cb1de200  r0 : 00000000
[   26.830528] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   26.837683] Control: 10c5387d  Table: 8acc006a  DAC: 00000051
[   26.843448] Process synchro (pid: 152, stack limit = 0xcac92210)
[   26.849475] Stack: (0xcac93e30 to 0xcac94000)
[   26.853857] 3e20:                                     00000001 c0736dac c054033c cae6b680
[   26.862060] 3e40: cae6b680 00000000 00000001 cb3f8610 cac93e74 cac93e60 c054035c c0736db8
[   26.870264] 3e60: 00000001 c054033c cac93e94 cac93e78 c029bf34 c0540348 00000000 00000000
[   26.878469] 3e80: cb3f8600 cae6b680 cac93ed4 cac93e98 c029b320 c029bef0 00000000 00000000
[   26.886672] 3ea0: 00000000 cac93f78 cb2d41fc caed3280 c029b214 cac93f78 00000001 000e20f8
[   26.894874] 3ec0: 00000001 00000000 cac93f44 cac93ed8 c0221dcc c029b220 c0e1ca39 cb2d41fc
[   26.903079] 3ee0: cac93f04 cac93ef0 c0183ef0 c0183ab0 cb2d41fc 00000000 cac93f44 cac93f08
[   26.911282] 3f00: c0225eec c0183ebc 00000001 00000000 c0223728 00000000 c0245454 00000001
[   26.919485] 3f20: 00000001 caed3280 000e20f8 cac93f78 000e20f8 00000001 cac93f74 cac93f48
[   26.927690] 3f40: c0223680 c0221da4 c0246520 c0245460 caed3283 caed3280 00000000 00000000
[   26.935893] 3f60: 000e20f8 00000001 cac93fa4 cac93f78 c0224520 c02235e4 00000000 00000000
[   26.944096] 3f80: 00000001 000e20f8 00000001 00000004 c0107f84 cac92000 00000000 cac93fa8
[   26.952299] 3fa0: c0107de0 c02244e8 00000001 000e20f8 0000000e 000e20f8 00000001 fbad2484
[   26.960502] 3fc0: 00000001 000e20f8 00000001 00000004 beb6b698 00064260 0006421c beb6b4b4
[   26.968705] 3fe0: 00000000 beb6b450 b6f219a0 b6e2f268 800d0010 0000000e cac93ff4 cac93ffc
[   26.976896] Backtrace:
[   26.979388] [<c0736dac>] (iio_trigger_write_current) from [<c054035c>] (dev_attr_store+0x20/0x2c)
[   26.988289]  r10:cb3f8610 r9:00000001 r8:00000000 r7:cae6b680 r6:cae6b680 r5:c054033c
[   26.996138]  r4:c0736dac r3:00000001
[   26.999747] [<c054033c>] (dev_attr_store) from [<c029bf34>] (sysfs_kf_write+0x50/0x54)
[   27.007686]  r5:c054033c r4:00000001
[   27.011290] [<c029bee4>] (sysfs_kf_write) from [<c029b320>] (kernfs_fop_write+0x10c/0x224)
[   27.019579]  r7:cae6b680 r6:cb3f8600 r5:00000000 r4:00000000
[   27.025271] [<c029b214>] (kernfs_fop_write) from [<c0221dcc>] (__vfs_write+0x34/0x120)
[   27.033214]  r10:00000000 r9:00000001 r8:000e20f8 r7:00000001 r6:cac93f78 r5:c029b214
[   27.041059]  r4:caed3280
[   27.043622] [<c0221d98>] (__vfs_write) from [<c0223680>] (vfs_write+0xa8/0x170)
[   27.050959]  r9:00000001 r8:000e20f8 r7:cac93f78 r6:000e20f8 r5:caed3280 r4:00000001
[   27.058731] [<c02235d8>] (vfs_write) from [<c0224520>] (SyS_write+0x44/0x98)
[   27.065806]  r9:00000001 r8:000e20f8 r7:00000000 r6:00000000 r5:caed3280 r4:caed3283
[   27.073582] [<c02244dc>] (SyS_write) from [<c0107de0>] (ret_fast_syscall+0x0/0x1c)
[   27.081179]  r9:cac92000 r8:c0107f84 r7:00000004 r6:00000001 r5:000e20f8 r4:00000001
[   27.088947] Code: 1a000009 e1a04009 e3a06010 e1a05008 (e5943000)
[   27.095244] ---[ end trace 06d1dab86d6e6bab ]---

To fix that problem call iio_trigger_put(trig) only when trig is not
NULL.

Fixes: d5d24bc ("iio: trigger: close race condition in acquiring trigger reference")
Signed-off-by: Marcin Niestroj <m.niestroj@grinn-global.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
johnv-valve pushed a commit that referenced this pull request Jul 3, 2017
commit b425e50 upstream.

Since the introduction of .init_rq_fn() and .exit_rq_fn() it is
essential that the memory allocated for struct request_queue
stays around until all blk_exit_rl() calls have finished. Hence
make blk_init_rl() take a reference on struct request_queue.

This patch fixes the following crash:

general protection fault: 0000 [#2] SMP
CPU: 3 PID: 28 Comm: ksoftirqd/3 Tainted: G      D         4.12.0-rc2-dbg+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
task: ffff88013a108040 task.stack: ffffc9000071c000
RIP: 0010:free_request_size+0x1a/0x30
RSP: 0018:ffffc9000071fd38 EFLAGS: 00010202
RAX: 6b6b6b6b6b6b6b6b RBX: ffff880067362a88 RCX: 0000000000000003
RDX: ffff880067464178 RSI: ffff880067362a88 RDI: ffff880135ea4418
RBP: ffffc9000071fd40 R08: 0000000000000000 R09: 0000000100180009
R10: ffffc9000071fd38 R11: ffffffff81110800 R12: ffff88006752d3d8
R13: ffff88006752d3d8 R14: ffff88013a108040 R15: 000000000000000a
FS:  0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa8ec1edb00 CR3: 0000000138ee8000 CR4: 00000000001406e0
Call Trace:
 mempool_destroy.part.10+0x21/0x40
 mempool_destroy+0xe/0x10
 blk_exit_rl+0x12/0x20
 blkg_free+0x4d/0xa0
 __blkg_release_rcu+0x59/0x170
 rcu_process_callbacks+0x260/0x4e0
 __do_softirq+0x116/0x250
 smpboot_thread_fn+0x123/0x1e0
 kthread+0x109/0x140
 ret_from_fork+0x31/0x40

Fixes: commit e9c787e ("scsi: allocate scsi_cmnd structures as part of struct request")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
johnv-valve pushed a commit that referenced this pull request Nov 30, 2017
commit ff16567 upstream.

acpi_remove_pm_notifier() ends up calling flush_workqueue() while
holding acpi_pm_notifier_lock, and that same lock is taken by
by the work via acpi_pm_notify_handler(). This can deadlock.

To fix the problem let's split the single lock into two: one to
protect the dev->wakeup between the work vs. add/remove, and
another one to handle notifier installation vs. removal.

After commit a1d1493 "workqueue/lockdep: 'Fix' flush_work()
annotation" I was able to kill the machine (Intel Braswell)
very easily with 'powertop --auto-tune', runtime suspending i915,
and trying to wake it up via the USB keyboard. The cases when
it didn't die are presumably explained by lockdep getting disabled
by something else (cpu hotplug locking issues usually).

Fortunately I still got a lockdep report over netconsole
(trickling in very slowly), even though the machine was
otherwise practically dead:

[  112.179806] ======================================================
[  114.670858] WARNING: possible circular locking dependency detected
[  117.155663] 4.13.0-rc6-bsw-bisect-00169-ga1d14934ea4b #119 Not tainted
[  119.658101] ------------------------------------------------------
[  121.310242] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command.
[  121.313294] xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead
[  121.313346] xhci_hcd 0000:00:14.0: HC died; cleaning up
[  121.313485] usb 1-6: USB disconnect, device number 3
[  121.313501] usb 1-6.2: USB disconnect, device number 4
[  134.747383] kworker/0:2/47 is trying to acquire lock:
[  137.220790]  (acpi_pm_notifier_lock){+.+.}, at: [<ffffffff813cafdf>] acpi_pm_notify_handler+0x2f/0x80
[  139.721524]
[  139.721524] but task is already holding lock:
[  144.672922]  ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720
[  147.184450]
[  147.184450] which lock already depends on the new lock.
[  147.184450]
[  154.604711]
[  154.604711] the existing dependency chain (in reverse order) is:
[  159.447888]
[  159.447888] -> #2 ((&dpc->work)){+.+.}:
[  164.183486]        __lock_acquire+0x1255/0x13f0
[  166.504313]        lock_acquire+0xb5/0x210
[  168.778973]        process_one_work+0x1b9/0x720
[  171.030316]        worker_thread+0x4c/0x440
[  173.257184]        kthread+0x154/0x190
[  175.456143]        ret_from_fork+0x27/0x40
[  177.624348]
[  177.624348] -> #1 ("kacpi_notify"){+.+.}:
[  181.850351]        __lock_acquire+0x1255/0x13f0
[  183.941695]        lock_acquire+0xb5/0x210
[  186.046115]        flush_workqueue+0xdd/0x510
[  190.408153]        acpi_os_wait_events_complete+0x31/0x40
[  192.625303]        acpi_remove_notify_handler+0x133/0x188
[  194.820829]        acpi_remove_pm_notifier+0x56/0x90
[  196.989068]        acpi_dev_pm_detach+0x5f/0xa0
[  199.145866]        dev_pm_domain_detach+0x27/0x30
[  201.285614]        i2c_device_probe+0x100/0x210
[  203.411118]        driver_probe_device+0x23e/0x310
[  205.522425]        __driver_attach+0xa3/0xb0
[  207.634268]        bus_for_each_dev+0x69/0xa0
[  209.714797]        driver_attach+0x1e/0x20
[  211.778258]        bus_add_driver+0x1bc/0x230
[  213.837162]        driver_register+0x60/0xe0
[  215.868162]        i2c_register_driver+0x42/0x70
[  217.869551]        0xffffffffa0172017
[  219.863009]        do_one_initcall+0x45/0x170
[  221.843863]        do_init_module+0x5f/0x204
[  223.817915]        load_module+0x225b/0x29b0
[  225.757234]        SyS_finit_module+0xc6/0xd0
[  227.661851]        do_syscall_64+0x5c/0x120
[  229.536819]        return_from_SYSCALL_64+0x0/0x7a
[  231.392444]
[  231.392444] -> #0 (acpi_pm_notifier_lock){+.+.}:
[  235.124914]        check_prev_add+0x44e/0x8a0
[  237.024795]        __lock_acquire+0x1255/0x13f0
[  238.937351]        lock_acquire+0xb5/0x210
[  240.840799]        __mutex_lock+0x75/0x940
[  242.709517]        mutex_lock_nested+0x1c/0x20
[  244.551478]        acpi_pm_notify_handler+0x2f/0x80
[  246.382052]        acpi_ev_notify_dispatch+0x44/0x5c
[  248.194412]        acpi_os_execute_deferred+0x14/0x30
[  250.003925]        process_one_work+0x1ec/0x720
[  251.803191]        worker_thread+0x4c/0x440
[  253.605307]        kthread+0x154/0x190
[  255.387498]        ret_from_fork+0x27/0x40
[  257.153175]
[  257.153175] other info that might help us debug this:
[  257.153175]
[  262.324392] Chain exists of:
[  262.324392]   acpi_pm_notifier_lock --> "kacpi_notify" --> (&dpc->work)
[  262.324392]
[  267.391997]  Possible unsafe locking scenario:
[  267.391997]
[  270.758262]        CPU0                    CPU1
[  272.431713]        ----                    ----
[  274.060756]   lock((&dpc->work));
[  275.646532]                                lock("kacpi_notify");
[  277.260772]                                lock((&dpc->work));
[  278.839146]   lock(acpi_pm_notifier_lock);
[  280.391902]
[  280.391902]  *** DEADLOCK ***
[  280.391902]
[  284.986385] 2 locks held by kworker/0:2/47:
[  286.524895]  #0:  ("kacpi_notify"){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720
[  288.112927]  #1:  ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720
[  289.727725]

Fixes: c072530 (ACPI / PM: Revork the handling of ACPI device wakeup notifications)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
johnv-valve pushed a commit that referenced this pull request Nov 30, 2017
commit 8653188 upstream.

Avoid that the following is reported while loading the qla2xxx
kernel module:

BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/783
caller is debug_smp_processor_id+0x17/0x20
CPU: 7 PID: 783 Comm: modprobe Not tainted 4.14.0-rc8-dbg+ #2
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 dump_stack+0x8e/0xce
 check_preemption_disabled+0xe3/0xf0
 debug_smp_processor_id+0x17/0x20
 qla2x00_probe_one+0xf43/0x26c0 [qla2xxx]
 pci_device_probe+0xca/0x140
 driver_probe_device+0x2e2/0x440
 __driver_attach+0xa3/0xe0
 bus_for_each_dev+0x5f/0x90
 driver_attach+0x19/0x20
 bus_add_driver+0x1c0/0x260
 driver_register+0x5b/0xd0
 __pci_register_driver+0x63/0x70
 qla2x00_module_init+0x1d6/0x222 [qla2xxx]
 do_one_initcall+0x3c/0x163
 do_init_module+0x55/0x1eb
 load_module+0x20a2/0x2890
 SYSC_finit_module+0xd7/0xf0
 SyS_finit_module+0x9/0x10
 entry_SYSCALL_64_fastpath+0x23/0xc2

Fixes: commit 8abfa9e ("scsi: qla2xxx: Add function call to qpair for door bell")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Quinn Tran <quinn.tran@cavium.com>
Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
johnv-valve pushed a commit that referenced this pull request Jan 5, 2018
[ Upstream commit 87943db ]

Sai reported a warning during some MBA tests:

[  236.755559] ======================================================
[  236.762443] WARNING: possible circular locking dependency detected
[  236.769328] 4.14.0-rc4-yocto-standard #8 Not tainted
[  236.774857] ------------------------------------------------------
[  236.781738] mount/10091 is trying to acquire lock:
[  236.787071]  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8117f892>] static_key_enable+0x12/0x30
[  236.797058]
               but task is already holding lock:
[  236.803552]  (&type->s_umount_key#37/1){+.+.}, at: [<ffffffff81208b2f>] sget_userns+0x32f/0x520
[  236.813247]
               which lock already depends on the new lock.

[  236.822353]
               the existing dependency chain (in reverse order) is:
[  236.830686]
               -> #4 (&type->s_umount_key#37/1){+.+.}:
[  236.837756]        __lock_acquire+0x1100/0x11a0
[  236.842799]        lock_acquire+0xdf/0x1d0
[  236.847363]        down_write_nested+0x46/0x80
[  236.852310]        sget_userns+0x32f/0x520
[  236.856873]        kernfs_mount_ns+0x7e/0x1f0
[  236.861728]        rdt_mount+0x30c/0x440
[  236.866096]        mount_fs+0x38/0x150
[  236.870262]        vfs_kern_mount+0x67/0x150
[  236.875015]        do_mount+0x1df/0xd50
[  236.879286]        SyS_mount+0x95/0xe0
[  236.883464]        entry_SYSCALL_64_fastpath+0x18/0xad
[  236.889183]
               -> #3 (rdtgroup_mutex){+.+.}:
[  236.895292]        __lock_acquire+0x1100/0x11a0
[  236.900337]        lock_acquire+0xdf/0x1d0
[  236.904899]        __mutex_lock+0x80/0x8f0
[  236.909459]        mutex_lock_nested+0x1b/0x20
[  236.914407]        intel_rdt_online_cpu+0x3b/0x4a0
[  236.919745]        cpuhp_invoke_callback+0xce/0xb80
[  236.925177]        cpuhp_thread_fun+0x1c5/0x230
[  236.930222]        smpboot_thread_fn+0x11a/0x1e0
[  236.935362]        kthread+0x152/0x190
[  236.939536]        ret_from_fork+0x27/0x40
[  236.944097]
               -> #2 (cpuhp_state-up){+.+.}:
[  236.950199]        __lock_acquire+0x1100/0x11a0
[  236.955241]        lock_acquire+0xdf/0x1d0
[  236.959800]        cpuhp_issue_call+0x12e/0x1c0
[  236.964845]        __cpuhp_setup_state_cpuslocked+0x13b/0x2f0
[  236.971242]        __cpuhp_setup_state+0xa7/0x120
[  236.976483]        page_writeback_init+0x43/0x67
[  236.981623]        pagecache_init+0x38/0x3b
[  236.986281]        start_kernel+0x3c6/0x41a
[  236.990931]        x86_64_start_reservations+0x2a/0x2c
[  236.996650]        x86_64_start_kernel+0x72/0x75
[  237.001793]        verify_cpu+0x0/0xfb
[  237.005966]
               -> #1 (cpuhp_state_mutex){+.+.}:
[  237.012364]        __lock_acquire+0x1100/0x11a0
[  237.017408]        lock_acquire+0xdf/0x1d0
[  237.021969]        __mutex_lock+0x80/0x8f0
[  237.026527]        mutex_lock_nested+0x1b/0x20
[  237.031475]        __cpuhp_setup_state_cpuslocked+0x54/0x2f0
[  237.037777]        __cpuhp_setup_state+0xa7/0x120
[  237.043013]        page_alloc_init+0x28/0x30
[  237.047769]        start_kernel+0x148/0x41a
[  237.052425]        x86_64_start_reservations+0x2a/0x2c
[  237.058145]        x86_64_start_kernel+0x72/0x75
[  237.063284]        verify_cpu+0x0/0xfb
[  237.067456]
               -> #0 (cpu_hotplug_lock.rw_sem){++++}:
[  237.074436]        check_prev_add+0x401/0x800
[  237.079286]        __lock_acquire+0x1100/0x11a0
[  237.084330]        lock_acquire+0xdf/0x1d0
[  237.088890]        cpus_read_lock+0x42/0x90
[  237.093548]        static_key_enable+0x12/0x30
[  237.098496]        rdt_mount+0x406/0x440
[  237.102862]        mount_fs+0x38/0x150
[  237.107035]        vfs_kern_mount+0x67/0x150
[  237.111787]        do_mount+0x1df/0xd50
[  237.116058]        SyS_mount+0x95/0xe0
[  237.120233]        entry_SYSCALL_64_fastpath+0x18/0xad
[  237.125952]
               other info that might help us debug this:

[  237.134867] Chain exists of:
                 cpu_hotplug_lock.rw_sem --> rdtgroup_mutex --> &type->s_umount_key#37/1

[  237.148425]  Possible unsafe locking scenario:

[  237.155015]        CPU0                    CPU1
[  237.160057]        ----                    ----
[  237.165100]   lock(&type->s_umount_key#37/1);
[  237.169952]                                lock(rdtgroup_mutex);
[  237.176641]
lock(&type->s_umount_key#37/1);
[  237.184287]   lock(cpu_hotplug_lock.rw_sem);
[  237.189041]
                *** DEADLOCK ***

When the resctrl filesystem is mounted the locks must be acquired in the
same order as was done when the cpus came online:

     cpu_hotplug_lock before rdtgroup_mutex.

This also requires to switch the static_branch_enable() calls to the
_cpulocked variant because now cpu hotplug lock is held already.

[ tglx: Switched to cpus_read_[un]lock ]

Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Acked-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Link: https://lkml.kernel.org/r/9c41b91bc2f47d9e95b62b213ecdb45623c47a9f.1508490116.git.reinette.chatre@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
johnv-valve pushed a commit that referenced this pull request Jan 5, 2018
commit 5c3de77 upstream.

In the function brcmf_sdio_firmware_callback() the driver is
unbound from the sdio function devices in the error path.
However, the order in which it is done resulted in a use-after-free
issue (see brcmf_ops_sdio_remove() in bcmsdh.c). Hence change
the order and first unbind sdio function #2 device and then
unbind sdio function #1 device.

Fixes: 7a51461 ("brcmfmac: unbind all devices upon failure in firmware callback")
Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
johnv-valve pushed a commit that referenced this pull request Jan 5, 2018
[ Upstream commit f3069c6 ]

This is a fix for syzkaller719569, where memory registration was
attempted without any underlying transport being loaded.

Analysis of the case reveals that it is the setsockopt() RDS_GET_MR
(2) and RDS_GET_MR_FOR_DEST (7) that are vulnerable.

Here is an example stack trace when the bug is hit:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000c0
IP: __rds_rdma_map+0x36/0x440 [rds]
PGD 2f93d03067 P4D 2f93d03067 PUD 2f93d02067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: bridge stp llc tun rpcsec_gss_krb5 nfsv4
dns_resolver nfs fscache rds binfmt_misc sb_edac intel_powerclamp
coretemp kvm_intel kvm irqbypass crct10dif_pclmul c rc32_pclmul
ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd
iTCO_wdt mei_me sg iTCO_vendor_support ipmi_si mei ipmi_devintf nfsd
shpchp pcspkr i2c_i801 ioatd ma ipmi_msghandler wmi lpc_ich mfd_core
auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
mgag200 i2c_algo_bit drm_kms_helper ixgbe syscopyarea ahci sysfillrect
sysimgblt libahci mdio fb_sys_fops ttm ptp libata sd_mod mlx4_core drm
crc32c_intel pps_core megaraid_sas i2c_core dca dm_mirror
dm_region_hash dm_log dm_mod
CPU: 48 PID: 45787 Comm: repro_set2 Not tainted 4.14.2-3.el7uek.x86_64 #2
Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017
task: ffff882f9190db00 task.stack: ffffc9002b994000
RIP: 0010:__rds_rdma_map+0x36/0x440 [rds]
RSP: 0018:ffffc9002b997df0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff882fa2182580 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffc9002b997e40 RDI: ffff882fa2182580
RBP: ffffc9002b997e30 R08: 0000000000000000 R09: 0000000000000002
R10: ffff885fb29e3838 R11: 0000000000000000 R12: ffff882fa2182580
R13: ffff882fa2182580 R14: 0000000000000002 R15: 0000000020000ffc
FS:  00007fbffa20b700(0000) GS:ffff882fbfb80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000c0 CR3: 0000002f98a66006 CR4: 00000000001606e0
Call Trace:
 rds_get_mr+0x56/0x80 [rds]
 rds_setsockopt+0x172/0x340 [rds]
 ? __fget_light+0x25/0x60
 ? __fdget+0x13/0x20
 SyS_setsockopt+0x80/0xe0
 do_syscall_64+0x67/0x1b0
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fbff9b117f9
RSP: 002b:00007fbffa20aed8 EFLAGS: 00000293 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00000000000c84a4 RCX: 00007fbff9b117f9
RDX: 0000000000000002 RSI: 0000400000000114 RDI: 000000000000109b
RBP: 00007fbffa20af10 R08: 0000000000000020 R09: 00007fbff9dd7860
R10: 0000000020000ffc R11: 0000000000000293 R12: 0000000000000000
R13: 00007fbffa20b9c0 R14: 00007fbffa20b700 R15: 0000000000000021

Code: 41 56 41 55 49 89 fd 41 54 53 48 83 ec 18 8b 87 f0 02 00 00 48
89 55 d0 48 89 4d c8 85 c0 0f 84 2d 03 00 00 48 8b 87 00 03 00 00 <48>
83 b8 c0 00 00 00 00 0f 84 25 03 00 0 0 48 8b 06 48 8b 56 08

The fix is to check the existence of an underlying transport in
__rds_rdma_map().

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
johnv-valve pushed a commit that referenced this pull request Jan 5, 2018
[ Upstream commit 36b6f9f ]

Lockdep warns about a potential deadlock:

[   66.782842] ======================================================
[   66.782888] WARNING: possible circular locking dependency detected
[   66.782937] 4.14.0-rc2-test-test+ #48 Not tainted
[   66.782983] ------------------------------------------------------
[   66.783052] umount/336 is trying to acquire lock:
[   66.783117]  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff81032395>] rdt_kill_sb+0x215/0x390
[   66.783193]
               but task is already holding lock:
[   66.783244]  (rdtgroup_mutex){+.+.}, at: [<ffffffff810321b6>] rdt_kill_sb+0x36/0x390
[   66.783305]
               which lock already depends on the new lock.

[   66.783364]
               the existing dependency chain (in reverse order) is:
[   66.783419]
               -> #3 (rdtgroup_mutex){+.+.}:
[   66.783467]        __lock_acquire+0x1293/0x13f0
[   66.783509]        lock_acquire+0xaf/0x220
[   66.783543]        __mutex_lock+0x71/0x9b0
[   66.783575]        mutex_lock_nested+0x1b/0x20
[   66.783610]        intel_rdt_online_cpu+0x3b/0x430
[   66.783649]        cpuhp_invoke_callback+0xab/0x8e0
[   66.783687]        cpuhp_thread_fun+0x7a/0x150
[   66.783722]        smpboot_thread_fn+0x1cc/0x270
[   66.783764]        kthread+0x16e/0x190
[   66.783794]        ret_from_fork+0x27/0x40
[   66.783825]
               -> #2 (cpuhp_state){+.+.}:
[   66.783870]        __lock_acquire+0x1293/0x13f0
[   66.783906]        lock_acquire+0xaf/0x220
[   66.783938]        cpuhp_issue_call+0x102/0x170
[   66.783974]        __cpuhp_setup_state_cpuslocked+0x154/0x2a0
[   66.784023]        __cpuhp_setup_state+0xc7/0x170
[   66.784061]        page_writeback_init+0x43/0x67
[   66.784097]        pagecache_init+0x43/0x4a
[   66.784131]        start_kernel+0x3ad/0x3f7
[   66.784165]        x86_64_start_reservations+0x2a/0x2c
[   66.784204]        x86_64_start_kernel+0x72/0x75
[   66.784241]        verify_cpu+0x0/0xfb
[   66.784270]
               -> #1 (cpuhp_state_mutex){+.+.}:
[   66.784319]        __lock_acquire+0x1293/0x13f0
[   66.784355]        lock_acquire+0xaf/0x220
[   66.784387]        __mutex_lock+0x71/0x9b0
[   66.784419]        mutex_lock_nested+0x1b/0x20
[   66.784454]        __cpuhp_setup_state_cpuslocked+0x52/0x2a0
[   66.784497]        __cpuhp_setup_state+0xc7/0x170
[   66.784535]        page_alloc_init+0x28/0x30
[   66.784569]        start_kernel+0x148/0x3f7
[   66.784602]        x86_64_start_reservations+0x2a/0x2c
[   66.784642]        x86_64_start_kernel+0x72/0x75
[   66.784678]        verify_cpu+0x0/0xfb
[   66.784707]
               -> #0 (cpu_hotplug_lock.rw_sem){++++}:
[   66.784759]        check_prev_add+0x32f/0x6e0
[   66.784794]        __lock_acquire+0x1293/0x13f0
[   66.784830]        lock_acquire+0xaf/0x220
[   66.784863]        cpus_read_lock+0x3d/0xb0
[   66.784896]        rdt_kill_sb+0x215/0x390
[   66.784930]        deactivate_locked_super+0x3e/0x70
[   66.784968]        deactivate_super+0x40/0x60
[   66.785003]        cleanup_mnt+0x3f/0x80
[   66.785034]        __cleanup_mnt+0x12/0x20
[   66.785070]        task_work_run+0x8b/0xc0
[   66.785103]        exit_to_usermode_loop+0x94/0xa0
[   66.786804]        syscall_return_slowpath+0xe8/0x150
[   66.788502]        entry_SYSCALL_64_fastpath+0xab/0xad
[   66.790194]
               other info that might help us debug this:

[   66.795139] Chain exists of:
                 cpu_hotplug_lock.rw_sem --> cpuhp_state --> rdtgroup_mutex

[   66.800035]  Possible unsafe locking scenario:

[   66.803267]        CPU0                    CPU1
[   66.804867]        ----                    ----
[   66.806443]   lock(rdtgroup_mutex);
[   66.808002]                                lock(cpuhp_state);
[   66.809565]                                lock(rdtgroup_mutex);
[   66.811110]   lock(cpu_hotplug_lock.rw_sem);
[   66.812608]
                *** DEADLOCK ***

[   66.816983] 2 locks held by umount/336:
[   66.818418]  #0:  (&type->s_umount_key#35){+.+.}, at: [<ffffffff81229738>] deactivate_super+0x38/0x60
[   66.819922]  #1:  (rdtgroup_mutex){+.+.}, at: [<ffffffff810321b6>] rdt_kill_sb+0x36/0x390

When the resctrl filesystem is unmounted the locks should be obtain in the
locks in the same order as was done when the cpus came online:

      cpu_hotplug_lock before rdtgroup_mutex.

This also requires to switch the static_branch_disable() calls to the
_cpulocked variant because now cpu hotplug lock is held already.

[ tglx: Switched to cpus_read_[un]lock ]

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Acked-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/cc292e76be073f7260604651711c47b09fd0dc81.1508490116.git.reinette.chatre@intel.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants