Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor improvement for smsc95xx netusb driver performance. #139

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Minor improvement for smsc95xx netusb driver performance. #139

wants to merge 1 commit into from

Conversation

SiarheiVolkau
Copy link
Contributor

Reduce number of memcpy's by 1-2 improve transmit performance by 2-4%, or reduce cpu usage on a comparable value.

Reduce number of memcpy's by 1-2 improve transmit performance by 2-4% or reduce cpu usage on a comparable value.
miaoxie pushed a commit to miaoxie/linux-btrfs that referenced this pull request Nov 26, 2014
Corrects the following checkpatch gripes:

    WARNING: quoted string split across lines
    torvalds#95: FILE: drivers/mfd/ab3100-core.c:95:
    +			"write error (write register) "
    +			"%d bytes transferred (expected 2)\n",

    WARNING: quoted string split across lines
    torvalds#139: FILE: drivers/mfd/ab3100-core.c:139:
    +			"write error (write test register) "
    +			"%d bytes transferred (expected 2)\n",

    WARNING: quoted string split across lines
    torvalds#175: FILE: drivers/mfd/ab3100-core.c:175:
    +			"write error (send register address) "
    +			"%d bytes transferred (expected 1)\n",

    WARNING: quoted string split across lines
    torvalds#193: FILE: drivers/mfd/ab3100-core.c:193:
    +			"write error (read register) "
    +			"%d bytes transferred (expected 1)\n",

    WARNING: quoted string split across lines
    torvalds#241: FILE: drivers/mfd/ab3100-core.c:241:
    +			"write error (send first register address) "
    +			"%d bytes transferred (expected 1)\n",

    WARNING: quoted string split across lines
    torvalds#256: FILE: drivers/mfd/ab3100-core.c:256:
    +			"write error (read register page) "
    +			"%d bytes transferred (expected %d)\n",

    WARNING: quoted string split across lines
    torvalds#299: FILE: drivers/mfd/ab3100-core.c:299:
    +			"write error (maskset send address) "
    +			"%d bytes transferred (expected 1)\n",

    WARNING: quoted string split across lines
    torvalds#314: FILE: drivers/mfd/ab3100-core.c:314:
    +			"write error (maskset read register) "
    +			"%d bytes transferred (expected 1)\n",

    WARNING: quoted string split across lines
    torvalds#334: FILE: drivers/mfd/ab3100-core.c:334:
    +			"write error (write register) "
    +			"%d bytes transferred (expected 2)\n",

    WARNING: please, no spaces at the start of a line
    torvalds#374: FILE: drivers/mfd/ab3100-core.c:374:
    +  return blocking_notifier_chain_unregister(&ab3100->event_subscribers,$

    WARNING: Prefer seq_puts to seq_printf
    torvalds#458: FILE: drivers/mfd/ab3100-core.c:458:
    +	seq_printf(s, "AB3100 registers:\n");

    WARNING: quoted string split across lines
    torvalds#564: FILE: drivers/mfd/ab3100-core.c:564:
    +			 "debug write reg[0x%02x] with 0x%02x, "
    +			 "after readback: 0x%02x\n",

    WARNING: quoted string split across lines
    torvalds#723: FILE: drivers/mfd/ab3100-core.c:723:
    +			 "AB3100 P1E variant detected, "
    +			 "forcing chip to 32KHz\n");

    WARNING: quoted string split across lines
    torvalds#882: FILE: drivers/mfd/ab3100-core.c:882:
    +			"could not communicate with the AB3100 analog "
    +			"baseband chip\n");

    WARNING: quoted string split across lines
    torvalds#906: FILE: drivers/mfd/ab3100-core.c:906:
    +		dev_err(&client->dev, "accepting it anyway. Please update "
    +			"the driver.\n");

    total: 0 errors, 15 warnings, 999 lines checked

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
miaoxie pushed a commit to miaoxie/linux-btrfs that referenced this pull request Nov 27, 2014
If EIO happens after we have dropped j_state_lock, we won't notice
that the journal has been aborted.  So it is reasonable to move this
check after we have grabbed the j_checkpoint_mutex and re-grabbed the
j_state_lock.  This patch helps to prevent false positive complain
after EIO.

#DMESG:
__jbd2_log_wait_for_space: needed 8448 blocks and only had 8386 space available
__jbd2_log_wait_for_space: no way to get more journal space in ram1-8
------------[ cut here ]------------
WARNING: CPU: 15 PID: 6739 at fs/jbd2/checkpoint.c:168 __jbd2_log_wait_for_space+0x188/0x200()
Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
CPU: 15 PID: 6739 Comm: fsstress Tainted: G        W      3.17.0-rc2-00429-g684de57 torvalds#139
Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
 00000000000000a8 ffff88077aaab878 ffffffff815c1a8c 00000000000000a8
 0000000000000000 ffff88077aaab8b8 ffffffff8106ce8c ffff88077aaab898
 ffff8807c57e6000 ffff8807c57e6028 0000000000002100 ffff8807c57e62f0
Call Trace:
 [<ffffffff815c1a8c>] dump_stack+0x51/0x6d
 [<ffffffff8106ce8c>] warn_slowpath_common+0x8c/0xc0
 [<ffffffff8106ceda>] warn_slowpath_null+0x1a/0x20
 [<ffffffff812419f8>] __jbd2_log_wait_for_space+0x188/0x200
 [<ffffffff8123be9a>] start_this_handle+0x4da/0x7b0
 [<ffffffff810990e5>] ? local_clock+0x25/0x30
 [<ffffffff810aba87>] ? lockdep_init_map+0xe7/0x180
 [<ffffffff8123c5bc>] jbd2__journal_start+0xdc/0x1d0
 [<ffffffff811f2414>] ? __ext4_new_inode+0x7f4/0x1330
 [<ffffffff81222a38>] __ext4_journal_start_sb+0xf8/0x110
 [<ffffffff811f2414>] __ext4_new_inode+0x7f4/0x1330
 [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
 [<ffffffff812025bb>] ext4_create+0x8b/0x150
 [<ffffffff8117fe3b>] vfs_create+0x7b/0xb0
 [<ffffffff8118097b>] do_last+0x7db/0xcf0
 [<ffffffff8117e31d>] ? inode_permission+0x4d/0x50
 [<ffffffff811845d2>] path_openat+0x242/0x590
 [<ffffffff81191a76>] ? __alloc_fd+0x36/0x140
 [<ffffffff81184a6a>] do_filp_open+0x4a/0xb0
 [<ffffffff81191b61>] ? __alloc_fd+0x121/0x140
 [<ffffffff81172f20>] do_sys_open+0x170/0x220
 [<ffffffff8117300e>] SyS_open+0x1e/0x20
 [<ffffffff811715d6>] SyS_creat+0x16/0x20
 [<ffffffff815c7e12>] system_call_fastpath+0x16/0x1b
---[ end trace cd71c831f82059db ]---

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
@Elizafox
Copy link

Elizafox commented Jan 8, 2015

Linus doesn't accept pull requests from GitHub. Consult https://github.com/torvalds/linux/blob/master/Documentation/HOWTO instead.

nhoriguchi pushed a commit to nhoriguchi/linux that referenced this pull request May 11, 2015
Stress testing which injects soft offlining events for the process which
iterates "mmap-pagefault-munmap" loop can trigger BUG_ON in __free_one_page
due to PageHWPoison flag.
If page migration succeeds, the source page is supposed to be freed after
migration. But it seems that there can be a strange page state where it's
almost a free page, but it is somewhat reachable via drain_pages_zone.
maybe due to a race between __pagevec_lru_add_fn and putback_lru_page,
there could be

  [   14.025761] Soft offlining page 0x70fe1 at 0x70100008d000
  [   14.029400] Soft offlining page 0x705fb at 0x70300008d000
  [   14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping:          (null) index:0x2
  [   14.031186] flags: 0x1fffff80800000(hwpoison)
  [   14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
  [   14.031186] ------------[ cut here ]------------
  [   14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
  [   14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
  [   14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
  [   14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139
  [   14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  [   14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000
  [   14.031186] RIP: 0010:[<ffffffff811a806a>]  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186] RSP: 0018:ffff88007a117d28  EFLAGS: 00010096
  [   14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006
  [   14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0
  [   14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe
  [   14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1
  [   14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840
  [   14.031186] FS:  00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000
  [   14.031186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  [   14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0
  [   14.031186] Stack:
  [   14.031186]  ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998
  [   14.031186]  ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003
  [   14.031186]  ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998
  [   14.031186] Call Trace:
  [   14.031186]  [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50
  [   14.031186]  [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50
  [   14.031186]  [<ffffffff811a839d>] drain_local_pages+0x1d/0x30
  [   14.031186]  [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80
  [   14.031186]  [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0
  [   14.031186]  [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0
  [   14.031186]  [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780
  [   14.031186]  [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50
  [   14.031186]  [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120
  [   14.031186]  [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70
  [   14.031186]  [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170
  [   14.031186]  [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d
  [   14.031186]  [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17
  [   14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
  [   14.031186] RIP  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186]  RSP <ffff88007a117d28>
  [   14.031186] ---[ end trace 53926436e76d1f35 ]---

ate_pages()
This looks strange to me because soft offline makes target page free at first
then mark PageHWPoison. We call drain_all_pages() ...

Currently hard offline (memory_failure()) and soft offline isolate the in-use
target page differently. Hard offline keeps its refcount (so it intentionally
leaks error pages,) but soft offline frees it then marks PageHWPoison.
This "free then mark PageHWPoison" behavior sometimes doesn't work due to
pcplist (commit 9ab3b59 "mm: hwpoison: drop lru_add_drain_all() in
__soft_offline_page()" refers to it, but unfortunately it didn't solve this
problem.) That's because page freeing from drain_pages_zone() can't handle
hwpoisoned page and it's still on a pcplist after page migration with "freed"
status (so drain_all_pages() doesn't work.) I don't have clear idea about
a fix on freeing code, but think that we should avoid freeing hwpoisoned page
as hard offline code does. So this patch does this.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
nhoriguchi pushed a commit to nhoriguchi/linux that referenced this pull request May 11, 2015
Stress testing showed that soft offline events for a process iterating
"mmap-pagefault-munmap" loop can trigger VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP)
in __free_one_page():

  [   14.025761] Soft offlining page 0x70fe1 at 0x70100008d000
  [   14.029400] Soft offlining page 0x705fb at 0x70300008d000
  [   14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping:          (null) index:0x2
  [   14.031186] flags: 0x1fffff80800000(hwpoison)
  [   14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
  [   14.031186] ------------[ cut here ]------------
  [   14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
  [   14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
  [   14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
  [   14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139
  [   14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  [   14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000
  [   14.031186] RIP: 0010:[<ffffffff811a806a>]  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186] RSP: 0018:ffff88007a117d28  EFLAGS: 00010096
  [   14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006
  [   14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0
  [   14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe
  [   14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1
  [   14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840
  [   14.031186] FS:  00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000
  [   14.031186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  [   14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0
  [   14.031186] Stack:
  [   14.031186]  ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998
  [   14.031186]  ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003
  [   14.031186]  ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998
  [   14.031186] Call Trace:
  [   14.031186]  [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50
  [   14.031186]  [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50
  [   14.031186]  [<ffffffff811a839d>] drain_local_pages+0x1d/0x30
  [   14.031186]  [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80
  [   14.031186]  [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0
  [   14.031186]  [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0
  [   14.031186]  [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780
  [   14.031186]  [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50
  [   14.031186]  [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120
  [   14.031186]  [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70
  [   14.031186]  [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170
  [   14.031186]  [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d
  [   14.031186]  [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17
  [   14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
  [   14.031186] RIP  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186]  RSP <ffff88007a117d28>
  [   14.031186] ---[ end trace 53926436e76d1f35 ]---

When soft offline successfully migrates page, the source page is supposed to
be freed. But there is a race condition where a source page looks isolated
(i.e. the refcount is 0 and the PageHWPoison is set) but somewhat linked to
pcplist. Then another soft offline event calls drain_all_pages() and tries to
free such hwpoisoned page, which is forbidden.

This odd page state seems to happen due to the race between put_page() in
putback_lru_page() and __pagevec_lru_add_fn(). But I don't want to play with
tweaking drain code as done in commit 9ab3b59 "mm: hwpoison: drop
lru_add_drain_all() in __soft_offline_page()", or to change page freeing code
for this soft offline's purpose.

Instead, let's think about the difference between hard offline and soft offline.
There is an interesting difference in how to isolate the in-use page between
these, that is, hard offline marks PageHWPoison of the target page at first, and
doesn't free it by keeping its refcount 1. OTOH, soft offline tries to free
the target page then marks PageHWPoison. This difference might be the source
of complexity and result in bugs like the above. So making soft offline isolate
with keeping refcount can be a solution for this problem.

We can give to page migration code the "reason" which shows the caller, so
avoiding calling putback_lru_page() when called from soft offline does what
we need. With this change, target pages of soft offline never be reused without
changing migratetype, so this patch also removes it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
nhoriguchi pushed a commit to nhoriguchi/linux that referenced this pull request May 12, 2015
Stress testing showed that soft offline events for a process iterating
"mmap-pagefault-munmap" loop can trigger VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP)
in __free_one_page():

  [   14.025761] Soft offlining page 0x70fe1 at 0x70100008d000
  [   14.029400] Soft offlining page 0x705fb at 0x70300008d000
  [   14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping:          (null) index:0x2
  [   14.031186] flags: 0x1fffff80800000(hwpoison)
  [   14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
  [   14.031186] ------------[ cut here ]------------
  [   14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
  [   14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
  [   14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
  [   14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139
  [   14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  [   14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000
  [   14.031186] RIP: 0010:[<ffffffff811a806a>]  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186] RSP: 0018:ffff88007a117d28  EFLAGS: 00010096
  [   14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006
  [   14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0
  [   14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe
  [   14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1
  [   14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840
  [   14.031186] FS:  00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000
  [   14.031186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  [   14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0
  [   14.031186] Stack:
  [   14.031186]  ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998
  [   14.031186]  ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003
  [   14.031186]  ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998
  [   14.031186] Call Trace:
  [   14.031186]  [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50
  [   14.031186]  [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50
  [   14.031186]  [<ffffffff811a839d>] drain_local_pages+0x1d/0x30
  [   14.031186]  [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80
  [   14.031186]  [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0
  [   14.031186]  [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0
  [   14.031186]  [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780
  [   14.031186]  [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50
  [   14.031186]  [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120
  [   14.031186]  [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70
  [   14.031186]  [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170
  [   14.031186]  [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d
  [   14.031186]  [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17
  [   14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
  [   14.031186] RIP  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186]  RSP <ffff88007a117d28>
  [   14.031186] ---[ end trace 53926436e76d1f35 ]---

When soft offline successfully migrates page, the source page is supposed to
be freed. But there is a race condition where a source page looks isolated
(i.e. the refcount is 0 and the PageHWPoison is set) but somewhat linked to
pcplist. Then another soft offline event calls drain_all_pages() and tries to
free such hwpoisoned page, which is forbidden.

This odd page state seems to happen due to the race between put_page() in
putback_lru_page() and __pagevec_lru_add_fn(). But I don't want to play with
tweaking drain code as done in commit 9ab3b59 "mm: hwpoison: drop
lru_add_drain_all() in __soft_offline_page()", or to change page freeing code
for this soft offline's purpose.

Instead, let's think about the difference between hard offline and soft offline.
There is an interesting difference in how to isolate the in-use page between
these, that is, hard offline marks PageHWPoison of the target page at first, and
doesn't free it by keeping its refcount 1. OTOH, soft offline tries to free
the target page then marks PageHWPoison. This difference might be the source
of complexity and result in bugs like the above. So making soft offline isolate
with keeping refcount can be a solution for this problem.

We can pass to page migration code the "reason" which shows the caller, so
let's use this more to avoid calling putback_lru_page() when called from soft
offline, which effectively does the isolation for soft offline. With this
change, target pages of soft offline never be reused without changing
migratetype, so this patch also removes the related code.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
nhoriguchi pushed a commit to nhoriguchi/linux that referenced this pull request May 14, 2015
Stress testing showed that soft offline events for a process iterating
"mmap-pagefault-munmap" loop can trigger VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP)
in __free_one_page():

  [   14.025761] Soft offlining page 0x70fe1 at 0x70100008d000
  [   14.029400] Soft offlining page 0x705fb at 0x70300008d000
  [   14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping:          (null) index:0x2
  [   14.031186] flags: 0x1fffff80800000(hwpoison)
  [   14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
  [   14.031186] ------------[ cut here ]------------
  [   14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
  [   14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
  [   14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
  [   14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139
  [   14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  [   14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000
  [   14.031186] RIP: 0010:[<ffffffff811a806a>]  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186] RSP: 0018:ffff88007a117d28  EFLAGS: 00010096
  [   14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006
  [   14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0
  [   14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe
  [   14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1
  [   14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840
  [   14.031186] FS:  00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000
  [   14.031186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  [   14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0
  [   14.031186] Stack:
  [   14.031186]  ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998
  [   14.031186]  ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003
  [   14.031186]  ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998
  [   14.031186] Call Trace:
  [   14.031186]  [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50
  [   14.031186]  [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50
  [   14.031186]  [<ffffffff811a839d>] drain_local_pages+0x1d/0x30
  [   14.031186]  [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80
  [   14.031186]  [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0
  [   14.031186]  [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0
  [   14.031186]  [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780
  [   14.031186]  [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50
  [   14.031186]  [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120
  [   14.031186]  [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70
  [   14.031186]  [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170
  [   14.031186]  [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d
  [   14.031186]  [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17
  [   14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
  [   14.031186] RIP  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186]  RSP <ffff88007a117d28>
  [   14.031186] ---[ end trace 53926436e76d1f35 ]---

When soft offline successfully migrates page, the source page is supposed to
be freed. But there is a race condition where a source page looks isolated
(i.e. the refcount is 0 and the PageHWPoison is set) but somewhat linked to
pcplist. Then another soft offline event calls drain_all_pages() and tries to
free such hwpoisoned page, which is forbidden.

This odd page state seems to happen due to the race between put_page() in
putback_lru_page() and __pagevec_lru_add_fn(). But I don't want to play with
tweaking drain code as done in commit 9ab3b59 "mm: hwpoison: drop
lru_add_drain_all() in __soft_offline_page()", or to change page freeing code
for this soft offline's purpose.

Instead, let's think about the difference between hard offline and soft offline.
There is an interesting difference in how to isolate the in-use page between
these, that is, hard offline marks PageHWPoison of the target page at first, and
doesn't free it by keeping its refcount 1. OTOH, soft offline tries to free
the target page then marks PageHWPoison. This difference might be the source
of complexity and result in bugs like the above. So making soft offline isolate
with keeping refcount can be a solution for this problem.

We can pass to page migration code the "reason" which shows the caller, so
let's use this more to avoid calling putback_lru_page() when called from soft
offline, which effectively does the isolation for soft offline. With this
change, target pages of soft offline never be reused without changing
migratetype, so this patch also removes the related code.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
ddstreet pushed a commit to ddstreet/linux that referenced this pull request Jun 4, 2015
Stress testing showed that soft offline events for a process iterating
"mmap-pagefault-munmap" loop can trigger
VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page():

  [   14.025761] Soft offlining page 0x70fe1 at 0x70100008d000
  [   14.029400] Soft offlining page 0x705fb at 0x70300008d000
  [   14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping:          (null) index:0x2
  [   14.031186] flags: 0x1fffff80800000(hwpoison)
  [   14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
  [   14.031186] ------------[ cut here ]------------
  [   14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
  [   14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
  [   14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
  [   14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139
  [   14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  [   14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000
  [   14.031186] RIP: 0010:[<ffffffff811a806a>]  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186] RSP: 0018:ffff88007a117d28  EFLAGS: 00010096
  [   14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006
  [   14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0
  [   14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe
  [   14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1
  [   14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840
  [   14.031186] FS:  00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000
  [   14.031186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  [   14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0
  [   14.031186] Stack:
  [   14.031186]  ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998
  [   14.031186]  ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003
  [   14.031186]  ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998
  [   14.031186] Call Trace:
  [   14.031186]  [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50
  [   14.031186]  [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50
  [   14.031186]  [<ffffffff811a839d>] drain_local_pages+0x1d/0x30
  [   14.031186]  [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80
  [   14.031186]  [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0
  [   14.031186]  [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0
  [   14.031186]  [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780
  [   14.031186]  [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50
  [   14.031186]  [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120
  [   14.031186]  [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70
  [   14.031186]  [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170
  [   14.031186]  [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d
  [   14.031186]  [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17
  [   14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
  [   14.031186] RIP  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186]  RSP <ffff88007a117d28>
  [   14.031186] ---[ end trace 53926436e76d1f35 ]---

When soft offline successfully migrates page, the source page is supposed
to be freed.  But there is a race condition where a source page looks
isolated (i.e.  the refcount is 0 and the PageHWPoison is set) but
somewhat linked to pcplist.  Then another soft offline event calls
drain_all_pages() and tries to free such hwpoisoned page, which is
forbidden.

This odd page state seems to happen due to the race between put_page() in
putback_lru_page() and __pagevec_lru_add_fn().  But I don't want to play
with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison:
drop lru_add_drain_all() in __soft_offline_page()", or to change page
freeing code for this soft offline's purpose.

Instead, let's think about the difference between hard offline and soft
offline.  There is an interesting difference in how to isolate the in-use
page between these, that is, hard offline marks PageHWPoison of the target
page at first, and doesn't free it by keeping its refcount 1.  OTOH, soft
offline tries to free the target page then marks PageHWPoison.  This
difference might be the source of complexity and result in bugs like the
above.  So making soft offline isolate with keeping refcount can be a
solution for this problem.

We can pass to page migration code the "reason" which shows the caller, so
let's use this more to avoid calling putback_lru_page() when called from
soft offline, which effectively does the isolation for soft offline.  With
this change, target pages of soft offline never be reused without changing
migratetype, so this patch also removes the related code.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ddstreet pushed a commit to ddstreet/linux that referenced this pull request Jun 8, 2015
Stress testing showed that soft offline events for a process iterating
"mmap-pagefault-munmap" loop can trigger
VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page():

  [   14.025761] Soft offlining page 0x70fe1 at 0x70100008d000
  [   14.029400] Soft offlining page 0x705fb at 0x70300008d000
  [   14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping:          (null) index:0x2
  [   14.031186] flags: 0x1fffff80800000(hwpoison)
  [   14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
  [   14.031186] ------------[ cut here ]------------
  [   14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
  [   14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
  [   14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
  [   14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139
  [   14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  [   14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000
  [   14.031186] RIP: 0010:[<ffffffff811a806a>]  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186] RSP: 0018:ffff88007a117d28  EFLAGS: 00010096
  [   14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006
  [   14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0
  [   14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe
  [   14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1
  [   14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840
  [   14.031186] FS:  00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000
  [   14.031186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  [   14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0
  [   14.031186] Stack:
  [   14.031186]  ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998
  [   14.031186]  ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003
  [   14.031186]  ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998
  [   14.031186] Call Trace:
  [   14.031186]  [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50
  [   14.031186]  [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50
  [   14.031186]  [<ffffffff811a839d>] drain_local_pages+0x1d/0x30
  [   14.031186]  [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80
  [   14.031186]  [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0
  [   14.031186]  [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0
  [   14.031186]  [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780
  [   14.031186]  [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50
  [   14.031186]  [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120
  [   14.031186]  [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70
  [   14.031186]  [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170
  [   14.031186]  [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d
  [   14.031186]  [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17
  [   14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
  [   14.031186] RIP  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186]  RSP <ffff88007a117d28>
  [   14.031186] ---[ end trace 53926436e76d1f35 ]---

When soft offline successfully migrates page, the source page is supposed
to be freed.  But there is a race condition where a source page looks
isolated (i.e.  the refcount is 0 and the PageHWPoison is set) but
somewhat linked to pcplist.  Then another soft offline event calls
drain_all_pages() and tries to free such hwpoisoned page, which is
forbidden.

This odd page state seems to happen due to the race between put_page() in
putback_lru_page() and __pagevec_lru_add_fn().  But I don't want to play
with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison:
drop lru_add_drain_all() in __soft_offline_page()", or to change page
freeing code for this soft offline's purpose.

Instead, let's think about the difference between hard offline and soft
offline.  There is an interesting difference in how to isolate the in-use
page between these, that is, hard offline marks PageHWPoison of the target
page at first, and doesn't free it by keeping its refcount 1.  OTOH, soft
offline tries to free the target page then marks PageHWPoison.  This
difference might be the source of complexity and result in bugs like the
above.  So making soft offline isolate with keeping refcount can be a
solution for this problem.

We can pass to page migration code the "reason" which shows the caller, so
let's use this more to avoid calling putback_lru_page() when called from
soft offline, which effectively does the isolation for soft offline.  With
this change, target pages of soft offline never be reused without changing
migratetype, so this patch also removes the related code.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
nhoriguchi pushed a commit to nhoriguchi/linux that referenced this pull request Jun 9, 2015
Stress testing showed that soft offline events for a process iterating
"mmap-pagefault-munmap" loop can trigger
VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page():

  [   14.025761] Soft offlining page 0x70fe1 at 0x70100008d000
  [   14.029400] Soft offlining page 0x705fb at 0x70300008d000
  [   14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping:          (null) index:0x2
  [   14.031186] flags: 0x1fffff80800000(hwpoison)
  [   14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
  [   14.031186] ------------[ cut here ]------------
  [   14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
  [   14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
  [   14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
  [   14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139
  [   14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  [   14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000
  [   14.031186] RIP: 0010:[<ffffffff811a806a>]  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186] RSP: 0018:ffff88007a117d28  EFLAGS: 00010096
  [   14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006
  [   14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0
  [   14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe
  [   14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1
  [   14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840
  [   14.031186] FS:  00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000
  [   14.031186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  [   14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0
  [   14.031186] Stack:
  [   14.031186]  ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998
  [   14.031186]  ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003
  [   14.031186]  ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998
  [   14.031186] Call Trace:
  [   14.031186]  [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50
  [   14.031186]  [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50
  [   14.031186]  [<ffffffff811a839d>] drain_local_pages+0x1d/0x30
  [   14.031186]  [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80
  [   14.031186]  [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0
  [   14.031186]  [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0
  [   14.031186]  [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780
  [   14.031186]  [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50
  [   14.031186]  [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120
  [   14.031186]  [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70
  [   14.031186]  [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170
  [   14.031186]  [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d
  [   14.031186]  [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17
  [   14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
  [   14.031186] RIP  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186]  RSP <ffff88007a117d28>
  [   14.031186] ---[ end trace 53926436e76d1f35 ]---

When soft offline successfully migrates page, the source page is supposed
to be freed.  But there is a race condition where a source page looks
isolated (i.e.  the refcount is 0 and the PageHWPoison is set) but
somewhat linked to pcplist.  Then another soft offline event calls
drain_all_pages() and tries to free such hwpoisoned page, which is
forbidden.

This odd page state seems to happen due to the race between put_page() in
putback_lru_page() and __pagevec_lru_add_fn().  But I don't want to play
with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison:
drop lru_add_drain_all() in __soft_offline_page()", or to change page
freeing code for this soft offline's purpose.

Instead, let's think about the difference between hard offline and soft
offline.  There is an interesting difference in how to isolate the in-use
page between these, that is, hard offline marks PageHWPoison of the target
page at first, and doesn't free it by keeping its refcount 1.  OTOH, soft
offline tries to free the target page then marks PageHWPoison.  This
difference might be the source of complexity and result in bugs like the
above.  So making soft offline isolate with keeping refcount can be a
solution for this problem.

We can pass to page migration code the "reason" which shows the caller, so
let's use this more to avoid calling putback_lru_page() when called from
soft offline, which effectively does the isolation for soft offline.  With
this change, target pages of soft offline never be reused without changing
migratetype, so this patch also removes the related code.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ddstreet pushed a commit to ddstreet/linux that referenced this pull request Jun 10, 2015
Stress testing showed that soft offline events for a process iterating
"mmap-pagefault-munmap" loop can trigger
VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page():

  [   14.025761] Soft offlining page 0x70fe1 at 0x70100008d000
  [   14.029400] Soft offlining page 0x705fb at 0x70300008d000
  [   14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping:          (null) index:0x2
  [   14.031186] flags: 0x1fffff80800000(hwpoison)
  [   14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
  [   14.031186] ------------[ cut here ]------------
  [   14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
  [   14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
  [   14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
  [   14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139
  [   14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  [   14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000
  [   14.031186] RIP: 0010:[<ffffffff811a806a>]  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186] RSP: 0018:ffff88007a117d28  EFLAGS: 00010096
  [   14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006
  [   14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0
  [   14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe
  [   14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1
  [   14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840
  [   14.031186] FS:  00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000
  [   14.031186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  [   14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0
  [   14.031186] Stack:
  [   14.031186]  ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998
  [   14.031186]  ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003
  [   14.031186]  ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998
  [   14.031186] Call Trace:
  [   14.031186]  [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50
  [   14.031186]  [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50
  [   14.031186]  [<ffffffff811a839d>] drain_local_pages+0x1d/0x30
  [   14.031186]  [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80
  [   14.031186]  [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0
  [   14.031186]  [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0
  [   14.031186]  [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780
  [   14.031186]  [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50
  [   14.031186]  [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120
  [   14.031186]  [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70
  [   14.031186]  [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170
  [   14.031186]  [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d
  [   14.031186]  [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17
  [   14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
  [   14.031186] RIP  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186]  RSP <ffff88007a117d28>
  [   14.031186] ---[ end trace 53926436e76d1f35 ]---

When soft offline successfully migrates page, the source page is supposed
to be freed.  But there is a race condition where a source page looks
isolated (i.e.  the refcount is 0 and the PageHWPoison is set) but
somewhat linked to pcplist.  Then another soft offline event calls
drain_all_pages() and tries to free such hwpoisoned page, which is
forbidden.

This odd page state seems to happen due to the race between put_page() in
putback_lru_page() and __pagevec_lru_add_fn().  But I don't want to play
with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison:
drop lru_add_drain_all() in __soft_offline_page()", or to change page
freeing code for this soft offline's purpose.

Instead, let's think about the difference between hard offline and soft
offline.  There is an interesting difference in how to isolate the in-use
page between these, that is, hard offline marks PageHWPoison of the target
page at first, and doesn't free it by keeping its refcount 1.  OTOH, soft
offline tries to free the target page then marks PageHWPoison.  This
difference might be the source of complexity and result in bugs like the
above.  So making soft offline isolate with keeping refcount can be a
solution for this problem.

We can pass to page migration code the "reason" which shows the caller, so
let's use this more to avoid calling putback_lru_page() when called from
soft offline, which effectively does the isolation for soft offline.  With
this change, target pages of soft offline never be reused without changing
migratetype, so this patch also removes the related code.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ddstreet pushed a commit to ddstreet/linux that referenced this pull request Jun 20, 2015
Stress testing showed that soft offline events for a process iterating
"mmap-pagefault-munmap" loop can trigger
VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page():

  [   14.025761] Soft offlining page 0x70fe1 at 0x70100008d000
  [   14.029400] Soft offlining page 0x705fb at 0x70300008d000
  [   14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping:          (null) index:0x2
  [   14.031186] flags: 0x1fffff80800000(hwpoison)
  [   14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
  [   14.031186] ------------[ cut here ]------------
  [   14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
  [   14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
  [   14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
  [   14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139
  [   14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  [   14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000
  [   14.031186] RIP: 0010:[<ffffffff811a806a>]  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186] RSP: 0018:ffff88007a117d28  EFLAGS: 00010096
  [   14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006
  [   14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0
  [   14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe
  [   14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1
  [   14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840
  [   14.031186] FS:  00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000
  [   14.031186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  [   14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0
  [   14.031186] Stack:
  [   14.031186]  ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998
  [   14.031186]  ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003
  [   14.031186]  ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998
  [   14.031186] Call Trace:
  [   14.031186]  [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50
  [   14.031186]  [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50
  [   14.031186]  [<ffffffff811a839d>] drain_local_pages+0x1d/0x30
  [   14.031186]  [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80
  [   14.031186]  [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0
  [   14.031186]  [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0
  [   14.031186]  [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780
  [   14.031186]  [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50
  [   14.031186]  [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120
  [   14.031186]  [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70
  [   14.031186]  [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170
  [   14.031186]  [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d
  [   14.031186]  [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17
  [   14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
  [   14.031186] RIP  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186]  RSP <ffff88007a117d28>
  [   14.031186] ---[ end trace 53926436e76d1f35 ]---

When soft offline successfully migrates page, the source page is supposed
to be freed.  But there is a race condition where a source page looks
isolated (i.e.  the refcount is 0 and the PageHWPoison is set) but
somewhat linked to pcplist.  Then another soft offline event calls
drain_all_pages() and tries to free such hwpoisoned page, which is
forbidden.

This odd page state seems to happen due to the race between put_page() in
putback_lru_page() and __pagevec_lru_add_fn().  But I don't want to play
with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison:
drop lru_add_drain_all() in __soft_offline_page()", or to change page
freeing code for this soft offline's purpose.

Instead, let's think about the difference between hard offline and soft
offline.  There is an interesting difference in how to isolate the in-use
page between these, that is, hard offline marks PageHWPoison of the target
page at first, and doesn't free it by keeping its refcount 1.  OTOH, soft
offline tries to free the target page then marks PageHWPoison.  This
difference might be the source of complexity and result in bugs like the
above.  So making soft offline isolate with keeping refcount can be a
solution for this problem.

We can pass to page migration code the "reason" which shows the caller, so
let's use this more to avoid calling putback_lru_page() when called from
soft offline, which effectively does the isolation for soft offline.  With
this change, target pages of soft offline never be reused without changing
migratetype, so this patch also removes the related code.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ddstreet pushed a commit to ddstreet/linux that referenced this pull request Jun 20, 2015
Stress testing showed that soft offline events for a process iterating
"mmap-pagefault-munmap" loop can trigger
VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page():

  [   14.025761] Soft offlining page 0x70fe1 at 0x70100008d000
  [   14.029400] Soft offlining page 0x705fb at 0x70300008d000
  [   14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping:          (null) index:0x2
  [   14.031186] flags: 0x1fffff80800000(hwpoison)
  [   14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
  [   14.031186] ------------[ cut here ]------------
  [   14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
  [   14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
  [   14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
  [   14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139
  [   14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  [   14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000
  [   14.031186] RIP: 0010:[<ffffffff811a806a>]  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186] RSP: 0018:ffff88007a117d28  EFLAGS: 00010096
  [   14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006
  [   14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0
  [   14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe
  [   14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1
  [   14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840
  [   14.031186] FS:  00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000
  [   14.031186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  [   14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0
  [   14.031186] Stack:
  [   14.031186]  ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998
  [   14.031186]  ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003
  [   14.031186]  ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998
  [   14.031186] Call Trace:
  [   14.031186]  [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50
  [   14.031186]  [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50
  [   14.031186]  [<ffffffff811a839d>] drain_local_pages+0x1d/0x30
  [   14.031186]  [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80
  [   14.031186]  [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0
  [   14.031186]  [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0
  [   14.031186]  [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780
  [   14.031186]  [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50
  [   14.031186]  [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120
  [   14.031186]  [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70
  [   14.031186]  [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170
  [   14.031186]  [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d
  [   14.031186]  [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17
  [   14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
  [   14.031186] RIP  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186]  RSP <ffff88007a117d28>
  [   14.031186] ---[ end trace 53926436e76d1f35 ]---

When soft offline successfully migrates page, the source page is supposed
to be freed.  But there is a race condition where a source page looks
isolated (i.e.  the refcount is 0 and the PageHWPoison is set) but
somewhat linked to pcplist.  Then another soft offline event calls
drain_all_pages() and tries to free such hwpoisoned page, which is
forbidden.

This odd page state seems to happen due to the race between put_page() in
putback_lru_page() and __pagevec_lru_add_fn().  But I don't want to play
with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison:
drop lru_add_drain_all() in __soft_offline_page()", or to change page
freeing code for this soft offline's purpose.

Instead, let's think about the difference between hard offline and soft
offline.  There is an interesting difference in how to isolate the in-use
page between these, that is, hard offline marks PageHWPoison of the target
page at first, and doesn't free it by keeping its refcount 1.  OTOH, soft
offline tries to free the target page then marks PageHWPoison.  This
difference might be the source of complexity and result in bugs like the
above.  So making soft offline isolate with keeping refcount can be a
solution for this problem.

We can pass to page migration code the "reason" which shows the caller, so
let's use this more to avoid calling putback_lru_page() when called from
soft offline, which effectively does the isolation for soft offline.  With
this change, target pages of soft offline never be reused without changing
migratetype, so this patch also removes the related code.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
torvalds pushed a commit that referenced this pull request Jun 25, 2015
Stress testing showed that soft offline events for a process iterating
"mmap-pagefault-munmap" loop can trigger
VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page():

  Soft offlining page 0x70fe1 at 0x70100008d000
  Soft offlining page 0x705fb at 0x70300008d000
  page:ffffea0001c3f840 count:0 mapcount:0 mapping:          (null) index:0x2
  flags: 0x1fffff80800000(hwpoison)
  page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
  ------------[ cut here ]------------
  kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
  invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
  Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
  CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 #139
  RIP: free_pcppages_bulk+0x52a/0x6f0
  Call Trace:
    drain_pages_zone+0x3d/0x50
    drain_local_pages+0x1d/0x30
    on_each_cpu_mask+0x46/0x80
    drain_all_pages+0x14b/0x1e0
    soft_offline_page+0x432/0x6e0
    SyS_madvise+0x73c/0x780
    system_call_fastpath+0x12/0x17
  Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
  RIP  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
   RSP <ffff88007a117d28>
  ---[ end trace 53926436e76d1f35 ]---

When soft offline successfully migrates page, the source page is supposed
to be freed.  But there is a race condition where a source page looks
isolated (i.e.  the refcount is 0 and the PageHWPoison is set) but
somewhat linked to pcplist.  Then another soft offline event calls
drain_all_pages() and tries to free such hwpoisoned page, which is
forbidden.

This odd page state seems to happen due to the race between put_page() in
putback_lru_page() and __pagevec_lru_add_fn().  But I don't want to play
with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison:
drop lru_add_drain_all() in __soft_offline_page()", or to change page
freeing code for this soft offline's purpose.

Instead, let's think about the difference between hard offline and soft
offline.  There is an interesting difference in how to isolate the in-use
page between these, that is, hard offline marks PageHWPoison of the target
page at first, and doesn't free it by keeping its refcount 1.  OTOH, soft
offline tries to free the target page then marks PageHWPoison.  This
difference might be the source of complexity and result in bugs like the
above.  So making soft offline isolate with keeping refcount can be a
solution for this problem.

We can pass to page migration code the "reason" which shows the caller, so
let's use this more to avoid calling putback_lru_page() when called from
soft offline, which effectively does the isolation for soft offline.  With
this change, target pages of soft offline never be reused without changing
migratetype, so this patch also removes the related code.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Oct 18, 2015
WARNING: line over 80 characters
torvalds#110: FILE: drivers/block/drbd/drbd_bitmap.c:1010:
+		page = mempool_alloc(drbd_md_io_page_pool, __GFP_HIGHMEM|__GFP_RECLAIM);

WARNING: line over 80 characters
torvalds#139: FILE: drivers/block/nvme-core.c:1039:
+		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, __GFP_RECLAIM);

WARNING: line over 80 characters
torvalds#466: FILE: include/linux/gfp.h:110:
+#define __GFP_RECLAIM ((__force gfp_t)(___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM))

ERROR: code indent should use tabs where possible
torvalds#547: FILE: kernel/power/swap.c:978:
+^I^I           __get_free_page(__GFP_RECLAIM | __GFP_HIGH);$

ERROR: code indent should use tabs where possible
torvalds#557: FILE: kernel/power/swap.c:1245:
+^I^I                                  __GFP_RECLAIM | __GFP_HIGH :$

ERROR: code indent should use tabs where possible
torvalds#558: FILE: kernel/power/swap.c:1246:
+^I^I                                  __GFP_RECLAIM | __GFP_NOWARN |$

WARNING: line over 80 characters
torvalds#570: FILE: lib/percpu_ida.c:138:
+ * used for internal memory allocations); thus if passed __GFP_RECLAIM we may sleep

ERROR: code indent should use tabs where possible
torvalds#596: FILE: mm/failslab.c:19:
+        if (failslab.ignore_gfp_reclaim && (gfpflags & __GFP_RECLAIM))$

WARNING: please, no spaces at the start of a line
torvalds#596: FILE: mm/failslab.c:19:
+        if (failslab.ignore_gfp_reclaim && (gfpflags & __GFP_RECLAIM))$

WARNING: line over 80 characters
torvalds#617: FILE: mm/filemap.c:2717:
+ * this page (__GFP_IO), and whether the call may block (__GFP_RECLAIM & __GFP_FS).

total: 4 errors, 6 warnings, 463 lines checked

NOTE: Whitespace errors detected.
      You may wish to use scripts/cleanpatch or scripts/cleanfile

./patches/mm-page_alloc-rename-__gfp_wait-to-__gfp_reclaim.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Oct 21, 2015
WARNING: line over 80 characters
torvalds#110: FILE: drivers/block/drbd/drbd_bitmap.c:1010:
+		page = mempool_alloc(drbd_md_io_page_pool, __GFP_HIGHMEM|__GFP_RECLAIM);

WARNING: line over 80 characters
torvalds#139: FILE: drivers/block/nvme-core.c:1039:
+		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, __GFP_RECLAIM);

WARNING: line over 80 characters
torvalds#466: FILE: include/linux/gfp.h:110:
+#define __GFP_RECLAIM ((__force gfp_t)(___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM))

ERROR: code indent should use tabs where possible
torvalds#547: FILE: kernel/power/swap.c:978:
+^I^I           __get_free_page(__GFP_RECLAIM | __GFP_HIGH);$

ERROR: code indent should use tabs where possible
torvalds#557: FILE: kernel/power/swap.c:1245:
+^I^I                                  __GFP_RECLAIM | __GFP_HIGH :$

ERROR: code indent should use tabs where possible
torvalds#558: FILE: kernel/power/swap.c:1246:
+^I^I                                  __GFP_RECLAIM | __GFP_NOWARN |$

WARNING: line over 80 characters
torvalds#570: FILE: lib/percpu_ida.c:138:
+ * used for internal memory allocations); thus if passed __GFP_RECLAIM we may sleep

ERROR: code indent should use tabs where possible
torvalds#596: FILE: mm/failslab.c:19:
+        if (failslab.ignore_gfp_reclaim && (gfpflags & __GFP_RECLAIM))$

WARNING: please, no spaces at the start of a line
torvalds#596: FILE: mm/failslab.c:19:
+        if (failslab.ignore_gfp_reclaim && (gfpflags & __GFP_RECLAIM))$

WARNING: line over 80 characters
torvalds#617: FILE: mm/filemap.c:2717:
+ * this page (__GFP_IO), and whether the call may block (__GFP_RECLAIM & __GFP_FS).

total: 4 errors, 6 warnings, 463 lines checked

NOTE: Whitespace errors detected.
      You may wish to use scripts/cleanpatch or scripts/cleanfile

./patches/mm-page_alloc-rename-__gfp_wait-to-__gfp_reclaim.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request Oct 22, 2015
WARNING: line over 80 characters
torvalds#110: FILE: drivers/block/drbd/drbd_bitmap.c:1010:
+		page = mempool_alloc(drbd_md_io_page_pool, __GFP_HIGHMEM|__GFP_RECLAIM);

WARNING: line over 80 characters
torvalds#139: FILE: drivers/block/nvme-core.c:1039:
+		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, __GFP_RECLAIM);

WARNING: line over 80 characters
torvalds#466: FILE: include/linux/gfp.h:110:
+#define __GFP_RECLAIM ((__force gfp_t)(___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM))

ERROR: code indent should use tabs where possible
torvalds#547: FILE: kernel/power/swap.c:978:
+^I^I           __get_free_page(__GFP_RECLAIM | __GFP_HIGH);$

ERROR: code indent should use tabs where possible
torvalds#557: FILE: kernel/power/swap.c:1245:
+^I^I                                  __GFP_RECLAIM | __GFP_HIGH :$

ERROR: code indent should use tabs where possible
torvalds#558: FILE: kernel/power/swap.c:1246:
+^I^I                                  __GFP_RECLAIM | __GFP_NOWARN |$

WARNING: line over 80 characters
torvalds#570: FILE: lib/percpu_ida.c:138:
+ * used for internal memory allocations); thus if passed __GFP_RECLAIM we may sleep

ERROR: code indent should use tabs where possible
torvalds#596: FILE: mm/failslab.c:19:
+        if (failslab.ignore_gfp_reclaim && (gfpflags & __GFP_RECLAIM))$

WARNING: please, no spaces at the start of a line
torvalds#596: FILE: mm/failslab.c:19:
+        if (failslab.ignore_gfp_reclaim && (gfpflags & __GFP_RECLAIM))$

WARNING: line over 80 characters
torvalds#617: FILE: mm/filemap.c:2717:
+ * this page (__GFP_IO), and whether the call may block (__GFP_RECLAIM & __GFP_FS).

total: 4 errors, 6 warnings, 463 lines checked

NOTE: Whitespace errors detected.
      You may wish to use scripts/cleanpatch or scripts/cleanfile

./patches/mm-page_alloc-rename-__gfp_wait-to-__gfp_reclaim.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
nhoriguchi pushed a commit to nhoriguchi/linux that referenced this pull request Oct 30, 2015
WARNING: line over 80 characters
torvalds#110: FILE: drivers/block/drbd/drbd_bitmap.c:1010:
+		page = mempool_alloc(drbd_md_io_page_pool, __GFP_HIGHMEM|__GFP_RECLAIM);

WARNING: line over 80 characters
torvalds#139: FILE: drivers/block/nvme-core.c:1039:
+		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, __GFP_RECLAIM);

WARNING: line over 80 characters
torvalds#466: FILE: include/linux/gfp.h:110:
+#define __GFP_RECLAIM ((__force gfp_t)(___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM))

ERROR: code indent should use tabs where possible
torvalds#547: FILE: kernel/power/swap.c:978:
+^I^I           __get_free_page(__GFP_RECLAIM | __GFP_HIGH);$

ERROR: code indent should use tabs where possible
torvalds#557: FILE: kernel/power/swap.c:1245:
+^I^I                                  __GFP_RECLAIM | __GFP_HIGH :$

ERROR: code indent should use tabs where possible
torvalds#558: FILE: kernel/power/swap.c:1246:
+^I^I                                  __GFP_RECLAIM | __GFP_NOWARN |$

WARNING: line over 80 characters
torvalds#570: FILE: lib/percpu_ida.c:138:
+ * used for internal memory allocations); thus if passed __GFP_RECLAIM we may sleep

ERROR: code indent should use tabs where possible
torvalds#596: FILE: mm/failslab.c:19:
+        if (failslab.ignore_gfp_reclaim && (gfpflags & __GFP_RECLAIM))$

WARNING: please, no spaces at the start of a line
torvalds#596: FILE: mm/failslab.c:19:
+        if (failslab.ignore_gfp_reclaim && (gfpflags & __GFP_RECLAIM))$

WARNING: line over 80 characters
torvalds#617: FILE: mm/filemap.c:2717:
+ * this page (__GFP_IO), and whether the call may block (__GFP_RECLAIM & __GFP_FS).

total: 4 errors, 6 warnings, 463 lines checked

NOTE: Whitespace errors detected.
      You may wish to use scripts/cleanpatch or scripts/cleanfile

./patches/mm-page_alloc-rename-__gfp_wait-to-__gfp_reclaim.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Noltari pushed a commit to Noltari/linux that referenced this pull request May 23, 2016
[ Upstream commit add05ce ]

Stress testing showed that soft offline events for a process iterating
"mmap-pagefault-munmap" loop can trigger
VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page():

  Soft offlining page 0x70fe1 at 0x70100008d000
  Soft offlining page 0x705fb at 0x70300008d000
  page:ffffea0001c3f840 count:0 mapcount:0 mapping:          (null) index:0x2
  flags: 0x1fffff80800000(hwpoison)
  page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
  ------------[ cut here ]------------
  kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
  invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
  Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
  CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139
  RIP: free_pcppages_bulk+0x52a/0x6f0
  Call Trace:
    drain_pages_zone+0x3d/0x50
    drain_local_pages+0x1d/0x30
    on_each_cpu_mask+0x46/0x80
    drain_all_pages+0x14b/0x1e0
    soft_offline_page+0x432/0x6e0
    SyS_madvise+0x73c/0x780
    system_call_fastpath+0x12/0x17
  Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
  RIP  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
   RSP <ffff88007a117d28>
  ---[ end trace 53926436e76d1f35 ]---

When soft offline successfully migrates page, the source page is supposed
to be freed.  But there is a race condition where a source page looks
isolated (i.e.  the refcount is 0 and the PageHWPoison is set) but
somewhat linked to pcplist.  Then another soft offline event calls
drain_all_pages() and tries to free such hwpoisoned page, which is
forbidden.

This odd page state seems to happen due to the race between put_page() in
putback_lru_page() and __pagevec_lru_add_fn().  But I don't want to play
with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison:
drop lru_add_drain_all() in __soft_offline_page()", or to change page
freeing code for this soft offline's purpose.

Instead, let's think about the difference between hard offline and soft
offline.  There is an interesting difference in how to isolate the in-use
page between these, that is, hard offline marks PageHWPoison of the target
page at first, and doesn't free it by keeping its refcount 1.  OTOH, soft
offline tries to free the target page then marks PageHWPoison.  This
difference might be the source of complexity and result in bugs like the
above.  So making soft offline isolate with keeping refcount can be a
solution for this problem.

We can pass to page migration code the "reason" which shows the caller, so
let's use this more to avoid calling putback_lru_page() when called from
soft offline, which effectively does the isolation for soft offline.  With
this change, target pages of soft offline never be reused without changing
migratetype, so this patch also removes the related code.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
JoonsooKim pushed a commit to JoonsooKim/linux that referenced this pull request Mar 27, 2017
Stress testing showed that soft offline events for a process iterating
"mmap-pagefault-munmap" loop can trigger
VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page():

  [   14.025761] Soft offlining page 0x70fe1 at 0x70100008d000
  [   14.029400] Soft offlining page 0x705fb at 0x70300008d000
  [   14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping:          (null) index:0x2
  [   14.031186] flags: 0x1fffff80800000(hwpoison)
  [   14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1))
  [   14.031186] ------------[ cut here ]------------
  [   14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585!
  [   14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
  [   14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy
  [   14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139
  [   14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  [   14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000
  [   14.031186] RIP: 0010:[<ffffffff811a806a>]  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186] RSP: 0018:ffff88007a117d28  EFLAGS: 00010096
  [   14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006
  [   14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0
  [   14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe
  [   14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1
  [   14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840
  [   14.031186] FS:  00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000
  [   14.031186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  [   14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0
  [   14.031186] Stack:
  [   14.031186]  ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998
  [   14.031186]  ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003
  [   14.031186]  ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998
  [   14.031186] Call Trace:
  [   14.031186]  [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50
  [   14.031186]  [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50
  [   14.031186]  [<ffffffff811a839d>] drain_local_pages+0x1d/0x30
  [   14.031186]  [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80
  [   14.031186]  [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0
  [   14.031186]  [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0
  [   14.031186]  [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780
  [   14.031186]  [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50
  [   14.031186]  [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120
  [   14.031186]  [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70
  [   14.031186]  [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170
  [   14.031186]  [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d
  [   14.031186]  [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17
  [   14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff
  [   14.031186] RIP  [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0
  [   14.031186]  RSP <ffff88007a117d28>
  [   14.031186] ---[ end trace 53926436e76d1f35 ]---

When soft offline successfully migrates page, the source page is supposed
to be freed.  But there is a race condition where a source page looks
isolated (i.e.  the refcount is 0 and the PageHWPoison is set) but
somewhat linked to pcplist.  Then another soft offline event calls
drain_all_pages() and tries to free such hwpoisoned page, which is
forbidden.

This odd page state seems to happen due to the race between put_page() in
putback_lru_page() and __pagevec_lru_add_fn().  But I don't want to play
with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison:
drop lru_add_drain_all() in __soft_offline_page()", or to change page
freeing code for this soft offline's purpose.

Instead, let's think about the difference between hard offline and soft
offline.  There is an interesting difference in how to isolate the in-use
page between these, that is, hard offline marks PageHWPoison of the target
page at first, and doesn't free it by keeping its refcount 1.  OTOH, soft
offline tries to free the target page then marks PageHWPoison.  This
difference might be the source of complexity and result in bugs like the
above.  So making soft offline isolate with keeping refcount can be a
solution for this problem.

We can pass to page migration code the "reason" which shows the caller, so
let's use this more to avoid calling putback_lru_page() when called from
soft offline, which effectively does the isolation for soft offline.  With
this change, target pages of soft offline never be reused without changing
migratetype, so this patch also removes the related code.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Jun 11, 2018
While hacking on kTLS, I ran into the following panic from an
unprivileged netserver / netperf TCP session:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0
  Oops: 0010 [#1] SMP KASAN PTI
  CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ torvalds#139
  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
  RIP: 0010:          (null)
  Code: Bad RIP value.
  RSP: 0018:ffff88036abcf740 EFLAGS: 00010246
  RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26
  RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280
  RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48
  R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280
  R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350
  FS:  00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ? tls_sw_poll+0xa4/0x160 [tls]
   ? sock_poll+0x20a/0x680
   ? do_select+0x77b/0x11a0
   ? poll_schedule_timeout.constprop.12+0x130/0x130
   ? pick_link+0xb00/0xb00
   ? read_word_at_a_time+0x13/0x20
   ? vfs_poll+0x270/0x270
   ? deref_stack_reg+0xad/0xe0
   ? __read_once_size_nocheck.constprop.6+0x10/0x10
  [...]

Debugging further, it turns out that calling into ctx->sk_poll() is
invalid since sk_poll itself is NULL which was saved from the original
TCP socket in order for tls_sw_poll() to invoke it.

Looks like the recent conversion from poll to poll_mask callback started
in 1525242 ("net: add support for ->poll_mask in proto_ops") missed
to eventually convert kTLS, too: TCP's ->poll was converted over to the
->poll_mask in commit 2c7d3da ("net/tcp: convert to ->poll_mask")
and therefore kTLS wrongly saved the ->poll old one which is now NULL.

Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN |
POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in
tcp_poll_mask() as well that is mangled here.

Fixes: 2c7d3da ("net/tcp: convert to ->poll_mask")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Watson <davejwatson@fb.com>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Jun 12, 2018
While hacking on kTLS, I ran into the following panic from an
unprivileged netserver / netperf TCP session:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0
  Oops: 0010 [#1] SMP KASAN PTI
  CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ torvalds#139
  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
  RIP: 0010:          (null)
  Code: Bad RIP value.
  RSP: 0018:ffff88036abcf740 EFLAGS: 00010246
  RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26
  RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280
  RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48
  R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280
  R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350
  FS:  00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ? tls_sw_poll+0xa4/0x160 [tls]
   ? sock_poll+0x20a/0x680
   ? do_select+0x77b/0x11a0
   ? poll_schedule_timeout.constprop.12+0x130/0x130
   ? pick_link+0xb00/0xb00
   ? read_word_at_a_time+0x13/0x20
   ? vfs_poll+0x270/0x270
   ? deref_stack_reg+0xad/0xe0
   ? __read_once_size_nocheck.constprop.6+0x10/0x10
  [...]

Debugging further, it turns out that calling into ctx->sk_poll() is
invalid since sk_poll itself is NULL which was saved from the original
TCP socket in order for tls_sw_poll() to invoke it.

Looks like the recent conversion from poll to poll_mask callback started
in 1525242 ("net: add support for ->poll_mask in proto_ops") missed
to eventually convert kTLS, too: TCP's ->poll was converted over to the
->poll_mask in commit 2c7d3da ("net/tcp: convert to ->poll_mask")
and therefore kTLS wrongly saved the ->poll old one which is now NULL.

Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN |
POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in
tcp_poll_mask() as well that is mangled here.

Fixes: 2c7d3da ("net/tcp: convert to ->poll_mask")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Watson <davejwatson@fb.com>
Tested-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Jul 4, 2018
WARNING: line over 80 characters
torvalds#34: FILE: fs/ocfs2/alloc.c:1484:
+			status = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#35: FILE: fs/ocfs2/alloc.c:1485:
+^I^I^I^I             "Owner %llu has empty extent list (next_free_rec == 0)\n",$

WARNING: line over 80 characters
torvalds#36: FILE: fs/ocfs2/alloc.c:1486:
+				             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci));

ERROR: code indent should use tabs where possible
torvalds#36: FILE: fs/ocfs2/alloc.c:1486:
+^I^I^I^I             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci));$

WARNING: line over 80 characters
torvalds#46: FILE: fs/ocfs2/alloc.c:1492:
+			status = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#47: FILE: fs/ocfs2/alloc.c:1493:
+^I^I^I^I             "Owner %llu has extent list where extent # %d has no physical block start\n",$

WARNING: line over 80 characters
torvalds#48: FILE: fs/ocfs2/alloc.c:1494:
+				             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci), i);

ERROR: code indent should use tabs where possible
torvalds#48: FILE: fs/ocfs2/alloc.c:1494:
+^I^I^I^I             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci), i);$

WARNING: line over 80 characters
torvalds#61: FILE: fs/ocfs2/alloc.c:3215:
+			ret = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#62: FILE: fs/ocfs2/alloc.c:3216:
+^I^I^I^I          "Owner %llu has empty extent block at %llu\n",$

WARNING: line over 80 characters
torvalds#63: FILE: fs/ocfs2/alloc.c:3217:
+				          (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#63: FILE: fs/ocfs2/alloc.c:3217:
+^I^I^I^I          (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci),$

WARNING: line over 80 characters
torvalds#64: FILE: fs/ocfs2/alloc.c:3218:
+				          (unsigned long long)le64_to_cpu(eb->h_blkno));

ERROR: code indent should use tabs where possible
torvalds#64: FILE: fs/ocfs2/alloc.c:3218:
+^I^I^I^I          (unsigned long long)le64_to_cpu(eb->h_blkno));$

ERROR: code indent should use tabs where possible
torvalds#79: FILE: fs/ocfs2/alloc.c:4412:
+^I^I^I^I^I             "Extent block #%llu has an invalid l_next_free_rec of %d.  It should have matched the l_count of %d\n",$

WARNING: line over 80 characters
torvalds#80: FILE: fs/ocfs2/alloc.c:4413:
+					             (unsigned long long)le64_to_cpu(eb->h_blkno),

ERROR: code indent should use tabs where possible
torvalds#80: FILE: fs/ocfs2/alloc.c:4413:
+^I^I^I^I^I             (unsigned long long)le64_to_cpu(eb->h_blkno),$

WARNING: line over 80 characters
torvalds#81: FILE: fs/ocfs2/alloc.c:4414:
+					             le16_to_cpu(new_el->l_next_free_rec),

ERROR: code indent should use tabs where possible
torvalds#81: FILE: fs/ocfs2/alloc.c:4414:
+^I^I^I^I^I             le16_to_cpu(new_el->l_next_free_rec),$

WARNING: line over 80 characters
torvalds#82: FILE: fs/ocfs2/alloc.c:4415:
+					             le16_to_cpu(new_el->l_count));

ERROR: code indent should use tabs where possible
torvalds#82: FILE: fs/ocfs2/alloc.c:4415:
+^I^I^I^I^I             le16_to_cpu(new_el->l_count));$

ERROR: code indent should use tabs where possible
torvalds#96: FILE: fs/ocfs2/alloc.c:4466:
+^I^I^I^I^I             "Extent block #%llu has an invalid l_next_free_rec of %d\n",$

WARNING: line over 80 characters
torvalds#97: FILE: fs/ocfs2/alloc.c:4467:
+					             (unsigned long long)le64_to_cpu(eb->h_blkno),

ERROR: code indent should use tabs where possible
torvalds#97: FILE: fs/ocfs2/alloc.c:4467:
+^I^I^I^I^I             (unsigned long long)le64_to_cpu(eb->h_blkno),$

WARNING: line over 80 characters
torvalds#98: FILE: fs/ocfs2/alloc.c:4468:
+					             le16_to_cpu(new_el->l_next_free_rec));

ERROR: code indent should use tabs where possible
torvalds#98: FILE: fs/ocfs2/alloc.c:4468:
+^I^I^I^I^I             le16_to_cpu(new_el->l_next_free_rec));$

WARNING: line over 80 characters
torvalds#114: FILE: fs/ocfs2/localalloc.c:666:
+		status = ocfs2_error(osb->sb, "local alloc inode %llu says it has %u used bits, but a count shows %u\n",

WARNING: line over 80 characters
torvalds#115: FILE: fs/ocfs2/localalloc.c:667:
+			             (unsigned long long)le64_to_cpu(alloc->i_blkno),

ERROR: code indent should use tabs where possible
torvalds#115: FILE: fs/ocfs2/localalloc.c:667:
+^I^I^I             (unsigned long long)le64_to_cpu(alloc->i_blkno),$

ERROR: code indent should use tabs where possible
torvalds#116: FILE: fs/ocfs2/localalloc.c:668:
+^I^I^I             le32_to_cpu(alloc->id1.bitmap1.i_used),$

ERROR: code indent should use tabs where possible
torvalds#117: FILE: fs/ocfs2/localalloc.c:669:
+^I^I^I             ocfs2_local_alloc_count_bits(alloc));$

ERROR: code indent should use tabs where possible
torvalds#138: FILE: fs/ocfs2/quota_local.c:142:
+^I^I^I           "Quota file %llu is probably corrupted! Requested to read block %Lu but file has size only %Lu\n",$

WARNING: %Lu is non-standard C, use %llu
torvalds#138: FILE: fs/ocfs2/quota_local.c:142:
+			           "Quota file %llu is probably corrupted! Requested to read block %Lu but file has size only %Lu\n",

ERROR: code indent should use tabs where possible
torvalds#139: FILE: fs/ocfs2/quota_local.c:143:
+^I^I^I           (unsigned long long)OCFS2_I(inode)->ip_blkno,$

ERROR: code indent should use tabs where possible
torvalds#140: FILE: fs/ocfs2/quota_local.c:144:
+^I^I^I           (unsigned long long)v_block,$

ERROR: code indent should use tabs where possible
torvalds#141: FILE: fs/ocfs2/quota_local.c:145:
+^I^I^I           (unsigned long long)i_size_read(inode));$

total: 21 errors, 15 warnings, 108 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

NOTE: Whitespace errors detected.
      You may wish to use scripts/cleanpatch or scripts/cleanfile

./patches/ocfs2-return-erofs-when-filesystem-becomes-read-only.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Jul 11, 2018
WARNING: line over 80 characters
torvalds#34: FILE: fs/ocfs2/alloc.c:1484:
+			status = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#35: FILE: fs/ocfs2/alloc.c:1485:
+^I^I^I^I             "Owner %llu has empty extent list (next_free_rec == 0)\n",$

WARNING: line over 80 characters
torvalds#36: FILE: fs/ocfs2/alloc.c:1486:
+				             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci));

ERROR: code indent should use tabs where possible
torvalds#36: FILE: fs/ocfs2/alloc.c:1486:
+^I^I^I^I             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci));$

WARNING: line over 80 characters
torvalds#46: FILE: fs/ocfs2/alloc.c:1492:
+			status = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#47: FILE: fs/ocfs2/alloc.c:1493:
+^I^I^I^I             "Owner %llu has extent list where extent # %d has no physical block start\n",$

WARNING: line over 80 characters
torvalds#48: FILE: fs/ocfs2/alloc.c:1494:
+				             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci), i);

ERROR: code indent should use tabs where possible
torvalds#48: FILE: fs/ocfs2/alloc.c:1494:
+^I^I^I^I             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci), i);$

WARNING: line over 80 characters
torvalds#61: FILE: fs/ocfs2/alloc.c:3215:
+			ret = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#62: FILE: fs/ocfs2/alloc.c:3216:
+^I^I^I^I          "Owner %llu has empty extent block at %llu\n",$

WARNING: line over 80 characters
torvalds#63: FILE: fs/ocfs2/alloc.c:3217:
+				          (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#63: FILE: fs/ocfs2/alloc.c:3217:
+^I^I^I^I          (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci),$

WARNING: line over 80 characters
torvalds#64: FILE: fs/ocfs2/alloc.c:3218:
+				          (unsigned long long)le64_to_cpu(eb->h_blkno));

ERROR: code indent should use tabs where possible
torvalds#64: FILE: fs/ocfs2/alloc.c:3218:
+^I^I^I^I          (unsigned long long)le64_to_cpu(eb->h_blkno));$

ERROR: code indent should use tabs where possible
torvalds#79: FILE: fs/ocfs2/alloc.c:4412:
+^I^I^I^I^I             "Extent block #%llu has an invalid l_next_free_rec of %d.  It should have matched the l_count of %d\n",$

WARNING: line over 80 characters
torvalds#80: FILE: fs/ocfs2/alloc.c:4413:
+					             (unsigned long long)le64_to_cpu(eb->h_blkno),

ERROR: code indent should use tabs where possible
torvalds#80: FILE: fs/ocfs2/alloc.c:4413:
+^I^I^I^I^I             (unsigned long long)le64_to_cpu(eb->h_blkno),$

WARNING: line over 80 characters
torvalds#81: FILE: fs/ocfs2/alloc.c:4414:
+					             le16_to_cpu(new_el->l_next_free_rec),

ERROR: code indent should use tabs where possible
torvalds#81: FILE: fs/ocfs2/alloc.c:4414:
+^I^I^I^I^I             le16_to_cpu(new_el->l_next_free_rec),$

WARNING: line over 80 characters
torvalds#82: FILE: fs/ocfs2/alloc.c:4415:
+					             le16_to_cpu(new_el->l_count));

ERROR: code indent should use tabs where possible
torvalds#82: FILE: fs/ocfs2/alloc.c:4415:
+^I^I^I^I^I             le16_to_cpu(new_el->l_count));$

ERROR: code indent should use tabs where possible
torvalds#96: FILE: fs/ocfs2/alloc.c:4466:
+^I^I^I^I^I             "Extent block #%llu has an invalid l_next_free_rec of %d\n",$

WARNING: line over 80 characters
torvalds#97: FILE: fs/ocfs2/alloc.c:4467:
+					             (unsigned long long)le64_to_cpu(eb->h_blkno),

ERROR: code indent should use tabs where possible
torvalds#97: FILE: fs/ocfs2/alloc.c:4467:
+^I^I^I^I^I             (unsigned long long)le64_to_cpu(eb->h_blkno),$

WARNING: line over 80 characters
torvalds#98: FILE: fs/ocfs2/alloc.c:4468:
+					             le16_to_cpu(new_el->l_next_free_rec));

ERROR: code indent should use tabs where possible
torvalds#98: FILE: fs/ocfs2/alloc.c:4468:
+^I^I^I^I^I             le16_to_cpu(new_el->l_next_free_rec));$

WARNING: line over 80 characters
torvalds#114: FILE: fs/ocfs2/localalloc.c:666:
+		status = ocfs2_error(osb->sb, "local alloc inode %llu says it has %u used bits, but a count shows %u\n",

WARNING: line over 80 characters
torvalds#115: FILE: fs/ocfs2/localalloc.c:667:
+			             (unsigned long long)le64_to_cpu(alloc->i_blkno),

ERROR: code indent should use tabs where possible
torvalds#115: FILE: fs/ocfs2/localalloc.c:667:
+^I^I^I             (unsigned long long)le64_to_cpu(alloc->i_blkno),$

ERROR: code indent should use tabs where possible
torvalds#116: FILE: fs/ocfs2/localalloc.c:668:
+^I^I^I             le32_to_cpu(alloc->id1.bitmap1.i_used),$

ERROR: code indent should use tabs where possible
torvalds#117: FILE: fs/ocfs2/localalloc.c:669:
+^I^I^I             ocfs2_local_alloc_count_bits(alloc));$

ERROR: code indent should use tabs where possible
torvalds#138: FILE: fs/ocfs2/quota_local.c:142:
+^I^I^I           "Quota file %llu is probably corrupted! Requested to read block %Lu but file has size only %Lu\n",$

WARNING: %Lu is non-standard C, use %llu
torvalds#138: FILE: fs/ocfs2/quota_local.c:142:
+			           "Quota file %llu is probably corrupted! Requested to read block %Lu but file has size only %Lu\n",

ERROR: code indent should use tabs where possible
torvalds#139: FILE: fs/ocfs2/quota_local.c:143:
+^I^I^I           (unsigned long long)OCFS2_I(inode)->ip_blkno,$

ERROR: code indent should use tabs where possible
torvalds#140: FILE: fs/ocfs2/quota_local.c:144:
+^I^I^I           (unsigned long long)v_block,$

ERROR: code indent should use tabs where possible
torvalds#141: FILE: fs/ocfs2/quota_local.c:145:
+^I^I^I           (unsigned long long)i_size_read(inode));$

total: 21 errors, 15 warnings, 108 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

NOTE: Whitespace errors detected.
      You may wish to use scripts/cleanpatch or scripts/cleanfile

./patches/ocfs2-return-erofs-when-filesystem-becomes-read-only.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Jul 16, 2018
WARNING: line over 80 characters
torvalds#34: FILE: fs/ocfs2/alloc.c:1484:
+			status = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#35: FILE: fs/ocfs2/alloc.c:1485:
+^I^I^I^I             "Owner %llu has empty extent list (next_free_rec == 0)\n",$

WARNING: line over 80 characters
torvalds#36: FILE: fs/ocfs2/alloc.c:1486:
+				             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci));

ERROR: code indent should use tabs where possible
torvalds#36: FILE: fs/ocfs2/alloc.c:1486:
+^I^I^I^I             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci));$

WARNING: line over 80 characters
torvalds#46: FILE: fs/ocfs2/alloc.c:1492:
+			status = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#47: FILE: fs/ocfs2/alloc.c:1493:
+^I^I^I^I             "Owner %llu has extent list where extent # %d has no physical block start\n",$

WARNING: line over 80 characters
torvalds#48: FILE: fs/ocfs2/alloc.c:1494:
+				             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci), i);

ERROR: code indent should use tabs where possible
torvalds#48: FILE: fs/ocfs2/alloc.c:1494:
+^I^I^I^I             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci), i);$

WARNING: line over 80 characters
torvalds#61: FILE: fs/ocfs2/alloc.c:3215:
+			ret = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#62: FILE: fs/ocfs2/alloc.c:3216:
+^I^I^I^I          "Owner %llu has empty extent block at %llu\n",$

WARNING: line over 80 characters
torvalds#63: FILE: fs/ocfs2/alloc.c:3217:
+				          (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#63: FILE: fs/ocfs2/alloc.c:3217:
+^I^I^I^I          (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci),$

WARNING: line over 80 characters
torvalds#64: FILE: fs/ocfs2/alloc.c:3218:
+				          (unsigned long long)le64_to_cpu(eb->h_blkno));

ERROR: code indent should use tabs where possible
torvalds#64: FILE: fs/ocfs2/alloc.c:3218:
+^I^I^I^I          (unsigned long long)le64_to_cpu(eb->h_blkno));$

ERROR: code indent should use tabs where possible
torvalds#79: FILE: fs/ocfs2/alloc.c:4412:
+^I^I^I^I^I             "Extent block #%llu has an invalid l_next_free_rec of %d.  It should have matched the l_count of %d\n",$

WARNING: line over 80 characters
torvalds#80: FILE: fs/ocfs2/alloc.c:4413:
+					             (unsigned long long)le64_to_cpu(eb->h_blkno),

ERROR: code indent should use tabs where possible
torvalds#80: FILE: fs/ocfs2/alloc.c:4413:
+^I^I^I^I^I             (unsigned long long)le64_to_cpu(eb->h_blkno),$

WARNING: line over 80 characters
torvalds#81: FILE: fs/ocfs2/alloc.c:4414:
+					             le16_to_cpu(new_el->l_next_free_rec),

ERROR: code indent should use tabs where possible
torvalds#81: FILE: fs/ocfs2/alloc.c:4414:
+^I^I^I^I^I             le16_to_cpu(new_el->l_next_free_rec),$

WARNING: line over 80 characters
torvalds#82: FILE: fs/ocfs2/alloc.c:4415:
+					             le16_to_cpu(new_el->l_count));

ERROR: code indent should use tabs where possible
torvalds#82: FILE: fs/ocfs2/alloc.c:4415:
+^I^I^I^I^I             le16_to_cpu(new_el->l_count));$

ERROR: code indent should use tabs where possible
torvalds#96: FILE: fs/ocfs2/alloc.c:4466:
+^I^I^I^I^I             "Extent block #%llu has an invalid l_next_free_rec of %d\n",$

WARNING: line over 80 characters
torvalds#97: FILE: fs/ocfs2/alloc.c:4467:
+					             (unsigned long long)le64_to_cpu(eb->h_blkno),

ERROR: code indent should use tabs where possible
torvalds#97: FILE: fs/ocfs2/alloc.c:4467:
+^I^I^I^I^I             (unsigned long long)le64_to_cpu(eb->h_blkno),$

WARNING: line over 80 characters
torvalds#98: FILE: fs/ocfs2/alloc.c:4468:
+					             le16_to_cpu(new_el->l_next_free_rec));

ERROR: code indent should use tabs where possible
torvalds#98: FILE: fs/ocfs2/alloc.c:4468:
+^I^I^I^I^I             le16_to_cpu(new_el->l_next_free_rec));$

WARNING: line over 80 characters
torvalds#114: FILE: fs/ocfs2/localalloc.c:666:
+		status = ocfs2_error(osb->sb, "local alloc inode %llu says it has %u used bits, but a count shows %u\n",

WARNING: line over 80 characters
torvalds#115: FILE: fs/ocfs2/localalloc.c:667:
+			             (unsigned long long)le64_to_cpu(alloc->i_blkno),

ERROR: code indent should use tabs where possible
torvalds#115: FILE: fs/ocfs2/localalloc.c:667:
+^I^I^I             (unsigned long long)le64_to_cpu(alloc->i_blkno),$

ERROR: code indent should use tabs where possible
torvalds#116: FILE: fs/ocfs2/localalloc.c:668:
+^I^I^I             le32_to_cpu(alloc->id1.bitmap1.i_used),$

ERROR: code indent should use tabs where possible
torvalds#117: FILE: fs/ocfs2/localalloc.c:669:
+^I^I^I             ocfs2_local_alloc_count_bits(alloc));$

ERROR: code indent should use tabs where possible
torvalds#138: FILE: fs/ocfs2/quota_local.c:142:
+^I^I^I           "Quota file %llu is probably corrupted! Requested to read block %Lu but file has size only %Lu\n",$

WARNING: %Lu is non-standard C, use %llu
torvalds#138: FILE: fs/ocfs2/quota_local.c:142:
+			           "Quota file %llu is probably corrupted! Requested to read block %Lu but file has size only %Lu\n",

ERROR: code indent should use tabs where possible
torvalds#139: FILE: fs/ocfs2/quota_local.c:143:
+^I^I^I           (unsigned long long)OCFS2_I(inode)->ip_blkno,$

ERROR: code indent should use tabs where possible
torvalds#140: FILE: fs/ocfs2/quota_local.c:144:
+^I^I^I           (unsigned long long)v_block,$

ERROR: code indent should use tabs where possible
torvalds#141: FILE: fs/ocfs2/quota_local.c:145:
+^I^I^I           (unsigned long long)i_size_read(inode));$

total: 21 errors, 15 warnings, 108 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

NOTE: Whitespace errors detected.
      You may wish to use scripts/cleanpatch or scripts/cleanfile

./patches/ocfs2-return-erofs-when-filesystem-becomes-read-only.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Jul 24, 2018
WARNING: line over 80 characters
torvalds#34: FILE: fs/ocfs2/alloc.c:1484:
+			status = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#35: FILE: fs/ocfs2/alloc.c:1485:
+^I^I^I^I             "Owner %llu has empty extent list (next_free_rec == 0)\n",$

WARNING: line over 80 characters
torvalds#36: FILE: fs/ocfs2/alloc.c:1486:
+				             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci));

ERROR: code indent should use tabs where possible
torvalds#36: FILE: fs/ocfs2/alloc.c:1486:
+^I^I^I^I             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci));$

WARNING: line over 80 characters
torvalds#46: FILE: fs/ocfs2/alloc.c:1492:
+			status = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#47: FILE: fs/ocfs2/alloc.c:1493:
+^I^I^I^I             "Owner %llu has extent list where extent # %d has no physical block start\n",$

WARNING: line over 80 characters
torvalds#48: FILE: fs/ocfs2/alloc.c:1494:
+				             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci), i);

ERROR: code indent should use tabs where possible
torvalds#48: FILE: fs/ocfs2/alloc.c:1494:
+^I^I^I^I             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci), i);$

WARNING: line over 80 characters
torvalds#61: FILE: fs/ocfs2/alloc.c:3215:
+			ret = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#62: FILE: fs/ocfs2/alloc.c:3216:
+^I^I^I^I          "Owner %llu has empty extent block at %llu\n",$

WARNING: line over 80 characters
torvalds#63: FILE: fs/ocfs2/alloc.c:3217:
+				          (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#63: FILE: fs/ocfs2/alloc.c:3217:
+^I^I^I^I          (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci),$

WARNING: line over 80 characters
torvalds#64: FILE: fs/ocfs2/alloc.c:3218:
+				          (unsigned long long)le64_to_cpu(eb->h_blkno));

ERROR: code indent should use tabs where possible
torvalds#64: FILE: fs/ocfs2/alloc.c:3218:
+^I^I^I^I          (unsigned long long)le64_to_cpu(eb->h_blkno));$

ERROR: code indent should use tabs where possible
torvalds#79: FILE: fs/ocfs2/alloc.c:4412:
+^I^I^I^I^I             "Extent block #%llu has an invalid l_next_free_rec of %d.  It should have matched the l_count of %d\n",$

WARNING: line over 80 characters
torvalds#80: FILE: fs/ocfs2/alloc.c:4413:
+					             (unsigned long long)le64_to_cpu(eb->h_blkno),

ERROR: code indent should use tabs where possible
torvalds#80: FILE: fs/ocfs2/alloc.c:4413:
+^I^I^I^I^I             (unsigned long long)le64_to_cpu(eb->h_blkno),$

WARNING: line over 80 characters
torvalds#81: FILE: fs/ocfs2/alloc.c:4414:
+					             le16_to_cpu(new_el->l_next_free_rec),

ERROR: code indent should use tabs where possible
torvalds#81: FILE: fs/ocfs2/alloc.c:4414:
+^I^I^I^I^I             le16_to_cpu(new_el->l_next_free_rec),$

WARNING: line over 80 characters
torvalds#82: FILE: fs/ocfs2/alloc.c:4415:
+					             le16_to_cpu(new_el->l_count));

ERROR: code indent should use tabs where possible
torvalds#82: FILE: fs/ocfs2/alloc.c:4415:
+^I^I^I^I^I             le16_to_cpu(new_el->l_count));$

ERROR: code indent should use tabs where possible
torvalds#96: FILE: fs/ocfs2/alloc.c:4466:
+^I^I^I^I^I             "Extent block #%llu has an invalid l_next_free_rec of %d\n",$

WARNING: line over 80 characters
torvalds#97: FILE: fs/ocfs2/alloc.c:4467:
+					             (unsigned long long)le64_to_cpu(eb->h_blkno),

ERROR: code indent should use tabs where possible
torvalds#97: FILE: fs/ocfs2/alloc.c:4467:
+^I^I^I^I^I             (unsigned long long)le64_to_cpu(eb->h_blkno),$

WARNING: line over 80 characters
torvalds#98: FILE: fs/ocfs2/alloc.c:4468:
+					             le16_to_cpu(new_el->l_next_free_rec));

ERROR: code indent should use tabs where possible
torvalds#98: FILE: fs/ocfs2/alloc.c:4468:
+^I^I^I^I^I             le16_to_cpu(new_el->l_next_free_rec));$

WARNING: line over 80 characters
torvalds#114: FILE: fs/ocfs2/localalloc.c:666:
+		status = ocfs2_error(osb->sb, "local alloc inode %llu says it has %u used bits, but a count shows %u\n",

WARNING: line over 80 characters
torvalds#115: FILE: fs/ocfs2/localalloc.c:667:
+			             (unsigned long long)le64_to_cpu(alloc->i_blkno),

ERROR: code indent should use tabs where possible
torvalds#115: FILE: fs/ocfs2/localalloc.c:667:
+^I^I^I             (unsigned long long)le64_to_cpu(alloc->i_blkno),$

ERROR: code indent should use tabs where possible
torvalds#116: FILE: fs/ocfs2/localalloc.c:668:
+^I^I^I             le32_to_cpu(alloc->id1.bitmap1.i_used),$

ERROR: code indent should use tabs where possible
torvalds#117: FILE: fs/ocfs2/localalloc.c:669:
+^I^I^I             ocfs2_local_alloc_count_bits(alloc));$

ERROR: code indent should use tabs where possible
torvalds#138: FILE: fs/ocfs2/quota_local.c:142:
+^I^I^I           "Quota file %llu is probably corrupted! Requested to read block %Lu but file has size only %Lu\n",$

WARNING: %Lu is non-standard C, use %llu
torvalds#138: FILE: fs/ocfs2/quota_local.c:142:
+			           "Quota file %llu is probably corrupted! Requested to read block %Lu but file has size only %Lu\n",

ERROR: code indent should use tabs where possible
torvalds#139: FILE: fs/ocfs2/quota_local.c:143:
+^I^I^I           (unsigned long long)OCFS2_I(inode)->ip_blkno,$

ERROR: code indent should use tabs where possible
torvalds#140: FILE: fs/ocfs2/quota_local.c:144:
+^I^I^I           (unsigned long long)v_block,$

ERROR: code indent should use tabs where possible
torvalds#141: FILE: fs/ocfs2/quota_local.c:145:
+^I^I^I           (unsigned long long)i_size_read(inode));$

total: 21 errors, 15 warnings, 108 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

NOTE: Whitespace errors detected.
      You may wish to use scripts/cleanpatch or scripts/cleanfile

./patches/ocfs2-return-erofs-when-filesystem-becomes-read-only.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Jul 28, 2018
WARNING: line over 80 characters
torvalds#34: FILE: fs/ocfs2/alloc.c:1484:
+			status = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#35: FILE: fs/ocfs2/alloc.c:1485:
+^I^I^I^I             "Owner %llu has empty extent list (next_free_rec == 0)\n",$

WARNING: line over 80 characters
torvalds#36: FILE: fs/ocfs2/alloc.c:1486:
+				             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci));

ERROR: code indent should use tabs where possible
torvalds#36: FILE: fs/ocfs2/alloc.c:1486:
+^I^I^I^I             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci));$

WARNING: line over 80 characters
torvalds#46: FILE: fs/ocfs2/alloc.c:1492:
+			status = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#47: FILE: fs/ocfs2/alloc.c:1493:
+^I^I^I^I             "Owner %llu has extent list where extent # %d has no physical block start\n",$

WARNING: line over 80 characters
torvalds#48: FILE: fs/ocfs2/alloc.c:1494:
+				             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci), i);

ERROR: code indent should use tabs where possible
torvalds#48: FILE: fs/ocfs2/alloc.c:1494:
+^I^I^I^I             (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci), i);$

WARNING: line over 80 characters
torvalds#61: FILE: fs/ocfs2/alloc.c:3215:
+			ret = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#62: FILE: fs/ocfs2/alloc.c:3216:
+^I^I^I^I          "Owner %llu has empty extent block at %llu\n",$

WARNING: line over 80 characters
torvalds#63: FILE: fs/ocfs2/alloc.c:3217:
+				          (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci),

ERROR: code indent should use tabs where possible
torvalds#63: FILE: fs/ocfs2/alloc.c:3217:
+^I^I^I^I          (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci),$

WARNING: line over 80 characters
torvalds#64: FILE: fs/ocfs2/alloc.c:3218:
+				          (unsigned long long)le64_to_cpu(eb->h_blkno));

ERROR: code indent should use tabs where possible
torvalds#64: FILE: fs/ocfs2/alloc.c:3218:
+^I^I^I^I          (unsigned long long)le64_to_cpu(eb->h_blkno));$

ERROR: code indent should use tabs where possible
torvalds#79: FILE: fs/ocfs2/alloc.c:4412:
+^I^I^I^I^I             "Extent block #%llu has an invalid l_next_free_rec of %d.  It should have matched the l_count of %d\n",$

WARNING: line over 80 characters
torvalds#80: FILE: fs/ocfs2/alloc.c:4413:
+					             (unsigned long long)le64_to_cpu(eb->h_blkno),

ERROR: code indent should use tabs where possible
torvalds#80: FILE: fs/ocfs2/alloc.c:4413:
+^I^I^I^I^I             (unsigned long long)le64_to_cpu(eb->h_blkno),$

WARNING: line over 80 characters
torvalds#81: FILE: fs/ocfs2/alloc.c:4414:
+					             le16_to_cpu(new_el->l_next_free_rec),

ERROR: code indent should use tabs where possible
torvalds#81: FILE: fs/ocfs2/alloc.c:4414:
+^I^I^I^I^I             le16_to_cpu(new_el->l_next_free_rec),$

WARNING: line over 80 characters
torvalds#82: FILE: fs/ocfs2/alloc.c:4415:
+					             le16_to_cpu(new_el->l_count));

ERROR: code indent should use tabs where possible
torvalds#82: FILE: fs/ocfs2/alloc.c:4415:
+^I^I^I^I^I             le16_to_cpu(new_el->l_count));$

ERROR: code indent should use tabs where possible
torvalds#96: FILE: fs/ocfs2/alloc.c:4466:
+^I^I^I^I^I             "Extent block #%llu has an invalid l_next_free_rec of %d\n",$

WARNING: line over 80 characters
torvalds#97: FILE: fs/ocfs2/alloc.c:4467:
+					             (unsigned long long)le64_to_cpu(eb->h_blkno),

ERROR: code indent should use tabs where possible
torvalds#97: FILE: fs/ocfs2/alloc.c:4467:
+^I^I^I^I^I             (unsigned long long)le64_to_cpu(eb->h_blkno),$

WARNING: line over 80 characters
torvalds#98: FILE: fs/ocfs2/alloc.c:4468:
+					             le16_to_cpu(new_el->l_next_free_rec));

ERROR: code indent should use tabs where possible
torvalds#98: FILE: fs/ocfs2/alloc.c:4468:
+^I^I^I^I^I             le16_to_cpu(new_el->l_next_free_rec));$

WARNING: line over 80 characters
torvalds#114: FILE: fs/ocfs2/localalloc.c:666:
+		status = ocfs2_error(osb->sb, "local alloc inode %llu says it has %u used bits, but a count shows %u\n",

WARNING: line over 80 characters
torvalds#115: FILE: fs/ocfs2/localalloc.c:667:
+			             (unsigned long long)le64_to_cpu(alloc->i_blkno),

ERROR: code indent should use tabs where possible
torvalds#115: FILE: fs/ocfs2/localalloc.c:667:
+^I^I^I             (unsigned long long)le64_to_cpu(alloc->i_blkno),$

ERROR: code indent should use tabs where possible
torvalds#116: FILE: fs/ocfs2/localalloc.c:668:
+^I^I^I             le32_to_cpu(alloc->id1.bitmap1.i_used),$

ERROR: code indent should use tabs where possible
torvalds#117: FILE: fs/ocfs2/localalloc.c:669:
+^I^I^I             ocfs2_local_alloc_count_bits(alloc));$

ERROR: code indent should use tabs where possible
torvalds#138: FILE: fs/ocfs2/quota_local.c:142:
+^I^I^I           "Quota file %llu is probably corrupted! Requested to read block %Lu but file has size only %Lu\n",$

WARNING: %Lu is non-standard C, use %llu
torvalds#138: FILE: fs/ocfs2/quota_local.c:142:
+			           "Quota file %llu is probably corrupted! Requested to read block %Lu but file has size only %Lu\n",

ERROR: code indent should use tabs where possible
torvalds#139: FILE: fs/ocfs2/quota_local.c:143:
+^I^I^I           (unsigned long long)OCFS2_I(inode)->ip_blkno,$

ERROR: code indent should use tabs where possible
torvalds#140: FILE: fs/ocfs2/quota_local.c:144:
+^I^I^I           (unsigned long long)v_block,$

ERROR: code indent should use tabs where possible
torvalds#141: FILE: fs/ocfs2/quota_local.c:145:
+^I^I^I           (unsigned long long)i_size_read(inode));$

total: 21 errors, 15 warnings, 108 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

NOTE: Whitespace errors detected.
      You may wish to use scripts/cleanpatch or scripts/cleanfile

./patches/ocfs2-return-erofs-when-filesystem-becomes-read-only.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 26, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 26, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 27, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 27, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 27, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 27, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 27, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 27, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 27, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 27, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 28, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 28, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 28, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 28, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 28, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 28, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 28, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 28, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 29, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 29, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 29, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 29, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 29, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 29, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 29, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing that referenced this pull request Oct 29, 2024
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241023123009.749764-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Nov 6, 2024
[ Upstream commit 90e0569 ]

The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241023123009.749764-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Nov 6, 2024
[ Upstream commit 90e0569 ]

The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241023123009.749764-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Nov 7, 2024
[ Upstream commit 90e0569 ]

The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241023123009.749764-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
hellsgod pushed a commit to hellsgod/linux that referenced this pull request Nov 8, 2024
[ Upstream commit 90e0569 ]

The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb torvalds#139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb torvalds#139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241023123009.749764-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants