Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate Testing 1.8.0 into upstreamed branch #1

Closed
wants to merge 243 commits into from

Conversation

Jimbo77
Copy link
Owner

@Jimbo77 Jimbo77 commented Jan 17, 2021

No description provided.

kerneltoast and others added 30 commits December 13, 2020 06:59
Userspace parses this and sets the ro.boot.verifiedbootstate prop
according to the value that this flag has. When ro.boot.verifiedbootstate
is not 'green', SafetyNet is tripped and fails the CTS test.

Hide verifiedbootstate from /proc/cmdline in order to fix the failed
SafetyNet CTS check.
SafetyNet checks androidboot.veritymode in Nougat, so remove it.

Additionally, remove androidboot.enable_dm_verity and androidboot.secboot
in case SafetyNet will check them in the future.

Signed-off-by: Sultanxda <sultanxda@gmail.com>
Signed-off-by: flar2 <asegaert@gmail.com>
Signed-off-by: flar2 <asegaert@gmail.com>
Signed-off-by: AndroidGX <subut3x@gmail.com>
This driver boosts enumerated devfreq devices upon input, and allows for
boosting specific devfreq devices on other custom events. The boost
frequencies for this driver should be set so that frame drops are
near-zero at the boosted frequencies and power consumption is minimized
at said frequencies. The goal of this driver is to provide an interface
to achieve optimal device performance by requesting boosts on key
events, such as when a frame is ready to rendered to the display.

Currently, support is only present for boosting the cpubw devfreq
device, but the driver is structured in a way that makes it easy to add
support for new boostable devfreq devices in the future.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: msdx321 <msdx321@gmail.com>
This scales up way too easily, it needs to be tuned.
Values range from 0-3, 0 off, 3 most aggressive.
Scales back nicely in 2d to 133mhz.
Only scale up and count if MIN_VALUE crossed
It does not make sense to run kgsl on high and devfreq on regular
priority.

Change-Id: Ie5e6c9353a4e1324a6a49278e5ad3638462f551c

Signed-off-by: flar2 <asegaert@gmail.com>
On devices with a CPU that contains heterogeneous cores (e.g., big.LITTLE),
it can be beneficial to place some performance-critical IRQs and kthreads
onto the performance CPU cluster in order to improve performance.

This commit adds the following APIs:
-kthread_run_perf_critical() to create and start a perf-critical kthread
-IRQF_PERF_CRITICAL to schedule an IRQ and any threads it may have onto
 performance CPUs
-PF_PERF_CRITICAL to mark a process (mainly a kthread) as performance
 critical (this is used by kthread_run_perf_critical())

In order to accommodate this new API, the following changes are made:
-Performance-critical IRQs have their affinities reaffined upon exit from
 suspend (since the affinities are broken when non-boot CPUs are disabled)
-Performance-critical IRQs and their threads have their affinities reset
 upon entering suspend, so that upon immediate suspend exit (when only the
 boot CPU is online), interrupts can be processed and interrupt threads
 can be scheduled onto an online CPU (otherwise we'd hit a kernel BUG)
-do_set_cpus_allowed() and __set_cpus_allowed_ptr() are modified to enforce
 a performance-critical kthread's affinity
-Mark perf-critical IRQs with IRQD_AFFINITY_MANAGED so userspace can't mess
 with their affinity

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Some unimportant kthreads do not need high-performance and can safely be
bound to low-power CPUs for energy efficiency.

This commit adds an API to create kthreads bound to low-power CPUs. This
affinity cannot be altered by the kernel or userspace.

This extends the performance-critical kthread support introduced in
commit c4a0ef7e1d1e ("kernel: Add API to mark IRQs and kthreads as
performance critical").

Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Add cpu_lp_mask and cpu_perf_mask to represent the CPUs that belong to
each cluster in a dual-cluster, heterogeneous system.

Note that this assumes that CONFIG_NR_CPUS is set to exatly the number of
CPUs present, and that the number of CPUs per cluster is split evenly in
half.
Jimbo77 pushed a commit that referenced this pull request Mar 31, 2021
commit 3a5ca857079ea022e0b1b17fc154f7ad7dbc150f upstream.

When a non-initial netns is destroyed, the usual policy is to delete
all virtual network interfaces contained, but move physical interfaces
back to the initial netns. This keeps the physical interface visible
on the system.

CAN devices are somewhat special, as they define rtnl_link_ops even
if they are physical devices. If a CAN interface is moved into a
non-initial netns, destroying that netns lets the interface vanish
instead of moving it back to the initial netns. default_device_exit()
skips CAN interfaces due to having rtnl_link_ops set. Reproducer:

  ip netns add foo
  ip link set can0 netns foo
  ip netns delete foo

WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
Workqueue: netns cleanup_net
[<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14)
[<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8)
[<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114)
[<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac)
[<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60)
[<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380)
[<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438)
[<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8)
[<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c)
[<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c)

To properly restore physical CAN devices to the initial netns on owning
netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
For CAN devices setting this flag, default_device_exit() considers them
non-virtual, applying the usual namespace move.

The issue was introduced in the commit mentioned below, as at that time
CAN devices did not have a dellink() operation.

Fixes: e008b5f ("net: Simplfy default_device_exit and improve batching.")
Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jimbo77 pushed a commit that referenced this pull request Apr 10, 2021
…tint

commit 6069e3e927c8fb3a1947b07d1a561644ea960248 upstream.

We have a cycle of callbacks scheduling works which submit
URBs with thos callbacks. This needs to be blocked, stopped
and unblocked to untangle the circle.

The issue leads to faults like:

[   55.068392] Unable to handle kernel paging request at virtual address 6b6b6c03
[   55.075624] pgd = be866494
[   55.078335] [6b6b6c03] *pgd=00000000
[   55.081924] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[   55.087238] Modules linked in: ppp_async crc_ccitt ppp_generic slhc
xt_TCPMSS xt_tcpmss xt_hl nf_log_ipv6 nf_log_ipv4 nf_log_common
xt_policy xt_limit xt_conntrack xt_tcpudp xt_pkttype ip6table_mangle
iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
iptable_mangle ip6table_filter ip6_tables iptable_filter ip_tables
des_generic md5 sch_fq_codel cdc_mbim cdc_wdm cdc_ncm usbnet mii
cdc_acm usb_storage ip_tunnel xfrm_user xfrm6_tunnel tunnel6
xfrm4_tunnel tunnel4 esp6 esp4 ah6 ah4 xfrm_algo xt_LOG xt_LED
xt_comment x_tables ipv6
[   55.134954] CPU: 0 PID: 82 Comm: kworker/0:2 Tainted: G
   T 5.8.17 #1
[   55.142526] Hardware name: Freescale i.MX7 Dual (Device Tree)
[   55.148304] Workqueue: events acm_softint [cdc_acm]
[   55.153196] PC is at kobject_get+0x10/0xa4
[   55.157302] LR is at usb_get_dev+0x14/0x1c
[   55.161402] pc : [<8047c06c>]    lr : [<80560448>]    psr: 20000193
[   55.167671] sp : bca39ea8  ip : 00007374  fp : bf6cbd80
[   55.172899] r10: 00000000  r9 : bdd92284  r8 : bdd92008
[   55.178128] r7 : 6b6b6b6b  r6 : fffffffe  r5 : 60000113  r4 : 6b6b6be3
[   55.184658] r3 : 6b6b6b6b  r2 : 00000111  r1 : 00000000  r0 : 6b6b6be3
[   55.191191] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM Segment none
[   55.198417] Control: 10c5387d  Table: bcf0c06a  DAC: 00000051
[   55.204168] Process kworker/0:2 (pid: 82, stack limit = 0x9bdd2a89)
[   55.210439] Stack: (0xbca39ea8 to 0xbca3a000)
[   55.214805] 9ea0:                   bf6cbd80 80769a50 6b6b6b6b 80560448 bdeb0500 8056bfe8
[   55.222991] 9ec0: 00000002 b76da000 00000000 bdeb0500 bdd92448 bca38000 bdeb0510 8056d69c
[   55.231177] 9ee0: bca38000 00000000 80c050fc 00000000 bca39f44 09d42015 00000000 00000001
[   55.239363] 9f00: bdd92448 bdd92438 bdd92000 7f1158c4 bdd92448 bca2ee00 bf6cbd80 bf6cef00
[   55.247549] 9f20: 00000000 00000000 00000000 801412d8 bf6cbd98 80c03d00 bca2ee00 bf6cbd80
[   55.255735] 9f40: bca2ee14 bf6cbd98 80c03d00 00000008 bca38000 80141568 00000000 80c446ae
[   55.263921] 9f60: 00000000 bc9ed880 bc9f0700 bca38000 bc117eb4 80141524 bca2ee00 bc9ed8a4
[   55.272107] 9f80: 00000000 80147cc8 00000000 bc9f0700 80147b84 00000000 00000000 00000000
[   55.280292] 9fa0: 00000000 00000000 00000000 80100148 00000000 00000000 00000000 00000000
[   55.288477] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   55.296662] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[   55.304860] [<8047c06c>] (kobject_get) from [<80560448>] (usb_get_dev+0x14/0x1c)
[   55.312271] [<80560448>] (usb_get_dev) from [<8056bfe8>] (usb_hcd_unlink_urb+0x50/0xd8)
[   55.320286] [<8056bfe8>] (usb_hcd_unlink_urb) from [<8056d69c>] (usb_kill_urb.part.0+0x44/0xd0)
[   55.329004] [<8056d69c>] (usb_kill_urb.part.0) from [<7f1158c4>] (acm_softint+0x4c/0x10c [cdc_acm])
[   55.338082] [<7f1158c4>] (acm_softint [cdc_acm]) from [<801412d8>] (process_one_work+0x19c/0x3e8)
[   55.346969] [<801412d8>] (process_one_work) from [<80141568>] (worker_thread+0x44/0x4dc)
[   55.355072] [<80141568>] (worker_thread) from [<80147cc8>] (kthread+0x144/0x180)
[   55.362481] [<80147cc8>] (kthread) from [<80100148>] (ret_from_fork+0x14/0x2c)
[   55.369706] Exception stack(0xbca39fb0 to 0xbca39ff8)

Tested-by: Bruno Thomsen <bruno.thomsen@gmail.com>
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210311130126.15972-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jimbo77 pushed a commit that referenced this pull request Apr 17, 2021
[ Upstream commit a9f81244d2e33e6dfcef120fefd30c96b3f7cdb0 ]

setup_fritz() in avmfritz.c might fail with -EIO and in this case the
isac.type and isac.write_reg is not initialized and remains 0(NULL).
A subsequent call to isac_release() will dereference isac->write_reg and
crash.

[    1.737444] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    1.737809] #PF: supervisor instruction fetch in kernel mode
[    1.738106] #PF: error_code(0x0010) - not-present page
[    1.738378] PGD 0 P4D 0
[    1.738515] Oops: 0010 [#1] SMP NOPTI
[    1.738711] CPU: 0 PID: 180 Comm: systemd-udevd Not tainted 5.12.0-rc2+ #78
[    1.739077] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-p
rebuilt.qemu.org 04/01/2014
[    1.739664] RIP: 0010:0x0
[    1.739807] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[    1.740200] RSP: 0018:ffffc9000027ba10 EFLAGS: 00010202
[    1.740478] RAX: 0000000000000000 RBX: ffff888102f41840 RCX: 0000000000000027
[    1.740853] RDX: 00000000000000ff RSI: 0000000000000020 RDI: ffff888102f41800
[    1.741226] RBP: ffffc9000027ba20 R08: ffff88817bc18440 R09: ffffc9000027b808
[    1.741600] R10: 0000000000000001 R11: 0000000000000001 R12: ffff888102f41840
[    1.741976] R13: 00000000fffffffb R14: ffff888102f41800 R15: ffff8881008b0000
[    1.742351] FS:  00007fda3a38a8c0(0000) GS:ffff88817bc00000(0000) knlGS:0000000000000000
[    1.742774] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.743076] CR2: ffffffffffffffd6 CR3: 00000001021ec000 CR4: 00000000000006f0
[    1.743452] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    1.743828] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    1.744206] Call Trace:
[    1.744339]  isac_release+0xcc/0xe0 [mISDNipac]
[    1.744582]  fritzpci_probe.cold+0x282/0x739 [avmfritz]
[    1.744861]  local_pci_probe+0x48/0x80
[    1.745063]  pci_device_probe+0x10f/0x1c0
[    1.745278]  really_probe+0xfb/0x420
[    1.745471]  driver_probe_device+0xe9/0x160
[    1.745693]  device_driver_attach+0x5d/0x70
[    1.745917]  __driver_attach+0x8f/0x150
[    1.746123]  ? device_driver_attach+0x70/0x70
[    1.746354]  bus_for_each_dev+0x7e/0xc0
[    1.746560]  driver_attach+0x1e/0x20
[    1.746751]  bus_add_driver+0x152/0x1f0
[    1.746957]  driver_register+0x74/0xd0
[    1.747157]  ? 0xffffffffc00d8000
[    1.747334]  __pci_register_driver+0x54/0x60
[    1.747562]  AVM_init+0x36/0x1000 [avmfritz]
[    1.747791]  do_one_initcall+0x48/0x1d0
[    1.747997]  ? __cond_resched+0x19/0x30
[    1.748206]  ? kmem_cache_alloc_trace+0x390/0x440
[    1.748458]  ? do_init_module+0x28/0x250
[    1.748669]  do_init_module+0x62/0x250
[    1.748870]  load_module+0x23ee/0x26a0
[    1.749073]  __do_sys_finit_module+0xc2/0x120
[    1.749307]  ? __do_sys_finit_module+0xc2/0x120
[    1.749549]  __x64_sys_finit_module+0x1a/0x20
[    1.749782]  do_syscall_64+0x38/0x90

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Apr 17, 2021
commit 90bd070aae6c4fb5d302f9c4b9c88be60c8197ec upstream.

The following deadlock is detected:

  truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write).

  PID: 14827  TASK: ffff881686a9af80  CPU: 20  COMMAND: "ora_p005_hrltd9"
   #0  __schedule at ffffffff818667cc
   #1  schedule at ffffffff81866de6
   #2  inode_dio_wait at ffffffff812a2d04
   #3  ocfs2_setattr at ffffffffc05f322e [ocfs2]
   #4  notify_change at ffffffff812a5a09
   #5  do_truncate at ffffffff812808f5
   #6  do_sys_ftruncate.constprop.18 at ffffffff81280cf2
   #7  sys_ftruncate at ffffffff81280d8e
   #8  do_syscall_64 at ffffffff81003949
   #9  entry_SYSCALL_64_after_hwframe at ffffffff81a001ad

dio completion path is going to complete one direct IO (decrement
inode->i_dio_count), but before that it hung at locking inode->i_rwsem:

   #0  __schedule+700 at ffffffff818667cc
   #1  schedule+54 at ffffffff81866de6
   #2  rwsem_down_write_failed+536 at ffffffff8186aa28
   #3  call_rwsem_down_write_failed+23 at ffffffff8185a1b7
   #4  down_write+45 at ffffffff81869c9d
   #5  ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
   #6  ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
   #7  dio_complete+140 at ffffffff812c873c
   #8  dio_aio_complete_work+25 at ffffffff812c89f9
   #9  process_one_work+361 at ffffffff810b1889
  #10  worker_thread+77 at ffffffff810b233d
  #11  kthread+261 at ffffffff810b7fd5
  #12  ret_from_fork+62 at ffffffff81a0035e

Thus above forms ABBA deadlock.  The same deadlock was mentioned in
upstream commit 28f5a8a ("ocfs2: should wait dio before inode lock
in ocfs2_setattr()").  It seems that that commit only removed the
cluster lock (the victim of above dead lock) from the ABBA deadlock
party.

End-user visible effects: Process hang in truncate -> ocfs2_setattr path
and other processes hang at ocfs2_dio_end_io_write path.

This is to fix the deadlock itself.  It removes inode_lock() call from
dio completion path to remove the deadlock and add ip_alloc_sem lock in
setattr path to synchronize the inode modifications.

[wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested]
  Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com

Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jimbo77 pushed a commit that referenced this pull request Apr 17, 2021
commit 1ffbc7ea91606e4abd10eb60de5367f1c86daf5e upstream.

Reproduce:

  modprobe sch_teql
  tc qdisc add dev teql0 root teql0

This leads to (for instance in Centos 7 VM) OOPS:

[  532.366633] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[  532.366733] IP: [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql]
[  532.366825] PGD 80000001376d5067 PUD 137e37067 PMD 0
[  532.366906] Oops: 0000 [#1] SMP
[  532.366987] Modules linked in: sch_teql ...
[  532.367945] CPU: 1 PID: 3026 Comm: tc Kdump: loaded Tainted: G               ------------ T 3.10.0-1062.7.1.el7.x86_64 #1
[  532.368041] Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.2 04/01/2014
[  532.368125] task: ffff8b7d37d31070 ti: ffff8b7c9fdbc000 task.ti: ffff8b7c9fdbc000
[  532.368224] RIP: 0010:[<ffffffffc06124a8>]  [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql]
[  532.368320] RSP: 0018:ffff8b7c9fdbf8e0  EFLAGS: 00010286
[  532.368394] RAX: ffffffffc0612490 RBX: ffff8b7cb1565e00 RCX: ffff8b7d35ba2000
[  532.368476] RDX: ffff8b7d35ba2000 RSI: 0000000000000000 RDI: ffff8b7cb1565e00
[  532.368557] RBP: ffff8b7c9fdbf8f8 R08: ffff8b7d3fd1f140 R09: ffff8b7d3b001600
[  532.368638] R10: ffff8b7d3b001600 R11: ffffffff84c7d65b R12: 00000000ffffffd8
[  532.368719] R13: 0000000000008000 R14: ffff8b7d35ba2000 R15: ffff8b7c9fdbf9a8
[  532.368800] FS:  00007f6a4e872740(0000) GS:ffff8b7d3fd00000(0000) knlGS:0000000000000000
[  532.368885] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  532.368961] CR2: 00000000000000a8 CR3: 00000001396ee000 CR4: 00000000000206e0
[  532.369046] Call Trace:
[  532.369159]  [<ffffffff84c8192e>] qdisc_create+0x36e/0x450
[  532.369268]  [<ffffffff846a9b49>] ? ns_capable+0x29/0x50
[  532.369366]  [<ffffffff849afde2>] ? nla_parse+0x32/0x120
[  532.369442]  [<ffffffff84c81b4c>] tc_modify_qdisc+0x13c/0x610
[  532.371508]  [<ffffffff84c693e7>] rtnetlink_rcv_msg+0xa7/0x260
[  532.372668]  [<ffffffff84907b65>] ? sock_has_perm+0x75/0x90
[  532.373790]  [<ffffffff84c69340>] ? rtnl_newlink+0x890/0x890
[  532.374914]  [<ffffffff84c8da7b>] netlink_rcv_skb+0xab/0xc0
[  532.376055]  [<ffffffff84c63708>] rtnetlink_rcv+0x28/0x30
[  532.377204]  [<ffffffff84c8d400>] netlink_unicast+0x170/0x210
[  532.378333]  [<ffffffff84c8d7a8>] netlink_sendmsg+0x308/0x420
[  532.379465]  [<ffffffff84c2f3a6>] sock_sendmsg+0xb6/0xf0
[  532.380710]  [<ffffffffc034a56e>] ? __xfs_filemap_fault+0x8e/0x1d0 [xfs]
[  532.381868]  [<ffffffffc034a75c>] ? xfs_filemap_fault+0x2c/0x30 [xfs]
[  532.383037]  [<ffffffff847ec23a>] ? __do_fault.isra.61+0x8a/0x100
[  532.384144]  [<ffffffff84c30269>] ___sys_sendmsg+0x3e9/0x400
[  532.385268]  [<ffffffff847f3fad>] ? handle_mm_fault+0x39d/0x9b0
[  532.386387]  [<ffffffff84d88678>] ? __do_page_fault+0x238/0x500
[  532.387472]  [<ffffffff84c31921>] __sys_sendmsg+0x51/0x90
[  532.388560]  [<ffffffff84c31972>] SyS_sendmsg+0x12/0x20
[  532.389636]  [<ffffffff84d8dede>] system_call_fastpath+0x25/0x2a
[  532.390704]  [<ffffffff84d8de21>] ? system_call_after_swapgs+0xae/0x146
[  532.391753] Code: 00 00 00 00 00 00 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b b7 48 01 00 00 48 89 fb <48> 8b 8e a8 00 00 00 48 85 c9 74 43 48 89 ca eb 0f 0f 1f 80 00
[  532.394036] RIP  [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql]
[  532.395127]  RSP <ffff8b7c9fdbf8e0>
[  532.396179] CR2: 00000000000000a8

Null pointer dereference happens on master->slaves dereference in
teql_destroy() as master is null-pointer.

When qdisc_create() calls teql_qdisc_init() it imediately fails after
check "if (m->dev == dev)" because both devices are teql0, and it does
not set qdisc_priv(sch)->m leaving it zero on error path, then
qdisc_create() imediately calls teql_destroy() which does not expect
zero master pointer and we get OOPS.

Fixes: 87b60cf ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jimbo77 pushed a commit that referenced this pull request Apr 17, 2021
[ Upstream commit 7c0d6e482062eb5c06ecccfab340abc523bdca00 ]

card->owner is a required property and since commit 81033c6b584b ("ALSA:
core: Warn on empty module") a warning is issued if it is empty. Add it.
This fixes following warning observed on Lamobo R1:

WARNING: CPU: 1 PID: 190 at sound/core/init.c:207 snd_card_new+0x430/0x480 [snd]
Modules linked in: sun4i_codec(E+) sun4i_backend(E+) snd_soc_core(E) ...
CPU: 1 PID: 190 Comm: systemd-udevd Tainted: G         C  E     5.10.0-1-armmp #1 Debian 5.10.4-1
Hardware name: Allwinner sun7i (A20) Family
Call trace:
 (snd_card_new [snd])
 (snd_soc_bind_card [snd_soc_core])
 (snd_soc_register_card [snd_soc_core])
 (sun4i_codec_probe [sun4i_codec])

Fixes: 45fb6b6 ("ASoC: sunxi: add support for the on-chip codec on early Allwinner SoCs")
Related: commit 3c27ea23ffb4 ("ASoC: qcom: Set card->owner to avoid warnings")
Related: commit ec653df2a0cb ("drm/vc4/vc4_hdmi: fill ASoC card owner")
Cc: linux-arm-kernel@lists.infradead.org
Cc: alsa-devel@alsa-project.org
Signed-off-by: Bastian Germann <bage@linutronix.de>
Link: https://lore.kernel.org/r/20210331151843.30583-1-bage@linutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Apr 17, 2021
[ Upstream commit 8d3c0c01cb2e36b2bf3c06a82b18b228d0c8f5d0 ]

Fix invalid usage of a list_for_each_entry cursor in
clk_notifier_register(). When list is empty or if the list
is completely traversed (without breaking from the loop on one
of the entries) then the list cursor does not point to a valid
entry and therefore should not be used.

The issue was dicovered when running 5.12-rc1 kernel on x86_64
with KASAN enabled:
BUG: KASAN: global-out-of-bounds in clk_notifier_register+0xab/0x230
Read of size 8 at addr ffffffffa0d10588 by task swapper/0/1

CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc1 #1
Hardware name: Google Caroline/Caroline,
BIOS Google_Caroline.7820.430.0 07/20/2018
Call Trace:
 dump_stack+0xee/0x15c
 print_address_description+0x1e/0x2dc
 kasan_report+0x188/0x1ce
 ? clk_notifier_register+0xab/0x230
 ? clk_prepare_lock+0x15/0x7b
 ? clk_notifier_register+0xab/0x230
 clk_notifier_register+0xab/0x230
 dw8250_probe+0xc01/0x10d4
...
Memory state around the buggy address:
 ffffffffa0d10480: 00 00 00 00 00 03 f9 f9 f9 f9 f9 f9 00 00 00 00
 ffffffffa0d10500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9
>ffffffffa0d10580: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
                      ^
 ffffffffa0d10600: 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 00 00 00 00
 ffffffffa0d10680: 00 00 00 00 00 00 00 00 f9 f9 f9 f9 00 00 00 00
 ==================================================================

Fixes: b247649 ("clk: introduce the common clock framework")
Reported-by: Lukasz Majczak <lma@semihalf.com>
Signed-off-by: Lukasz Bartosik <lb@semihalf.com>
Link: https://lore.kernel.org/r/20210401225149.18826-1-lb@semihalf.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Apr 17, 2021
[ Upstream commit 7045465500e465b09f09d6e5bdc260a9f1aab97b ]

Fix invalid usage of a list_for_each_entry cursor in
clk_notifier_unregister(). When list is empty or if the list
is completely traversed (without breaking from the loop on one
of the entries) then the list cursor does not point to a valid
entry and therefore should not be used. The patch fixes a logical
bug that hasn't been seen in pratice however it is analogus
to the bug fixed in clk_notifier_register().

The issue was dicovered when running 5.12-rc1 kernel on x86_64
with KASAN enabled:
BUG: KASAN: global-out-of-bounds in clk_notifier_register+0xab/0x230
Read of size 8 at addr ffffffffa0d10588 by task swapper/0/1

CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc1 #1
Hardware name: Google Caroline/Caroline,
BIOS Google_Caroline.7820.430.0 07/20/2018
Call Trace:
 dump_stack+0xee/0x15c
 print_address_description+0x1e/0x2dc
 kasan_report+0x188/0x1ce
 ? clk_notifier_register+0xab/0x230
 ? clk_prepare_lock+0x15/0x7b
 ? clk_notifier_register+0xab/0x230
 clk_notifier_register+0xab/0x230
 dw8250_probe+0xc01/0x10d4
 ...
 Memory state around the buggy address:
  ffffffffa0d10480: 00 00 00 00 00 03 f9 f9 f9 f9 f9 f9 00 00 00 00
  ffffffffa0d10500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9
 >ffffffffa0d10580: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
                          ^
  ffffffffa0d10600: 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 00 00 00 00
  ffffffffa0d10680: 00 00 00 00 00 00 00 00 f9 f9 f9 f9 00 00 00 00
  ==================================================================

Fixes: b247649 ("clk: introduce the common clock framework")
Reported-by: Lukasz Majczak <lma@semihalf.com>
Signed-off-by: Lukasz Bartosik <lb@semihalf.com>
Link: https://lore.kernel.org/r/20210401225149.18826-2-lb@semihalf.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Apr 17, 2021
[ Upstream commit a9f81244d2e33e6dfcef120fefd30c96b3f7cdb0 ]

setup_fritz() in avmfritz.c might fail with -EIO and in this case the
isac.type and isac.write_reg is not initialized and remains 0(NULL).
A subsequent call to isac_release() will dereference isac->write_reg and
crash.

[    1.737444] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    1.737809] #PF: supervisor instruction fetch in kernel mode
[    1.738106] #PF: error_code(0x0010) - not-present page
[    1.738378] PGD 0 P4D 0
[    1.738515] Oops: 0010 [#1] SMP NOPTI
[    1.738711] CPU: 0 PID: 180 Comm: systemd-udevd Not tainted 5.12.0-rc2+ #78
[    1.739077] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-p
rebuilt.qemu.org 04/01/2014
[    1.739664] RIP: 0010:0x0
[    1.739807] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[    1.740200] RSP: 0018:ffffc9000027ba10 EFLAGS: 00010202
[    1.740478] RAX: 0000000000000000 RBX: ffff888102f41840 RCX: 0000000000000027
[    1.740853] RDX: 00000000000000ff RSI: 0000000000000020 RDI: ffff888102f41800
[    1.741226] RBP: ffffc9000027ba20 R08: ffff88817bc18440 R09: ffffc9000027b808
[    1.741600] R10: 0000000000000001 R11: 0000000000000001 R12: ffff888102f41840
[    1.741976] R13: 00000000fffffffb R14: ffff888102f41800 R15: ffff8881008b0000
[    1.742351] FS:  00007fda3a38a8c0(0000) GS:ffff88817bc00000(0000) knlGS:0000000000000000
[    1.742774] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.743076] CR2: ffffffffffffffd6 CR3: 00000001021ec000 CR4: 00000000000006f0
[    1.743452] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    1.743828] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    1.744206] Call Trace:
[    1.744339]  isac_release+0xcc/0xe0 [mISDNipac]
[    1.744582]  fritzpci_probe.cold+0x282/0x739 [avmfritz]
[    1.744861]  local_pci_probe+0x48/0x80
[    1.745063]  pci_device_probe+0x10f/0x1c0
[    1.745278]  really_probe+0xfb/0x420
[    1.745471]  driver_probe_device+0xe9/0x160
[    1.745693]  device_driver_attach+0x5d/0x70
[    1.745917]  __driver_attach+0x8f/0x150
[    1.746123]  ? device_driver_attach+0x70/0x70
[    1.746354]  bus_for_each_dev+0x7e/0xc0
[    1.746560]  driver_attach+0x1e/0x20
[    1.746751]  bus_add_driver+0x152/0x1f0
[    1.746957]  driver_register+0x74/0xd0
[    1.747157]  ? 0xffffffffc00d8000
[    1.747334]  __pci_register_driver+0x54/0x60
[    1.747562]  AVM_init+0x36/0x1000 [avmfritz]
[    1.747791]  do_one_initcall+0x48/0x1d0
[    1.747997]  ? __cond_resched+0x19/0x30
[    1.748206]  ? kmem_cache_alloc_trace+0x390/0x440
[    1.748458]  ? do_init_module+0x28/0x250
[    1.748669]  do_init_module+0x62/0x250
[    1.748870]  load_module+0x23ee/0x26a0
[    1.749073]  __do_sys_finit_module+0xc2/0x120
[    1.749307]  ? __do_sys_finit_module+0xc2/0x120
[    1.749549]  __x64_sys_finit_module+0x1a/0x20
[    1.749782]  do_syscall_64+0x38/0x90

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Apr 17, 2021
commit 90bd070aae6c4fb5d302f9c4b9c88be60c8197ec upstream.

The following deadlock is detected:

  truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write).

  PID: 14827  TASK: ffff881686a9af80  CPU: 20  COMMAND: "ora_p005_hrltd9"
   #0  __schedule at ffffffff818667cc
   #1  schedule at ffffffff81866de6
   #2  inode_dio_wait at ffffffff812a2d04
   #3  ocfs2_setattr at ffffffffc05f322e [ocfs2]
   #4  notify_change at ffffffff812a5a09
   #5  do_truncate at ffffffff812808f5
   #6  do_sys_ftruncate.constprop.18 at ffffffff81280cf2
   #7  sys_ftruncate at ffffffff81280d8e
   #8  do_syscall_64 at ffffffff81003949
   #9  entry_SYSCALL_64_after_hwframe at ffffffff81a001ad

dio completion path is going to complete one direct IO (decrement
inode->i_dio_count), but before that it hung at locking inode->i_rwsem:

   #0  __schedule+700 at ffffffff818667cc
   #1  schedule+54 at ffffffff81866de6
   #2  rwsem_down_write_failed+536 at ffffffff8186aa28
   #3  call_rwsem_down_write_failed+23 at ffffffff8185a1b7
   #4  down_write+45 at ffffffff81869c9d
   #5  ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
   #6  ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
   #7  dio_complete+140 at ffffffff812c873c
   #8  dio_aio_complete_work+25 at ffffffff812c89f9
   #9  process_one_work+361 at ffffffff810b1889
  #10  worker_thread+77 at ffffffff810b233d
  #11  kthread+261 at ffffffff810b7fd5
  #12  ret_from_fork+62 at ffffffff81a0035e

Thus above forms ABBA deadlock.  The same deadlock was mentioned in
upstream commit 28f5a8a ("ocfs2: should wait dio before inode lock
in ocfs2_setattr()").  It seems that that commit only removed the
cluster lock (the victim of above dead lock) from the ABBA deadlock
party.

End-user visible effects: Process hang in truncate -> ocfs2_setattr path
and other processes hang at ocfs2_dio_end_io_write path.

This is to fix the deadlock itself.  It removes inode_lock() call from
dio completion path to remove the deadlock and add ip_alloc_sem lock in
setattr path to synchronize the inode modifications.

[wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested]
  Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com

Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jimbo77 pushed a commit that referenced this pull request Apr 17, 2021
commit 1ffbc7ea91606e4abd10eb60de5367f1c86daf5e upstream.

Reproduce:

  modprobe sch_teql
  tc qdisc add dev teql0 root teql0

This leads to (for instance in Centos 7 VM) OOPS:

[  532.366633] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[  532.366733] IP: [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql]
[  532.366825] PGD 80000001376d5067 PUD 137e37067 PMD 0
[  532.366906] Oops: 0000 [#1] SMP
[  532.366987] Modules linked in: sch_teql ...
[  532.367945] CPU: 1 PID: 3026 Comm: tc Kdump: loaded Tainted: G               ------------ T 3.10.0-1062.7.1.el7.x86_64 #1
[  532.368041] Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.2 04/01/2014
[  532.368125] task: ffff8b7d37d31070 ti: ffff8b7c9fdbc000 task.ti: ffff8b7c9fdbc000
[  532.368224] RIP: 0010:[<ffffffffc06124a8>]  [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql]
[  532.368320] RSP: 0018:ffff8b7c9fdbf8e0  EFLAGS: 00010286
[  532.368394] RAX: ffffffffc0612490 RBX: ffff8b7cb1565e00 RCX: ffff8b7d35ba2000
[  532.368476] RDX: ffff8b7d35ba2000 RSI: 0000000000000000 RDI: ffff8b7cb1565e00
[  532.368557] RBP: ffff8b7c9fdbf8f8 R08: ffff8b7d3fd1f140 R09: ffff8b7d3b001600
[  532.368638] R10: ffff8b7d3b001600 R11: ffffffff84c7d65b R12: 00000000ffffffd8
[  532.368719] R13: 0000000000008000 R14: ffff8b7d35ba2000 R15: ffff8b7c9fdbf9a8
[  532.368800] FS:  00007f6a4e872740(0000) GS:ffff8b7d3fd00000(0000) knlGS:0000000000000000
[  532.368885] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  532.368961] CR2: 00000000000000a8 CR3: 00000001396ee000 CR4: 00000000000206e0
[  532.369046] Call Trace:
[  532.369159]  [<ffffffff84c8192e>] qdisc_create+0x36e/0x450
[  532.369268]  [<ffffffff846a9b49>] ? ns_capable+0x29/0x50
[  532.369366]  [<ffffffff849afde2>] ? nla_parse+0x32/0x120
[  532.369442]  [<ffffffff84c81b4c>] tc_modify_qdisc+0x13c/0x610
[  532.371508]  [<ffffffff84c693e7>] rtnetlink_rcv_msg+0xa7/0x260
[  532.372668]  [<ffffffff84907b65>] ? sock_has_perm+0x75/0x90
[  532.373790]  [<ffffffff84c69340>] ? rtnl_newlink+0x890/0x890
[  532.374914]  [<ffffffff84c8da7b>] netlink_rcv_skb+0xab/0xc0
[  532.376055]  [<ffffffff84c63708>] rtnetlink_rcv+0x28/0x30
[  532.377204]  [<ffffffff84c8d400>] netlink_unicast+0x170/0x210
[  532.378333]  [<ffffffff84c8d7a8>] netlink_sendmsg+0x308/0x420
[  532.379465]  [<ffffffff84c2f3a6>] sock_sendmsg+0xb6/0xf0
[  532.380710]  [<ffffffffc034a56e>] ? __xfs_filemap_fault+0x8e/0x1d0 [xfs]
[  532.381868]  [<ffffffffc034a75c>] ? xfs_filemap_fault+0x2c/0x30 [xfs]
[  532.383037]  [<ffffffff847ec23a>] ? __do_fault.isra.61+0x8a/0x100
[  532.384144]  [<ffffffff84c30269>] ___sys_sendmsg+0x3e9/0x400
[  532.385268]  [<ffffffff847f3fad>] ? handle_mm_fault+0x39d/0x9b0
[  532.386387]  [<ffffffff84d88678>] ? __do_page_fault+0x238/0x500
[  532.387472]  [<ffffffff84c31921>] __sys_sendmsg+0x51/0x90
[  532.388560]  [<ffffffff84c31972>] SyS_sendmsg+0x12/0x20
[  532.389636]  [<ffffffff84d8dede>] system_call_fastpath+0x25/0x2a
[  532.390704]  [<ffffffff84d8de21>] ? system_call_after_swapgs+0xae/0x146
[  532.391753] Code: 00 00 00 00 00 00 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b b7 48 01 00 00 48 89 fb <48> 8b 8e a8 00 00 00 48 85 c9 74 43 48 89 ca eb 0f 0f 1f 80 00
[  532.394036] RIP  [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql]
[  532.395127]  RSP <ffff8b7c9fdbf8e0>
[  532.396179] CR2: 00000000000000a8

Null pointer dereference happens on master->slaves dereference in
teql_destroy() as master is null-pointer.

When qdisc_create() calls teql_qdisc_init() it imediately fails after
check "if (m->dev == dev)" because both devices are teql0, and it does
not set qdisc_priv(sch)->m leaving it zero on error path, then
qdisc_create() imediately calls teql_destroy() which does not expect
zero master pointer and we get OOPS.

Fixes: 87b60cf ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jimbo77 pushed a commit that referenced this pull request Apr 17, 2021
[ Upstream commit 7c0d6e482062eb5c06ecccfab340abc523bdca00 ]

card->owner is a required property and since commit 81033c6b584b ("ALSA:
core: Warn on empty module") a warning is issued if it is empty. Add it.
This fixes following warning observed on Lamobo R1:

WARNING: CPU: 1 PID: 190 at sound/core/init.c:207 snd_card_new+0x430/0x480 [snd]
Modules linked in: sun4i_codec(E+) sun4i_backend(E+) snd_soc_core(E) ...
CPU: 1 PID: 190 Comm: systemd-udevd Tainted: G         C  E     5.10.0-1-armmp #1 Debian 5.10.4-1
Hardware name: Allwinner sun7i (A20) Family
Call trace:
 (snd_card_new [snd])
 (snd_soc_bind_card [snd_soc_core])
 (snd_soc_register_card [snd_soc_core])
 (sun4i_codec_probe [sun4i_codec])

Fixes: 45fb6b6 ("ASoC: sunxi: add support for the on-chip codec on early Allwinner SoCs")
Related: commit 3c27ea23ffb4 ("ASoC: qcom: Set card->owner to avoid warnings")
Related: commit ec653df2a0cb ("drm/vc4/vc4_hdmi: fill ASoC card owner")
Cc: linux-arm-kernel@lists.infradead.org
Cc: alsa-devel@alsa-project.org
Signed-off-by: Bastian Germann <bage@linutronix.de>
Link: https://lore.kernel.org/r/20210331151843.30583-1-bage@linutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Apr 17, 2021
[ Upstream commit 8d3c0c01cb2e36b2bf3c06a82b18b228d0c8f5d0 ]

Fix invalid usage of a list_for_each_entry cursor in
clk_notifier_register(). When list is empty or if the list
is completely traversed (without breaking from the loop on one
of the entries) then the list cursor does not point to a valid
entry and therefore should not be used.

The issue was dicovered when running 5.12-rc1 kernel on x86_64
with KASAN enabled:
BUG: KASAN: global-out-of-bounds in clk_notifier_register+0xab/0x230
Read of size 8 at addr ffffffffa0d10588 by task swapper/0/1

CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc1 #1
Hardware name: Google Caroline/Caroline,
BIOS Google_Caroline.7820.430.0 07/20/2018
Call Trace:
 dump_stack+0xee/0x15c
 print_address_description+0x1e/0x2dc
 kasan_report+0x188/0x1ce
 ? clk_notifier_register+0xab/0x230
 ? clk_prepare_lock+0x15/0x7b
 ? clk_notifier_register+0xab/0x230
 clk_notifier_register+0xab/0x230
 dw8250_probe+0xc01/0x10d4
...
Memory state around the buggy address:
 ffffffffa0d10480: 00 00 00 00 00 03 f9 f9 f9 f9 f9 f9 00 00 00 00
 ffffffffa0d10500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9
>ffffffffa0d10580: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
                      ^
 ffffffffa0d10600: 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 00 00 00 00
 ffffffffa0d10680: 00 00 00 00 00 00 00 00 f9 f9 f9 f9 00 00 00 00
 ==================================================================

Fixes: b247649 ("clk: introduce the common clock framework")
Reported-by: Lukasz Majczak <lma@semihalf.com>
Signed-off-by: Lukasz Bartosik <lb@semihalf.com>
Link: https://lore.kernel.org/r/20210401225149.18826-1-lb@semihalf.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Apr 17, 2021
[ Upstream commit 7045465500e465b09f09d6e5bdc260a9f1aab97b ]

Fix invalid usage of a list_for_each_entry cursor in
clk_notifier_unregister(). When list is empty or if the list
is completely traversed (without breaking from the loop on one
of the entries) then the list cursor does not point to a valid
entry and therefore should not be used. The patch fixes a logical
bug that hasn't been seen in pratice however it is analogus
to the bug fixed in clk_notifier_register().

The issue was dicovered when running 5.12-rc1 kernel on x86_64
with KASAN enabled:
BUG: KASAN: global-out-of-bounds in clk_notifier_register+0xab/0x230
Read of size 8 at addr ffffffffa0d10588 by task swapper/0/1

CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc1 #1
Hardware name: Google Caroline/Caroline,
BIOS Google_Caroline.7820.430.0 07/20/2018
Call Trace:
 dump_stack+0xee/0x15c
 print_address_description+0x1e/0x2dc
 kasan_report+0x188/0x1ce
 ? clk_notifier_register+0xab/0x230
 ? clk_prepare_lock+0x15/0x7b
 ? clk_notifier_register+0xab/0x230
 clk_notifier_register+0xab/0x230
 dw8250_probe+0xc01/0x10d4
 ...
 Memory state around the buggy address:
  ffffffffa0d10480: 00 00 00 00 00 03 f9 f9 f9 f9 f9 f9 00 00 00 00
  ffffffffa0d10500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9
 >ffffffffa0d10580: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
                          ^
  ffffffffa0d10600: 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 00 00 00 00
  ffffffffa0d10680: 00 00 00 00 00 00 00 00 f9 f9 f9 f9 00 00 00 00
  ==================================================================

Fixes: b247649 ("clk: introduce the common clock framework")
Reported-by: Lukasz Majczak <lma@semihalf.com>
Signed-off-by: Lukasz Bartosik <lb@semihalf.com>
Link: https://lore.kernel.org/r/20210401225149.18826-2-lb@semihalf.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request May 22, 2021
commit b895bdf5d643b6feb7c60856326dd4feb6981560 upstream.

div_u64() divides u64 by u32.

nft_limit_init() wants to divide u64 by u64, use the appropriate
math function (div64_u64)

divide error: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 8390 Comm: syz-executor188 Not tainted 5.12.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:div_u64_rem include/linux/math64.h:28 [inline]
RIP: 0010:div_u64 include/linux/math64.h:127 [inline]
RIP: 0010:nft_limit_init+0x2a2/0x5e0 net/netfilter/nft_limit.c:85
Code: ef 4c 01 eb 41 0f 92 c7 48 89 de e8 38 a5 22 fa 4d 85 ff 0f 85 97 02 00 00 e8 ea 9e 22 fa 4c 0f af f3 45 89 ed 31 d2 4c 89 f0 <49> f7 f5 49 89 c6 e8 d3 9e 22 fa 48 8d 7d 48 48 b8 00 00 00 00 00
RSP: 0018:ffffc90009447198 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000200000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff875152e6 RDI: 0000000000000003
RBP: ffff888020f80908 R08: 0000200000000000 R09: 0000000000000000
R10: ffffffff875152d8 R11: 0000000000000000 R12: ffffc90009447270
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  000000000097a300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200001c4 CR3: 0000000026a52000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 nf_tables_newexpr net/netfilter/nf_tables_api.c:2675 [inline]
 nft_expr_init+0x145/0x2d0 net/netfilter/nf_tables_api.c:2713
 nft_set_elem_expr_alloc+0x27/0x280 net/netfilter/nf_tables_api.c:5160
 nf_tables_newset+0x1997/0x3150 net/netfilter/nf_tables_api.c:4321
 nfnetlink_rcv_batch+0x85a/0x21b0 net/netfilter/nfnetlink.c:456
 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:580 [inline]
 nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:598
 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
 sock_sendmsg_nosec net/socket.c:654 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:674
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: c26844e ("netfilter: nf_tables: Fix nft limit burst handling")
Fixes: 3e0f64b ("netfilter: nft_limit: fix packet ratelimiting")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Diagnosed-by: Luigi Rizzo <lrizzo@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jimbo77 pushed a commit that referenced this pull request May 22, 2021
commit 1a1c130ab7575498eed5bcf7220037ae09cd1f8a upstream.

The following problem has been reported by George Kennedy:

 Since commit 7fef431be9c9 ("mm/page_alloc: place pages to tail
 in __free_pages_core()") the following use after free occurs
 intermittently when ACPI tables are accessed.

 BUG: KASAN: use-after-free in ibft_init+0x134/0xc49
 Read of size 4 at addr ffff8880be453004 by task swapper/0/1
 CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc1-7a7fd0d #1
 Call Trace:
  dump_stack+0xf6/0x158
  print_address_description.constprop.9+0x41/0x60
  kasan_report.cold.14+0x7b/0xd4
  __asan_report_load_n_noabort+0xf/0x20
  ibft_init+0x134/0xc49
  do_one_initcall+0xc4/0x3e0
  kernel_init_freeable+0x5af/0x66b
  kernel_init+0x16/0x1d0
  ret_from_fork+0x22/0x30

 ACPI tables mapped via kmap() do not have their mapped pages
 reserved and the pages can be "stolen" by the buddy allocator.

Apparently, on the affected system, the ACPI table in question is
not located in "reserved" memory, like ACPI NVS or ACPI Data, that
will not be used by the buddy allocator, so the memory occupied by
that table has to be explicitly reserved to prevent the buddy
allocator from using it.

In order to address this problem, rearrange the initialization of the
ACPI tables on x86 to locate the initial tables earlier and reserve
the memory occupied by them.

The other architectures using ACPI should not be affected by this
change.

Link: https://lore.kernel.org/linux-acpi/1614802160-29362-1-git-send-email-george.kennedy@oracle.com/
Reported-by: George Kennedy <george.kennedy@oracle.com>
Tested-by: George Kennedy <george.kennedy@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jimbo77 pushed a commit that referenced this pull request Jun 4, 2021
[ Upstream commit a2b2cc660822cae08c351c7f6b452bfd1330a4f7 ]

This patch fixes the following Coverity warning:

    CID 361199 (#1 of 1): Unchecked return value (CHECKED_RETURN)
    3. check_return: Calling qla24xx_get_isp_stats without checking return
    value (as is done elsewhere 4 out of 5 times).

Link: https://lore.kernel.org/r/20210320232359.941-7-bvanassche@acm.org
Cc: Quinn Tran <qutran@marvell.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Lee Duncan <lduncan@suse.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Jun 4, 2021
[ Upstream commit 2ce35c0821afc2acd5ee1c3f60d149f8b2520ce8 ]

On bsg command completion, bsg_job_done() was called while qla driver
continued to access the bsg_job buffer. bsg_job_done() would free up
resources that ended up being reused by other task while the driver
continued to access the buffers. As a result, driver was reading garbage
data.

localhost kernel: BUG: KASAN: use-after-free in sg_next+0x64/0x80
localhost kernel: Read of size 8 at addr ffff8883228a3330 by task swapper/26/0
localhost kernel:
localhost kernel: CPU: 26 PID: 0 Comm: swapper/26 Kdump:
loaded Tainted: G          OE    --------- -  - 4.18.0-193.el8.x86_64+debug #1
localhost kernel: Hardware name: HP ProLiant DL360
Gen9/ProLiant DL360 Gen9, BIOS P89 08/12/2016
localhost kernel: Call Trace:
localhost kernel: <IRQ>
localhost kernel: dump_stack+0x9a/0xf0
localhost kernel: print_address_description.cold.3+0x9/0x23b
localhost kernel: kasan_report.cold.4+0x65/0x95
localhost kernel: debug_dma_unmap_sg.part.12+0x10d/0x2d0
localhost kernel: qla2x00_bsg_sp_free+0xaf6/0x1010 [qla2xxx]

Link: https://lore.kernel.org/r/20210329085229.4367-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Jun 4, 2021
… root

[ Upstream commit f9690f426b2134cc3e74bfc5d9dfd6a4b2ca5281 ]

Commit dbcc7d57bffc0c ("btrfs: fix race when cloning extent buffer during
rewind of an old root"), fixed a race when we need to rewind the extent
buffer of an old root. It was caused by picking a new mod log operation
for the extent buffer while getting a cloned extent buffer with an outdated
number of items (off by -1), because we cloned the extent buffer without
locking it first.

However there is still another similar race, but in the opposite direction.
The cloned extent buffer has a number of items that does not match the
number of tree mod log operations that are going to be replayed. This is
because right after we got the last (most recent) tree mod log operation to
replay and before locking and cloning the extent buffer, another task adds
a new pointer to the extent buffer, which results in adding a new tree mod
log operation and incrementing the number of items in the extent buffer.
So after cloning we have mismatch between the number of items in the extent
buffer and the number of mod log operations we are going to apply to it.
This results in hitting a BUG_ON() that produces the following stack trace:

   ------------[ cut here ]------------
   kernel BUG at fs/btrfs/tree-mod-log.c:675!
   invalid opcode: 0000 [#1] SMP KASAN PTI
   CPU: 3 PID: 4811 Comm: crawl_1215 Tainted: G        W         5.12.0-7d1efdf501f8-misc-next+ #99
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
   RIP: 0010:tree_mod_log_rewind+0x3b1/0x3c0
   Code: 05 48 8d 74 10 (...)
   RSP: 0018:ffffc90001027090 EFLAGS: 00010293
   RAX: 0000000000000000 RBX: ffff8880a8514600 RCX: ffffffffaa9e59b6
   RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8880a851462c
   RBP: ffffc900010270e0 R08: 00000000000000c0 R09: ffffed1004333417
   R10: ffff88802199a0b7 R11: ffffed1004333416 R12: 000000000000000e
   R13: ffff888135af8748 R14: ffff88818766ff00 R15: ffff8880a851462c
   FS:  00007f29acf62700(0000) GS:ffff8881f2200000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 00007f0e6013f718 CR3: 000000010d42e003 CR4: 0000000000170ee0
   Call Trace:
    btrfs_get_old_root+0x16a/0x5c0
    ? lock_downgrade+0x400/0x400
    btrfs_search_old_slot+0x192/0x520
    ? btrfs_search_slot+0x1090/0x1090
    ? free_extent_buffer.part.61+0xd7/0x140
    ? free_extent_buffer+0x13/0x20
    resolve_indirect_refs+0x3e9/0xfc0
    ? lock_downgrade+0x400/0x400
    ? __kasan_check_read+0x11/0x20
    ? add_prelim_ref.part.11+0x150/0x150
    ? lock_downgrade+0x400/0x400
    ? __kasan_check_read+0x11/0x20
    ? lock_acquired+0xbb/0x620
    ? __kasan_check_write+0x14/0x20
    ? do_raw_spin_unlock+0xa8/0x140
    ? rb_insert_color+0x340/0x360
    ? prelim_ref_insert+0x12d/0x430
    find_parent_nodes+0x5c3/0x1830
    ? stack_trace_save+0x87/0xb0
    ? resolve_indirect_refs+0xfc0/0xfc0
    ? fs_reclaim_acquire+0x67/0xf0
    ? __kasan_check_read+0x11/0x20
    ? lockdep_hardirqs_on_prepare+0x210/0x210
    ? fs_reclaim_acquire+0x67/0xf0
    ? __kasan_check_read+0x11/0x20
    ? ___might_sleep+0x10f/0x1e0
    ? __kasan_kmalloc+0x9d/0xd0
    ? trace_hardirqs_on+0x55/0x120
    btrfs_find_all_roots_safe+0x142/0x1e0
    ? find_parent_nodes+0x1830/0x1830
    ? trace_hardirqs_on+0x55/0x120
    ? ulist_free+0x1f/0x30
    ? btrfs_inode_flags_to_xflags+0x50/0x50
    iterate_extent_inodes+0x20e/0x580
    ? tree_backref_for_extent+0x230/0x230
    ? release_extent_buffer+0x225/0x280
    ? read_extent_buffer+0xdd/0x110
    ? lock_downgrade+0x400/0x400
    ? __kasan_check_read+0x11/0x20
    ? lock_acquired+0xbb/0x620
    ? __kasan_check_write+0x14/0x20
    ? do_raw_spin_unlock+0xa8/0x140
    ? _raw_spin_unlock+0x22/0x30
    ? release_extent_buffer+0x225/0x280
    iterate_inodes_from_logical+0x129/0x170
    ? iterate_inodes_from_logical+0x129/0x170
    ? btrfs_inode_flags_to_xflags+0x50/0x50
    ? iterate_extent_inodes+0x580/0x580
    ? __vmalloc_node+0x92/0xb0
    ? init_data_container+0x34/0xb0
    ? init_data_container+0x34/0xb0
    ? kvmalloc_node+0x60/0x80
    btrfs_ioctl_logical_to_ino+0x158/0x230
    btrfs_ioctl+0x2038/0x4360
    ? __kasan_check_write+0x14/0x20
    ? mmput+0x3b/0x220
    ? btrfs_ioctl_get_supported_features+0x30/0x30
    ? __kasan_check_read+0x11/0x20
    ? __kasan_check_read+0x11/0x20
    ? lock_release+0xc8/0x650
    ? __might_fault+0x64/0xd0
    ? __kasan_check_read+0x11/0x20
    ? lock_downgrade+0x400/0x400
    ? lockdep_hardirqs_on_prepare+0x210/0x210
    ? lockdep_hardirqs_on_prepare+0x13/0x210
    ? _raw_spin_unlock_irqrestore+0x51/0x63
    ? __kasan_check_read+0x11/0x20
    ? do_vfs_ioctl+0xfc/0x9d0
    ? ioctl_file_clone+0xe0/0xe0
    ? lock_downgrade+0x400/0x400
    ? lockdep_hardirqs_on_prepare+0x210/0x210
    ? __kasan_check_read+0x11/0x20
    ? lock_release+0xc8/0x650
    ? __task_pid_nr_ns+0xd3/0x250
    ? __kasan_check_read+0x11/0x20
    ? __fget_files+0x160/0x230
    ? __fget_light+0xf2/0x110
    __x64_sys_ioctl+0xc3/0x100
    do_syscall_64+0x37/0x80
    entry_SYSCALL_64_after_hwframe+0x44/0xae
   RIP: 0033:0x7f29ae85b427
   Code: 00 00 90 48 8b (...)
   RSP: 002b:00007f29acf5fcf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
   RAX: ffffffffffffffda RBX: 00007f29acf5ff40 RCX: 00007f29ae85b427
   RDX: 00007f29acf5ff48 RSI: 00000000c038943b RDI: 0000000000000003
   RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f29acf60120
   R10: 00005640d5fc7b00 R11: 0000000000000246 R12: 0000000000000003
   R13: 00007f29acf5ff48 R14: 00007f29acf5ff40 R15: 00007f29acf5fef8
   Modules linked in:
   ---[ end trace 85e5fce078dfbe04 ]---

  (gdb) l *(tree_mod_log_rewind+0x3b1)
  0xffffffff819e5b21 is in tree_mod_log_rewind (fs/btrfs/tree-mod-log.c:675).
  670                      * the modification. As we're going backwards, we do the
  671                      * opposite of each operation here.
  672                      */
  673                     switch (tm->op) {
  674                     case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
  675                             BUG_ON(tm->slot < n);
  676                             fallthrough;
  677                     case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_MOVING:
  678                     case BTRFS_MOD_LOG_KEY_REMOVE:
  679                             btrfs_set_node_key(eb, &tm->key, tm->slot);
  (gdb) quit

The following steps explain in more detail how it happens:

1) We have one tree mod log user (through fiemap or the logical ino ioctl),
   with a sequence number of 1, so we have fs_info->tree_mod_seq == 1.
   This is task A;

2) Another task is at ctree.c:balance_level() and we have eb X currently as
   the root of the tree, and we promote its single child, eb Y, as the new
   root.

   Then, at ctree.c:balance_level(), we call:

      ret = btrfs_tree_mod_log_insert_root(root->node, child, true);

3) At btrfs_tree_mod_log_insert_root() we create a tree mod log operation
   of type BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING, with a ->logical field
   pointing to ebX->start. We only have one item in eb X, so we create
   only one tree mod log operation, and store in the "tm_list" array;

4) Then, still at btrfs_tree_mod_log_insert_root(), we create a tree mod
   log element of operation type BTRFS_MOD_LOG_ROOT_REPLACE, ->logical set
   to ebY->start, ->old_root.logical set to ebX->start, ->old_root.level
   set to the level of eb X and ->generation set to the generation of eb X;

5) Then btrfs_tree_mod_log_insert_root() calls tree_mod_log_free_eb() with
   "tm_list" as argument. After that, tree_mod_log_free_eb() calls
   tree_mod_log_insert(). This inserts the mod log operation of type
   BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING from step 3 into the rbtree
   with a sequence number of 2 (and fs_info->tree_mod_seq set to 2);

6) Then, after inserting the "tm_list" single element into the tree mod
   log rbtree, the BTRFS_MOD_LOG_ROOT_REPLACE element is inserted, which
   gets the sequence number 3 (and fs_info->tree_mod_seq set to 3);

7) Back to ctree.c:balance_level(), we free eb X by calling
   btrfs_free_tree_block() on it. Because eb X was created in the current
   transaction, has no other references and writeback did not happen for
   it, we add it back to the free space cache/tree;

8) Later some other task B allocates the metadata extent from eb X, since
   it is marked as free space in the space cache/tree, and uses it as a
   node for some other btree;

9) The tree mod log user task calls btrfs_search_old_slot(), which calls
   btrfs_get_old_root(), and finally that calls tree_mod_log_oldest_root()
   with time_seq == 1 and eb_root == eb Y;

10) The first iteration of the while loop finds the tree mod log element
    with sequence number 3, for the logical address of eb Y and of type
    BTRFS_MOD_LOG_ROOT_REPLACE;

11) Because the operation type is BTRFS_MOD_LOG_ROOT_REPLACE, we don't
    break out of the loop, and set root_logical to point to
    tm->old_root.logical, which corresponds to the logical address of
    eb X;

12) On the next iteration of the while loop, the call to
    tree_mod_log_search_oldest() returns the smallest tree mod log element
    for the logical address of eb X, which has a sequence number of 2, an
    operation type of BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING and
    corresponds to the old slot 0 of eb X (eb X had only 1 item in it
    before being freed at step 7);

13) We then break out of the while loop and return the tree mod log
    operation of type BTRFS_MOD_LOG_ROOT_REPLACE (eb Y), and not the one
    for slot 0 of eb X, to btrfs_get_old_root();

14) At btrfs_get_old_root(), we process the BTRFS_MOD_LOG_ROOT_REPLACE
    operation and set "logical" to the logical address of eb X, which was
    the old root. We then call tree_mod_log_search() passing it the logical
    address of eb X and time_seq == 1;

15) But before calling tree_mod_log_search(), task B locks eb X, adds a
    key to eb X, which results in adding a tree mod log operation of type
    BTRFS_MOD_LOG_KEY_ADD, with a sequence number of 4, to the tree mod
    log, and increments the number of items in eb X from 0 to 1.
    Now fs_info->tree_mod_seq has a value of 4;

16) Task A then calls tree_mod_log_search(), which returns the most recent
    tree mod log operation for eb X, which is the one just added by task B
    at the previous step, with a sequence number of 4, a type of
    BTRFS_MOD_LOG_KEY_ADD and for slot 0;

17) Before task A locks and clones eb X, task A adds another key to eb X,
    which results in adding a new BTRFS_MOD_LOG_KEY_ADD mod log operation,
    with a sequence number of 5, for slot 1 of eb X, increments the
    number of items in eb X from 1 to 2, and unlocks eb X.
    Now fs_info->tree_mod_seq has a value of 5;

18) Task A then locks eb X and clones it. The clone has a value of 2 for
    the number of items and the pointer "tm" points to the tree mod log
    operation with sequence number 4, not the most recent one with a
    sequence number of 5, so there is mismatch between the number of
    mod log operations that are going to be applied to the cloned version
    of eb X and the number of items in the clone;

19) Task A then calls tree_mod_log_rewind() with the clone of eb X, the
    tree mod log operation with sequence number 4 and a type of
    BTRFS_MOD_LOG_KEY_ADD, and time_seq == 1;

20) At tree_mod_log_rewind(), we set the local variable "n" with a value
    of 2, which is the number of items in the clone of eb X.

    Then in the first iteration of the while loop, we process the mod log
    operation with sequence number 4, which is targeted at slot 0 and has
    a type of BTRFS_MOD_LOG_KEY_ADD. This results in decrementing "n" from
    2 to 1.

    Then we pick the next tree mod log operation for eb X, which is the
    tree mod log operation with a sequence number of 2, a type of
    BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING and for slot 0, it is the one
    added in step 5 to the tree mod log tree.

    We go back to the top of the loop to process this mod log operation,
    and because its slot is 0 and "n" has a value of 1, we hit the BUG_ON:

        (...)
        switch (tm->op) {
        case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
                BUG_ON(tm->slot < n);
                fallthrough;
	(...)

Fix this by checking for a more recent tree mod log operation after locking
and cloning the extent buffer of the old root node, and use it as the first
operation to apply to the cloned extent buffer when rewinding it.

Stable backport notes: due to moved code and renames, in =< 5.11 the
change should be applied to ctree.c:get_old_root.

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Link: https://lore.kernel.org/linux-btrfs/20210404040732.GZ32440@hungrycats.org/
Fixes: 834328a ("Btrfs: tree mod log's old roots could still be part of the tree")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Jun 4, 2021
commit 90ada91f4610c5ef11bc52576516d96c496fc3f1 upstream.

KASAN reports a BUG when download file in jffs2 filesystem.It is
because when dstlen == 1, cpage_out will write array out of bounds.
Actually, data will not be compressed in jffs2_zlib_compress() if
data's length less than 4.

[  393.799778] BUG: KASAN: slab-out-of-bounds in jffs2_rtime_compress+0x214/0x2f0 at addr ffff800062e3b281
[  393.809166] Write of size 1 by task tftp/2918
[  393.813526] CPU: 3 PID: 2918 Comm: tftp Tainted: G    B           4.9.115-rt93-EMBSYS-CGEL-6.1.R6-dirty #1
[  393.823173] Hardware name: LS1043A RDB Board (DT)
[  393.827870] Call trace:
[  393.830322] [<ffff20000808c700>] dump_backtrace+0x0/0x2f0
[  393.835721] [<ffff20000808ca04>] show_stack+0x14/0x20
[  393.840774] [<ffff2000086ef700>] dump_stack+0x90/0xb0
[  393.845829] [<ffff20000827b19c>] kasan_object_err+0x24/0x80
[  393.851402] [<ffff20000827b404>] kasan_report_error+0x1b4/0x4d8
[  393.857323] [<ffff20000827bae8>] kasan_report+0x38/0x40
[  393.862548] [<ffff200008279d44>] __asan_store1+0x4c/0x58
[  393.867859] [<ffff2000084ce2ec>] jffs2_rtime_compress+0x214/0x2f0
[  393.873955] [<ffff2000084bb3b0>] jffs2_selected_compress+0x178/0x2a0
[  393.880308] [<ffff2000084bb530>] jffs2_compress+0x58/0x478
[  393.885796] [<ffff2000084c5b34>] jffs2_write_inode_range+0x13c/0x450
[  393.892150] [<ffff2000084be0b8>] jffs2_write_end+0x2a8/0x4a0
[  393.897811] [<ffff2000081f3008>] generic_perform_write+0x1c0/0x280
[  393.903990] [<ffff2000081f5074>] __generic_file_write_iter+0x1c4/0x228
[  393.910517] [<ffff2000081f5210>] generic_file_write_iter+0x138/0x288
[  393.916870] [<ffff20000829ec1c>] __vfs_write+0x1b4/0x238
[  393.922181] [<ffff20000829ff00>] vfs_write+0xd0/0x238
[  393.927232] [<ffff2000082a1ba8>] SyS_write+0xa0/0x110
[  393.932283] [<ffff20000808429c>] __sys_trace_return+0x0/0x4
[  393.937851] Object at ffff800062e3b280, in cache kmalloc-64 size: 64
[  393.944197] Allocated:
[  393.946552] PID = 2918
[  393.948913]  save_stack_trace_tsk+0x0/0x220
[  393.953096]  save_stack_trace+0x18/0x20
[  393.956932]  kasan_kmalloc+0xd8/0x188
[  393.960594]  __kmalloc+0x144/0x238
[  393.963994]  jffs2_selected_compress+0x48/0x2a0
[  393.968524]  jffs2_compress+0x58/0x478
[  393.972273]  jffs2_write_inode_range+0x13c/0x450
[  393.976889]  jffs2_write_end+0x2a8/0x4a0
[  393.980810]  generic_perform_write+0x1c0/0x280
[  393.985251]  __generic_file_write_iter+0x1c4/0x228
[  393.990040]  generic_file_write_iter+0x138/0x288
[  393.994655]  __vfs_write+0x1b4/0x238
[  393.998228]  vfs_write+0xd0/0x238
[  394.001543]  SyS_write+0xa0/0x110
[  394.004856]  __sys_trace_return+0x0/0x4
[  394.008684] Freed:
[  394.010691] PID = 2918
[  394.013051]  save_stack_trace_tsk+0x0/0x220
[  394.017233]  save_stack_trace+0x18/0x20
[  394.021069]  kasan_slab_free+0x88/0x188
[  394.024902]  kfree+0x6c/0x1d8
[  394.027868]  jffs2_sum_write_sumnode+0x2c4/0x880
[  394.032486]  jffs2_do_reserve_space+0x198/0x598
[  394.037016]  jffs2_reserve_space+0x3f8/0x4d8
[  394.041286]  jffs2_write_inode_range+0xf0/0x450
[  394.045816]  jffs2_write_end+0x2a8/0x4a0
[  394.049737]  generic_perform_write+0x1c0/0x280
[  394.054179]  __generic_file_write_iter+0x1c4/0x228
[  394.058968]  generic_file_write_iter+0x138/0x288
[  394.063583]  __vfs_write+0x1b4/0x238
[  394.067157]  vfs_write+0xd0/0x238
[  394.070470]  SyS_write+0xa0/0x110
[  394.073783]  __sys_trace_return+0x0/0x4
[  394.077612] Memory state around the buggy address:
[  394.082404]  ffff800062e3b180: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[  394.089623]  ffff800062e3b200: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[  394.096842] >ffff800062e3b280: 01 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  394.104056]                    ^
[  394.107283]  ffff800062e3b300: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[  394.114502]  ffff800062e3b380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[  394.121718] ==================================================================

Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: Joel Stanley <joel@jms.id.au>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jimbo77 pushed a commit that referenced this pull request Jun 4, 2021
…ad fails

commit 8e947c8f4a5620df77e43c9c75310dc510250166 upstream.

When loading a device-mapper table for a request-based mapped device,
and the allocation/initialization of the blk_mq_tag_set for the device
fails, a following device remove will cause a double free.

E.g. (dmesg):
  device-mapper: core: Cannot initialize queue for request-based dm-mq mapped device
  device-mapper: ioctl: unable to set up device queue for new table.
  Unable to handle kernel pointer dereference in virtual kernel address space
  Failing address: 0305e098835de000 TEID: 0305e098835de803
  Fault in home space mode while using kernel ASCE.
  AS:000000025efe0007 R3:0000000000000024
  Oops: 0038 ilc:3 [#1] SMP
  Modules linked in: ... lots of modules ...
  Supported: Yes, External
  CPU: 0 PID: 7348 Comm: multipathd Kdump: loaded Tainted: G        W      X    5.3.18-53-default #1 SLE15-SP3
  Hardware name: IBM 8561 T01 7I2 (LPAR)
  Krnl PSW : 0704e00180000000 000000025e368eca (kfree+0x42/0x330)
             R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
  Krnl GPRS: 000000000000004a 000000025efe5230 c1773200d779968d 0000000000000000
             000000025e520270 000000025e8d1b40 0000000000000003 00000007aae10000
             000000025e5202a2 0000000000000001 c1773200d779968d 0305e098835de640
             00000007a8170000 000003ff80138650 000000025e5202a2 000003e00396faa8
  Krnl Code: 000000025e368eb8: c4180041e100       lgrl    %r1,25eba50b8
             000000025e368ebe: ecba06b93a55       risbg   %r11,%r10,6,185,58
            #000000025e368ec4: e3b010000008       ag      %r11,0(%r1)
            >000000025e368eca: e310b0080004       lg      %r1,8(%r11)
             000000025e368ed0: a7110001           tmll    %r1,1
             000000025e368ed4: a7740129           brc     7,25e369126
             000000025e368ed8: e320b0080004       lg      %r2,8(%r11)
             000000025e368ede: b904001b           lgr     %r1,%r11
  Call Trace:
   [<000000025e368eca>] kfree+0x42/0x330
   [<000000025e5202a2>] blk_mq_free_tag_set+0x72/0xb8
   [<000003ff801316a8>] dm_mq_cleanup_mapped_device+0x38/0x50 [dm_mod]
   [<000003ff80120082>] free_dev+0x52/0xd0 [dm_mod]
   [<000003ff801233f0>] __dm_destroy+0x150/0x1d0 [dm_mod]
   [<000003ff8012bb9a>] dev_remove+0x162/0x1c0 [dm_mod]
   [<000003ff8012a988>] ctl_ioctl+0x198/0x478 [dm_mod]
   [<000003ff8012ac8a>] dm_ctl_ioctl+0x22/0x38 [dm_mod]
   [<000000025e3b11ee>] ksys_ioctl+0xbe/0xe0
   [<000000025e3b127a>] __s390x_sys_ioctl+0x2a/0x40
   [<000000025e8c15ac>] system_call+0xd8/0x2c8
  Last Breaking-Event-Address:
   [<000000025e52029c>] blk_mq_free_tag_set+0x6c/0xb8
  Kernel panic - not syncing: Fatal exception: panic_on_oops

When allocation/initialization of the blk_mq_tag_set fails in
dm_mq_init_request_queue(), it is uninitialized/freed, but the pointer
is not reset to NULL; so when dev_remove() later gets into
dm_mq_cleanup_mapped_device() it sees the pointer and tries to
uninitialize and free it again.

Fix this by setting the pointer to NULL in dm_mq_init_request_queue()
error-handling. Also set it to NULL in dm_mq_cleanup_mapped_device().

Cc: <stable@vger.kernel.org> # 4.6+
Fixes: 1c357a1 ("dm: allocate blk_mq_tag_set rather than embed in mapped_device")
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jimbo77 pushed a commit that referenced this pull request Jun 4, 2021
commit f7c7a2f9a23e5b6e0f5251f29648d0238bb7757e upstream.

md_kick_rdev_from_array will remove rdev, so we should
use rdev_for_each_safe to search list.

How to trigger:

env: Two nodes on kvm-qemu x86_64 VMs (2C2G with 2 iscsi luns).

```
node2=192.168.0.3

for i in {1..20}; do
    echo ==== $i `date` ====;

    mdadm -Ss && ssh ${node2} "mdadm -Ss"
    wipefs -a /dev/sda /dev/sdb

    mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l 1 /dev/sda \
       /dev/sdb --assume-clean
    ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
    mdadm --wait /dev/md0
    ssh ${node2} "mdadm --wait /dev/md0"

    mdadm --manage /dev/md0 --fail /dev/sda --remove /dev/sda
    sleep 1
done
```

Crash stack:

```
stack segment: 0000 [#1] SMP
... ...
RIP: 0010:md_check_recovery+0x1e8/0x570 [md_mod]
... ...
RSP: 0018:ffffb149807a7d68 EFLAGS: 00010207
RAX: 0000000000000000 RBX: ffff9d494c180800 RCX: ffff9d490fc01e50
RDX: fffff047c0ed8308 RSI: 0000000000000246 RDI: 0000000000000246
RBP: 6b6b6b6b6b6b6b6b R08: ffff9d490fc01e40 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: ffff9d494c180818 R14: ffff9d493399ef38 R15: ffff9d4933a1d800
FS:  0000000000000000(0000) GS:ffff9d494f700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe68cab9010 CR3: 000000004c6be001 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 raid1d+0x5c/0xd40 [raid1]
 ? finish_task_switch+0x75/0x2a0
 ? lock_timer_base+0x67/0x80
 ? try_to_del_timer_sync+0x4d/0x80
 ? del_timer_sync+0x41/0x50
 ? schedule_timeout+0x254/0x2d0
 ? md_start_sync+0xe0/0xe0 [md_mod]
 ? md_thread+0x127/0x160 [md_mod]
 md_thread+0x127/0x160 [md_mod]
 ? wait_woken+0x80/0x80
 kthread+0x10d/0x130
 ? kthread_park+0xa0/0xa0
 ret_from_fork+0x1f/0x40
```

Fixes: dbb64f8 ("md-cluster: Fix adding of new disk with new reload code")
Fixes: 659b254 ("md-cluster: remove a disk asynchronously from cluster environment")
Cc: stable@vger.kernel.org
Reviewed-by: Gang He <ghe@suse.com>
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jimbo77 pushed a commit that referenced this pull request Jun 4, 2021
[ Upstream commit b8b2f37cf632434456182e9002d63cbc4cccc50c ]

When adding a PTE a ptesync is needed to order the update of the PTE
with subsequent accesses otherwise a spurious fault may be raised.

radix__set_pte_at() does not do this for performance gains. For
non-kernel memory this is not an issue as any faults of this kind are
corrected by the page fault handler. For kernel memory these faults
are not handled. The current solution is that there is a ptesync in
flush_cache_vmap() which should be called when mapping from the
vmalloc region.

However, map_kernel_page() does not call flush_cache_vmap(). This is
troublesome in particular for code patching with Strict RWX on radix.
In do_patch_instruction() the page frame that contains the instruction
to be patched is mapped and then immediately patched. With no ordering
or synchronization between setting up the PTE and writing to the page
it is possible for faults.

As the code patching is done using __put_user_asm_goto() the resulting
fault is obscured - but using a normal store instead it can be seen:

  BUG: Unable to handle kernel data access on write at 0xc008000008f24a3c
  Faulting instruction address: 0xc00000000008bd74
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in: nop_module(PO+) [last unloaded: nop_module]
  CPU: 4 PID: 757 Comm: sh Tainted: P           O      5.10.0-rc5-01361-ge3c1b78c8440-dirty #43
  NIP:  c00000000008bd74 LR: c00000000008bd50 CTR: c000000000025810
  REGS: c000000016f634a0 TRAP: 0300   Tainted: P           O       (5.10.0-rc5-01361-ge3c1b78c8440-dirty)
  MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44002884  XER: 00000000
  CFAR: c00000000007c68c DAR: c008000008f24a3c DSISR: 42000000 IRQMASK: 1

This results in the kind of issue reported here:
  https://lore.kernel.org/linuxppc-dev/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@lca.pw/

Chris Riedl suggested a reliable way to reproduce the issue:
  $ mount -t debugfs none /sys/kernel/debug
  $ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done) &

Turning ftrace on and off does a large amount of code patching which
in usually less then 5min will crash giving a trace like:

   ftrace-powerpc: (____ptrval____): replaced (4b473b11) != old (60000000)
   ------------[ ftrace bug ]------------
   ftrace failed to modify
   [<c000000000bf8e5c>] napi_busy_loop+0xc/0x390
    actual:   11:3b:47:4b
   Setting ftrace call site to call ftrace function
   ftrace record flags: 80000001
    (1)
    expected tramp: c00000000006c96c
   ------------[ cut here ]------------
   WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 ftrace_bug+0x28c/0x2e8
   Modules linked in: nop_module(PO-) [last unloaded: nop_module]
   CPU: 4 PID: 809 Comm: sh Tainted: P           O      5.10.0-rc5-01360-gf878ccaf250a #1
   NIP:  c00000000024f334 LR: c00000000024f330 CTR: c0000000001a5af0
   REGS: c000000004c8b760 TRAP: 0700   Tainted: P           O       (5.10.0-rc5-01360-gf878ccaf250a)
   MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28008848  XER: 20040000
   CFAR: c0000000001a9c98 IRQMASK: 0
   GPR00: c00000000024f330 c000000004c8b9f0 c000000002770600 0000000000000022
   GPR04: 00000000ffff7fff c000000004c8b6d0 0000000000000027 c0000007fe9bcdd8
   GPR08: 0000000000000023 ffffffffffffffd8 0000000000000027 c000000002613118
   GPR12: 0000000000008000 c0000007fffdca00 0000000000000000 0000000000000000
   GPR16: 0000000023ec37c5 0000000000000000 0000000000000000 0000000000000008
   GPR20: c000000004c8bc90 c0000000027a2d20 c000000004c8bcd0 c000000002612fe8
   GPR24: 0000000000000038 0000000000000030 0000000000000028 0000000000000020
   GPR28: c000000000ff1b68 c000000000bf8e5c c00000000312f700 c000000000fbb9b0
   NIP ftrace_bug+0x28c/0x2e8
   LR  ftrace_bug+0x288/0x2e8
   Call Trace:
     ftrace_bug+0x288/0x2e8 (unreliable)
     ftrace_modify_all_code+0x168/0x210
     arch_ftrace_update_code+0x18/0x30
     ftrace_run_update_code+0x44/0xc0
     ftrace_startup+0xf8/0x1c0
     register_ftrace_function+0x4c/0xc0
     function_trace_init+0x80/0xb0
     tracing_set_tracer+0x2a4/0x4f0
     tracing_set_trace_write+0xd4/0x130
     vfs_write+0xf0/0x330
     ksys_write+0x84/0x140
     system_call_exception+0x14c/0x230
     system_call_common+0xf0/0x27c

To fix this when updating kernel memory PTEs using ptesync.

Fixes: f1cb8f9 ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags")
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Tidy up change log slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210208032957.1232102-1-jniethe5@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Jun 4, 2021
[ Upstream commit ed8029d7b472369a010a1901358567ca3b6dbb0d ]

RCU complains about us calling printk() from an offline CPU:

  =============================
  WARNING: suspicious RCU usage
  5.12.0-rc7-02874-g7cf90e481cb8 #1 Not tainted
  -----------------------------
  kernel/locking/lockdep.c:3568 RCU-list traversed in non-reader section!!

  other info that might help us debug this:

  RCU used illegally from offline CPU!
  rcu_scheduler_active = 2, debug_locks = 1
  no locks held by swapper/0/0.

  stack backtrace:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc7-02874-g7cf90e481cb8 #1
  Call Trace:
    dump_stack+0xec/0x144 (unreliable)
    lockdep_rcu_suspicious+0x124/0x144
    __lock_acquire+0x1098/0x28b0
    lock_acquire+0x128/0x600
    _raw_spin_lock_irqsave+0x6c/0xc0
    down_trylock+0x2c/0x70
    __down_trylock_console_sem+0x60/0x140
    vprintk_emit+0x1a8/0x4b0
    vprintk_func+0xcc/0x200
    printk+0x40/0x54
    pseries_cpu_offline_self+0xc0/0x120
    arch_cpu_idle_dead+0x54/0x70
    do_idle+0x174/0x4a0
    cpu_startup_entry+0x38/0x40
    rest_init+0x268/0x388
    start_kernel+0x748/0x790
    start_here_common+0x1c/0x614

Which happens because by the time we get to rtas_stop_self() we are
already offline. In addition the message can be spammy, and is not that
helpful for users, so remove it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210418135413.1204031-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Jun 4, 2021
[ Upstream commit 35b4f24415c854cd718ccdf38dbea6297f010aae ]

There's a panic that occurs in a few of envs, the call trace is as below:

  [] general protection fault, ... 0x29acd70f1000a: 0000 [#1] SMP PTI
  [] RIP: 0010:sctp_ulpevent_notify_peer_addr_change+0x4b/0x1fa [sctp]
  []  sctp_assoc_control_transport+0x1b9/0x210 [sctp]
  []  sctp_do_8_2_transport_strike.isra.16+0x15c/0x220 [sctp]
  []  sctp_cmd_interpreter.isra.21+0x1231/0x1a10 [sctp]
  []  sctp_do_sm+0xc3/0x2a0 [sctp]
  []  sctp_generate_timeout_event+0x81/0xf0 [sctp]

This is caused by a transport use-after-free issue. When processing a
duplicate COOKIE-ECHO chunk in sctp_sf_do_dupcook_a(), both COOKIE-ACK
and SHUTDOWN chunks are allocated with the transort from the new asoc.
However, later in the sideeffect machine, the old asoc is used to send
them out and old asoc's shutdown_last_sent_to is set to the transport
that SHUTDOWN chunk attached to in sctp_cmd_setup_t2(), which actually
belongs to the new asoc. After the new_asoc is freed and the old asoc
T2 timeout, the old asoc's shutdown_last_sent_to that is already freed
would be accessed in sctp_sf_t2_timer_expire().

Thanks Alexander and Jere for helping dig into this issue.

To fix it, this patch is to do the asoc update first, then allocate
the COOKIE-ACK and SHUTDOWN chunks with the 'updated' old asoc. This
would make more sense, as a chunk from an asoc shouldn't be sent out
with another asoc. We had fixed quite a few issues caused by this.

Fixes: 145cb2f7177d ("sctp: Fix bundling of SHUTDOWN with COOKIE-ACK")
Reported-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Reported-by: syzbot+bbe538efd1046586f587@syzkaller.appspotmail.com
Reported-by: Michal Tesar <mtesar@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Jun 4, 2021
[ Upstream commit 5bbf219328849e83878bddb7c226d8d42e84affc ]

An out of bounds write happens when setting the default power state.
KASAN sees this as:

[drm] radeon: 512M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
==================================================================
BUG: KASAN: slab-out-of-bounds in
radeon_atombios_parse_power_table_1_3+0x1837/0x1998 [radeon]
Write of size 4 at addr ffff88810178d858 by task systemd-udevd/157

CPU: 0 PID: 157 Comm: systemd-udevd Not tainted 5.12.0-E620 #50
Hardware name: eMachines        eMachines E620  /Nile       , BIOS V1.03 09/30/2008
Call Trace:
 dump_stack+0xa5/0xe6
 print_address_description.constprop.0+0x18/0x239
 kasan_report+0x170/0x1a8
 radeon_atombios_parse_power_table_1_3+0x1837/0x1998 [radeon]
 radeon_atombios_get_power_modes+0x144/0x1888 [radeon]
 radeon_pm_init+0x1019/0x1904 [radeon]
 rs690_init+0x76e/0x84a [radeon]
 radeon_device_init+0x1c1a/0x21e5 [radeon]
 radeon_driver_load_kms+0xf5/0x30b [radeon]
 drm_dev_register+0x255/0x4a0 [drm]
 radeon_pci_probe+0x246/0x2f6 [radeon]
 pci_device_probe+0x1aa/0x294
 really_probe+0x30e/0x850
 driver_probe_device+0xe6/0x135
 device_driver_attach+0xc1/0xf8
 __driver_attach+0x13f/0x146
 bus_for_each_dev+0xfa/0x146
 bus_add_driver+0x2b3/0x447
 driver_register+0x242/0x2c1
 do_one_initcall+0x149/0x2fd
 do_init_module+0x1ae/0x573
 load_module+0x4dee/0x5cca
 __do_sys_finit_module+0xf1/0x140
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Without KASAN, this will manifest later when the kernel attempts to
allocate memory that was stomped, since it collides with the inline slab
freelist pointer:

invalid opcode: 0000 [#1] SMP NOPTI
CPU: 0 PID: 781 Comm: openrc-run.sh Tainted: G        W 5.10.12-gentoo-E620 #2
Hardware name: eMachines        eMachines E620  /Nile , BIOS V1.03       09/30/2008
RIP: 0010:kfree+0x115/0x230
Code: 89 c5 e8 75 ea ff ff 48 8b 00 0f ba e0 09 72 63 e8 1f f4 ff ff 41 89 c4 48 8b 45 00 0f ba e0 10 72 0a 48 8b 45 08 a8 01 75 02 <0f> 0b 44 89 e1 48 c7 c2 00 f0 ff ff be 06 00 00 00 48 d3 e2 48 c7
RSP: 0018:ffffb42f40267e10 EFLAGS: 00010246
RAX: ffffd61280ee8d88 RBX: 0000000000000004 RCX: 000000008010000d
RDX: 4000000000000000 RSI: ffffffffba1360b0 RDI: ffffd61280ee8d80
RBP: ffffd61280ee8d80 R08: ffffffffb91bebdf R09: 0000000000000000
R10: ffff8fe2c1047ac8 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000100
FS:  00007fe80eff6b68(0000) GS:ffff8fe339c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe80eec7bc0 CR3: 0000000038012000 CR4: 00000000000006f0
Call Trace:
 __free_fdtable+0x16/0x1f
 put_files_struct+0x81/0x9b
 do_exit+0x433/0x94d
 do_group_exit+0xa6/0xa6
 __x64_sys_exit_group+0xf/0xf
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fe80ef64bea
Code: Unable to access opcode bytes at RIP 0x7fe80ef64bc0.
RSP: 002b:00007ffdb1c47528 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fe80ef64bea
RDX: 00007fe80ef64f60 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 00007fe80ee2c620 R11: 0000000000000246 R12: 00007fe80eff41e0
R13: 00000000ffffffff R14: 0000000000000024 R15: 00007fe80edf9cd0
Modules linked in: radeon(+) ath5k(+) snd_hda_codec_realtek ...

Use a valid power_state index when initializing the "flags" and "misc"
and "misc2" fields.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211537
Reported-by: Erhard F. <erhard_f@mailbox.org>
Fixes: a48b9b4 ("drm/radeon/kms/pm: add asic specific callbacks for getting power state (v2)")
Fixes: 79daedc ("drm/radeon/kms: minor pm cleanups")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Jun 4, 2021
commit 016002848c82eeb5d460489ce392d91fe18c475c upstream.

In 'rp2_probe', the driver registers 'rp2_uart_interrupt' then calls
'rp2_fw_cb' through 'request_firmware_nowait'. In 'rp2_fw_cb', if the
firmware don't exists, function just return without initializing ports
of 'rp2_card'. But now the interrupt handler function has been
registered, and when an interrupt comes, 'rp2_uart_interrupt' may access
those ports then causing NULL pointer dereference or other bugs.

Because the driver does some initialization work in 'rp2_fw_cb', in
order to make the driver ready to handle interrupts, 'request_firmware'
should be used instead of asynchronous 'request_firmware_nowait'.

This report reveals it:

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.177-gdba4159c14ef-dirty #45
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-
gc9ba5276e321-prebuilt.qemu.org 04/01/2014
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xec/0x156 lib/dump_stack.c:118
 assign_lock_key kernel/locking/lockdep.c:727 [inline]
 register_lock_class+0x14e5/0x1ba0 kernel/locking/lockdep.c:753
 __lock_acquire+0x187/0x3750 kernel/locking/lockdep.c:3303
 lock_acquire+0x124/0x340 kernel/locking/lockdep.c:3907
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x32/0x50 kernel/locking/spinlock.c:144
 spin_lock include/linux/spinlock.h:329 [inline]
 rp2_ch_interrupt drivers/tty/serial/rp2.c:466 [inline]
 rp2_asic_interrupt.isra.9+0x15d/0x990 drivers/tty/serial/rp2.c:493
 rp2_uart_interrupt+0x49/0xe0 drivers/tty/serial/rp2.c:504
 __handle_irq_event_percpu+0xfb/0x770 kernel/irq/handle.c:149
 handle_irq_event_percpu+0x79/0x150 kernel/irq/handle.c:189
 handle_irq_event+0xac/0x140 kernel/irq/handle.c:206
 handle_fasteoi_irq+0x232/0x5c0 kernel/irq/chip.c:725
 generic_handle_irq_desc include/linux/irqdesc.h:155 [inline]
 handle_irq+0x230/0x3a0 arch/x86/kernel/irq_64.c:87
 do_IRQ+0xa7/0x1e0 arch/x86/kernel/irq.c:247
 common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:670
 </IRQ>
RIP: 0010:native_safe_halt+0x28/0x30 arch/x86/include/asm/irqflags.h:61
Code: 00 00 55 be 04 00 00 00 48 c7 c7 00 c2 2f 8c 48 89 e5 e8 fb 31 e7 f8
8b 05 75 af 8d 03 85 c0 7e 07 0f 00 2d 8a 61 65 00 fb f4 <5d> c3 90 90 90
90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41
RSP: 0018:ffff88806b71fcc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
RAX: 0000000000000000 RBX: ffffffff8bde7e48 RCX: ffffffff88a21285
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff8c2fc200
RBP: ffff88806b71fcc8 R08: fffffbfff185f840 R09: fffffbfff185f840
R10: 0000000000000001 R11: fffffbfff185f840 R12: 0000000000000002
R13: ffffffff8bea18a0 R14: 0000000000000000 R15: 0000000000000000
 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
 default_idle+0x6f/0x360 arch/x86/kernel/process.c:557
 arch_cpu_idle+0xf/0x20 arch/x86/kernel/process.c:548
 default_idle_call+0x3b/0x60 kernel/sched/idle.c:93
 cpuidle_idle_call kernel/sched/idle.c:153 [inline]
 do_idle+0x2ab/0x3c0 kernel/sched/idle.c:263
 cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:369
 start_secondary+0x3b8/0x4e0 arch/x86/kernel/smpboot.c:271
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
PGD 8000000056d27067 P4D 8000000056d27067 PUD 56d28067 PMD 0
Oops: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.177-gdba4159c14ef-dirty #45
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-
gc9ba5276e321-prebuilt.qemu.org 04/01/2014
RIP: 0010:readl arch/x86/include/asm/io.h:59 [inline]
RIP: 0010:rp2_ch_interrupt drivers/tty/serial/rp2.c:472 [inline]
RIP: 0010:rp2_asic_interrupt.isra.9+0x181/0x990 drivers/tty/serial/rp2.c:
493
Code: df e8 43 5d c2 05 48 8d 83 e8 01 00 00 48 89 85 60 ff ff ff 48 c1 e8
03 42 80 3c 30 00 0f 85 aa 07 00 00 48 8b 83 e8 01 00 00 <8b> 40 10 89 c1
89 85 68 ff ff ff 48 8b 83 e8 01 00 00 89 48 10 83
RSP: 0018:ffff88806c287cd0 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff88806ade6820 RCX: ffffffff814300b1
RDX: 1ffff1100d5bcd06 RSI: 0000000000000004 RDI: ffff88806ade6820
RBP: ffff88806c287db8 R08: ffffed100d5bcd05 R09: ffffed100d5bcd05
R10: 0000000000000001 R11: ffffed100d5bcd04 R12: ffffc90001e00000
R13: ffff888069654e10 R14: dffffc0000000000 R15: ffff888069654df0
FS:  0000000000000000(0000) GS:ffff88806c280000(0000) knlGS:
0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000006892c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>
 rp2_uart_interrupt+0x49/0xe0 drivers/tty/serial/rp2.c:504
 __handle_irq_event_percpu+0xfb/0x770 kernel/irq/handle.c:149
 handle_irq_event_percpu+0x79/0x150 kernel/irq/handle.c:189
 handle_irq_event+0xac/0x140 kernel/irq/handle.c:206
 handle_fasteoi_irq+0x232/0x5c0 kernel/irq/chip.c:725
 generic_handle_irq_desc include/linux/irqdesc.h:155 [inline]
 handle_irq+0x230/0x3a0 arch/x86/kernel/irq_64.c:87
 do_IRQ+0xa7/0x1e0 arch/x86/kernel/irq.c:247
 common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:670
 </IRQ>
RIP: 0010:native_safe_halt+0x28/0x30 arch/x86/include/asm/irqflags.h:61
Code: 00 00 55 be 04 00 00 00 48 c7 c7 00 c2 2f 8c 48 89 e5 e8 fb 31 e7
f8 8b 05 75 af 8d 03 85 c0 7e 07 0f 00 2d 8a 61 65 00 fb f4 <5d> c3 90
90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41
RSP: 0018:ffff88806b71fcc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
RAX: 0000000000000000 RBX: ffffffff8bde7e48 RCX: ffffffff88a21285
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff8c2fc200
RBP: ffff88806b71fcc8 R08: fffffbfff185f840 R09: fffffbfff185f840
R10: 0000000000000001 R11: fffffbfff185f840 R12: 0000000000000002
R13: ffffffff8bea18a0 R14: 0000000000000000 R15: 0000000000000000
 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
 default_idle+0x6f/0x360 arch/x86/kernel/process.c:557
 arch_cpu_idle+0xf/0x20 arch/x86/kernel/process.c:548
 default_idle_call+0x3b/0x60 kernel/sched/idle.c:93
 cpuidle_idle_call kernel/sched/idle.c:153 [inline]
 do_idle+0x2ab/0x3c0 kernel/sched/idle.c:263
 cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:369
 start_secondary+0x3b8/0x4e0 arch/x86/kernel/smpboot.c:271
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
Modules linked in:
Dumping ftrace buffer:
   (ftrace buffer empty)
CR2: 0000000000000010
---[ end trace 11804dbb55cb1a64 ]---
RIP: 0010:readl arch/x86/include/asm/io.h:59 [inline]
RIP: 0010:rp2_ch_interrupt drivers/tty/serial/rp2.c:472 [inline]
RIP: 0010:rp2_asic_interrupt.isra.9+0x181/0x990 drivers/tty/serial/rp2.c:
493
Code: df e8 43 5d c2 05 48 8d 83 e8 01 00 00 48 89 85 60 ff ff ff 48 c1
e8 03 42 80 3c 30 00 0f 85 aa 07 00 00 48 8b 83 e8 01 00 00 <8b> 40 10 89
c1 89 85 68 ff ff ff 48 8b 83 e8 01 00 00 89 48 10 83
RSP: 0018:ffff88806c287cd0 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff88806ade6820 RCX: ffffffff814300b1
RDX: 1ffff1100d5bcd06 RSI: 0000000000000004 RDI: ffff88806ade6820
RBP: ffff88806c287db8 R08: ffffed100d5bcd05 R09: ffffed100d5bcd05
R10: 0000000000000001 R11: ffffed100d5bcd04 R12: ffffc90001e00000
R13: ffff888069654e10 R14: dffffc0000000000 R15: ffff888069654df0
FS:  0000000000000000(0000) GS:ffff88806c280000(0000) knlGS:
0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000006892c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Reported-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Link: https://lore.kernel.org/r/1621577323-1541-1-git-send-email-zheyuma97@gmail.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jimbo77 pushed a commit that referenced this pull request Jun 4, 2021
[ Upstream commit 91df99a6eb50d5a1bc70fff4a09a0b7ae6aab96d ]

While doing error injection testing I got the following panic

  kernel BUG at fs/btrfs/tree-log.c:1862!
  invalid opcode: 0000 [#1] SMP NOPTI
  CPU: 1 PID: 7836 Comm: mount Not tainted 5.13.0-rc1+ #305
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  RIP: 0010:link_to_fixup_dir+0xd5/0xe0
  RSP: 0018:ffffb5800180fa30 EFLAGS: 00010216
  RAX: fffffffffffffffb RBX: 00000000fffffffb RCX: ffff8f595287faf0
  RDX: ffffb5800180fa37 RSI: ffff8f5954978800 RDI: 0000000000000000
  RBP: ffff8f5953af9450 R08: 0000000000000019 R09: 0000000000000001
  R10: 000151f408682970 R11: 0000000120021001 R12: ffff8f5954978800
  R13: ffff8f595287faf0 R14: ffff8f5953c77dd0 R15: 0000000000000065
  FS:  00007fc5284c8c40(0000) GS:ffff8f59bbd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fc5287f47c0 CR3: 000000011275e002 CR4: 0000000000370ee0
  Call Trace:
   replay_one_buffer+0x409/0x470
   ? btree_read_extent_buffer_pages+0xd0/0x110
   walk_up_log_tree+0x157/0x1e0
   walk_log_tree+0xa6/0x1d0
   btrfs_recover_log_trees+0x1da/0x360
   ? replay_one_extent+0x7b0/0x7b0
   open_ctree+0x1486/0x1720
   btrfs_mount_root.cold+0x12/0xea
   ? __kmalloc_track_caller+0x12f/0x240
   legacy_get_tree+0x24/0x40
   vfs_get_tree+0x22/0xb0
   vfs_kern_mount.part.0+0x71/0xb0
   btrfs_mount+0x10d/0x380
   ? vfs_parse_fs_string+0x4d/0x90
   legacy_get_tree+0x24/0x40
   vfs_get_tree+0x22/0xb0
   path_mount+0x433/0xa10
   __x64_sys_mount+0xe3/0x120
   do_syscall_64+0x3d/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae

We can get -EIO or any number of legitimate errors from
btrfs_search_slot(), panicing here is not the appropriate response.  The
error path for this code handles errors properly, simply return the
error.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Jun 4, 2021
[ Upstream commit 1e5c37385097c35911b0f8a0c67ffd10ee1af9a2 ]

looks like we forget to set ttm->sg to NULL.
Hit panic below

[ 1235.844104] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b7b4b: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
[ 1235.989074] Call Trace:
[ 1235.991751]  sg_free_table+0x17/0x20
[ 1235.995667]  amdgpu_ttm_backend_unbind.cold+0x4d/0xf7 [amdgpu]
[ 1236.002288]  amdgpu_ttm_backend_destroy+0x29/0x130 [amdgpu]
[ 1236.008464]  ttm_tt_destroy+0x1e/0x30 [ttm]
[ 1236.013066]  ttm_bo_cleanup_memtype_use+0x51/0xa0 [ttm]
[ 1236.018783]  ttm_bo_release+0x262/0xa50 [ttm]
[ 1236.023547]  ttm_bo_put+0x82/0xd0 [ttm]
[ 1236.027766]  amdgpu_bo_unref+0x26/0x50 [amdgpu]
[ 1236.032809]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x7aa/0xd90 [amdgpu]
[ 1236.040400]  kfd_ioctl_alloc_memory_of_gpu+0xe2/0x330 [amdgpu]
[ 1236.046912]  kfd_ioctl+0x463/0x690 [amdgpu]

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jimbo77 pushed a commit that referenced this pull request Jun 4, 2021
[ Upstream commit 020ef930b826d21c5446fdc9db80fd72a791bc21 ]

mld_newpack() doesn't allow to allocate high order page,
only order-0 allocation is allowed.
If headroom size is too large, a kernel panic could occur in skb_put().

Test commands:
    ip netns del A
    ip netns del B
    ip netns add A
    ip netns add B
    ip link add veth0 type veth peer name veth1
    ip link set veth0 netns A
    ip link set veth1 netns B

    ip netns exec A ip link set lo up
    ip netns exec A ip link set veth0 up
    ip netns exec A ip -6 a a 2001:db8:0::1/64 dev veth0
    ip netns exec B ip link set lo up
    ip netns exec B ip link set veth1 up
    ip netns exec B ip -6 a a 2001:db8:0::2/64 dev veth1
    for i in {1..99}
    do
        let A=$i-1
        ip netns exec A ip link add ip6gre$i type ip6gre \
	local 2001:db8:$A::1 remote 2001:db8:$A::2 encaplimit 100
        ip netns exec A ip -6 a a 2001:db8:$i::1/64 dev ip6gre$i
        ip netns exec A ip link set ip6gre$i up

        ip netns exec B ip link add ip6gre$i type ip6gre \
	local 2001:db8:$A::2 remote 2001:db8:$A::1 encaplimit 100
        ip netns exec B ip -6 a a 2001:db8:$i::2/64 dev ip6gre$i
        ip netns exec B ip link set ip6gre$i up
    done

Splat looks like:
kernel BUG at net/core/skbuff.c:110!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.12.0+ #891
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:skb_panic+0x15d/0x15f
Code: 92 fe 4c 8b 4c 24 10 53 8b 4d 70 45 89 e0 48 c7 c7 00 ae 79 83
41 57 41 56 41 55 48 8b 54 24 a6 26 f9 ff <0f> 0b 48 8b 6c 24 20 89
34 24 e8 4a 4e 92 fe 8b 34 24 48 c7 c1 20
RSP: 0018:ffff88810091f820 EFLAGS: 00010282
RAX: 0000000000000089 RBX: ffff8881086e9000 RCX: 0000000000000000
RDX: 0000000000000089 RSI: 0000000000000008 RDI: ffffed1020123efb
RBP: ffff888005f6eac0 R08: ffffed1022fc0031 R09: ffffed1022fc0031
R10: ffff888117e00187 R11: ffffed1022fc0030 R12: 0000000000000028
R13: ffff888008284eb0 R14: 0000000000000ed8 R15: 0000000000000ec0
FS:  0000000000000000(0000) GS:ffff888117c00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8b801c5640 CR3: 0000000033c2c006 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
 ? ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
 skb_put.cold.104+0x22/0x22
 ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
 ? rcu_read_lock_sched_held+0x91/0xc0
 mld_newpack+0x398/0x8f0
 ? ip6_mc_hdr.isra.26.constprop.46+0x600/0x600
 ? lock_contended+0xc40/0xc40
 add_grhead.isra.33+0x280/0x380
 add_grec+0x5ca/0xff0
 ? mld_sendpack+0xf40/0xf40
 ? lock_downgrade+0x690/0x690
 mld_send_initial_cr.part.34+0xb9/0x180
 ipv6_mc_dad_complete+0x15d/0x1b0
 addrconf_dad_completed+0x8d2/0xbb0
 ? lock_downgrade+0x690/0x690
 ? addrconf_rs_timer+0x660/0x660
 ? addrconf_dad_work+0x73c/0x10e0
 addrconf_dad_work+0x73c/0x10e0

Allowing high order page allocation could fix this problem.

Fixes: 72e09ad ("ipv6: avoid high order allocations")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.