Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kmemleak: 6 new suspected memory leaks in -net #319

Closed
matttbe opened this issue Dec 1, 2022 · 3 comments
Closed

kmemleak: 6 new suspected memory leaks in -net #319

matttbe opened this issue Dec 1, 2022 · 3 comments
Labels

Comments

@matttbe
Copy link
Member

matttbe commented Dec 1, 2022

This morning, the CI found 6 suspected memory leaks when validating export-net/20221201T055227 (commit a864fdf)

https://cirrus-ci.com/task/5950451166740480

unreferenced object 0xffff888015d22800 (size 512):
  comm "ip", pid 28924, jiffies 4299672355 (age 84.902s)
  hex dump (first 32 bytes):
    00 c0 60 0a 80 88 ff ff 00 00 00 00 00 00 00 00  ..`.............
    00 00 00 00 00 00 00 00 ea ff ff ff ff ff ff ff  ................
  backtrace:
    __kmalloc (include/linux/kasan.h:211) 
    __register_sysctl_table (include/linux/slab.h:558) 
    mptcp_net_init (net/mptcp/ctrl.c:154) 
    ops_init (net/core/net_namespace.c:135) 
    setup_net (net/core/net_namespace.c:332) 
    copy_net_ns (net/core/net_namespace.c:480) 
    create_new_namespaces (kernel/nsproxy.c:110) 
    unshare_nsproxy_namespaces (kernel/nsproxy.c:226 (discriminator 4)) 
    ksys_unshare (kernel/fork.c:3188) 
    __x64_sys_unshare (kernel/fork.c:3257) 
    do_syscall_64 (arch/x86/entry/common.c:50) 
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 

unreferenced object 0xffff88800ef05c00 (size 256):
  comm "ip", pid 28924, jiffies 4299672355 (age 84.902s)
  hex dump (first 32 bytes):
    78 5c f0 0e 80 88 ff ff 00 00 00 00 00 00 00 00  x..............
    00 00 00 00 00 00 00 00 ea ff ff ff ff ff ff ff  ................
  backtrace:
    __kmalloc (include/linux/kasan.h:211) 
    __register_sysctl_table (fs/proc/proc_sysctl.c:974) 
    mptcp_net_init (net/mptcp/ctrl.c:154) 
    ops_init (net/core/net_namespace.c:135) 
    setup_net (net/core/net_namespace.c:332) 
    copy_net_ns (net/core/net_namespace.c:480) 
    create_new_namespaces (kernel/nsproxy.c:110) 
    unshare_nsproxy_namespaces (kernel/nsproxy.c:226 (discriminator 4)) 
    ksys_unshare (kernel/fork.c:3188) 
    __x64_sys_unshare (kernel/fork.c:3257) 
    do_syscall_64 (arch/x86/entry/common.c:50) 
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 

unreferenced object 0xffff888007b49f80 (size 64):
  comm "ip", pid 28924, jiffies 4299672355 (age 84.902s)
  hex dump (first 32 bytes):
    80 00 00 00 00 00 00 00 80 b5 99 0d 80 88 ff ff  ................
    22 01 00 00 00 00 ad de 60 a7 31 92 ff ff ff ff  ".......`.1.....
  backtrace:
    __kmalloc_node_track_caller (include/linux/kasan.h:211) 
    kmemdup (mm/util.c:129) 
    fib_notifier_ops_register (net/core/fib_notifier.c:149) 
    ipmr_net_init (net/ipv4/ipmr.c:3049) 
    ops_init (net/core/net_namespace.c:135) 
    setup_net (net/core/net_namespace.c:332) 
    copy_net_ns (net/core/net_namespace.c:480) 
    create_new_namespaces (kernel/nsproxy.c:110) 
    unshare_nsproxy_namespaces (kernel/nsproxy.c:226 (discriminator 4)) 
    ksys_unshare (kernel/fork.c:3188) 
    __x64_sys_unshare (kernel/fork.c:3257) 
    do_syscall_64 (arch/x86/entry/common.c:50) 
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 

unreferenced object 0xffff88800ef05e00 (size 256):
  comm "ip", pid 28924, jiffies 4299672356 (age 84.901s)
  hex dump (first 32 bytes):
    00 38 d2 15 80 88 ff ff 00 00 00 00 00 00 00 00  .8..............
    00 00 00 00 00 00 00 00 ea ff ff ff ff ff ff ff  ................
  backtrace:
    __kmalloc (include/linux/kasan.h:211) 
    __register_sysctl_table (include/linux/slab.h:558) 
    ipv4_frags_init_net (net/ipv4/ip_fragment.c:612) 
    ops_init (net/core/net_namespace.c:135) 
    setup_net (net/core/net_namespace.c:332) 
    copy_net_ns (net/core/net_namespace.c:480) 
    create_new_namespaces (kernel/nsproxy.c:110) 
    unshare_nsproxy_namespaces (kernel/nsproxy.c:226 (discriminator 4)) 
    ksys_unshare (kernel/fork.c:3188) 
    __x64_sys_unshare (kernel/fork.c:3257) 
    do_syscall_64 (arch/x86/entry/common.c:50) 
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 

unreferenced object 0xffff8880035ae900 (size 128):
  comm "ip", pid 28924, jiffies 4299672356 (age 84.911s)
  hex dump (first 32 bytes):
    00 ea 5a 03 80 88 ff ff 00 00 00 00 00 00 00 00  ..Z.............
    00 00 00 00 00 00 00 00 ea ff ff ff ff ff ff ff  ................
  backtrace:
    __kmalloc (include/linux/kasan.h:211) 
    __register_sysctl_table (include/linux/slab.h:558) 
    unix_sysctl_register (net/unix/sysctl_net_unix.c:39) 
    unix_net_init (net/unix/af_unix.c:3597) 
    ops_init (net/core/net_namespace.c:135) 
    setup_net (net/core/net_namespace.c:332) 
    copy_net_ns (net/core/net_namespace.c:480) 
    create_new_namespaces (kernel/nsproxy.c:110) 
    unshare_nsproxy_namespaces (kernel/nsproxy.c:226 (discriminator 4)) 
    ksys_unshare (kernel/fork.c:3188) 
    __x64_sys_unshare (kernel/fork.c:3257) 
    do_syscall_64 (arch/x86/entry/common.c:50) 
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 

unreferenced object 0xffff88800ef04e00 (size 256):
  comm "ip", pid 28924, jiffies 4299672356 (age 84.911s)
  hex dump (first 32 bytes):
    78 4e f0 0e 80 88 ff ff 00 00 00 00 00 00 00 00  xN..............
    00 00 00 00 00 00 00 00 ea ff ff ff ff ff ff ff  ................
  backtrace:
    __kmalloc (include/linux/kasan.h:211) 
    __register_sysctl_table (fs/proc/proc_sysctl.c:974) 
    unix_sysctl_register (net/unix/sysctl_net_unix.c:39) 
    unix_net_init (net/unix/af_unix.c:3597) 
    ops_init (net/core/net_namespace.c:135) 
    setup_net (net/core/net_namespace.c:332) 
    copy_net_ns (net/core/net_namespace.c:480) 
    create_new_namespaces (kernel/nsproxy.c:110) 
    unshare_nsproxy_namespaces (kernel/nsproxy.c:226 (discriminator 4)) 
    ksys_unshare (kernel/fork.c:3188) 
    __x64_sys_unshare (kernel/fork.c:3257) 
    do_syscall_64 (arch/x86/entry/common.c:50) 
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) 
@matttbe matttbe added the bug label Dec 1, 2022
@matttbe
Copy link
Member Author

matttbe commented Dec 1, 2022

The only diff compared to yesterday (export-net/20221130T171638...export-net/20221201T055227):

  • 421f866 net: broadcom: Add PTP_1588_CLOCK_OPTIONAL dependency for BCMGENET under ARCH_BCM2835

Not related.

@matttbe
Copy link
Member Author

matttbe commented Dec 1, 2022

As discussed on IRC with @pabeni and because the mentioned code in MPTCP didn't change recently (especially in -net), the issue is very likely outside MPTCP code.

Yet, nobody else reported a similar issue on netdev.

@matttbe
Copy link
Member Author

matttbe commented Dec 20, 2022

We didn't manage to reproduce it and it looks like it is not due to MPTCP.
We can close this ticket

@matttbe matttbe closed this as completed Dec 20, 2022
matttbe pushed a commit that referenced this issue Mar 27, 2024
In case when is64 == 1 in emit(A64_REV32(is64, dst, dst), ctx) the
generated insn reverses byte order for both high and low 32-bit words,
resuling in an incorrect swap as indicated by the jit test:

[ 9757.262607] test_bpf: #312 BSWAP 16: 0x0123456789abcdef -> 0xefcd jited:1 8 PASS
[ 9757.264435] test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 ret 1460850314 != -271733879 (0x5712ce8a != 0xefcdab89)FAIL (1 times)
[ 9757.266260] test_bpf: #314 BSWAP 64: 0x0123456789abcdef -> 0x67452301 jited:1 8 PASS
[ 9757.268000] test_bpf: #315 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 jited:1 8 PASS
[ 9757.269686] test_bpf: #316 BSWAP 16: 0xfedcba9876543210 -> 0x1032 jited:1 8 PASS
[ 9757.271380] test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 ret -1460850316 != 271733878 (0xa8ed3174 != 0x10325476)FAIL (1 times)
[ 9757.273022] test_bpf: #318 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe jited:1 7 PASS
[ 9757.274721] test_bpf: #319 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 jited:1 9 PASS

Fix this by forcing 32bit variant of rev32.

Fixes: 1104247 ("bpf, arm64: Support unconditional bswap")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Tested-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Message-ID: <20240321081809.158803-1-asavkov@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
matttbe pushed a commit that referenced this issue May 20, 2024
Recent additions in BPF like cpu v4 instructions, test_bpf module
exhibits the following failures:

  test_bpf: #82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)

  test_bpf: #165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
  test_bpf: #166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)

  test_bpf: #169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
  test_bpf: #170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: #172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: #313 BSWAP 16: 0x0123456789abcdef -> 0xefcd
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 301 PASS
  test_bpf: #314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 555 PASS
  test_bpf: #315 BSWAP 64: 0x0123456789abcdef -> 0x67452301
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 268 PASS
  test_bpf: #316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 269 PASS
  test_bpf: #317 BSWAP 16: 0xfedcba9876543210 -> 0x1032
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 460 PASS
  test_bpf: #318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 320 PASS
  test_bpf: #319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 222 PASS
  test_bpf: #320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 273 PASS

  test_bpf: #344 BPF_LDX_MEMSX | BPF_B
  eBPF filter opcode 0091 (@5) unsupported
  jited:0 432 PASS
  test_bpf: #345 BPF_LDX_MEMSX | BPF_H
  eBPF filter opcode 0089 (@5) unsupported
  jited:0 381 PASS
  test_bpf: #346 BPF_LDX_MEMSX | BPF_W
  eBPF filter opcode 0081 (@5) unsupported
  jited:0 505 PASS

  test_bpf: #490 JMP32_JA: Unconditional jump: if (true) return 1
  eBPF filter opcode 0006 (@1) unsupported
  jited:0 261 PASS

  test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed]

Fix them by adding missing processing.

Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu
matttbe pushed a commit that referenced this issue Jun 3, 2024
When the following snippet is run, lockdep will report a deadlock[1].

  /* Acquire all queues dim_locks */
  for (i = 0; i < vi->max_queue_pairs; i++)
          mutex_lock(&vi->rq[i].dim_lock);

There's no deadlock here because the vq locks are always taken
in the same order, but lockdep can not figure it out. So refactoring
the code to alleviate the problem.

[1]
========================================================
WARNING: possible recursive locking detected
6.9.0-rc7+ #319 Not tainted
--------------------------------------------
ethtool/962 is trying to acquire lock:

but task is already holding lock:

other info that might help us debug this:
Possible unsafe locking scenario:

      CPU0
      ----
 lock(&vi->rq[i].dim_lock);
 lock(&vi->rq[i].dim_lock);

*** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by ethtool/962:
 #0: ffffffff82dbaab0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
 #1: ffffffff82dad0a8 (rtnl_mutex){+.+.}-{3:3}, at:
				ethnl_default_set_doit+0xbe/0x1e0

stack backtrace:
CPU: 6 PID: 962 Comm: ethtool Not tainted 6.9.0-rc7+ #319
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
	   rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x79/0xb0
 check_deadlock+0x130/0x220
 __lock_acquire+0x861/0x990
 lock_acquire.part.0+0x72/0x1d0
 ? lock_acquire+0xf8/0x130
 __mutex_lock+0x71/0xd50
 virtnet_set_coalesce+0x151/0x190
 __ethnl_set_coalesce.isra.0+0x3f8/0x4d0
 ethnl_set_coalesce+0x34/0x90
 ethnl_default_set_doit+0xdd/0x1e0
 genl_family_rcv_msg_doit+0xdc/0x130
 genl_family_rcv_msg+0x154/0x230
 ? __pfx_ethnl_default_set_doit+0x10/0x10
 genl_rcv_msg+0x4b/0xa0
 ? __pfx_genl_rcv_msg+0x10/0x10
 netlink_rcv_skb+0x5a/0x110
 genl_rcv+0x28/0x40
 netlink_unicast+0x1af/0x280
 netlink_sendmsg+0x20e/0x460
 __sys_sendto+0x1fe/0x210
 ? find_held_lock+0x2b/0x80
 ? do_user_addr_fault+0x3a2/0x8a0
 ? __lock_release+0x5e/0x160
 ? do_user_addr_fault+0x3a2/0x8a0
 ? lock_release+0x72/0x140
 ? do_user_addr_fault+0x3a7/0x8a0
 __x64_sys_sendto+0x29/0x30
 do_syscall_64+0x78/0x180
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 4d4ac2e ("virtio_net: Add a lock for per queue RX coalesce")
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20240528134116.117426-3-hengqi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
matttbe pushed a commit that referenced this issue Oct 28, 2024
Commit 76d54bf ("nvme-tcp: don't access released socket during
error recovery") added a mutex_lock() call for the queue->queue_lock
in nvme_tcp_get_address(). However, the mutex_lock() races with
mutex_destroy() in nvme_tcp_free_queue(), and causes the WARN below.

DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 3 PID: 34077 at kernel/locking/mutex.c:587 __mutex_lock+0xcf0/0x1220
Modules linked in: nvmet_tcp nvmet nvme_tcp nvme_fabrics iw_cm ib_cm ib_core pktcdvd nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables qrtr sunrpc ppdev 9pnet_virtio 9pnet pcspkr netfs parport_pc parport e1000 i2c_piix4 i2c_smbus loop fuse nfnetlink zram bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper xfs drm sym53c8xx floppy nvme scsi_transport_spi nvme_core nvme_auth serio_raw ata_generic pata_acpi dm_multipath qemu_fw_cfg [last unloaded: ib_uverbs]
CPU: 3 UID: 0 PID: 34077 Comm: udisksd Not tainted 6.11.0-rc7 #319
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
RIP: 0010:__mutex_lock+0xcf0/0x1220
Code: 08 84 d2 0f 85 c8 04 00 00 8b 15 ef b6 c8 01 85 d2 0f 85 78 f4 ff ff 48 c7 c6 20 93 ee af 48 c7 c7 60 91 ee af e8 f0 a7 6d fd <0f> 0b e9 5e f4 ff ff 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1
RSP: 0018:ffff88811305f760 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88812c652058 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
RBP: ffff88811305f8b0 R08: 0000000000000001 R09: ffffed1075c36341
R10: ffff8883ae1b1a0b R11: 0000000000010498 R12: 0000000000000000
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff88812c652058
FS:  00007f9713ae4980(0000) GS:ffff8883ae180000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcd78483c7c CR3: 0000000122c38000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? __warn.cold+0x5b/0x1af
 ? __mutex_lock+0xcf0/0x1220
 ? report_bug+0x1ec/0x390
 ? handle_bug+0x3c/0x80
 ? exc_invalid_op+0x13/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? __mutex_lock+0xcf0/0x1220
 ? nvme_tcp_get_address+0xc2/0x1e0 [nvme_tcp]
 ? __pfx___mutex_lock+0x10/0x10
 ? __lock_acquire+0xd6a/0x59e0
 ? nvme_tcp_get_address+0xc2/0x1e0 [nvme_tcp]
 nvme_tcp_get_address+0xc2/0x1e0 [nvme_tcp]
 ? __pfx_nvme_tcp_get_address+0x10/0x10 [nvme_tcp]
 nvme_sysfs_show_address+0x81/0xc0 [nvme_core]
 dev_attr_show+0x42/0x80
 ? __asan_memset+0x1f/0x40
 sysfs_kf_seq_show+0x1f0/0x370
 seq_read_iter+0x2cb/0x1130
 ? rw_verify_area+0x3b1/0x590
 ? __mutex_lock+0x433/0x1220
 vfs_read+0x6a6/0xa20
 ? lockdep_hardirqs_on+0x78/0x100
 ? __pfx_vfs_read+0x10/0x10
 ksys_read+0xf7/0x1d0
 ? __pfx_ksys_read+0x10/0x10
 ? __x64_sys_openat+0x105/0x1d0
 do_syscall_64+0x93/0x180
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x180
 ? __pfx_ksys_read+0x10/0x10
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x180
 ? do_syscall_64+0x9f/0x180
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f9713f55cfa
Code: 55 48 89 e5 48 83 ec 20 48 89 55 e8 48 89 75 f0 89 7d f8 e8 e8 74 f8 ff 48 8b 55 e8 48 8b 75 f0 41 89 c0 8b 7d f8 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 2e 44 89 c7 48 89 45 f8 e8 42 75 f8 ff 48 8b
RSP: 002b:00007ffd7f512e70 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 000055c38f316859 RCX: 00007f9713f55cfa
RDX: 0000000000000fff RSI: 00007ffd7f512eb0 RDI: 0000000000000011
RBP: 00007ffd7f512e90 R08: 0000000000000000 R09: 00000000ffffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 000055c38f317148
R13: 0000000000000000 R14: 00007f96f4004f30 R15: 000055c3b6b623c0
 </TASK>

The WARN is observed when the blktests test case nvme/014 is repeated
with tcp transport. It is rare, and 200 times repeat is required to
recreate in some test environments.

To avoid the WARN, check the NVME_TCP_Q_LIVE flag before locking
queue->queue_lock. The flag is cleared long time before the lock gets
destroyed.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant