Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

./mptcp_connect.sh -m mmap test blocks #67

Closed
matttbe opened this issue Jul 28, 2020 · 4 comments
Closed

./mptcp_connect.sh -m mmap test blocks #67

matttbe opened this issue Jul 28, 2020 · 4 comments
Labels

Comments

@matttbe
Copy link
Member

matttbe commented Jul 28, 2020

I didn't manage to reproduce it yet but my CI noticed that:

00:31:43.802 + ./mptcp_connect.sh -m mmap
00:31:44.166 [   88.586604] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
00:31:44.291 [   88.711431] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
00:31:44.415 [   88.835321] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
00:31:44.459 # INFO: set ns3-5f1fb1eb-wxHFDU dev ns3eth2: ethtool -K  gso off
00:31:44.476 # INFO: set ns4-5f1fb1eb-wxHFDU dev ns4eth3: ethtool -K tso off gro off
00:31:44.560 # Created /tmp/tmp.wf8fIAAJnJ (size 4960284	/tmp/tmp.wf8fIAAJnJ) containing data sent by client
00:31:44.585 # Created /tmp/tmp.ZBZTsPaQar (size 237596	/tmp/tmp.ZBZTsPaQar) containing data sent by server
00:31:44.674 # New MPTCP socket can be blocked via sysctl		[ OK ]
00:31:44.722 # setsockopt(..., TCP_ULP, "mptcp", ...) blocked	[ OK ]
00:31:44.735 # INFO: validating network environment with pings
00:31:45.052 [   89.472312] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
00:31:46.866 # INFO: Using loss of 0.99% delay 263 ms reorder 94% 67% on ns3eth4
00:31:46.880 # INFO: extra options:  -m mmap
00:31:47.228 # ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP	(duration   288ms) [ OK ]
00:31:47.586 # ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP  	(duration   284ms) [ OK ]
00:31:47.693 # ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP	(duration    37ms) [ OK ]
00:31:48.052 # ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP	(duration   290ms) [ OK ]
00:31:48.413 # ns1 MPTCP -> ns1 (dead:beef:1::1:10004) TCP  	(duration   288ms) [ OK ]
00:31:48.520 # ns1 TCP   -> ns1 (dead:beef:1::1:10005) MPTCP	(duration    37ms) [ OK ]
00:31:48.633 # ns1 MPTCP -> ns2 (10.0.1.2:10006      ) MPTCP	(duration    44ms) [ OK ]
00:31:48.947 # ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP	(duration   248ms) [ OK ]
00:31:49.058 # ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP	(duration    44ms) [ OK ]
00:31:49.373 # ns1 MPTCP -> ns2 (dead:beef:2::1:10009) MPTCP	(duration   243ms) [ OK ]
00:31:49.495 # ns1 MPTCP -> ns3 (10.0.2.2:10010      ) MPTCP	(duration    52ms) [ OK ]
00:31:49.611 # ns1 MPTCP -> ns3 (dead:beef:2::2:10011) MPTCP	(duration    47ms) [ OK ]
00:31:49.730 # ns1 MPTCP -> ns3 (10.0.3.2:10012      ) MPTCP	(duration    52ms) [ OK ]
00:31:49.859 # ns1 MPTCP -> ns3 (dead:beef:3::2:10013) MPTCP	(duration    58ms) [ OK ]
00:31:49.998 # ns1 MPTCP -> ns4 (10.0.3.1:10014      ) MPTCP	(duration    64ms) [ OK ]
00:45:18.000 # ns1 MPTCP -> ns4 (dead:beef:3::1:10015) MPTCP	/usr/lib/klibc/bin/poweroff
01:00:18.001 + clean

I didn't investigate more but I think there is a timeout with mptcp_connect. It can gives a clue maybe.
Using the last export branch: https://github.com/multipath-tcp/mptcp_net-next/commits/export/20200728T045328

@matttbe matttbe added the bug label Jul 28, 2020
@matttbe matttbe changed the title ./mptcp_connect.sh -m mmap timesout ./mptcp_connect.sh -m mmap test blocks Aug 7, 2020
@pabeni
Copy link

pabeni commented Aug 9, 2020

With multiple xmit substream support, this now happen in a deterministic way very early on MPTCP loopback test.

I think the problem is the receiver side can't send MPTCP-level ack if the stream is unidirectional (or receiver side has already completed the xmit part) and the reception/mptcp ack increment happens in the workqueue.

@matttbe
Copy link
Member Author

matttbe commented Aug 29, 2020

Fixed thanks to Florian's: mptcp: free acked data before waiting for more memory in commit 1cec170

@matttbe matttbe closed this as completed Aug 29, 2020
matttbe pushed a commit that referenced this issue Oct 24, 2020
NXP Layerscape (ls1028a, ls2088a), dra7xxx and imx6 platforms are either
programmed or statically configured to forward the error triggered by a
link-down state (eg no connected endpoint device) on the system bus for
PCI configuration transactions; these errors are reported as an SError
at system level, which is fatal.

Enumerating a PCI tree when the PCIe link is down is not sensible
either, so even if the link-up check is racy (link can go down after
map_bus() is called) add a link-up check in map_bus() to prevent issuing
configuration transactions when the link is down.

SError report:

 SError Interrupt on CPU2, code 0xbf000002 -- SError
 CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc5-next-20200914-00001-gf965d3ec86fa #67
 Hardware name: LS1046A RDB Board (DT)
 pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--)
 pc : pci_generic_config_read+0x3c/0xe0
 lr : pci_generic_config_read+0x24/0xe0
 sp : ffff80001003b7b0
 x29: ffff80001003b7b0 x28: ffff80001003ba74
 x27: ffff000971d96800 x26: ffff00096e77e0a8
 x25: ffff80001003b874 x24: ffff80001003b924
 x23: 0000000000000004 x22: 0000000000000000
 x21: 0000000000000000 x20: ffff80001003b874
 x19: 0000000000000004 x18: ffffffffffffffff
 x17: 00000000000000c0 x16: fffffe0025981840
 x15: ffffb94c75b69948 x14: 62203a383634203a
 x13: 666e6f635f726568 x12: 202c31203d207265
 x11: 626d756e3e2d7375 x10: 656877202c307830
 x9 : 203d206e66766564 x8 : 0000000000000908
 x7 : 0000000000000908 x6 : ffff800010900000
 x5 : ffff00096e77e080 x4 : 0000000000000000
 x3 : 0000000000000003 x2 : 84fa3440ff7e7000
 x1 : 0000000000000000 x0 : ffff800010034000
 Kernel panic - not syncing: Asynchronous SError Interrupt
 CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc5-next-20200914-00001-gf965d3ec86fa #67
 Hardware name: LS1046A RDB Board (DT)
 Call trace:
  dump_backtrace+0x0/0x1c0
  show_stack+0x18/0x28
  dump_stack+0xd8/0x134
  panic+0x180/0x398
  add_taint+0x0/0xb0
  arm64_serror_panic+0x78/0x88
  do_serror+0x68/0x180
  el1_error+0x84/0x100
  pci_generic_config_read+0x3c/0xe0
  dw_pcie_rd_other_conf+0x78/0x110
  pci_bus_read_config_dword+0x88/0xe8
  pci_bus_generic_read_dev_vendor_id+0x30/0x1b0
  pci_bus_read_dev_vendor_id+0x4c/0x78
  pci_scan_single_device+0x80/0x100

Link: https://lore.kernel.org/r/20200916054130.8685-1-Zhiqiang.Hou@nxp.com
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
[lorenzo.pieralisi@arm.com: rewrote the commit log, remove Fixes tag]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
@matttbe
Copy link
Member Author

matttbe commented Dec 10, 2020

With the latest export branch, I got a similar issue:

00:32:12.889 + ./mptcp_connect.sh -m mmap
00:32:13.203 [  297.723793] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
00:32:13.324 [  297.845668] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
00:32:13.449 [  297.970239] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
00:32:13.501 # INFO: set ns3-5fd1f6c4-HuMyRo dev ns3eth2: ethtool -K  gso off
00:32:13.517 # INFO: set ns4-5fd1f6c4-HuMyRo dev ns4eth3: ethtool -K  gro off
00:32:13.588 # Created /tmp/tmp.KQFpLf0eh7 (size 6059036	/tmp/tmp.KQFpLf0eh7) containing data sent by client
00:32:13.608 # Created /tmp/tmp.j4b69lb8mi (size 90140	/tmp/tmp.j4b69lb8mi) containing data sent by server
00:32:13.703 # New MPTCP socket can be blocked via sysctl		[ OK ]
00:32:13.738 # setsockopt(..., TCP_ULP, "mptcp", ...) blocked	[ OK ]
00:32:13.750 # INFO: validating network environment with pings
00:32:16.736 # INFO: Using loss of 0.96% delay 38 ms reorder 91% 96% with delay 9ms on ns3eth4
00:32:16.749 # INFO: extra options:  -m mmap
00:32:17.044 # ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP	(duration    75ms) [ OK ]
00:32:17.288 # ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP  	(duration    29ms) [ OK ]
00:32:17.543 # ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP	(duration    28ms) [ OK ]
00:32:17.837 # ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP	(duration    68ms) [ OK ]
00:32:18.070 # ns1 MPTCP -> ns1 (dead:beef:1::1:10004) TCP  	(duration    29ms) [ OK ]
00:32:18.310 # ns1 TCP   -> ns1 (dead:beef:1::1:10005) MPTCP	(duration    28ms) [ OK ]
00:32:18.565 # ns1 MPTCP -> ns2 (10.0.1.2:10006      ) MPTCP	(duration    34ms) [ OK ]
00:32:18.817 # ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP	(duration    29ms) [ OK ]
00:47:17.818 Timeout: sending Ctrl+C
00:47:17.819 # ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP	^C/usr/lib/klibc/bin/poweroff

I still need to instrument my CI to output more info in case of timeout...

The CI is re-running the tests.

@matttbe
Copy link
Member Author

matttbe commented Dec 10, 2020

It looks like it is a new regression, see #124.

@matttbe matttbe closed this as completed Dec 10, 2020
jenkins-tessares pushed a commit that referenced this issue Feb 12, 2021
The current implementation of L2CAP options negotiation will continue
the negotiation when a device responds with L2CAP_CONF_UNACCEPT ("unaccepted
options"), but not when the device replies with L2CAP_CONF_UNKNOWN ("unknown
options").

Trying to continue the negotiation without ERTM support will allow
Bluetooth-capable XBox One controllers (notably models 1708 and 1797)
to connect.

btmon before patch:
> ACL Data RX: Handle 256 flags 0x02 dlen 16                            #64 [hci0] 59.182702
      L2CAP: Connection Response (0x03) ident 2 len 8
        Destination CID: 64
        Source CID: 64
        Result: Connection successful (0x0000)
        Status: No further information available (0x0000)
< ACL Data TX: Handle 256 flags 0x00 dlen 23                            #65 [hci0] 59.182744
      L2CAP: Configure Request (0x04) ident 3 len 15
        Destination CID: 64
        Flags: 0x0000
        Option: Retransmission and Flow Control (0x04) [mandatory]
          Mode: Basic (0x00)
          TX window size: 0
          Max transmit: 0
          Retransmission timeout: 0
          Monitor timeout: 0
          Maximum PDU size: 0
> ACL Data RX: Handle 256 flags 0x02 dlen 16                            #66 [hci0] 59.183948
      L2CAP: Configure Request (0x04) ident 1 len 8
        Destination CID: 64
        Flags: 0x0000
        Option: Maximum Transmission Unit (0x01) [mandatory]
          MTU: 1480
< ACL Data TX: Handle 256 flags 0x00 dlen 18                            #67 [hci0] 59.183994
      L2CAP: Configure Response (0x05) ident 1 len 10
        Source CID: 64
        Flags: 0x0000
        Result: Success (0x0000)
        Option: Maximum Transmission Unit (0x01) [mandatory]
          MTU: 1480
> ACL Data RX: Handle 256 flags 0x02 dlen 15                            #69 [hci0] 59.187676
      L2CAP: Configure Response (0x05) ident 3 len 7
        Source CID: 64
        Flags: 0x0000
        Result: Failure - unknown options (0x0003)
        04                                               .
< ACL Data TX: Handle 256 flags 0x00 dlen 12                            #70 [hci0] 59.187722
      L2CAP: Disconnection Request (0x06) ident 4 len 4
        Destination CID: 64
        Source CID: 64
> ACL Data RX: Handle 256 flags 0x02 dlen 12                            #73 [hci0] 59.192714
      L2CAP: Disconnection Response (0x07) ident 4 len 4
        Destination CID: 64
        Source CID: 64

btmon after patch:
> ACL Data RX: Handle 256 flags 0x02 dlen 16                          #248 [hci0] 103.502970
      L2CAP: Connection Response (0x03) ident 5 len 8
        Destination CID: 65
        Source CID: 65
        Result: Connection pending (0x0001)
        Status: No further information available (0x0000)
> ACL Data RX: Handle 256 flags 0x02 dlen 16                          #249 [hci0] 103.504184
      L2CAP: Connection Response (0x03) ident 5 len 8
        Destination CID: 65
        Source CID: 65
        Result: Connection successful (0x0000)
        Status: No further information available (0x0000)
< ACL Data TX: Handle 256 flags 0x00 dlen 23                          #250 [hci0] 103.504398
      L2CAP: Configure Request (0x04) ident 6 len 15
        Destination CID: 65
        Flags: 0x0000
        Option: Retransmission and Flow Control (0x04) [mandatory]
          Mode: Basic (0x00)
          TX window size: 0
          Max transmit: 0
          Retransmission timeout: 0
          Monitor timeout: 0
          Maximum PDU size: 0
> ACL Data RX: Handle 256 flags 0x02 dlen 16                          #251 [hci0] 103.505472
      L2CAP: Configure Request (0x04) ident 3 len 8
        Destination CID: 65
        Flags: 0x0000
        Option: Maximum Transmission Unit (0x01) [mandatory]
          MTU: 1480
< ACL Data TX: Handle 256 flags 0x00 dlen 18                          #252 [hci0] 103.505689
      L2CAP: Configure Response (0x05) ident 3 len 10
        Source CID: 65
        Flags: 0x0000
        Result: Success (0x0000)
        Option: Maximum Transmission Unit (0x01) [mandatory]
          MTU: 1480
> ACL Data RX: Handle 256 flags 0x02 dlen 15                          #254 [hci0] 103.509165
      L2CAP: Configure Response (0x05) ident 6 len 7
        Source CID: 65
        Flags: 0x0000
        Result: Failure - unknown options (0x0003)
        04                                               .
< ACL Data TX: Handle 256 flags 0x00 dlen 12                          #255 [hci0] 103.509426
      L2CAP: Configure Request (0x04) ident 7 len 4
        Destination CID: 65
        Flags: 0x0000
< ACL Data TX: Handle 256 flags 0x00 dlen 12                          #257 [hci0] 103.511870
      L2CAP: Connection Request (0x02) ident 8 len 4
        PSM: 1 (0x0001)
        Source CID: 66
> ACL Data RX: Handle 256 flags 0x02 dlen 14                          #259 [hci0] 103.514121
      L2CAP: Configure Response (0x05) ident 7 len 6
        Source CID: 65
        Flags: 0x0000
        Result: Success (0x0000)

Signed-off-by: Florian Dollinger <dollinger.florian@gmx.de>
Co-developed-by: Florian Dollinger <dollinger.florian@gmx.de>
Reviewed-by: Luiz Augusto Von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
jenkins-tessares pushed a commit that referenced this issue Apr 30, 2021
This mostly reverts commit 99bca61 ("MIPS: pci-legacy: use generic
pci_enable_resources"). Fixes regressions such as:
  ata_piix 0000:00:0a.1: can't enable device: BAR 0 [io  0x01f0-0x01f7] not
	claimed
  ata_piix: probe of 0000:00:0a.1 failed with error -22

The only changes from the strict revert are to fix checkpatch errors:
  ERROR: spaces required around that '=' (ctx:VxV)
  #33: FILE: arch/mips/pci/pci-legacy.c:252:
  +	for (idx=0; idx < PCI_NUM_RESOURCES; idx++) {
 	        ^

  ERROR: do not use assignment in if condition
  #67: FILE: arch/mips/pci/pci-legacy.c:284:
  +	if ((err = pcibios_enable_resources(dev, mask)) < 0)

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
matttbe pushed a commit that referenced this issue Aug 2, 2021
Our syzcaller report a NULL pointer dereference:

BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 116e95067 P4D 116e95067 PUD 1080b5067 PMD 0
Oops: 0010 [#1] SMP KASAN
CPU: 7 PID: 592 Comm: a.out Not tainted 5.13.0-next-20210629-dirty #67
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-p4
RIP: 0010:0x0
Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
RSP: 0018:ffff888114e779b8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 1ffff110229cef39 RCX: ffffffffaa67e1aa
RDX: 0000000000000000 RSI: ffff88810a58ee00 RDI: ffff8881233180b0
RBP: ffffffffac38e9c0 R08: ffffffffaa67e17e R09: 0000000000000001
R10: ffffffffb91c5557 R11: fffffbfff7238aaa R12: ffff88810a58ee00
R13: ffff888114e77aa0 R14: 0000000000000000 R15: ffff8881233180b0
FS:  00007f946163c480(0000) GS:ffff88839f1c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 00000001099c1000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __lookup_slow+0x116/0x2d0
 ? page_put_link+0x120/0x120
 ? __d_lookup+0xfc/0x320
 ? d_lookup+0x49/0x90
 lookup_one_len+0x13c/0x170
 ? __lookup_slow+0x2d0/0x2d0
 ? reiserfs_schedule_old_flush+0x31/0x130
 reiserfs_lookup_privroot+0x64/0x150
 reiserfs_fill_super+0x158c/0x1b90
 ? finish_unfinished+0xb10/0xb10
 ? bprintf+0xe0/0xe0
 ? __mutex_lock_slowpath+0x30/0x30
 ? __kasan_check_write+0x20/0x30
 ? up_write+0x51/0xb0
 ? set_blocksize+0x9f/0x1f0
 mount_bdev+0x27c/0x2d0
 ? finish_unfinished+0xb10/0xb10
 ? reiserfs_kill_sb+0x120/0x120
 get_super_block+0x19/0x30
 legacy_get_tree+0x76/0xf0
 vfs_get_tree+0x49/0x160
 ? capable+0x1d/0x30
 path_mount+0xacc/0x1380
 ? putname+0x97/0xd0
 ? finish_automount+0x450/0x450
 ? kmem_cache_free+0xf8/0x5a0
 ? putname+0x97/0xd0
 do_mount+0xe2/0x110
 ? path_mount+0x1380/0x1380
 ? copy_mount_options+0x69/0x140
 __x64_sys_mount+0xf0/0x190
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

This is because 'root_inode' is initialized with wrong mode, and
it's i_op is set to 'reiserfs_special_inode_operations'. Thus add
check for 'root_inode' to fix the problem.

Link: https://lore.kernel.org/r/20210702040743.1918552-1-yukuai3@huawei.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
jenkins-tessares pushed a commit that referenced this issue Aug 26, 2022
To set the panel orientation property with quirk, we need the mode size
provided by EDID. This info is available after EDID is read by dc_link_detect()
and updated by amdgpu_dm_update_connector_after_detect(). The detection
happens at driver load in amdgpu_dm_initialize_drm_device() and,
therefore, we can get modes and set panel orientation before
drm_dev_register() to avoid DRM warns on creating the connector property
after device registration:

[    2.563969] ------------[ cut here ]------------
[    2.563971] WARNING: CPU: 6 PID: 325 at drivers/gpu/drm/drm_mode_object.c:45 drm_mode_object_add+0x72/0x80 [drm]
[    2.563997] Modules linked in: btusb btrtl btbcm btintel btmtk bluetooth rfkill ecdh_generic ecc usbhid crc16 amdgpu(+) drm_ttm_helper ttm agpgart gpu_sched i2c_algo_bit drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm serio_raw sdhci_pci atkbd libps2 cqhci vivaldi_fmap ccp sdhci i8042 crct10dif_pclmul crc32_pclmul hid_multitouch ghash_clmulni_intel aesni_intel crypto_simd cryptd wdat_wdt mmc_core cec xhci_pci sp5100_tco rng_core xhci_pci_renesas serio 8250_dw i2c_hid_acpi i2c_hid btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq dm_mirror dm_region_hash dm_log dm_mod pkcs8_key_parser crypto_user
[    2.564032] CPU: 6 PID: 325 Comm: systemd-udevd Not tainted 5.18.0-amd-staging-drm-next+ #67
[    2.564034] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0105 03/21/2022
[    2.564036] RIP: 0010:drm_mode_object_add+0x72/0x80 [drm]
[    2.564053] Code: f0 89 c3 85 c0 78 07 89 45 00 44 89 65 04 4c 89 ef e8 e2 99 04 f1 31 c0 85 db 0f 4e c3 5b 5d 41 5c 41 5d c3 80 7f 50 00 74 ac <0f> 0b eb a8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 4c
[    2.564055] RSP: 0018:ffffb2e880413860 EFLAGS: 00010202
[    2.564056] RAX: ffffffffc0ba1440 RBX: ffff99508a860010 RCX: 0000000000000001
[    2.564057] RDX: 00000000b0b0b0b0 RSI: ffff99508c050110 RDI: ffff99508a860010
[    2.564058] RBP: ffff99508c050110 R08: 0000000000000020 R09: ffff99508c292c20
[    2.564059] R10: 0000000000000000 R11: ffff99508c0507d8 R12: 00000000b0b0b0b0
[    2.564060] R13: 0000000000000004 R14: ffffffffc068a4b6 R15: ffffffffc068a47f
[    2.564061] FS:  00007fc69b5f1a40(0000) GS:ffff9953aff80000(0000) knlGS:0000000000000000
[    2.564063] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.564063] CR2: 00007f9506804000 CR3: 0000000107f92000 CR4: 0000000000350ee0
[    2.564065] Call Trace:
[    2.564068]  <TASK>
[    2.564070]  drm_property_create+0xc9/0x170 [drm]
[    2.564088]  drm_property_create_enum+0x1f/0x70 [drm]
[    2.564105]  drm_connector_set_panel_orientation_with_quirk+0x96/0xc0 [drm]
[    2.564123]  get_modes+0x4fb/0x530 [amdgpu]
[    2.564378]  drm_helper_probe_single_connector_modes+0x1ad/0x850 [drm_kms_helper]
[    2.564390]  drm_client_modeset_probe+0x229/0x1400 [drm]
[    2.564411]  ? xas_store+0x52/0x5e0
[    2.564416]  ? kmem_cache_alloc_trace+0x177/0x2c0
[    2.564420]  __drm_fb_helper_initial_config_and_unlock+0x44/0x4e0 [drm_kms_helper]
[    2.564430]  drm_fbdev_client_hotplug+0x173/0x210 [drm_kms_helper]
[    2.564438]  drm_fbdev_generic_setup+0xa5/0x166 [drm_kms_helper]
[    2.564446]  amdgpu_pci_probe+0x35e/0x370 [amdgpu]
[    2.564621]  local_pci_probe+0x45/0x80
[    2.564625]  ? pci_match_device+0xd7/0x130
[    2.564627]  pci_device_probe+0xbf/0x220
[    2.564629]  ? sysfs_do_create_link_sd+0x69/0xd0
[    2.564633]  really_probe+0x19c/0x380
[    2.564637]  __driver_probe_device+0xfe/0x180
[    2.564639]  driver_probe_device+0x1e/0x90
[    2.564641]  __driver_attach+0xc0/0x1c0
[    2.564643]  ? __device_attach_driver+0xe0/0xe0
[    2.564644]  ? __device_attach_driver+0xe0/0xe0
[    2.564646]  bus_for_each_dev+0x78/0xc0
[    2.564648]  bus_add_driver+0x149/0x1e0
[    2.564650]  driver_register+0x8f/0xe0
[    2.564652]  ? 0xffffffffc1023000
[    2.564654]  do_one_initcall+0x44/0x200
[    2.564657]  ? kmem_cache_alloc_trace+0x177/0x2c0
[    2.564659]  do_init_module+0x4c/0x250
[    2.564663]  __do_sys_init_module+0x12e/0x1b0
[    2.564666]  do_syscall_64+0x3b/0x90
[    2.564670]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[    2.564673] RIP: 0033:0x7fc69bff232e
[    2.564674] Code: 48 8b 0d 45 0b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 12 0b 0c 00 f7 d8 64 89 01 48
[    2.564676] RSP: 002b:00007ffe872ba3e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[    2.564677] RAX: ffffffffffffffda RBX: 000055873f797820 RCX: 00007fc69bff232e
[    2.564678] RDX: 000055873f7bf390 RSI: 0000000001155e81 RDI: 00007fc699e4d010
[    2.564679] RBP: 00007fc699e4d010 R08: 000055873f7bfe20 R09: 0000000001155e90
[    2.564680] R10: 000000055873f7bf R11: 0000000000000246 R12: 000055873f7bf390
[    2.564681] R13: 000000000000000d R14: 000055873f7c4cb0 R15: 000055873f797820
[    2.564683]  </TASK>
[    2.564683] ---[ end trace 0000000000000000 ]---
[    2.564696] ------------[ cut here ]------------
[    2.564696] WARNING: CPU: 6 PID: 325 at drivers/gpu/drm/drm_mode_object.c:242 drm_object_attach_property+0x52/0x80 [drm]
[    2.564717] Modules linked in: btusb btrtl btbcm btintel btmtk bluetooth rfkill ecdh_generic ecc usbhid crc16 amdgpu(+) drm_ttm_helper ttm agpgart gpu_sched i2c_algo_bit drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm serio_raw sdhci_pci atkbd libps2 cqhci vivaldi_fmap ccp sdhci i8042 crct10dif_pclmul crc32_pclmul hid_multitouch ghash_clmulni_intel aesni_intel crypto_simd cryptd wdat_wdt mmc_core cec xhci_pci sp5100_tco rng_core xhci_pci_renesas serio 8250_dw i2c_hid_acpi i2c_hid btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq dm_mirror dm_region_hash dm_log dm_mod pkcs8_key_parser crypto_user
[    2.564738] CPU: 6 PID: 325 Comm: systemd-udevd Tainted: G        W         5.18.0-amd-staging-drm-next+ #67
[    2.564740] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0105 03/21/2022
[    2.564741] RIP: 0010:drm_object_attach_property+0x52/0x80 [drm]
[    2.564759] Code: 2d 83 f8 18 74 33 48 89 74 c1 08 48 8b 4f 08 48 89 94 c1 c8 00 00 00 48 8b 47 08 83 00 01 c3 4d 85 d2 75 dd 83 7f 58 01 75 d7 <0f> 0b eb d3 41 80 78 50 00 74 cc 0f 0b eb c8 44 89 ce 48 c7 c7 28
[    2.564760] RSP: 0018:ffffb2e8804138d8 EFLAGS: 00010246
[    2.564761] RAX: 0000000000000010 RBX: ffff99508c1a2000 RCX: ffff99508c1a2180
[    2.564762] RDX: 0000000000000003 RSI: ffff99508c050100 RDI: ffff99508c1a2040
[    2.564763] RBP: 00000000ffffffff R08: ffff99508a860010 R09: 00000000c0c0c0c0
[    2.564763] R10: 0000000000000000 R11: 0000000000000020 R12: ffff99508a860010
[    2.564764] R13: ffff995088733008 R14: ffff99508c1a2000 R15: ffffffffc068a47f
[    2.564765] FS:  00007fc69b5f1a40(0000) GS:ffff9953aff80000(0000) knlGS:0000000000000000
[    2.564766] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.564767] CR2: 00007f9506804000 CR3: 0000000107f92000 CR4: 0000000000350ee0
[    2.564768] Call Trace:
[    2.564769]  <TASK>
[    2.564770]  drm_connector_set_panel_orientation_with_quirk+0x4a/0xc0 [drm]
[    2.564789]  get_modes+0x4fb/0x530 [amdgpu]
[    2.565024]  drm_helper_probe_single_connector_modes+0x1ad/0x850 [drm_kms_helper]
[    2.565036]  drm_client_modeset_probe+0x229/0x1400 [drm]
[    2.565056]  ? xas_store+0x52/0x5e0
[    2.565060]  ? kmem_cache_alloc_trace+0x177/0x2c0
[    2.565062]  __drm_fb_helper_initial_config_and_unlock+0x44/0x4e0 [drm_kms_helper]
[    2.565072]  drm_fbdev_client_hotplug+0x173/0x210 [drm_kms_helper]
[    2.565080]  drm_fbdev_generic_setup+0xa5/0x166 [drm_kms_helper]
[    2.565088]  amdgpu_pci_probe+0x35e/0x370 [amdgpu]
[    2.565261]  local_pci_probe+0x45/0x80
[    2.565263]  ? pci_match_device+0xd7/0x130
[    2.565265]  pci_device_probe+0xbf/0x220
[    2.565267]  ? sysfs_do_create_link_sd+0x69/0xd0
[    2.565268]  really_probe+0x19c/0x380
[    2.565270]  __driver_probe_device+0xfe/0x180
[    2.565272]  driver_probe_device+0x1e/0x90
[    2.565274]  __driver_attach+0xc0/0x1c0
[    2.565276]  ? __device_attach_driver+0xe0/0xe0
[    2.565278]  ? __device_attach_driver+0xe0/0xe0
[    2.565279]  bus_for_each_dev+0x78/0xc0
[    2.565281]  bus_add_driver+0x149/0x1e0
[    2.565283]  driver_register+0x8f/0xe0
[    2.565285]  ? 0xffffffffc1023000
[    2.565286]  do_one_initcall+0x44/0x200
[    2.565288]  ? kmem_cache_alloc_trace+0x177/0x2c0
[    2.565290]  do_init_module+0x4c/0x250
[    2.565291]  __do_sys_init_module+0x12e/0x1b0
[    2.565294]  do_syscall_64+0x3b/0x90
[    2.565296]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[    2.565297] RIP: 0033:0x7fc69bff232e
[    2.565298] Code: 48 8b 0d 45 0b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 12 0b 0c 00 f7 d8 64 89 01 48
[    2.565299] RSP: 002b:00007ffe872ba3e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[    2.565301] RAX: ffffffffffffffda RBX: 000055873f797820 RCX: 00007fc69bff232e
[    2.565302] RDX: 000055873f7bf390 RSI: 0000000001155e81 RDI: 00007fc699e4d010
[    2.565303] RBP: 00007fc699e4d010 R08: 000055873f7bfe20 R09: 0000000001155e90
[    2.565303] R10: 000000055873f7bf R11: 0000000000000246 R12: 000055873f7bf390
[    2.565304] R13: 000000000000000d R14: 000055873f7c4cb0 R15: 000055873f797820
[    2.565306]  </TASK>
[    2.565307] ---[ end trace 0000000000000000 ]---

--

v2:
- call amdgpu_dm_connector_get_modes() instead of ddc_get_modes() (Harry)

Fixes: d77de78 ("amd/display: enable panel orientation quirks")
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
jenkins-tessares pushed a commit that referenced this issue Jan 6, 2023
Hoist the is_removed_spte() check above the "level == goal_level" check
when walking SPTEs during a TDP MMU page fault to avoid attempting to map
a leaf entry if said entry is frozen by a different task/vCPU.

  ------------[ cut here ]------------
  WARNING: CPU: 3 PID: 939 at arch/x86/kvm/mmu/tdp_mmu.c:653 kvm_tdp_mmu_map+0x269/0x4b0
  Modules linked in: kvm_intel
  CPU: 3 PID: 939 Comm: nx_huge_pages_t Not tainted 6.1.0-rc4+ #67
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:kvm_tdp_mmu_map+0x269/0x4b0
  RSP: 0018:ffffc9000068fba8 EFLAGS: 00010246
  RAX: 00000000000005a0 RBX: ffffc9000068fcc0 RCX: 0000000000000005
  RDX: ffff88810741f000 RSI: ffff888107f04600 RDI: ffffc900006a3000
  RBP: 060000010b000bf3 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 000ffffffffff000 R12: 0000000000000005
  R13: ffff888113670000 R14: ffff888107464958 R15: 0000000000000000
  FS:  00007f01c942c740(0000) GS:ffff888277cc0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000117013006 CR4: 0000000000172ea0
  Call Trace:
   <TASK>
   kvm_tdp_page_fault+0x10c/0x130
   kvm_mmu_page_fault+0x103/0x680
   vmx_handle_exit+0x132/0x5a0 [kvm_intel]
   vcpu_enter_guest+0x60c/0x16f0
   kvm_arch_vcpu_ioctl_run+0x1e2/0x9d0
   kvm_vcpu_ioctl+0x271/0x660
   __x64_sys_ioctl+0x80/0xb0
   do_syscall_64+0x2b/0x50
   entry_SYSCALL_64_after_hwframe+0x46/0xb0
   </TASK>
  ---[ end trace 0000000000000000 ]---

Fixes: 63d28a2 ("KVM: x86/mmu: simplify kvm_tdp_mmu_map flow when guest has to retry")
Cc: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Robert Hoo <robert.hu@linux.intel.com>
Message-Id: <20221213033030.83345-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
matttbe pushed a commit that referenced this issue Jun 29, 2023
Assertions reports are split into two parts, the exact file and location
of the condition and then the stack trace printed from
btrfs_assertfail(). This means all the stack traces report the same line
and this is what's typically reported by various tools, making it harder
to distinguish the reports.

  [403.2467] assertion failed: refcount_read(&block_group->refs) == 1, in fs/btrfs/block-group.c:4259
  [403.2479] ------------[ cut here ]------------
  [403.2484] kernel BUG at fs/btrfs/messages.c:259!
  [403.2488] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
  [403.2493] CPU: 2 PID: 23202 Comm: umount Not tainted 6.2.0-rc4-default+ #67
  [403.2499] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
  [403.2509] RIP: 0010:btrfs_assertfail+0x19/0x1b [btrfs]
  ...
  [403.2595] Call Trace:
  [403.2598]  <TASK>
  [403.2601]  btrfs_free_block_groups.cold+0x52/0xae [btrfs]
  [403.2608]  close_ctree+0x6c2/0x761 [btrfs]
  [403.2613]  ? __wait_for_common+0x2b8/0x360
  [403.2618]  ? btrfs_cleanup_one_transaction.cold+0x7a/0x7a [btrfs]
  [403.2626]  ? mark_held_locks+0x6b/0x90
  [403.2630]  ? lockdep_hardirqs_on_prepare+0x13d/0x200
  [403.2636]  ? __call_rcu_common.constprop.0+0x1ea/0x3d0
  [403.2642]  ? trace_hardirqs_on+0x2d/0x110
  [403.2646]  ? __call_rcu_common.constprop.0+0x1ea/0x3d0
  [403.2652]  generic_shutdown_super+0xb0/0x1c0
  [403.2657]  kill_anon_super+0x1e/0x40
  [403.2662]  btrfs_kill_super+0x25/0x30 [btrfs]
  [403.2668]  deactivate_locked_super+0x4c/0xc0

By making btrfs_assertfail a macro we'll get the same line number for
the BUG output:

  [63.5736] assertion failed: 0, in fs/btrfs/super.c:1572
  [63.5758] ------------[ cut here ]------------
  [63.5782] kernel BUG at fs/btrfs/super.c:1572!
  [63.5807] invalid opcode: 0000 [#2] PREEMPT SMP KASAN
  [63.5831] CPU: 0 PID: 859 Comm: mount Tainted: G      D            6.3.0-rc7-default+ #2062
  [63.5868] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
  [63.5905] RIP: 0010:btrfs_mount+0x24/0x30 [btrfs]
  [63.5964] RSP: 0018:ffff88800e69fcd8 EFLAGS: 00010246
  [63.5982] RAX: 000000000000002d RBX: ffff888008fc1400 RCX: 0000000000000000
  [63.6004] RDX: 0000000000000000 RSI: ffffffffb90fd868 RDI: ffffffffbcc3ff20
  [63.6026] RBP: ffffffffc081b200 R08: 0000000000000001 R09: ffff88800e69fa27
  [63.6046] R10: ffffed1001cd3f44 R11: 0000000000000001 R12: ffff888005a3c370
  [63.6062] R13: ffffffffc058e830 R14: 0000000000000000 R15: 00000000ffffffff
  [63.6081] FS:  00007f7b3561f800(0000) GS:ffff88806c600000(0000) knlGS:0000000000000000
  [63.6105] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [63.6120] CR2: 00007fff83726e10 CR3: 0000000002a9e000 CR4: 00000000000006b0
  [63.6137] Call Trace:
  [63.6143]  <TASK>
  [63.6148]  legacy_get_tree+0x80/0xd0
  [63.6158]  vfs_get_tree+0x43/0x120
  [63.6166]  do_new_mount+0x1f3/0x3d0
  [63.6176]  ? do_add_mount+0x140/0x140
  [63.6187]  ? cap_capable+0xa4/0xe0
  [63.6197]  path_mount+0x223/0xc10

This comes at a cost of bloating the final btrfs.ko module due all the
inlining, as long as assertions are compiled in. This is a must for
debugging builds but this is often enabled on release builds too.

Release build:

   text    data     bss     dec     hex filename
1251676   20317   16088 1288081  13a791 pre/btrfs.ko
1260612   29473   16088 1306173  13ee3d post/btrfs.ko

DELTA: +8936

CC: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: David Sterba <dsterba@suse.com>
matttbe pushed a commit that referenced this issue Oct 31, 2023
With latest sync from net-next tree, bpf-next has a bpf selftest failure:
  [root@arch-fb-vm1 bpf]# ./test_progs -t setget_sockopt
  ...
  [   76.194349] ============================================
  [   76.194682] WARNING: possible recursive locking detected
  [   76.195039] 6.6.0-rc7-g37884503df08-dirty #67 Tainted: G        W  OE
  [   76.195518] --------------------------------------------
  [   76.195852] new_name/154 is trying to acquire lock:
  [   76.196159] ffff8c3e06ad8d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: ip_sock_set_tos+0x19/0x30
  [   76.196669]
  [   76.196669] but task is already holding lock:
  [   76.197028] ffff8c3e06ad8d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_listen+0x21/0x70
  [   76.197517]
  [   76.197517] other info that might help us debug this:
  [   76.197919]  Possible unsafe locking scenario:
  [   76.197919]
  [   76.198287]        CPU0
  [   76.198444]        ----
  [   76.198600]   lock(sk_lock-AF_INET);
  [   76.198831]   lock(sk_lock-AF_INET);
  [   76.199062]
  [   76.199062]  *** DEADLOCK ***
  [   76.199062]
  [   76.199420]  May be due to missing lock nesting notation
  [   76.199420]
  [   76.199879] 2 locks held by new_name/154:
  [   76.200131]  #0: ffff8c3e06ad8d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_listen+0x21/0x70
  [   76.200644]  #1: ffffffff90f96a40 (rcu_read_lock){....}-{1:2}, at: __cgroup_bpf_run_filter_sock_ops+0x55/0x290
  [   76.201268]
  [   76.201268] stack backtrace:
  [   76.201538] CPU: 4 PID: 154 Comm: new_name Tainted: G        W  OE      6.6.0-rc7-g37884503df08-dirty #67
  [   76.202134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
  [   76.202699] Call Trace:
  [   76.202858]  <TASK>
  [   76.203002]  dump_stack_lvl+0x4b/0x80
  [   76.203239]  __lock_acquire+0x740/0x1ec0
  [   76.203503]  lock_acquire+0xc1/0x2a0
  [   76.203766]  ? ip_sock_set_tos+0x19/0x30
  [   76.204050]  ? sk_stream_write_space+0x12a/0x230
  [   76.204389]  ? lock_release+0xbe/0x260
  [   76.204661]  lock_sock_nested+0x32/0x80
  [   76.204942]  ? ip_sock_set_tos+0x19/0x30
  [   76.205208]  ip_sock_set_tos+0x19/0x30
  [   76.205452]  do_ip_setsockopt+0x4b3/0x1580
  [   76.205719]  __bpf_setsockopt+0x62/0xa0
  [   76.205963]  bpf_sock_ops_setsockopt+0x11/0x20
  [   76.206247]  bpf_prog_630217292049c96e_bpf_test_sockopt_int+0xbc/0x123
  [   76.206660]  bpf_prog_493685a3bae00bbd_bpf_test_ip_sockopt+0x49/0x4b
  [   76.207055]  bpf_prog_b0bcd27f269aeea0_skops_sockopt+0x44c/0xec7
  [   76.207437]  __cgroup_bpf_run_filter_sock_ops+0xda/0x290
  [   76.207829]  __inet_listen_sk+0x108/0x1b0
  [   76.208122]  inet_listen+0x48/0x70
  [   76.208373]  __sys_listen+0x74/0xb0
  [   76.208630]  __x64_sys_listen+0x16/0x20
  [   76.208911]  do_syscall_64+0x3f/0x90
  [   76.209174]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
  ...

Both ip_sock_set_tos() and inet_listen() calls lock_sock(sk) which
caused a dead lock.

To fix the issue, use sockopt_lock_sock() in ip_sock_set_tos()
instead. sockopt_lock_sock() will avoid lock_sock() if it is in bpf
context.

Fixes: 878d951 ("inet: lock the socket in ip_sock_set_tos()")
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231027182424.1444845-1-yonghong.song@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants