Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: devicetree node sdio1 is missing. #13

Closed
wants to merge 88 commits into from

Conversation

chainsx
Copy link

@chainsx chainsx commented Feb 3, 2024

Tested: After adding the sdio1 node on the 6.6 kernel, the WiFi module can function properly.

root@licheepi-4a:~# lsmod
Module                  Size  Used by
lz4hc                  12288  0
lz4hc_compress         28672  1 lz4hc
lz4                    12288  0
lz4_compress           45056  1 lz4
binfmt_misc            24576  1
rtw88_8723ds           12288  0
rtw88_8723d            45056  1 rtw88_8723ds
rtw88_sdio             24576  1 rtw88_8723ds
rtw88_core            212992  2 rtw88_8723d,rtw88_sdio
mac80211             1454080  2 rtw88_sdio,rtw88_core
libarc4                12288  1 mac80211
cfg80211             1126400  2 rtw88_core,mac80211
rfkill                 28672  3 cfg80211
fuse                  200704  1
ip_tables              36864  0
root@licheepi-4a:~# ifconfig -a
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.8.148  netmask 255.255.255.0  broadcast 192.168.8.255
        inet6 fe80::6f7b:4495:eb1a:2904  prefixlen 64  scopeid 0x20<link>
        ether 5e:ae:c9:a9:69:c3  txqueuelen 1000  (Ethernet)
        RX packets 151  bytes 19022 (19.0 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 125  bytes 10351 (10.3 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 13  base 0xa000

eth1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 5e:ae:c9:a9:69:c4  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 14  base 0x6000

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wlan0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 2c:05:47:c3:45:73  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@licheepi-4a:~# uname -a
Linux licheepi-4a 6.6.14-current-thead #2 SMP Mon Jan 29 18:51:28 CST 2024 riscv64 riscv64 riscv64 GNU/Linux

However, the current system still has the following issues:

After inserting the SD card, the following error message will appear, but it does not affect the usage and may cause the speed of appearance of the SD device to slow down..

[  463.682592] mmc1: Tuning failed, falling back to fixed sampling clock
[  463.682662] sdhci-dwcmshc ffe7090000.mmc: tuning failed: -11
[  463.682756] mmc1: tuning execution failed: -5
[  463.682808] mmc1: error -5 whilst initialising SD card
[  473.918913] mmc1: Timeout waiting for hardware cmd interrupt.
[  473.918944] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[  473.918951] mmc1: sdhci: Sys addr:  0x00000001 | Version:  0x00000005
[  473.918959] mmc1: sdhci: Blk size:  0x00007040 | Blk cnt:  0x00000000
[  473.918966] mmc1: sdhci: Argument:  0x00000000 | Trn mode: 0x00000010
[  473.918972] mmc1: sdhci: Present:   0x03ff0000 | Host ctl: 0x00000017
[  473.918977] mmc1: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
[  473.918982] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x0000000f
[  473.918986] mmc1: sdhci: Timeout:   0x00000004 | Int stat: 0x00000000
[  473.918991] mmc1: sdhci: Int enab:  0x00000020 | Sig enab: 0x00000020
[  473.918996] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[  473.919002] mmc1: sdhci: Caps:      0x3f69c881 | Caps_1:   0x08008177
[  473.919007] mmc1: sdhci: Cmd:       0x00000102 | Max curr: 0x00191919
[  473.919012] mmc1: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00edc87f
[  473.919018] mmc1: sdhci: Resp[2]:   0x325b5900 | Resp[3]:  0x00400e00
[  473.919025] mmc1: sdhci: Host ctl2: 0x0000300b
[  473.919030] mmc1: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000000ce8220
[  473.919036] mmc1: sdhci: ============================================
[  474.011045] mmc1: Tuning failed, falling back to fixed sampling clock
[  474.011147] sdhci-dwcmshc ffe7090000.mmc: tuning failed: -11
[  474.011199] mmc1: tuning execution failed: -5
[  474.081228] mmc1: new high speed SDHC card at address aaaa
[  474.085470] mmcblk1: mmc1:aaaa SC32G 29.7 GiB
[  474.105659]  mmcblk1: p1

esmil and others added 30 commits January 29, 2024 14:38
Allow the generic gpio-ranges property so GPIOs can be mapped to their
corresponding pin. This way control of GPIO on pins that are already used
by other peripherals can be denied and basic pinconf can be done on pin
controllers that support it.

Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
(cherry picked from commit 0e6e3c6 gpio/for-next)
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Add bindings for the pin controllers on the T-Head TH1520 RISC-V SoC.

Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Add pinctrl driver for the T-Head TH1520 RISC-V SoC.

Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Add nodes for pin controllers on the T-Head TH1520 RISC-V SoC.

Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Add gpio-ranges properties to the TH1520 device tree, so user space can
change basic pinconf settings for GPIOs and are not allowed to use pads
already used by other functions.

Adjust number of GPIOs available for the different controllers.

Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Adjust labels for the TH1520 GPIO controllers such that GPIOs can be
referenced by the names used by the documentation. Eg.

GPIO0_X  -> <&gpio0 X Y>
GPIO1_X  -> <&gpio1 X Y>
GPIO2_X  -> <&gpio2 X Y>
GPIO3_X  -> <&gpio3 X Y>
GPIO4_X  -> <&gpio4 X Y>
AOGPIO_X -> <&aogpio X Y>

Remove labels for the parent GPIO devices that shouldn't need to be
referenced.

Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Add pinctrl settings for UART0 used as the default debug console on
both the Lichee Pi 4A and BeagleV Ahead boards.

Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Add nodes for the 5 user controllable LEDs on the BeagleV Ahead board.

Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Add compatible value for the T-Head TH1520 dwcmshc controller.

Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Drew Fustini <dfustini@baylibre.com>
Link: https://lore.kernel.org/r/20231109-th1520-mmc-v5-1-018bd039cf17@baylibre.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
(cherry picked from commit edee955 mmc-next)
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Expose __sdhci_execute_tuning() so that it can be called from the
mmc host controller drivers.

In the sdhci-of-dwcmshc driver, sdhci_dwcmshc_th1520_ops sets
platform_execute_tuning to th1520_execute_tuning(). That function has
to manipulate phy registers before tuning can be performed. To avoid
copying the code verbatim from __sdhci_execute_tuning() into
th1520_execute_tuning(), make it possible for __sdhci_execute_tuning()
to be called from sdhci-of-dwcmshc.

Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Drew Fustini <dfustini@baylibre.com>
Link: https://lore.kernel.org/r/20231109-th1520-mmc-v5-2-018bd039cf17@baylibre.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
(cherry picked from commit 9cc811a mmc-next)
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Add support for the mmc controller in the T-Head TH1520 with the new
compatible "thead,th1520-dwcmshc". Implement custom sdhci_ops for
set_uhs_signaling, reset, voltage_switch, and platform_execute_tuning.

Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Drew Fustini <dfustini@baylibre.com>
Link: https://lore.kernel.org/r/20231109-th1520-mmc-v5-3-018bd039cf17@baylibre.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
(cherry picked from commit 43658a5 mmc-next)
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Enable the mmc controller driver and dma controller driver needed for
T-Head TH1520 based boards, like the LicheePi 4A and BeagleV-Ahead, to
boot from eMMC storage.

Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Drew Fustini <dfustini@baylibre.com>
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Link: https://lore.kernel.org/r/20231206-th1520_mmc_dts-v8-1-69220e373e8f@baylibre.com
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Add node for the fixed reference clock used for emmc and sdio nodes.
Add emmc node for the 1st dwcmshc instance which is typically connected
to an eMMC device. Add sdio0 node for the 2nd dwcmshc instance which is
typically connected to microSD slot. Add sdio1 node for the 3rd dwcmshc
instance which is typically connected to an SDIO WiFi module. The node
names are based on Table 1-2 C910/C906 memory map in the TH1520 System
User Manual.

Link: https://git.beagleboard.org/beaglev-ahead/beaglev-ahead/-/tree/main/docs
Signed-off-by: Drew Fustini <dfustini@baylibre.com>
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20231206-th1520_mmc_dts-v8-2-69220e373e8f@baylibre.com
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Add emmc node properties for the eMMC device and add sdio0 node
properties for the microSD slot. Set the frequency for the sdhci
reference clock.

Signed-off-by: Drew Fustini <dfustini@baylibre.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20231206-th1520_mmc_dts-v8-3-69220e373e8f@baylibre.com
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Add emmc node properties for the eMMC device and add sdio0 node
properties for the microSD slot. Set the frequency for the sdhci
reference clock.

Signed-off-by: Drew Fustini <dfustini@baylibre.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20231206-th1520_mmc_dts-v8-4-69220e373e8f@baylibre.com
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
This way GPIO can be denied on on pins already used by other devices and
basic pin configuration (pull-up, pull-down etc.) can be done on through
the userspace GPIO API.

Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
T-HEAD SoCs such as the TH1520 contain a PWM controller used
to control the LCD backlight, fan and so on.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20231005130519.3864-2-jszhang@kernel.org
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
T-HEAD SoCs such as the TH1520 contain a PWM controller used
to control the LCD backlight, fan and so on. Add driver for it.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20231005130519.3864-3-jszhang@kernel.org
[esmil: remove .owner field]
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
[esmil: remove status = "disabled", drop existing aonsys_clk]
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
[esmil: add fan pinctrl]
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
snps dwmac 3.70a also supports setting pbl related properties, such as
"snps,pbl", "snps,txpbl", "snps,rxpbl" and "snps,no-pbl-x8".

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230827091710.1483-2-jszhang@kernel.org
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Add documentation to describe T-HEAD dwmac.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230827091710.1483-3-jszhang@kernel.org
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Add dwmac glue driver to support the dwmac on the T-HEAD TH1520 SoC.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230827091710.1483-4-jszhang@kernel.org
[esmil: rename plat->interface -> plat->mac_interface,
        use devm_stmmac_probe_config_dt()]
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
T-HEAD TH1520 platform's USB has a wrapper module around
the DesignWare USB3 DRD controller. Add binding information doc for
it.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230927164222.3505-2-jszhang@kernel.org
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Adds TH1520 Glue layer to support USB controller on T-HEAD TH1520 SoC.
There is a DesignWare USB3 DRD core in TH1520 SoCs, the dwc3 core is
the child of this USB wrapper module device.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230927164222.3505-3-jszhang@kernel.org
[esmil: use devm_of_platform_populate(), remove MODULE_ALIAS()]
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
RevySR and others added 15 commits January 29, 2024 14:38
thead-gcc: v2.8.0
mainline-gcc: v13.2

Signed-off-by: Han Gao <gaohan@iscas.ac.cn>
Signed-off-by: Han Gao <gaohan@iscas.ac.cn>
Commit 434ed33 ("memfd: improve userspace warnings for missing
exec-related flags") attempted to make these warnings more useful (so
they would work as an incentive to get users to switch to specifying
these flags -- as intended by the original MFD_NOEXEC_SEAL patchset).
Unfortunately, it turns out that even INFO-level logging is too extreme
to enable by default and alternative solutions to the spam issue (such
as doing more extreme rate-limiting per-task) are either too ugly or
overkill for something as simple as emitting a log as a developer aid.

Given that the flags are new and there is no harm to not specifying them
(after all, we maintain backwards compatibility) we can just drop the
warnings for now until some time in the future when most programs have
migrated and distributions start using vm.memfd_noexec=1 (where failing
to pass the flag would result in unexpected errors for programs that use
executable memfds).

Link: https://lkml.kernel.org/r/20230912-memfd-reduce-spam-v2-1-7d92a4964b6a@cyphar.com
Fixes: 434ed33 ("memfd: improve userspace warnings for missing exec-related flags")
Fixes: 2562d67 ("revert "memfd: improve userspace warnings for missing exec-related flags".")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Reported-by: Damian Tometzki <dtometzki@fedoraproject.org>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Han Gao <gaohan@iscas.ac.cn>
Signed-off-by: Han Gao <gaohan@iscas.ac.cn>
Fix warning in light_set_target() during boot that cpu_pll1_foutpostdiv
was already disabled. Only call clk_disable_unprepare() for
LIGHT_CPU_PLL0_FOUTPOSTDIV if clk_set_parent() returns non-zero value.

Signed-off-by: Drew Fustini <drew@pdp7.com>
…tifier_call

Removed unused variable 'val' from panic_cpufreq_notifier_call().

Signed-off-by: Drew Fustini <drew@pdp7.com>
Signed-off-by: Haaland Chen <haaland@milkv.io>
Write a C version of memcpy() which uses the biggest data size allowed,
without generating unaligned accesses.

The procedure is made of three steps:
First copy data one byte at time until the destination buffer is aligned
to a long boundary.
Then copy the data one long at time shifting the current and the next u8
to compose a long at every cycle.
Finally, copy the remainder one byte at time.

On a BeagleV, the TCP RX throughput increased by 45%:

before:

$ iperf3 -c beaglev
Connecting to host beaglev, port 5201
[  5] local 192.168.85.6 port 44840 connected to 192.168.85.48 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  76.4 MBytes   641 Mbits/sec   27    624 KBytes
[  5]   1.00-2.00   sec  72.5 MBytes   608 Mbits/sec    0    708 KBytes
[  5]   2.00-3.00   sec  73.8 MBytes   619 Mbits/sec   10    451 KBytes
[  5]   3.00-4.00   sec  72.5 MBytes   608 Mbits/sec    0    564 KBytes
[  5]   4.00-5.00   sec  73.8 MBytes   619 Mbits/sec    0    658 KBytes
[  5]   5.00-6.00   sec  73.8 MBytes   619 Mbits/sec   14    522 KBytes
[  5]   6.00-7.00   sec  73.8 MBytes   619 Mbits/sec    0    621 KBytes
[  5]   7.00-8.00   sec  72.5 MBytes   608 Mbits/sec    0    706 KBytes
[  5]   8.00-9.00   sec  73.8 MBytes   619 Mbits/sec   20    580 KBytes
[  5]   9.00-10.00  sec  73.8 MBytes   619 Mbits/sec    0    672 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   736 MBytes   618 Mbits/sec   71             sender
[  5]   0.00-10.01  sec   733 MBytes   615 Mbits/sec                  receiver

after:

$ iperf3 -c beaglev
Connecting to host beaglev, port 5201
[  5] local 192.168.85.6 port 44864 connected to 192.168.85.48 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   109 MBytes   912 Mbits/sec   48    559 KBytes
[  5]   1.00-2.00   sec   108 MBytes   902 Mbits/sec    0    690 KBytes
[  5]   2.00-3.00   sec   106 MBytes   891 Mbits/sec   36    396 KBytes
[  5]   3.00-4.00   sec   108 MBytes   902 Mbits/sec    0    567 KBytes
[  5]   4.00-5.00   sec   106 MBytes   891 Mbits/sec    0    699 KBytes
[  5]   5.00-6.00   sec   106 MBytes   891 Mbits/sec   32    414 KBytes
[  5]   6.00-7.00   sec   106 MBytes   891 Mbits/sec    0    583 KBytes
[  5]   7.00-8.00   sec   106 MBytes   891 Mbits/sec    0    708 KBytes
[  5]   8.00-9.00   sec   106 MBytes   891 Mbits/sec   28    433 KBytes
[  5]   9.00-10.00  sec   108 MBytes   902 Mbits/sec    0    591 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.04 GBytes   897 Mbits/sec  144             sender
[  5]   0.00-10.01  sec  1.04 GBytes   894 Mbits/sec                  receiver

And the decreased CPU time of the memcpy() is observable with perf top.
This is the `perf top -Ue task-clock` output when doing the test:

before:

Overhead  Shared O  Symbol
  42.22%  [kernel]  [k] memcpy
  35.00%  [kernel]  [k] __asm_copy_to_user
   3.50%  [kernel]  [k] sifive_l2_flush64_range
   2.30%  [kernel]  [k] stmmac_napi_poll_rx
   1.11%  [kernel]  [k] memset

after:

Overhead  Shared O  Symbol
  45.69%  [kernel]  [k] __asm_copy_to_user
  29.06%  [kernel]  [k] memcpy
   4.09%  [kernel]  [k] sifive_l2_flush64_range
   2.77%  [kernel]  [k] stmmac_napi_poll_rx
   1.24%  [kernel]  [k] memset

Compared with Matteo's original series, Jisheng made below changes:
1. adopt Emil's change to fix boot failure when build with clang
2. add corresponding changes to purgatory
3. always build optimized string.c rather than only build when optimize
for performance
4. implement unroll support when src & dst are both aligned to keep
the same performance as assembly version. After disassembling, I found
that the unroll version looks something like below, so it acchieves
the "unroll" effect as asm version but in C programming language:
	ld	t2,0(a5)
	ld	t0,8(a5)
	ld	t6,16(a5)
	ld	t5,24(a5)
	ld	t4,32(a5)
	ld	t3,40(a5)
	ld	t1,48(a5)
	ld	a1,56(a5)
	sd	t2,0(a6)
	sd	t0,8(a6)
	sd	t6,16(a6)
	sd	t5,24(a6)
	sd	t4,32(a6)
	sd	t3,40(a6)
	sd	t1,48(a6)
	sd	a1,56(a6)
And per my testing, unrolling more doesn't help performance, so
the "c" version only unrolls by using 8 GP regs rather than 16
ones as asm version.
5. Add proper __pi_memcpy and __pi___memcpy alias
6. more performance numbers.

Jisheng's commit msg:
Use the benchmark program from [1], I got below results on TH1520,
CV1800B and JH7110 platforms.

*TH1520 platform (I fixed cpu freq at 750MHZ):

Before the patch:

Random memcpy (bytes/ns):
   __memcpy 32K: 0.52 64K: 0.43 128K: 0.38 256K: 0.35 512K: 0.31 1024K: 0.22 avg 0.34
memcpy_call 32K: 0.41 64K: 0.35 128K: 0.33 256K: 0.31 512K: 0.28 1024K: 0.20 avg 0.30

Aligned medium memcpy (bytes/ns):
   __memcpy 8B: 0.46 16B: 0.61 32B: 0.84 64B: 0.89 128B: 3.31 256B: 3.44 512B: 3.51
memcpy_call 8B: 0.18 16B: 0.26 32B: 0.50 64B: 0.90 128B: 1.57 256B: 2.31 512B: 2.92

Unaligned medium memcpy (bytes/ns):
   __memcpy 8B: 0.19 16B: 0.18 32B: 0.25 64B: 0.30 128B: 0.33 256B: 0.35 512B: 0.36
memcpy_call 8B: 0.16 16B: 0.22 32B: 0.39 64B: 0.70 128B: 1.11 256B: 1.46 512B: 1.81

Large memcpy (bytes/ns):
   __memcpy 1K: 3.57 2K: 3.85 4K: 3.75 8K: 3.98 16K: 4.03 32K: 4.06 64K: 4.40
memcpy_call 1K: 3.13 2K: 3.75 4K: 3.99 8K: 4.29 16K: 4.40 32K: 4.46 64K: 4.63

After the patch:

Random memcpy (bytes/ns):
   __memcpy 32K: 0.32 64K: 0.28 128K: 0.26 256K: 0.24 512K: 0.22 1024K: 0.17 avg 0.24
memcpy_call 32K: 0.39 64K: 0.34 128K: 0.32 256K: 0.30 512K: 0.27 1024K: 0.20 avg 0.29

Aligned medium memcpy (bytes/ns):
   __memcpy 8B: 0.20 16B: 0.22 32B: 0.25 64B: 2.43 128B: 3.19 256B: 3.36 512B: 3.55
memcpy_call 8B: 0.18 16B: 0.24 32B: 0.46 64B: 0.88 128B: 1.53 256B: 2.30 512B: 2.92

Unaligned medium memcpy (bytes/ns):
   __memcpy 8B: 0.22 16B: 0.29 32B: 0.49 64B: 0.51 128B: 0.87 256B: 1.08 512B: 1.27
memcpy_call 8B: 0.12 16B: 0.21 32B: 0.40 64B: 0.70 128B: 1.10 256B: 1.46 512B: 1.80

Large memcpy (bytes/ns):
   __memcpy 1K: 3.63 2K: 3.66 4K: 3.78 8K: 3.87 16K: 3.96 32K: 4.11 64K: 4.40
memcpy_call 1K: 3.32 2K: 3.68 4K: 3.99 8K: 4.17 16K: 4.25 32K: 4.48 64K: 4.60

As can be seen, the unaligned medium memcpy performance is improved by
about 252%, I.E got 3.5x speed of original's. The performance of other
style mempcy is kept the same as original's.

And since the TH1520 supports HAVE_EFFICIENT_UNALIGNED_ACCESS, we can
optimize the memcpy futher without taking care of alignment at all.
Random memcpy (bytes/ns):
              __memcpy 32K: 0.35 64K: 0.31 128K: 0.28 256K: 0.25 512K: 0.23 1024K: 0.17 av
g 0.25
           memcpy_call 32K: 0.40 64K: 0.35 128K: 0.33 256K: 0.31 512K: 0.27 1024K: 0.20 av
g 0.30

Aligned medium memcpy (bytes/ns):
              __memcpy 8B: 0.21 16B: 0.23 32B: 0.27 64B: 3.34 128B: 3.42 256B: 3.50 512B:
3.58
           memcpy_call 8B: 0.18 16B: 0.24 32B: 0.46 64B: 0.88 128B: 1.53 256B: 2.31 512B:
2.92

Unaligned medium memcpy (bytes/ns):
              __memcpy 8B: 0.20 16B: 0.23 32B: 0.28 64B: 3.05 128B: 2.70 256B: 2.82 512B:
2.88
           memcpy_call 8B: 0.16 16B: 0.21 32B: 0.38 64B: 0.70 128B: 1.11 256B: 1.50 512B:
1.81

Large memcpy (bytes/ns):
              __memcpy 1K: 3.62 2K: 3.71 4K: 3.76 8K: 3.92 16K: 3.96 32K: 4.12 64K: 4.40
           memcpy_call 1K: 3.11 2K: 3.66 4K: 4.02 8K: 4.16 16K: 4.34 32K: 4.47 64K: 4.62

As can be seen, the unaligned medium memcpy is improved by 700%, I.E 8x
speed of original's.

*CV1800B platform:

Before the patch:

Random memcpy (bytes/ns):
   __memcpy 32K: 0.21 64K: 0.10 128K: 0.08 256K: 0.07 512K: 0.06 1024K: 0.06 avg 0.08
memcpy_call 32K: 0.19 64K: 0.10 128K: 0.08 256K: 0.07 512K: 0.06 1024K: 0.06 avg 0.08

Aligned medium memcpy (bytes/ns):
   __memcpy 8B: 0.26 16B: 0.36 32B: 0.48 64B: 0.51 128B: 2.01 256B: 2.44 512B: 2.73
memcpy_call 8B: 0.10 16B: 0.18 32B: 0.33 64B: 0.59 128B: 0.90 256B: 1.21 512B: 1.47

Unaligned medium memcpy (bytes/ns):
   __memcpy 8B: 0.11 16B: 0.12 32B: 0.15 64B: 0.16 128B: 0.16 256B: 0.17 512B: 0.17
memcpy_call 8B: 0.10 16B: 0.12 32B: 0.21 64B: 0.34 128B: 0.50 256B: 0.66 512B: 0.77

Large memcpy (bytes/ns):
   __memcpy 1K: 2.90 2K: 2.91 4K: 3.00 8K: 3.04 16K: 3.03 32K: 2.89 64K: 2.52
memcpy_call 1K: 1.62 2K: 1.74 4K: 1.80 8K: 1.83 16K: 1.84 32K: 1.78 64K: 1.54

After the patch:

Random memcpy (bytes/ns):
   __memcpy 32K: 0.15 64K: 0.08 128K: 0.06 256K: 0.06 512K: 0.05 1024K: 0.05 avg 0.07
memcpy_call 32K: 0.19 64K: 0.10 128K: 0.08 256K: 0.07 512K: 0.06 1024K: 0.06 avg 0.08

Aligned medium memcpy (bytes/ns):
   __memcpy 8B: 0.11 16B: 0.11 32B: 0.14 64B: 1.15 128B: 1.62 256B: 2.06 512B: 2.40
memcpy_call 8B: 0.10 16B: 0.18 32B: 0.33 64B: 0.59 128B: 0.90 256B: 1.21 512B: 1.47

Unaligned medium memcpy (bytes/ns):
   __memcpy 8B: 0.11 16B: 0.12 32B: 0.21 64B: 0.32 128B: 0.43 256B: 0.52 512B: 0.59
memcpy_call 8B: 0.10 16B: 0.12 32B: 0.21 64B: 0.34 128B: 0.50 256B: 0.66 512B: 0.77

Large memcpy (bytes/ns):
   __memcpy 1K: 2.56 2K: 2.71 4K: 2.78 8K: 2.81 16K: 2.80 32K: 2.68 64K: 2.51
memcpy_call 1K: 1.62 2K: 1.74 4K: 1.80 8K: 1.83 16K: 1.84 32K: 1.78 64K: 1.54

We get similar performance improvement as TH1520. And since CV1800B also
supports HAVE_EFFICIENT_UNALIGNED_ACCESS, so the performance can be
improved futher:
Random memcpy (bytes/ns):
   __memcpy 32K: 0.15 64K: 0.08 128K: 0.07 256K: 0.06 512K: 0.05 1024K: 0.05 avg 0.07
memcpy_call 32K: 0.19 64K: 0.10 128K: 0.08 256K: 0.07 512K: 0.06 1024K: 0.06 avg 0.08

Aligned medium memcpy (bytes/ns):
   __memcpy 8B: 0.13 16B: 0.14 32B: 0.15 64B: 1.55 128B: 2.01 256B: 2.36 512B: 2.58
memcpy_call 8B: 0.10 16B: 0.18 32B: 0.33 64B: 0.59 128B: 0.90 256B: 1.21 512B: 1.47

Unaligned medium memcpy (bytes/ns):
   __memcpy 8B: 0.13 16B: 0.14 32B: 0.15 64B: 1.06 128B: 1.26 256B: 1.39 512B: 1.46
memcpy_call 8B: 0.10 16B: 0.12 32B: 0.21 64B: 0.34 128B: 0.50 256B: 0.66 512B: 0.77

Large memcpy (bytes/ns):
   __memcpy 1K: 2.65 2K: 2.76 4K: 2.80 8K: 2.82 16K: 2.81 32K: 2.68 64K: 2.51
memcpy_call 1K: 1.63 2K: 1.74 4K: 1.80 8K: 1.84 16K: 1.84 32K: 1.78 64K: 1.54

Now the unaligned medium memcpy is running at 8.6x speed of original's!

*JH7110 (I fixed cpufreq at 1.5GHZ)

Before the patch:
Random memcpy (bytes/ns):
   __memcpy 32K: 0.45 64K: 0.40 128K: 0.36 256K: 0.33 512K: 0.33 1024K: 0.31 avg 0.36
memcpy_call 32K: 0.43 64K: 0.38 128K: 0.34 256K: 0.31 512K: 0.31 1024K: 0.29 avg 0.34

Aligned medium memcpy (bytes/ns):
   __memcpy 8B: 0.42 16B: 0.55 32B: 0.65 64B: 0.72 128B: 2.91 256B: 3.36 512B: 3.65
memcpy_call 8B: 0.16 16B: 0.36 32B: 0.67 64B: 1.14 128B: 1.70 256B: 2.26 512B: 2.71

Unaligned medium memcpy (bytes/ns):
   __memcpy 8B: 0.17 16B: 0.18 32B: 0.19 64B: 0.19 128B: 0.19 256B: 0.20 512B: 0.20
memcpy_call 8B: 0.16 16B: 0.20 32B: 0.36 64B: 0.62 128B: 0.94 256B: 1.28 512B: 1.52

Large memcpy (bytes/ns):
   __memcpy 1K: 3.62 2K: 3.82 4K: 3.90 8K: 3.95 16K: 3.97 32K: 1.33 64K: 1.33
memcpy_call 1K: 2.93 2K: 3.14 4K: 3.25 8K: 3.31 16K: 3.19 32K: 1.27 64K: 1.28

After the patch:

Random memcpy (bytes/ns):
   __memcpy 32K: 0.26 64K: 0.24 128K: 0.23 256K: 0.22 512K: 0.22 1024K: 0.21 avg 0.23
memcpy_call 32K: 0.42 64K: 0.38 128K: 0.34 256K: 0.31 512K: 0.31 1024K: 0.29 avg 0.34

Aligned medium memcpy (bytes/ns):
   __memcpy 8B: 0.17 16B: 0.17 32B: 0.18 64B: 1.94 128B: 2.56 256B: 3.04 512B: 3.36
memcpy_call 8B: 0.17 16B: 0.36 32B: 0.65 64B: 1.12 128B: 1.73 256B: 2.37 512B: 2.91

Unaligned medium memcpy (bytes/ns):
   __memcpy 8B: 0.17 16B: 0.24 32B: 0.41 64B: 0.63 128B: 0.85 256B: 1.00 512B: 1.14
memcpy_call 8B: 0.16 16B: 0.22 32B: 0.38 64B: 0.65 128B: 0.99 256B: 1.35 512B: 1.61

Large memcpy (bytes/ns):
   __memcpy 1K: 3.43 2K: 3.59 4K: 3.67 8K: 3.72 16K: 3.73 32K: 1.28 64K: 1.28
memcpy_call 1K: 3.21 2K: 3.46 4K: 3.60 8K: 3.68 16K: 3.51 32K: 1.27 64K: 1.28

As can be seen, the unaligned medium memcpy performance is improved by
about 470%, I.E 5.7x speed of original's. The performance of other
style mempcy is kept the same as original's.

Link:https://github.com/ARM-software/optimized-routines/blob/master/string/bench/memcpy.c [1]

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Co-developed-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
When the destination buffer is before the source one, or when the
buffers doesn't overlap, it's safe to use memcpy() instead, which is
optimized to use a bigger data size possible.

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
The generic memset is defined as a byte at time write. This is always
safe, but it's slower than a 4 byte or even 8 byte write.

Write a generic memset which fills the data one byte at time until the
destination is aligned, then fills using the largest size allowed,
and finally fills the remaining data one byte at time.

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Han Gao <gaohan@iscas.ac.cn>
RevySR pushed a commit that referenced this pull request Feb 6, 2024
[ Upstream commit b16904f ]

With latest upstream llvm18, the following test cases failed:

  $ ./test_progs -j
  #13/2    bpf_cookie/multi_kprobe_link_api:FAIL
  #13/3    bpf_cookie/multi_kprobe_attach_api:FAIL
  #13      bpf_cookie:FAIL
  torvalds#77      fentry_fexit:FAIL
  torvalds#78/1    fentry_test/fentry:FAIL
  torvalds#78      fentry_test:FAIL
  torvalds#82/1    fexit_test/fexit:FAIL
  torvalds#82      fexit_test:FAIL
  torvalds#112/1   kprobe_multi_test/skel_api:FAIL
  torvalds#112/2   kprobe_multi_test/link_api_addrs:FAIL
  [...]
  torvalds#112     kprobe_multi_test:FAIL
  torvalds#356/17  test_global_funcs/global_func17:FAIL
  torvalds#356     test_global_funcs:FAIL

Further analysis shows llvm upstream patch [1] is responsible for the above
failures. For example, for function bpf_fentry_test7() in net/bpf/test_run.c,
without [1], the asm code is:

  0000000000000400 <bpf_fentry_test7>:
     400: f3 0f 1e fa                   endbr64
     404: e8 00 00 00 00                callq   0x409 <bpf_fentry_test7+0x9>
     409: 48 89 f8                      movq    %rdi, %rax
     40c: c3                            retq
     40d: 0f 1f 00                      nopl    (%rax)

... and with [1], the asm code is:

  0000000000005d20 <bpf_fentry_test7.specialized.1>:
    5d20: e8 00 00 00 00                callq   0x5d25 <bpf_fentry_test7.specialized.1+0x5>
    5d25: c3                            retq

... and <bpf_fentry_test7.specialized.1> is called instead of <bpf_fentry_test7>
and this caused test failures for #13/torvalds#77 etc. except torvalds#356.

For test case torvalds#356/17, with [1] (progs/test_global_func17.c)), the main prog
looks like:

  0000000000000000 <global_func17>:
       0:       b4 00 00 00 2a 00 00 00 w0 = 0x2a
       1:       95 00 00 00 00 00 00 00 exit

... which passed verification while the test itself expects a verification
failure.

Let us add 'barrier_var' style asm code in both places to prevent function
specialization which caused selftests failure.

  [1] llvm/llvm-project#72903

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231127050342.1945270-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Feb 19, 2024
[ Upstream commit b16904f ]

With latest upstream llvm18, the following test cases failed:

  $ ./test_progs -j
  #13/2    bpf_cookie/multi_kprobe_link_api:FAIL
  #13/3    bpf_cookie/multi_kprobe_attach_api:FAIL
  #13      bpf_cookie:FAIL
  torvalds#77      fentry_fexit:FAIL
  torvalds#78/1    fentry_test/fentry:FAIL
  torvalds#78      fentry_test:FAIL
  torvalds#82/1    fexit_test/fexit:FAIL
  torvalds#82      fexit_test:FAIL
  torvalds#112/1   kprobe_multi_test/skel_api:FAIL
  torvalds#112/2   kprobe_multi_test/link_api_addrs:FAIL
  [...]
  torvalds#112     kprobe_multi_test:FAIL
  torvalds#356/17  test_global_funcs/global_func17:FAIL
  torvalds#356     test_global_funcs:FAIL

Further analysis shows llvm upstream patch [1] is responsible for the above
failures. For example, for function bpf_fentry_test7() in net/bpf/test_run.c,
without [1], the asm code is:

  0000000000000400 <bpf_fentry_test7>:
     400: f3 0f 1e fa                   endbr64
     404: e8 00 00 00 00                callq   0x409 <bpf_fentry_test7+0x9>
     409: 48 89 f8                      movq    %rdi, %rax
     40c: c3                            retq
     40d: 0f 1f 00                      nopl    (%rax)

... and with [1], the asm code is:

  0000000000005d20 <bpf_fentry_test7.specialized.1>:
    5d20: e8 00 00 00 00                callq   0x5d25 <bpf_fentry_test7.specialized.1+0x5>
    5d25: c3                            retq

... and <bpf_fentry_test7.specialized.1> is called instead of <bpf_fentry_test7>
and this caused test failures for #13/torvalds#77 etc. except torvalds#356.

For test case torvalds#356/17, with [1] (progs/test_global_func17.c)), the main prog
looks like:

  0000000000000000 <global_func17>:
       0:       b4 00 00 00 2a 00 00 00 w0 = 0x2a
       1:       95 00 00 00 00 00 00 00 exit

... which passed verification while the test itself expects a verification
failure.

Let us add 'barrier_var' style asm code in both places to prevent function
specialization which caused selftests failure.

  [1] llvm/llvm-project#72903

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231127050342.1945270-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal@kernel.org>
@RevySR RevySR force-pushed the th1520-lts branch 2 times, most recently from 5eae844 to a2d2689 Compare February 23, 2024 08:37
@RevySR RevySR force-pushed the th1520-lts branch 2 times, most recently from 19010cc to c10506d Compare March 10, 2024 10:26
@RevySR RevySR force-pushed the th1520-lts branch 2 times, most recently from 89f4fa5 to 422afc2 Compare March 30, 2024 12:11
RevySR pushed a commit that referenced this pull request May 16, 2024
vhost_worker will call tun call backs to receive packets. If too many
illegal packets arrives, tun_do_read will keep dumping packet contents.
When console is enabled, it will costs much more cpu time to dump
packet and soft lockup will be detected.

net_ratelimit mechanism can be used to limit the dumping rate.

PID: 33036    TASK: ffff949da6f20000  CPU: 23   COMMAND: "vhost-32980"
 #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253
 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3
 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e
 #3 [fffffe00003fced0] do_nmi at ffffffff8922660d
 #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663
    [exception RIP: io_serial_in+20]
    RIP: ffffffff89792594  RSP: ffffa655314979e8  RFLAGS: 00000002
    RAX: ffffffff89792500  RBX: ffffffff8af428a0  RCX: 0000000000000000
    RDX: 00000000000003fd  RSI: 0000000000000005  RDI: ffffffff8af428a0
    RBP: 0000000000002710   R8: 0000000000000004   R9: 000000000000000f
    R10: 0000000000000000  R11: ffffffff8acbf64f  R12: 0000000000000020
    R13: ffffffff8acbf698  R14: 0000000000000058  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594
 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470
 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6
 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605
 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558
 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124
 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07
 #12 [ffffa65531497b68] printk at ffffffff89318306
 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765
 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun]
 #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun]
 torvalds#16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net]
 torvalds#17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost]
 torvalds#18 [ffffa65531497f10] kthread at ffffffff892d2e72
 torvalds#19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f

Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors")
Signed-off-by: Lei Chen <lei.chen@smartx.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
@chainsx chainsx closed this May 24, 2024
RevySR pushed a commit that referenced this pull request Jul 29, 2024
commit be346c1 upstream.

The code in ocfs2_dio_end_io_write() estimates number of necessary
transaction credits using ocfs2_calc_extend_credits().  This however does
not take into account that the IO could be arbitrarily large and can
contain arbitrary number of extents.

Extent tree manipulations do often extend the current transaction but not
in all of the cases.  For example if we have only single block extents in
the tree, ocfs2_mark_extent_written() will end up calling
ocfs2_replace_extent_rec() all the time and we will never extend the
current transaction and eventually exhaust all the transaction credits if
the IO contains many single block extents.  Once that happens a
WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in
jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to
this error.  This was actually triggered by one of our customers on a
heavily fragmented OCFS2 filesystem.

To fix the issue make sure the transaction always has enough credits for
one extent insert before each call of ocfs2_mark_extent_written().

Heming Zhao said:

------
PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error"

PID: xxx  TASK: xxxx  CPU: 5  COMMAND: "SubmitThread-CA"
  #0 machine_kexec at ffffffff8c069932
  #1 __crash_kexec at ffffffff8c1338fa
  #2 panic at ffffffff8c1d69b9
  #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2]
  #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2]
  #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2]
  #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2]
  #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2]
  #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2]
  #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2]
#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2]
#11 dio_complete at ffffffff8c2b9fa7
#12 do_blockdev_direct_IO at ffffffff8c2bc09f
#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2]
#14 generic_file_direct_write at ffffffff8c1dcf14
#15 __generic_file_write_iter at ffffffff8c1dd07b
torvalds#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2]
torvalds#17 aio_write at ffffffff8c2cc72e
torvalds#18 kmem_cache_alloc at ffffffff8c248dde
torvalds#19 do_io_submit at ffffffff8c2ccada
torvalds#20 do_syscall_64 at ffffffff8c004984
torvalds#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba

Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz
Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RevySR pushed a commit that referenced this pull request Jul 29, 2024
[ Upstream commit 5c0b485 ]

syzkaller triggered the warning [0] in udp_v4_early_demux().

In udp_v[46]_early_demux() and sk_lookup(), we do not touch the refcount
of the looked-up sk and use sock_pfree() as skb->destructor, so we check
SOCK_RCU_FREE to ensure that the sk is safe to access during the RCU grace
period.

Currently, SOCK_RCU_FREE is flagged for a bound socket after being put
into the hash table.  Moreover, the SOCK_RCU_FREE check is done too early
in udp_v[46]_early_demux() and sk_lookup(), so there could be a small race
window:

  CPU1                                 CPU2
  ----                                 ----
  udp_v4_early_demux()                 udp_lib_get_port()
  |                                    |- hlist_add_head_rcu()
  |- sk = __udp4_lib_demux_lookup()    |
  |- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk));
                                       `- sock_set_flag(sk, SOCK_RCU_FREE)

We had the same bug in TCP and fixed it in commit 871019b ("net:
set SOCK_RCU_FREE before inserting socket into hashtable").

Let's apply the same fix for UDP.

[0]:
WARNING: CPU: 0 PID: 11198 at net/ipv4/udp.c:2599 udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599
Modules linked in:
CPU: 0 PID: 11198 Comm: syz-executor.1 Not tainted 6.9.0-g93bda33046e7 #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599
Code: c5 7a 15 fe bb 01 00 00 00 44 89 e9 31 ff d3 e3 81 e3 bf ef ff ff 89 de e8 2c 74 15 fe 85 db 0f 85 02 06 00 00 e8 9f 7a 15 fe <0f> 0b e8 98 7a 15 fe 49 8d 7e 60 e8 4f 39 2f fe 49 c7 46 60 20 52
RSP: 0018:ffffc9000ce3fa58 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8318c92c
RDX: ffff888036ccde00 RSI: ffffffff8318c2f1 RDI: 0000000000000001
RBP: ffff88805a2dd6e0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0001ffffffffffff R12: ffff88805a2dd680
R13: 0000000000000007 R14: ffff88800923f900 R15: ffff88805456004e
FS:  00007fc449127640(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc449126e38 CR3: 000000003de4b002 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
PKRU: 55555554
Call Trace:
 <TASK>
 ip_rcv_finish_core.constprop.0+0xbdd/0xd20 net/ipv4/ip_input.c:349
 ip_rcv_finish+0xda/0x150 net/ipv4/ip_input.c:447
 NF_HOOK include/linux/netfilter.h:314 [inline]
 NF_HOOK include/linux/netfilter.h:308 [inline]
 ip_rcv+0x16c/0x180 net/ipv4/ip_input.c:569
 __netif_receive_skb_one_core+0xb3/0xe0 net/core/dev.c:5624
 __netif_receive_skb+0x21/0xd0 net/core/dev.c:5738
 netif_receive_skb_internal net/core/dev.c:5824 [inline]
 netif_receive_skb+0x271/0x300 net/core/dev.c:5884
 tun_rx_batched drivers/net/tun.c:1549 [inline]
 tun_get_user+0x24db/0x2c50 drivers/net/tun.c:2002
 tun_chr_write_iter+0x107/0x1a0 drivers/net/tun.c:2048
 new_sync_write fs/read_write.c:497 [inline]
 vfs_write+0x76f/0x8d0 fs/read_write.c:590
 ksys_write+0xbf/0x190 fs/read_write.c:643
 __do_sys_write fs/read_write.c:655 [inline]
 __se_sys_write fs/read_write.c:652 [inline]
 __x64_sys_write+0x41/0x50 fs/read_write.c:652
 x64_sys_call+0xe66/0x1990 arch/x86/include/generated/asm/syscalls_64.h:2
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x4b/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fc44a68bc1f
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 cf f5 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 3c d0 f5 ff 48
RSP: 002b:00007fc449126c90 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000004bc050 RCX: 00007fc44a68bc1f
RDX: 0000000000000032 RSI: 00000000200000c0 RDI: 00000000000000c8
RBP: 00000000004bc050 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000032 R11: 0000000000000293 R12: 0000000000000000
R13: 000000000000000b R14: 00007fc44a5ec530 R15: 0000000000000000
 </TASK>

Fixes: 6acc9b4 ("bpf: Add helper to retrieve socket in BPF")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240709191356.24010-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
RevySR pushed a commit that referenced this pull request Sep 21, 2024
[ Upstream commit a699781 ]

A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 torvalds#16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 torvalds#17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.