Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rtw_8821au: Connection breaks after a while #205

Closed
stkw0 opened this issue Jun 23, 2024 · 80 comments
Closed

rtw_8821au: Connection breaks after a while #205

stkw0 opened this issue Jun 23, 2024 · 80 comments

Comments

@stkw0
Copy link

stkw0 commented Jun 23, 2024

Sometimes my wifi connection suddenly disconnects. When it happens, dmesg shows the next message recurrently: "rtw_8821au 1-8:1.0: MAC has not been powered on yet". No matter what I try, seems it's only resolved by rebooting the computer.
This message also shows the first time I boot up the computer, but it connects properly the first time.
May also important to note that before using rtw88 source I used aircrack-ng/rtl8812au driver. With those drivers, I also had random disconnections (maybe due to the AP¿?) but after restarting iwd or NetworkManager it recovered the connection. This does not happen now.

Module: rtw_8821au
Hardware: ID 2357:0120 TP-Link Archer T2U PLUS [RTL8821AU]
Linux: 6.9.5

@dubhater
Copy link
Collaborator

How long does it usually take to lose the connection like that?

Please attach the full journalctl output from a boot where your connection broke.

@stkw0
Copy link
Author

stkw0 commented Jun 23, 2024

From a couple of hours (say ~3h) to 1-2 days. I don't use systemd. I will add the relevant logs once it happens again.

Thank you.

@stkw0
Copy link
Author

stkw0 commented Jun 24, 2024

Here is the log of the last time that connetion broke

[36259.610821] random: crng reseeded on system resumption
[36259.610832] PM: suspend exit
[36259.642000] Generic FE-GE Realtek PHY r8169-0-600:00: attached PHY driver (mii_bus:phy_addr=r8169-0-600:00, irq=MAC)
[36259.717371] Loading firmware: rtw88/rtw8821a_fw.bin
[36259.717517] rtw_8821au 1-8:1.0: Firmware version 42.4.0, H2C version 0
[36259.759666] 00000000: 29 81 00 7c 01 00 01 00 4c 00 04 00 10 00 00 00  )..|....L.......
[36259.759671] 00000010: 25 26 26 27 27 27 2e 2e 2e 2e 2e ff ff ff ff ff  %&&'''..........
[36259.759672] 00000020: ff ff 1d 1b 19 19 1b 19 17 16 15 15 17 16 16 16  ................
[36259.759674] 00000030: fd ff ff ff ff ff 10 ff ff ff ff ff ff ff ff ff  ................
[36259.759675] 00000040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759677] 00000050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759678] 00000060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759680] 00000070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759681] 00000080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759683] 00000090: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759684] 000000a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759685] 000000b0: ff ff ff ff ff ff ff ff ba 27 1e 00 01 00 00 08  .........'......
[36259.759687] 000000c0: ff 09 00 ff 00 00 00 55 00 ff ff ff ff ff ff ff  .......U........
[36259.759688] 000000d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759690] 000000e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759691] 000000f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759693] 00000100: 57 23 20 01 ff ff 03 98 48 27 20 4c 84 0a 03 52  W# .....H' L...R
[36259.759694] 00000110: 65 61 6c 74 65 6b 20 18 03 38 30 32 2e 31 31 61  ealtek ..802.11a
[36259.759696] 00000120: 63 20 57 4c 41 4e 20 41 64 61 70 74 65 72 20 00  c WLAN Adapter .
[36259.759697] 00000130: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759698] 00000140: ff ff ff ff ff ff ff 0f ff ff ff ff ff ff ff ff  ................
[36259.759700] 00000150: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759701] 00000160: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759702] 00000170: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759703] 00000180: ff ff ff ff ff ff ff ff 83 ab 99 2d 03 93 98 a0  ...........-....
[36259.759705] 00000190: fc 8c 00 11 9b c4 00 ff ff ff ff ff ff ff ff ff  ................
[36259.759706] 000001a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759708] 000001b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759709] 000001c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759710] 000001d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759711] 000001e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.759713] 000001f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[36259.810289] r8169 0000:06:00.0 enp6s0: Link is Down
[36259.812234] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[36260.352182] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[36261.372967] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[36261.907218] wlan0: authenticate with 10:50:72:2f:39:35 (local address=98:48:27:20:4c:84)
[36262.081267] wlan0: send auth to 10:50:72:2f:39:35 (try 1/3)
[36262.083125] wlan0: authenticated
[36262.085015] wlan0: associate with 10:50:72:2f:39:35 (try 1/3)
[36262.087171] wlan0: RX AssocResp from 10:50:72:2f:39:35 (capab=0x11 status=0 aid=6)
[36262.092191] wlan0: associated
[36262.151352] wlan0: Limiting TX power to 23 (23 - 0) dBm as advertised by 10:50:72:2f:39:35
[36263.035092] ata5: link is slow to respond, please be patient (ready=0)
[36264.030098] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[36264.031208] sd 4:0:0:0: [sdc] Starting disk
[36264.032200] ata5.00: configured for UDMA/133
[36319.610307] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[36319.643464] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[36777.234567] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[37832.508261] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[37832.541419] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[38275.786428] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[38329.737498] wlan0: disconnect from AP 10:50:72:2f:39:35 for new auth to 10:50:72:2f:39:31
[38329.800541] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[38330.335880] wlan0: authenticate with 10:50:72:2f:39:31 (local address=98:48:27:20:4c:84)
[38330.444110] wlan0: send auth to 10:50:72:2f:39:31 (try 1/3)
[38330.447348] wlan0: authenticated
[38330.447920] wlan0: associate with 10:50:72:2f:39:31 (try 1/3)
[38330.453675] wlan0: RX ReassocResp from 10:50:72:2f:39:31 (capab=0x411 status=0 aid=6)
[38330.458176] wlan0: associated
[38463.948846] wlan0: disconnect from AP 10:50:72:2f:39:31 for new auth to 10:50:72:2f:39:35
[38464.001962] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[38464.541377] wlan0: authenticate with 10:50:72:2f:39:35 (local address=98:48:27:20:4c:84)
[38464.714467] wlan0: send auth to 10:50:72:2f:39:35 (try 1/3)
[38464.716286] wlan0: authenticated
[38464.717166] wlan0: associate with 10:50:72:2f:39:35 (try 1/3)
[38464.719309] wlan0: RX ReassocResp from 10:50:72:2f:39:35 (capab=0x11 status=0 aid=5)
[38464.724093] wlan0: associated
[38464.736072] wlan0: Limiting TX power to 23 (23 - 0) dBm as advertised by 10:50:72:2f:39:35
[38811.532263] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[38811.565235] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[38813.971004] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[39907.529515] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[39907.562552] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[40478.563592] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[40959.541325] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[40959.574468] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[41070.043780] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[41741.542724] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[41741.575683] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]
[43038.223867] wlan0: disconnect from AP 10:50:72:2f:39:35 for new auth to 10:50:72:2f:39:31
[43038.284851] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[43038.815360] wlan0: authenticate with 10:50:72:2f:39:31 (local address=98:48:27:20:4c:84)
[43038.933215] wlan0: send auth to 10:50:72:2f:39:31 (try 1/3)
[43038.938685] wlan0: authenticated
[43038.939114] wlan0: associate with 10:50:72:2f:39:31 (try 1/3)
[43038.944417] wlan0: RX ReassocResp from 10:50:72:2f:39:31 (capab=0x411 status=0 aid=7)
[43038.949300] wlan0: associated
[43093.767547] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[43094.575609] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[43096.559608] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[43098.751643] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[43102.318251] rtw_8821au 1-8:1.0: failed to send h2c command
[43102.555949] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[43103.047680] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[43103.648155] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[43118.877138] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[43144.104385] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[43189.328248] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[43274.552931] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[43439.778400] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[43745.005699] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[44050.232440] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[44355.459071] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[44660.685905] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[44965.912856] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[45271.139777] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[45576.367454] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[45881.597447] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[46186.824281] rtw_8821au 1-8:1.0: MAC has not been powered on yet
[46433.317237] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000068000 engine 03 [IFB] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel 2 [00bfb9a000 X[2976]]

@dubhater
Copy link
Collaborator

It looks like the firmware stops working, no idea why.

"MAC has not been powered on yet" is printed when the chip is being powered on.

Instead of rebooting, have you tried reloading rtw_8821au? Have you tried unplugging the device?

Also, I see that your computer was suspended. Does the problem happen if the computer stays awake the whole time?

@dubhater dubhater changed the title Connection breaks after a while rtw_8821au: Connection breaks after a while Jun 24, 2024
@stkw0
Copy link
Author

stkw0 commented Jun 29, 2024

I waited ~50 hours without suspending it, it didn't failed. Then I suspended it two times leaving some hours in between, without failures. I don't know if some changed that I made to the kernel (for other reasons) could affect this issue. For the past 5 days I didn't had this issue again.

Will report back if at some point I have more clues :/

@dubhater
Copy link
Collaborator

Did you update rtw88 since your original report? I pushed some changes recently.

@stkw0
Copy link
Author

stkw0 commented Jun 29, 2024

No. I updated now, if it happens again I will report back.
I also found weird it switch automatically from 2.4 GHz to 5 GHz network (it being a desktop), I don't know if that could be related.

@dubhater
Copy link
Collaborator

It must be switching because the signal strength varies over time. The switching could be related. If you can give the 2.4 GHz and 5 GHz networks separate SSIDs, you could try to make it switch from one to the other in a loop using nmcli/iwctl. Maybe give it a few seconds after each switch.

@stkw0
Copy link
Author

stkw0 commented Jun 30, 2024

For now there are no problems. If I have time I will try to build a test script. Thank you

@stkw0
Copy link
Author

stkw0 commented Jul 4, 2024

I had some disconnects but now seems it reconnects correctly without hanging forever. I guess this issue can be closed as it could not be reproduced.

@tratum
Copy link

tratum commented Jul 8, 2024

I was having the same problems but it was automatically resolved when I rebooted my system but now I am having the same problem again today

@dubhater
Copy link
Collaborator

dubhater commented Jul 8, 2024

I'm trying to reproduce it now:

for i in {001..100}; do nmcli connection down 64e4328c-6606-4648-93bc-247763c3bc5a; sleep 10; nmcli connection up 64e4328c-6606-4648-93bc-247763c3bc5a; sleep 10; done

@dubhater
Copy link
Collaborator

dubhater commented Jul 8, 2024

Still works.

@dubhater
Copy link
Collaborator

dubhater commented Jul 8, 2024

By the way, are either of you using KDE Plasma and its NetworkManager applet?

@stkw0
Copy link
Author

stkw0 commented Jul 8, 2024

I am and also using iwd backend instead of wpa_supplicant

@stkw0
Copy link
Author

stkw0 commented Jul 8, 2024

It happened again now. rmmod rtw_8821au and modprobe again fixed the issue. Here is the log of the failure and the recovery. Since last time I didn't pulled new commits from this repository. Nothing changed except that I updated to Linux 6.9.8:

[ 7860.564657] wlan0: Limiting TX power to 30 (30 - 0) dBm as advertised by 10:50:72:2f:39:35
[ 8613.130077] wlan0: disconnect from AP 10:50:72:2f:39:35 for new auth to 10:50:72:2f:39:31
[ 8613.731694] wlan0: authenticate with 10:50:72:2f:39:31 (local address=98:48:27:20:4c:84)
[ 8613.839173] wlan0: send auth to 10:50:72:2f:39:31 (try 1/3)
[ 8613.844410] wlan0: authenticated
[ 8613.846222] wlan0: associate with 10:50:72:2f:39:31 (try 1/3)
[ 8613.851052] wlan0: RX ReassocResp from 10:50:72:2f:39:31 (capab=0x411 status=0 aid=3)
[ 8613.855522] wlan0: associated
[ 8623.976358] rtw_8821au 1-8:1.0: write register 0x8c4 failed with -71
[ 8623.976478] rtw_8821au 1-8:1.0: read register 0x848 failed with -71
[ 8623.976596] rtw_8821au 1-8:1.0: write register 0x848 failed with -71
[ 8623.976718] rtw_8821au 1-8:1.0: read register 0xc00 failed with -71
[ 8623.976838] rtw_8821au 1-8:1.0: read register 0x8b0 failed with -71
[ 8623.976956] rtw_8821au 1-8:1.0: write register 0x8b0 failed with -71
[ 8627.011948] wlan0: disconnect from AP 10:50:72:2f:39:31 for new auth to 10:50:72:2f:39:35
[ 8627.516349] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[ 8627.623288] wlan0: authenticate with 10:50:72:2f:39:35 (local address=98:48:27:20:4c:84)
[ 8627.796034] wlan0: send auth to 10:50:72:2f:39:35 (try 1/3)
[ 8628.300319] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[ 8628.836342] wlan0: send auth to 10:50:72:2f:39:35 (try 2/3)
[ 8629.340327] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[ 8629.860645] wlan0: send auth to 10:50:72:2f:39:35 (try 3/3)
[ 8630.364338] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[ 8630.884403] wlan0: authentication with 10:50:72:2f:39:35 timed out
[ 8670.516554] rtw_8821au 1-8:1.0: rtw8821a_power_off: bailing because RTW_FLAG_POWERON
[ 8681.133490] usbcore: deregistering interface driver rtw_8821au
[ 8681.154267] rtw_8821au 1-8:1.0: rtw8821a_power_off: bailing because RTW_FLAG_POWERON
[ 8681.295984] usb 1-8: reset high-speed USB device number 3 using xhci_hcd
[ 8685.031857] Loading firmware: rtw88/rtw8821a_fw.bin
[ 8685.032059] rtw_8821au 1-8:1.0: Firmware version 42.4.0, H2C version 0
[ 8685.073861] 00000000: 29 81 00 7c 01 00 01 00 4c 00 04 00 10 00 00 00  )..|....L.......
[ 8685.073863] 00000010: 25 26 26 27 27 27 2e 2e 2e 2e 2e ff ff ff ff ff  %&&'''..........
[ 8685.073864] 00000020: ff ff 1d 1b 19 19 1b 19 17 16 15 15 17 16 16 16  ................
[ 8685.073865] 00000030: fd ff ff ff ff ff 10 ff ff ff ff ff ff ff ff ff  ................
[ 8685.073866] 00000040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073867] 00000050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073868] 00000060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073869] 00000070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073870] 00000080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073870] 00000090: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073871] 000000a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073872] 000000b0: ff ff ff ff ff ff ff ff ba 27 1e 00 01 00 00 08  .........'......
[ 8685.073873] 000000c0: ff 09 00 ff 00 00 00 55 00 ff ff ff ff ff ff ff  .......U........
[ 8685.073874] 000000d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073874] 000000e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073875] 000000f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073876] 00000100: 57 23 20 01 ff ff 03 98 48 27 20 4c 84 0a 03 52  W# .....H' L...R
[ 8685.073877] 00000110: 65 61 6c 74 65 6b 20 18 03 38 30 32 2e 31 31 61  ealtek ..802.11a
[ 8685.073878] 00000120: 63 20 57 4c 41 4e 20 41 64 61 70 74 65 72 20 00  c WLAN Adapter .
[ 8685.073879] 00000130: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073880] 00000140: ff ff ff ff ff ff ff 0f ff ff ff ff ff ff ff ff  ................
[ 8685.073880] 00000150: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073881] 00000160: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073882] 00000170: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073883] 00000180: ff ff ff ff ff ff ff ff 83 ab 99 2d 03 93 98 a0  ...........-....
[ 8685.073884] 00000190: fc 8c 00 11 9b c4 00 ff ff ff ff ff ff ff ff ff  ................
[ 8685.073884] 000001a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073885] 000001b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073886] 000001c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073887] 000001d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073888] 000001e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.073888] 000001f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 8685.075057] usbcore: registered new interface driver rtw_8821au
[ 8687.195869] wlan0: authenticate with 10:50:72:2f:39:35 (local address=98:48:27:20:4c:84)
[ 8687.367864] wlan0: send auth to 10:50:72:2f:39:35 (try 1/3)
[ 8687.369748] wlan0: authenticated
[ 8687.370819] wlan0: associate with 10:50:72:2f:39:35 (try 1/3)
[ 8687.372767] wlan0: RX AssocResp from 10:50:72:2f:39:35 (capab=0x11 status=0 aid=4)
[ 8687.377249] wlan0: associated
[ 8687.466278] wlan0: Limiting TX power to 30 (30 - 0) dBm as advertised by 10:50:72:2f:39:35

@dubhater
Copy link
Collaborator

I pushed something that may help. Maybe it won't. Please pull and test.

@tratum
Copy link

tratum commented Jul 11, 2024

Here's something silly that works for me whenever my connection breaks
For Fedora

sudo systemctl restart NetworkManager
sudo systemctl restart NetworkManager.service
sudo reboot

@dubhater
Copy link
Collaborator

I pushed something that may help. Maybe it won't. Please pull and test.

Well, I ran into the disconnection problem again yesterday, and today too. I'm thinking it's somehow caused by my torrent client. @stkw0 and @tratum were you downloading or uploading a lot of Linux ISOs when the connection died? :)

When the connection died yesterday, qBittorrent was showing over 5 GiB downloaded and about the same uploaded. Today it showed 22 GiB downloaded and 7 uploaded.

I tried to trigger the disconnection using iperf3, but it downloaded and uploaded a lot with no issues.

I was wrong earlier, the firmware doesn't die. Everything keeps working, except the driver doesn't receive anything from the chip anymore. I can see it transmitting probe requests on channels 48 and 149, so it's switching the channel and transmitting fine.

@tratum
Copy link

tratum commented Jul 13, 2024

@dubhater, I mean I was in the process of downloading the Windows ISO to set up a dual-boot configuration. However, I encountered frequent network disconnections randomly even before initiating the download. Additionally, upon switching to the Windows operating system, I faced another issue where I was unable to establish a connection to my Wi-Fi network even in the Windows OS.

I've been thinking about it, and I don't think the torrent client is the root cause of the disconnections

@tratum
Copy link

tratum commented Jul 13, 2024

I'm happy to share that for now the issues I was experiencing with frequent disconnects and WiFi interruptions have been resolved. Here are the detailed system specifications for my current system:

         .';:cccccccccccc:;,.
      .;cccccccccccccccccccccc;.      OS: Fedora Linux 40 (Workstation Edition) x86_64 
    .:cccccccccccccccccccccccccc:.     Host: TUF Gaming FX505DT_FX505DT 1.0 
  .;ccccccccccccc;.:dddl:.;ccccccc;.     Kernel: 6.9.8-200.fc40.x86_64 
 .:ccccccccccccc;OWMKOOXMWd;ccccccc:.     Shell: bash 5.2.26 
.:ccccccccccccc;KMMc;cc;xMMc:ccccccc:.     DE: GNOME 46.3.1 
,cccccccccccccc;MMM.;cc;;WW::cccccccc,     WM: Mutter 
:cccccccccccccc;MMM.;cccccccccccccccc:      Terminal: gnome-terminal 
:ccccccc;oxOOOo;MMM0OOk.;cccccccccccc:     CPU: AMD Ryzen 7 3750H with Radeon Vega Mobile Gfx (8) @ 2.300GHz 
cccccc:0MMKxdd:;MMMkddc.;cccccccccccc;     GPU: NVIDIA GeForce GTX 1650 Mobile / Max-Q 
ccccc:XM0';cccc;MMM.;cccccccccccccccc'      GPU: AMD ATI Radeon Vega Series / Radeon Vega Mobile Series 
ccccc;MMo;ccccc;MMW.;ccccccccccccccc;     Memory: 7958MiB / 30007MiB 
ccccc;0MNc.ccc.xMMd:ccccccccccccccc;     
cccccc;dNMWXXXWM0::cccccccccccccc:,      
cccccccc;.:odl:.;cccccccccccccc:,.       
:cccccccccccccccccccccccccccc:'.         
.:cccccccccccccccccccccc:;,..            
  '::cccccccccccccc::;,.                

@stkw0
Copy link
Author

stkw0 commented Jul 13, 2024

were you downloading or uploading a lot of Linux ISOs when the connection died? :)

I had qbittorrent opened, but it was not transmitting a high amount of bandwidth. If it's related with that, maybe the problem is more about opening and closing connections (the DHT and so) than bandwidth.

@dubhater
Copy link
Collaborator

Next time it happens, before you do anything else, please gather some information with these simple steps:

  1. Mount debugfs: # mount -t debugfs none /sys/kernel/debug
  2. Prepare this command but don't run it yet: # cat /sys/kernel/debug/ieee80211/phy0/rtw88/{mac_{0..2},mac_{4..7},bb_{8,9},bb_{a..f}} > registers.txt On your system it may not be phy0.
  3. When you see the LED blinking, press enter to run the command. That's when the chip is definitely powered on, because it's transmitting probe requests while scanning.

If registers.txt is filled mostly with eaeaeaea eaeaeaea eaeaeaea eaeaeaea it means you missed (the chip was powered off) and need to run cat again.

@stkw0
Copy link
Author

stkw0 commented Jul 28, 2024

It happened again but seems it's far less common now (still using rtw88 commit 5db1508). rmmod & modprobe workaround the problem. I could not gather the information requested since I didn't have debugfs enabled. Will do it now and update to latest master.

@stkw0
Copy link
Author

stkw0 commented Aug 8, 2024

Next time it happens, before you do anything else, please gather some information with these simple steps:

1. Mount debugfs: `# mount -t debugfs none /sys/kernel/debug`

2. Prepare this command but don't run it yet: `# cat /sys/kernel/debug/ieee80211/phy0/rtw88/{mac_{0..2},mac_{4..7},bb_{8,9},bb_{a..f}} > registers.txt` On your system it may not be phy0.

3. When you see the LED blinking, press enter to run the command. That's when the chip is definitely powered on, because it's transmitting probe requests while scanning.

If registers.txt is filled mostly with eaeaeaea eaeaeaea eaeaeaea eaeaeaea it means you missed (the chip was powered off) and need to run cat again.

Tried to follow this instructions but it changed from phy5 (on my machine) to phy6 when re-loading the module. Is there a way to fix it to have a predictable number? Also, would it be fine if I use tail -f to not miss anything?

@dubhater
Copy link
Collaborator

dubhater commented Aug 8, 2024

I don't think you can make it more predictable. tail -f doesn't work for this case.

@dubhater
Copy link
Collaborator

dubhater commented Aug 9, 2024

I got another idea. Please apply this patch and let me know if you see the error message "rtw_usb: probably just ran out of RX URBs" when the connection dies:

diff --git a/drivers/net/wireless/realtek/rtw88/usb.c b/drivers/net/wireless/realtek/rtw88/usb.c
index bf55360f9daf..149a200ffe19 100644
--- a/drivers/net/wireless/realtek/rtw88/usb.c
+++ b/drivers/net/wireless/realtek/rtw88/usb.c
@@ -671,6 +671,9 @@ static void rtw_usb_read_port_complete(struct urb *urb)
 		}
 		if (skb)
 			dev_kfree_skb_any(skb);
+		rtwusb->skipped_resubmit++;
+		if (rtwusb->skipped_resubmit >= RTW_USB_RXCB_NUM)
+			pr_err_once("rtw_usb: probably just ran out of RX URBs\n");
 	}
 }
 
diff --git a/drivers/net/wireless/realtek/rtw88/usb.h b/drivers/net/wireless/realtek/rtw88/usb.h
index 86697a5c0103..85bcb09b7997 100644
--- a/drivers/net/wireless/realtek/rtw88/usb.h
+++ b/drivers/net/wireless/realtek/rtw88/usb.h
@@ -82,6 +82,7 @@ struct rtw_usb {
 	struct rx_usb_ctrl_block rx_cb[RTW_USB_RXCB_NUM];
 	struct sk_buff_head rx_queue;
 	struct work_struct rx_work;
+	int skipped_resubmit;
 };
 
 static inline struct rtw_usb_tx_data *rtw_usb_get_tx_data(struct sk_buff *skb)

I would test it myself but my RTL8812AU just died and I don't want to reload the driver until I'm sure I don't need any more information from it.

@dubhater
Copy link
Collaborator

Haha, after RTL8812AU died I plugged RTL8811AU and it also died a few hours later. Only when I'm not trying to make it happen...

I got the register contents from both. I confirmed that rtw88 is not even receiving messages from the firmware (this is the cause of the "failed to get tx report from firmware" errors).

If you haven't started yet, here is a better patch which shows the error code:

diff --git a/drivers/net/wireless/realtek/rtw88/usb.c b/drivers/net/wireless/realtek/rtw88/usb.c
index bf55360f9daf..4dbcc276a76c 100644
--- a/drivers/net/wireless/realtek/rtw88/usb.c
+++ b/drivers/net/wireless/realtek/rtw88/usb.c
@@ -664,7 +664,6 @@ static void rtw_usb_read_port_complete(struct urb *urb)
 		case -ECOMM:
 		case -EOVERFLOW:
 		case -EINPROGRESS:
-			break;
 		default:
 			rtw_err(rtwdev, "status %d\n", urb->status);
 			break;

If this is indeed the right direction, you will see something like rtw_8821au 1-8:1.0: status -XYZ. The connection will break when this appears for the fourth time.

@stkw0
Copy link
Author

stkw0 commented Aug 11, 2024

In linux 6.10.3 seems that this patch is already applied? Will try with the latest Linux kernel and the latest master of this repo

@plumbeo
Copy link

plumbeo commented Nov 7, 2024

I've been testing the new changes, no issues and the speed seems to be a little better than before even with TCP transfers. I'm seeing a lot of allocating new RX skb/freeing excess RX skb messages in the logs though.

@stkw0
Copy link
Author

stkw0 commented Nov 13, 2024

The current state is much better. Still, from time to time it disconnects and when it does dmesg shows a lot of failed to allocate rx_skb and failed to get tx report from firmware

Also, it failed while I was using qbittorrent and compiling firefox with LTO, so the system was under disk and network I/O, CPU and memory pressure at the same time

@dubhater
Copy link
Collaborator

@stkw0 If you are seeing that exact message, with lowercase rx followed by underscore, you are not running the latest code. The problem is fixed now. You should pull and recompile.

@stkw0
Copy link
Author

stkw0 commented Nov 16, 2024

I updated to the latest and now I get the "allocating new" and "freeing excess" messages

[ 1641.081770] rtw_8821au 1-8:1.0: allocating new RX skb
[ 1641.081809] rtw_8821au 1-8:1.0: freeing excess RX skb
[ 1736.708519] rtw_8821au 1-8:1.0: allocating new RX skb
[ 1736.708560] rtw_8821au 1-8:1.0: freeing excess RX skb

@dubhater
Copy link
Collaborator

@stkw0 Do they show up a lot?

@stkw0
Copy link
Author

stkw0 commented Nov 17, 2024

More or less, there are some bursts, then it keeps calm for a while. See attached log of dmesg | grep rtw_8821au.
dmesg_grep_rtw_8821au.txt

@dubhater
Copy link
Collaborator

@stkw0 Do you see a lot of "allocating new RX skb" if you apply the patch below?

diff --git a/usb.c b/usb.c
index 9dafc75..d1cce9c 100644
--- a/usb.c
+++ b/usb.c
@@ -623,13 +623,7 @@ static void rtw_usb_rx_handler(struct rtw_usb *rtwusb)
 			rx_desc += next_pkt;
 		} while (rx_skb->data + urb_len > rx_desc + pkt_desc_sz);
 
-		if (skb_queue_len(&rtwusb->rx_free_queue) >=
-		    RTW_USB_RX_SKB_NUM - RTW_USB_RXCB_NUM) {
-			rtw_err(rtwdev, "freeing excess RX skb\n");
-			dev_kfree_skb_any(rx_skb);
-		} else {
 			skb_queue_tail(&rtwusb->rx_free_queue, rx_skb);
-		}
 	}
 
 	if (limit == 200)

@stkw0
Copy link
Author

stkw0 commented Nov 27, 2024

Been running for over 7h now, only saw three "allocating new RX skb" a few minutes after running qbittorrent but seems pretty stable. I will report back if there are any news, but looks very promising :)
Thank you very much for your hard work!

@PinkFreud
Copy link

PinkFreud commented Nov 28, 2024

I just started testing this driver yesterday after dusting off my old Edimax EW-7811UTC and connecting it to a sbc w/ a RISC-V cpu.

It seems to be fairly stable for the moment - scanning works and returns available networks, connecting to one of my networks w/ WPA3 works, etc.

Transferring a 1GB file over the wireless link caused a few messages to appear, though:

[18504.458219] rtw_8821au 1-1.3:1.0: allocating new RX skb
[18504.463591] rtw_8821au 1-1.3:1.0: allocating new RX skb
[18504.468961] rtw_8821au 1-1.3:1.0: allocating new RX skb
[18504.474251] rtw_8821au 1-1.3:1.0: freeing excess RX skb
[18504.479527] rtw_8821au 1-1.3:1.0: freeing excess RX skb
[18504.484771] rtw_8821au 1-1.3:1.0: freeing excess RX skb
[18504.511701] rtw_8821au 1-1.3:1.0: allocating new RX skb
[18504.517037] rtw_8821au 1-1.3:1.0: allocating new RX skb
[18504.522359] rtw_8821au 1-1.3:1.0: freeing excess RX skb
[18504.527633] rtw_8821au 1-1.3:1.0: freeing excess RX skb
[18505.103092] rtw_8821au 1-1.3:1.0: allocating new RX skb
[18505.108443] rtw_8821au 1-1.3:1.0: allocating new RX skb
[18505.113747] rtw_8821au 1-1.3:1.0: allocating new RX skb
[18505.119020] rtw_8821au 1-1.3:1.0: allocating new RX skb
[18505.124305] rtw_8821au 1-1.3:1.0: freeing excess RX skb
[18505.129569] rtw_8821au 1-1.3:1.0: freeing excess RX skb
[18505.134815] rtw_8821au 1-1.3:1.0: freeing excess RX skb
[18505.140075] rtw_8821au 1-1.3:1.0: freeing excess RX skb
[18505.202344] rtw_8821au 1-1.3:1.0: allocating new RX skb
[18505.207654] rtw_8821au 1-1.3:1.0: allocating new RX skb
[18505.213003] rtw_8821au 1-1.3:1.0: freeing excess RX skb
[18505.218327] rtw_8821au 1-1.3:1.0: freeing excess RX skb

So far, they appear harmless - the connection is stable.

@dubhater
Copy link
Collaborator

@PinkFreud You should try this too: #205 (comment)

@PinkFreud
Copy link

@PinkFreud You should try this too: #205 (comment)

I copied the same file after recompiling w/ the patch applied to usb.c. No skb messages are appearing so far.

@PinkFreud
Copy link

[ 8410.130350] rtw_8821au 1-1.3:1.0: allocating new RX skb

Well, I received one. :) Looks like these are becoming much more rare now - and no 'freeing excess...' message accompanying it, either.

@plumbeo
Copy link

plumbeo commented Nov 28, 2024

@stkw0 Do you see a lot of "allocating new RX skb" if you apply the patch below?

diff --git a/usb.c b/usb.c
index 9dafc75..d1cce9c 100644
--- a/usb.c
+++ b/usb.c
@@ -623,13 +623,7 @@ static void rtw_usb_rx_handler(struct rtw_usb *rtwusb)
 			rx_desc += next_pkt;
 		} while (rx_skb->data + urb_len > rx_desc + pkt_desc_sz);
 
-		if (skb_queue_len(&rtwusb->rx_free_queue) >=
-		    RTW_USB_RX_SKB_NUM - RTW_USB_RXCB_NUM) {
-			rtw_err(rtwdev, "freeing excess RX skb\n");
-			dev_kfree_skb_any(rx_skb);
-		} else {
 			skb_queue_tail(&rtwusb->rx_free_queue, rx_skb);
-		}
 	}
 
 	if (limit == 200)

I'm testing this patch, but doesn't this change allow the number of allocated skbs to grow indefinitely? On my Raspberry Pi 3B I just had a streak of 17 allocating new RX skb messages in a second, if it happens a few more times will the driver not eventually end up using a lot of memory?

@dubhater
Copy link
Collaborator

dubhater commented Dec 3, 2024

Been running for over 7h now, only saw three "allocating new RX skb" a few minutes after running qbittorrent but seems pretty stable. I will report back if there are any news, but looks very promising :) Thank you very much for your hard work!

@stkw0 Have you seen any more since then?

@plumbeo Yes, that patch removes the limit. 17 certainly seems excessive. I will have to find another solution.

@stkw0
Copy link
Author

stkw0 commented Dec 3, 2024

Not as much, just a couple from time to time. Most messages are deauth followed with a re-auth with the wifi AP.

The log of rtw_8221au messages

[   65.727204] rtw_8821au 1-8:1.0: allocating new RX skb
[ 1202.964378] rtw_8821au 1-8:1.0: allocating new RX skb
[ 1492.276928] rtw_8821au 1-8:1.0: allocating new RX skb
[41983.673595] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[41988.473651] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[41989.969670] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[41993.033710] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[41997.441783] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[52996.353002] Loading firmware: rtw88/rtw8821a_fw.bin
[52999.878905] Loading firmware: rtw88/rtw8821a_fw.bin
[52999.879032] rtw_8821au 1-8:1.0: Firmware version 42.4.0, H2C version 0
[53000.430404] rtw_8821au 1-8:1.0: allocating new RX skb
[53000.430414] rtw_8821au 1-8:1.0: allocating new RX skb
[53000.430419] rtw_8821au 1-8:1.0: allocating new RX skb
[53001.496325] rtw_8821au 1-8:1.0: allocating new RX skb
[53007.854908] rtw_8821au 1-8:1.0: allocating new RX skb
[53012.885783] rtw_8821au 1-8:1.0: allocating new RX skb
[53012.885819] rtw_8821au 1-8:1.0: allocating new RX skb
[53012.885837] rtw_8821au 1-8:1.0: allocating new RX skb
[53301.652565] rtw_8821au 1-8:1.0: allocating new RX skb
[83763.898960] rtw_8821au 1-8:1.0: allocating new RX skb
[111342.895055] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[133633.256292] rtw_8821au 1-8:1.0: firmware failed to leave lps state

@plumbeo
Copy link

plumbeo commented Dec 4, 2024

@dubhater I've been using a slightly modified version of your latest patch:

 
        rx_skb = skb_dequeue(&rtwusb->rx_free_queue);
        if (!rx_skb) {
-               rtw_err(rtwdev, "allocating new RX skb\n");
+               rtw_err(rtwdev, "allocating new RX skb, current length %u\n", skb_queue_len(&rtwusb->rx_queue));
 
                rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, priority);
        }

I rapidly checked the code and using rx_queue here seemed to make sense.

In any case it tends to happen infrequently and only once per loading of the driver, probably because when it gets to 16 or so it's enough to guarantee there are free RX skbs available, for example in this run:

Nov 28 22:20:10 riverside kernel: IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
Nov 28 22:25:42 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 5
Nov 28 22:26:58 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 6
Nov 28 22:30:17 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 7
Nov 28 22:30:17 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 8
Nov 28 22:39:41 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 9
Nov 28 22:41:07 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 10
Nov 28 22:41:07 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 11
Nov 28 22:41:07 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 12
Nov 28 22:41:07 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 13
Nov 28 22:41:07 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 14
Nov 28 22:41:07 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 15
Nov 28 22:41:07 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 16
Dec 01 17:45:06 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 17
Dec 01 17:45:06 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 18

but then yesterday this happened, apparently at random (I wasn't using the Raspberry Pi at the time):

Dec 02 00:40:31 riverside kernel: IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
Dec 02 00:41:52 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 5
[...]
Dec 02 00:41:52 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 14
[...]
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 15
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 16
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 17
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 18
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 19
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 20
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 21
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 22
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 23
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 24
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 25
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 26
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 27
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 28
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 29
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 30
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 31
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 32
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 33
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 34
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 35
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 36
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 37
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 38
Dec 03 23:11:34 riverside kernel: rtw_8821au 1-1.5:1.0: allocating new RX skb, current length 39

I suppose a possible strategy would be to allocate 4 skbs, then allocate more without freeing until there are 16 or so, and then start aggressively freeing them after that. That way users who don't ever need to allocate new skbs would not waste memory, users who needs a little more skbs wouldn't suffer from fragmentation, and there would be some flexibility to manage any sudden need for more skbs without indefinitely using memory.

@dubhater
Copy link
Collaborator

dubhater commented Dec 4, 2024

@plumbeo Yes, that's probably what I will do. What was it like before you applied the patch?

@plumbeo
Copy link

plumbeo commented Dec 4, 2024

@plumbeo Yes, that's probably what I will do. What was it like before you applied the patch?

Thousands of messages, enough that after a while I had to comment the rtw_err calls because they were spamming the logs. This is an example of a couple of hours of testing: log.txt.

The new patch certainly helps a lot to reduce the churn.

@dubhater
Copy link
Collaborator

dubhater commented Dec 4, 2024

@plumbeo That's interesting. What kind of computer is it, and what kernel are you using?

@plumbeo
Copy link

plumbeo commented Dec 4, 2024

@dubhater it's a Raspberry Pi 3B, with 1 GB of RAM, using a Debian armhf-based os with kernel 5.15.92. 32 bit kernel and userland.

@dubhater
Copy link
Collaborator

dubhater commented Dec 4, 2024

@stkw0 What about your computer? It's something suitable for compiling Firefox, I guess. :)

@stkw0
Copy link
Author

stkw0 commented Dec 4, 2024

It's getting old but still capable, yes :) It's an i7-6700 CPU with 32 GB of RAM. Currently running 6.12.1-gentoo-x86_64.
In fact, the problem happened most aggressively when using qbittorrent + compiling, now I can stress the system as much without any visible breakage.

@dubhater
Copy link
Collaborator

dubhater commented Dec 4, 2024

@plumbeo Would it be possible to try kernel 6.9 or newer? Kernel 6.9 introduced a new type of workqueue, which this driver uses for RX processing when it's available. I wonder if it makes a difference.

@plumbeo
Copy link

plumbeo commented Dec 5, 2024

@plumbeo Would it be possible to try kernel 6.9 or newer? Kernel 6.9 introduced a new type of workqueue, which this driver uses for RX processing when it's available. I wonder if it makes a difference.

Unfortunately I have to use a downstream kernel and only the LTS releases are supported, so I'll have to wait till 6.12 is made available. No idea when that will be though.

@stkw0
Copy link
Author

stkw0 commented Dec 13, 2024

I had one uncommon breakage.
I was running HEAD at 611874c + the patch shared in this issue.
Linux 6.12.1

To resolve it I had to rmmod && modprobe. The uptime was almost 4 days. Also may affect that I don't power off the computer but suspend to disk and then resume it

[272872.045435] rtw_8821au 1-8:1.0: allocating new RX skb
[272873.252657] rtw_8821au 1-8:1.0: write register 0x6c4 failed with -71
[272873.252807] rtw_8821au 1-8:1.0: write register 0x6c8 failed with -71
[272873.252967] rtw_8821au 1-8:1.0: read register 0xc24 failed with -71
[272873.253164] rtw_8821au 1-8:1.0: write register 0xc24 failed with -71
[272873.253328] rtw_8821au 1-8:1.0: read register 0xc28 failed with -71
[272873.881792] rtw_8821au 1-8:1.0: failed to get tx report from firmware
[272880.070294] rtw_8821au 1-8:1.0: failed to send h2c command
[272880.681874] rtw_8821au 1-8:1.0: failed to get tx report from firmware

@dubhater
Copy link
Collaborator

@stkw0 Please open a new issue. I don't think it's related to this one.

@stkw0
Copy link
Author

stkw0 commented Dec 13, 2024

Okay. Will try updating everything first and see if it reproduces

@dubhater
Copy link
Collaborator

Fixed in kernel 6.14.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants