-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rtw_8821au: Connection breaks after a while #205
Comments
How long does it usually take to lose the connection like that? Please attach the full journalctl output from a boot where your connection broke. |
From a couple of hours (say ~3h) to 1-2 days. I don't use systemd. I will add the relevant logs once it happens again. Thank you. |
Here is the log of the last time that connetion broke
|
It looks like the firmware stops working, no idea why. "MAC has not been powered on yet" is printed when the chip is being powered on. Instead of rebooting, have you tried reloading rtw_8821au? Have you tried unplugging the device? Also, I see that your computer was suspended. Does the problem happen if the computer stays awake the whole time? |
I waited ~50 hours without suspending it, it didn't failed. Then I suspended it two times leaving some hours in between, without failures. I don't know if some changed that I made to the kernel (for other reasons) could affect this issue. For the past 5 days I didn't had this issue again. Will report back if at some point I have more clues :/ |
Did you update rtw88 since your original report? I pushed some changes recently. |
No. I updated now, if it happens again I will report back. |
It must be switching because the signal strength varies over time. The switching could be related. If you can give the 2.4 GHz and 5 GHz networks separate SSIDs, you could try to make it switch from one to the other in a loop using nmcli/iwctl. Maybe give it a few seconds after each switch. |
For now there are no problems. If I have time I will try to build a test script. Thank you |
I had some disconnects but now seems it reconnects correctly without hanging forever. I guess this issue can be closed as it could not be reproduced. |
I was having the same problems but it was automatically resolved when I rebooted my system but now I am having the same problem again today |
I'm trying to reproduce it now: for i in {001..100}; do nmcli connection down 64e4328c-6606-4648-93bc-247763c3bc5a; sleep 10; nmcli connection up 64e4328c-6606-4648-93bc-247763c3bc5a; sleep 10; done |
Still works. |
By the way, are either of you using KDE Plasma and its NetworkManager applet? |
I am and also using iwd backend instead of wpa_supplicant |
It happened again now. rmmod rtw_8821au and modprobe again fixed the issue. Here is the log of the failure and the recovery. Since last time I didn't pulled new commits from this repository. Nothing changed except that I updated to Linux 6.9.8:
|
I pushed something that may help. Maybe it won't. Please pull and test. |
Here's something silly that works for me whenever my connection breaks
|
Well, I ran into the disconnection problem again yesterday, and today too. I'm thinking it's somehow caused by my torrent client. @stkw0 and @tratum were you downloading or uploading a lot of Linux ISOs when the connection died? :) When the connection died yesterday, qBittorrent was showing over 5 GiB downloaded and about the same uploaded. Today it showed 22 GiB downloaded and 7 uploaded. I tried to trigger the disconnection using iperf3, but it downloaded and uploaded a lot with no issues. I was wrong earlier, the firmware doesn't die. Everything keeps working, except the driver doesn't receive anything from the chip anymore. I can see it transmitting probe requests on channels 48 and 149, so it's switching the channel and transmitting fine. |
@dubhater, I mean I was in the process of downloading the Windows ISO to set up a dual-boot configuration. However, I encountered frequent network disconnections randomly even before initiating the download. Additionally, upon switching to the Windows operating system, I faced another issue where I was unable to establish a connection to my Wi-Fi network even in the Windows OS. I've been thinking about it, and I don't think the torrent client is the root cause of the disconnections |
I'm happy to share that for now the issues I was experiencing with frequent disconnects and WiFi interruptions have been resolved. Here are the detailed system specifications for my current system:
|
I had qbittorrent opened, but it was not transmitting a high amount of bandwidth. If it's related with that, maybe the problem is more about opening and closing connections (the DHT and so) than bandwidth. |
Next time it happens, before you do anything else, please gather some information with these simple steps:
If registers.txt is filled mostly with |
It happened again but seems it's far less common now (still using rtw88 commit 5db1508). rmmod & modprobe workaround the problem. I could not gather the information requested since I didn't have debugfs enabled. Will do it now and update to latest master. |
Tried to follow this instructions but it changed from phy5 (on my machine) to phy6 when re-loading the module. Is there a way to fix it to have a predictable number? Also, would it be fine if I use |
I don't think you can make it more predictable. |
I got another idea. Please apply this patch and let me know if you see the error message "rtw_usb: probably just ran out of RX URBs" when the connection dies: diff --git a/drivers/net/wireless/realtek/rtw88/usb.c b/drivers/net/wireless/realtek/rtw88/usb.c
index bf55360f9daf..149a200ffe19 100644
--- a/drivers/net/wireless/realtek/rtw88/usb.c
+++ b/drivers/net/wireless/realtek/rtw88/usb.c
@@ -671,6 +671,9 @@ static void rtw_usb_read_port_complete(struct urb *urb)
}
if (skb)
dev_kfree_skb_any(skb);
+ rtwusb->skipped_resubmit++;
+ if (rtwusb->skipped_resubmit >= RTW_USB_RXCB_NUM)
+ pr_err_once("rtw_usb: probably just ran out of RX URBs\n");
}
}
diff --git a/drivers/net/wireless/realtek/rtw88/usb.h b/drivers/net/wireless/realtek/rtw88/usb.h
index 86697a5c0103..85bcb09b7997 100644
--- a/drivers/net/wireless/realtek/rtw88/usb.h
+++ b/drivers/net/wireless/realtek/rtw88/usb.h
@@ -82,6 +82,7 @@ struct rtw_usb {
struct rx_usb_ctrl_block rx_cb[RTW_USB_RXCB_NUM];
struct sk_buff_head rx_queue;
struct work_struct rx_work;
+ int skipped_resubmit;
};
static inline struct rtw_usb_tx_data *rtw_usb_get_tx_data(struct sk_buff *skb) I would test it myself but my RTL8812AU just died and I don't want to reload the driver until I'm sure I don't need any more information from it. |
Haha, after RTL8812AU died I plugged RTL8811AU and it also died a few hours later. Only when I'm not trying to make it happen... I got the register contents from both. I confirmed that rtw88 is not even receiving messages from the firmware (this is the cause of the "failed to get tx report from firmware" errors). If you haven't started yet, here is a better patch which shows the error code: diff --git a/drivers/net/wireless/realtek/rtw88/usb.c b/drivers/net/wireless/realtek/rtw88/usb.c
index bf55360f9daf..4dbcc276a76c 100644
--- a/drivers/net/wireless/realtek/rtw88/usb.c
+++ b/drivers/net/wireless/realtek/rtw88/usb.c
@@ -664,7 +664,6 @@ static void rtw_usb_read_port_complete(struct urb *urb)
case -ECOMM:
case -EOVERFLOW:
case -EINPROGRESS:
- break;
default:
rtw_err(rtwdev, "status %d\n", urb->status);
break; If this is indeed the right direction, you will see something like |
In linux 6.10.3 seems that this patch is already applied? Will try with the latest Linux kernel and the latest master of this repo |
I've been testing the new changes, no issues and the speed seems to be a little better than before even with TCP transfers. I'm seeing a lot of |
The current state is much better. Still, from time to time it disconnects and when it does dmesg shows a lot of Also, it failed while I was using qbittorrent and compiling firefox with LTO, so the system was under disk and network I/O, CPU and memory pressure at the same time |
@stkw0 If you are seeing that exact message, with lowercase rx followed by underscore, you are not running the latest code. The problem is fixed now. You should pull and recompile. |
I updated to the latest and now I get the "allocating new" and "freeing excess" messages
|
@stkw0 Do they show up a lot? |
More or less, there are some bursts, then it keeps calm for a while. See attached log of |
@stkw0 Do you see a lot of "allocating new RX skb" if you apply the patch below? diff --git a/usb.c b/usb.c
index 9dafc75..d1cce9c 100644
--- a/usb.c
+++ b/usb.c
@@ -623,13 +623,7 @@ static void rtw_usb_rx_handler(struct rtw_usb *rtwusb)
rx_desc += next_pkt;
} while (rx_skb->data + urb_len > rx_desc + pkt_desc_sz);
- if (skb_queue_len(&rtwusb->rx_free_queue) >=
- RTW_USB_RX_SKB_NUM - RTW_USB_RXCB_NUM) {
- rtw_err(rtwdev, "freeing excess RX skb\n");
- dev_kfree_skb_any(rx_skb);
- } else {
skb_queue_tail(&rtwusb->rx_free_queue, rx_skb);
- }
}
if (limit == 200) |
Been running for over 7h now, only saw three "allocating new RX skb" a few minutes after running qbittorrent but seems pretty stable. I will report back if there are any news, but looks very promising :) |
I just started testing this driver yesterday after dusting off my old Edimax EW-7811UTC and connecting it to a sbc w/ a RISC-V cpu. It seems to be fairly stable for the moment - scanning works and returns available networks, connecting to one of my networks w/ WPA3 works, etc. Transferring a 1GB file over the wireless link caused a few messages to appear, though:
So far, they appear harmless - the connection is stable. |
@PinkFreud You should try this too: #205 (comment) |
I copied the same file after recompiling w/ the patch applied to usb.c. No skb messages are appearing so far. |
Well, I received one. :) Looks like these are becoming much more rare now - and no 'freeing excess...' message accompanying it, either. |
I'm testing this patch, but doesn't this change allow the number of allocated skbs to grow indefinitely? On my Raspberry Pi 3B I just had a streak of 17 |
@stkw0 Have you seen any more since then? @plumbeo Yes, that patch removes the limit. 17 certainly seems excessive. I will have to find another solution. |
Not as much, just a couple from time to time. Most messages are deauth followed with a re-auth with the wifi AP. The log of rtw_8221au messages
|
@dubhater I've been using a slightly modified version of your latest patch:
I rapidly checked the code and using In any case it tends to happen infrequently and only once per loading of the driver, probably because when it gets to 16 or so it's enough to guarantee there are free RX skbs available, for example in this run:
but then yesterday this happened, apparently at random (I wasn't using the Raspberry Pi at the time):
I suppose a possible strategy would be to allocate 4 skbs, then allocate more without freeing until there are 16 or so, and then start aggressively freeing them after that. That way users who don't ever need to allocate new skbs would not waste memory, users who needs a little more skbs wouldn't suffer from fragmentation, and there would be some flexibility to manage any sudden need for more skbs without indefinitely using memory. |
@plumbeo Yes, that's probably what I will do. What was it like before you applied the patch? |
Thousands of messages, enough that after a while I had to comment the The new patch certainly helps a lot to reduce the churn. |
@plumbeo That's interesting. What kind of computer is it, and what kernel are you using? |
@dubhater it's a Raspberry Pi 3B, with 1 GB of RAM, using a Debian armhf-based os with kernel 5.15.92. 32 bit kernel and userland. |
@stkw0 What about your computer? It's something suitable for compiling Firefox, I guess. :) |
It's getting old but still capable, yes :) It's an i7-6700 CPU with 32 GB of RAM. Currently running 6.12.1-gentoo-x86_64. |
@plumbeo Would it be possible to try kernel 6.9 or newer? Kernel 6.9 introduced a new type of workqueue, which this driver uses for RX processing when it's available. I wonder if it makes a difference. |
Unfortunately I have to use a downstream kernel and only the LTS releases are supported, so I'll have to wait till 6.12 is made available. No idea when that will be though. |
I had one uncommon breakage. To resolve it I had to rmmod && modprobe. The uptime was almost 4 days. Also may affect that I don't power off the computer but suspend to disk and then resume it
|
@stkw0 Please open a new issue. I don't think it's related to this one. |
Okay. Will try updating everything first and see if it reproduces |
Fixed in kernel 6.14. |
Sometimes my wifi connection suddenly disconnects. When it happens,
dmesg
shows the next message recurrently: "rtw_8821au 1-8:1.0: MAC has not been powered on yet". No matter what I try, seems it's only resolved by rebooting the computer.This message also shows the first time I boot up the computer, but it connects properly the first time.
May also important to note that before using rtw88 source I used aircrack-ng/rtl8812au driver. With those drivers, I also had random disconnections (maybe due to the AP¿?) but after restarting iwd or NetworkManager it recovered the connection. This does not happen now.
Module: rtw_8821au
Hardware: ID 2357:0120 TP-Link Archer T2U PLUS [RTL8821AU]
Linux: 6.9.5
The text was updated successfully, but these errors were encountered: