-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPv6 network stops working after a while #61
Comments
I also have same problem "IPv6 network stops working after a while". In my case, I'm using OpenWrt as a company's broadband router. It runs dhcp6d for LAN connection on stateless+stateful mode. My ISP can provide IPv6 connection via dhcpv6 against WAN connection. I'm afraid that I suspect that odhcp6c won't process dhcp expire, when it enters to this code block: Lines 524 to 569 in 53f07e9
With quick source code review, I have noticed that For odhcp6c developers, is there recommended way to mitigate this by user side? |
Sorry I may be wrong in previous post. I have noticed that there are 4 time values in dhcpv6 response from server. obtained with
It seems that odhcp6c uses |
I also posted my issue directly to teltonica and according to them it appears that my ISP didn't send out periodic RAs so my default route which had expire time of 65535 seconds expired. They also refered to this old workaround So it seems the workaound was to use the fakeroute option somehow |
Hi, I'm affected by an issue with at least the same effect (after a while like a day or so, my IPv6 access breaks). In my logs, I can see something like |
Hi, |
I investigated that a bit further and found out that now my IPv6 access works for 5-10 seconds, then it breaks for a few seconds and then the cycle repeats. When that happens, the error message above appears in my log. |
About my case, I decided to use workaround until good resolution will be available. It is to use crontab like:
At minute 45 every hour, odhcp6c process will receive This will be useful only if IPv6 connection will surely work until it's lease expiration. My problem is that IPv6 route's expiration isn't renewed after odhcpv6 renew transaction. root@OpenWrt:~# ip -6 r
default from 2409:250:XXXX:YYYY::/56 via fe80::226:bff:fe49:c2c0 dev wan metric 512
2409:250:XXXX:YYYY::/64 dev br-lan metric 1024
2409:250:XXXX:YYYY::/60 dev br-lan metric 256 expires 2035sec
unreachable 2409:250:XXXX:YYYY::/56 dev lo metric 2147483647 The route |
For me only SIGUSR2 worked that way. SIGUSR1 did nothing for some reason. But still, why doesn't it renew on T1? |
Same here, T1 and T2 are 150 & 240, preferred lifetime for prefix is 300. Further elements in openwrt/openwrt#13086 (comment) |
I've come across the same issue on NTT FLET'S CROSS.
Every 8 hours, I get the following log: |
Can't tell for sure, using 23.05.0-rc3 for 6 hours now with apparently no issues. |
What I understood in my case is that: there is another DHCPv6 enabled client (or proxy) on the same LAN. This is not a DHCPv6 client side problem of OpenWrt device. Suspect whether the another router like device like Business Phone unit or such may dispatch DHCPv6 Check if you are interested in: サクサのひかり電話オフィス収容ユニットとIPv6の共存模索 | mixiユーザー(id:2416887)の日記
As network fundamental idea, router treats Layer 2 data. ↓ OpenWrt will act as DHCPv6 client ↓ Then, SAXA unit will act as DHCPv6 client after X hours. And then absorb all of incoming packets from WAN. As a workaround of this issue, there is an idea to implement ping client and watchdog timer in order to restart DHCPv6 client. https://github.com/HiraokaHyperTools/openwrt-watchngn |
No, that's not the case in my home network. There are no other dhcpv6 client besides the OpenWRT router, and I also do not have a contract like Hikari Denwa. Even without a Hikari Denwa contract, NTT still allocates me a /56 IPv6-PD. My home network like this: NTT 10G-ONU->OpenWRT router->Switch Hub->AP/PC/NAS... I’ve noticed that this issue only occurs with OpenWRT devices that connect to FLET'S CROSS, while FLET'S NEXT doesn’t seem to have a similar problem, perhaps due to its Valid Lifetime lasting as long as a month. |
Just subscribed to NTT 10G Cross with Plala and I am getting a similar behavior with open-wrt... Anyone got a good fix or workaround since last time? It takes less than a minute to recover by itself but everything will be interrupted then, which is quite inconvenient, especially when you are in the middle of a meeting... :D I can see these logs eveytime I get an outage: That seems to be same problem isn't it? |
For me sending |
Thanks for your reply PF4Public. I will setup that in crontab then and see if better. |
Running "kill -SIGUSR2 |
Sending SIGUSR2 signal to odhcp6c means to send RELEASE and restart the state machine by sending SOLICIT messages, while SIGUSR1 sends a renew message. |
Thanks a lot for script. Just to be sure I am doing things right: I would just drop the script file in "/etc/hotplug.d/iface/", maybe enable the execute bit and that's it? It will be executed (at next boot? on interface status change?) and will add the "kill -SIGUSR1 $(pgrep odhcp6c)' "to be run every hour by crontab? Is that it ? |
@JesusArmy, or you can just add the following line in the Scheduled Tasks from the System menu.
You will see this log every hour.
|
Thank you both, that seems to be working great so far. I have not noticed any disconnection for the last 8h :) |
I found that OpenWrt package repo has isc-dhcp-client. Perhaps we can use it to replace odhcp6c. However its size is close to 1MB, which is much larger than odhcp6c. |
Thanks for sharing this. I'm using a mini-pc with a lot of storage and memory, so the size is not a problem. |
Looking forward to hearing some good news :) |
I have been running that since last week, with quiet some success but still getting issues occasionally. It used to be many outage a day, and now I seems to get one or two maximum. If I want to investigate what's happening, should I change log level to something more verbose or do I need to run some 24/7 tcpdump to capture what's happening at that time? Today the only thing I see in logs when that happen a bunch of "no route" logs like this:
|
Just wondering, are you also getting the quite low MTU of 1280 on your map-e interface ? |
FYI, it's been 48h that I have increased tx lenght from 1,000 to 10,000 on both eth physical interfaces (from luci) and the map-e interface (from terminal) and so far so good. Zero disconnection in 2 days. :) |
Glad to hear it. |
Yep it was running every hours, but I was still getting some additional disconnections. And I even got up to 3 within the same hour at some point before changing TX queue length. There is still the possibility that just restarting the network stack would have been enough to fix that issue. And that the TX queue modification is actually absolutely not related to getting a stable link... 😇 |
Not sure but I've never come across "No default route present, overriding ra_lifetime," and my network connection is solid as a rock.
|
Hi missing 223. |
I have the same issue running latest version(OpenWrt 23.05.4 r24012-d8dd03c46f) , but my refresh values are much lower (T1:600 T2:960 (IA_PD-prefix 2a02:678:640:e800::/56 pltime:1200 vltime:3600)). But what I noticed is I've got 2 odhcp6c processes running:
Now I configure the wan6 to request a /56 network (the default I get from my ISP). As you can see above only process 5682 is using this value (-P56), while the other (29379) is using the normal default (-P0). I can see as well 2 diferent dhcp6 solicit & advertise pairs(differences marked bold): As well I only get one dhcp6 request & reply: After this I get another dhcp request & reply as above. For me it looks like after the second request & reply I loose my IPv6 connection. |
@cre8ivejp @JesusArmy |
Im using odhcp6c in NTT env(フレッツ光クロス), but due to the above problem, I was running renew(SIGUSR1) via crontab As missing233 said, it's possible that NTT has fixed something. |
I have also experienced the symptom that IPv6 stops working after a while, and I have done some analysis. Using tcpdump to capture a packet trace and Wireshark to look at it, I can see that the problem begins when odhcp6c sends the first RENEW message, 12 hours after the prefix delegation lease began which is the T1 time value from the original REPLY message from my ISP. The RENEW message looks good to me but the REPLY message from my ISP contains an error status code shown in Wireshark as "NoPrefixAvail". The reply contains an option "Identity Association for Prefix Delegation"(25) which contains a sub option "IA Prefix"(26) which has this error status code set on it. The problem appears to be that odhcp6c does not even look for error codes there, it thinks it got a good response and is happy. However the ISP is no longer routing traffic for me so my IPv6 is broken. The response did actually update the "valid lifetime" and "preferred lifetime" to zero, and odhcp6c does update its internal state to record this, so next time the renewal fails differently and odhcp6c realises that it needs to get a new lease from scratch. So 12 hours after IPv6 broke it actually starts working again. Similarly if I send SIGUSR1 to force renewal while it's working then it breaks, if I do it while it is broken then it starts working again. I have tested a patch to fix this problem, and will upload it as a pull request. (Lucky I have a MacBook which is ARM architecture, so using a Debian docker container I can easily compile an ARM binary that works on my router, a Unifi UCG-Max that is using Debian 11 under the hood). |
Check for error status code in the IA Prefix option in replies to RENEW messages. This fixes a problem where odhcp6c thinks that renewal succeeded, when actually the upstream router is no longer routing this prefix for us. See openwrt#61 (comment)
@achims311 do you have any updates on your situation? It looks very similar to mine, except that the error you get in the reply to your renew is slightly different, which is actually handled by the following code to trigger sending a new request: Lines 1588 to 1593 in ffbb2d5
Are you saying that this new request fails to work? Can you share the tcpdump for this part too? And is it only a momentary outage or a persisting outage? For reference the tcpdump output for my own case of renewal failure is this: 22:07:58.607583 IP6 (flowlabel 0x4c641, hlim 1, next-header UDP (17) payload length: 140) fe80::2a70:4eff:fe6e:50b7.546 > ff02::1:2.547: [bad udp cksum 0xc6b9 -> 0x54c4!] dhcp6 renew (xid=4b4d32 (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67) (client-ID hwaddr type 1 28704e6e50b7) (server-ID vid 0000058361633a37) (Client-FQDN) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2407:5400:3102:be00::/56 pltime:0 vltime:0))) 22:07:58.702855 IP6 (class 0xc0, hlim 64, next-header UDP (17) payload length: 137) fe80::ae78:d1ff:fe32:985b.547 > fe80::2a70:4eff:fe6e:50b7.546: [udp sum ok] dhcp6 reply (xid=4b4d32 (client-ID hwaddr type 1 28704e6e50b7) (server-ID vid 0000058361633a37) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2407:5400:3102:be00::/56 pltime:0 vltime:0 (status-code NoPrefixAvail)))) |
I have a Teltonika RUTX11 mobile internet router. It's having issues that makes my router to loose IPv6 connectivity after a while.
If I kill the odhcp6c process it will spawn a new process and IPv6 connectivity is restored for a while but in like 1-3 days It's diving again.
I have logged a task at Teltonika but they haven't been able to pinpoint where the issue is. I have tried starting the process with -v but it seems to not be able to give any more details in the logs.
The process is started with these arguments:
odhcp6c -v -s /lib/netifd/dhcpv6.script -P0 -t120 qmimux0
Is there anything I can do to enable more logging to pinpoint the issue?
The /lib/netifd/dhcpv6.script file is quite massive:
The text was updated successfully, but these errors were encountered: