Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPv6 network stops working after a while #61

Open
hartmark opened this issue Oct 17, 2020 · 36 comments
Open

IPv6 network stops working after a while #61

hartmark opened this issue Oct 17, 2020 · 36 comments

Comments

@hartmark
Copy link

I have a Teltonika RUTX11 mobile internet router. It's having issues that makes my router to loose IPv6 connectivity after a while.

root@Teltonika-RUTX11:~# ping6 ftp.sunet.se
PING ftp.sunet.se (2001:6b0:19::165): 56 data bytes
ping6: sendto: Permission denied

If I kill the odhcp6c process it will spawn a new process and IPv6 connectivity is restored for a while but in like 1-3 days It's diving again.

I have logged a task at Teltonika but they haven't been able to pinpoint where the issue is. I have tried starting the process with -v but it seems to not be able to give any more details in the logs.

The process is started with these arguments:
odhcp6c -v -s /lib/netifd/dhcpv6.script -P0 -t120 qmimux0

Is there anything I can do to enable more logging to pinpoint the issue?

The /lib/netifd/dhcpv6.script file is quite massive:

#!/bin/sh
[ -z "$2" ] && echo "Error: should be run by odhcpc6c" && exit 1
. /lib/functions.sh
. /lib/netifd/netifd-proto.sh

setup_interface () {
	local device="$1"
	local prefsig=""
	local addrsig=""

	# Apply IPv6 / ND configuration
	HOPLIMIT=$(cat /proc/sys/net/ipv6/conf/$device/hop_limit)
	[ -n "$RA_HOPLIMIT" -a -n "$HOPLIMIT" ] && [ "$RA_HOPLIMIT" -gt "$HOPLIMIT" ] && echo "$RA_HOPLIMIT" > /proc/sys/net/ipv6/conf/$device/hop_limit
	[ -n "$RA_MTU" ] && [ "$RA_MTU" -ge 1280 ] && echo "$RA_MTU" > /proc/sys/net/ipv6/conf/$device/mtu 2>/dev/null
	[ -n "$RA_REACHABLE" ] && [ "$RA_REACHABLE" -gt 0 ] && echo "$RA_REACHABLE" > /proc/sys/net/ipv6/neigh/$device/base_reachable_time_ms
	[ -n "$RA_RETRANSMIT" ] && [ "$RA_RETRANSMIT" -gt 0 ] && echo "$RA_RETRANSMIT" > /proc/sys/net/ipv6/neigh/$device/retrans_time_ms

	proto_init_update "*" 1

	# Merge RA-DNS
	for radns in $RA_DNS; do
		local duplicate=0
		for dns in $RDNSS; do
			[ "$radns" = "$dns" ] && duplicate=1
		done
		[ "$duplicate" = 0 ] && RDNSS="$RDNSS $radns"
	done

	for dns in $RDNSS; do
		proto_add_dns_server "$dns"
	done

	for radomain in $RA_DOMAINS; do
		local duplicate=0
		for domain in $DOMAINS; do
			[ "$radomain" = "$domain" ] && duplicate=1
		done
		[ "$duplicate" = 0 ] && DOMAINS="$DOMAINS $radomain"
	done

	for domain in $DOMAINS; do
		proto_add_dns_search "$domain"
	done

	for prefix in $PREFIXES; do
		proto_add_ipv6_prefix "$prefix"
		prefsig="$prefsig ${prefix%%,*}"
		local entry="${prefix#*/}"
		entry="${entry#*,}"
		entry="${entry#*,}"
		local valid="${entry%%,*}"

		if [ -z "$RA_ADDRESSES" -a -z "$RA_ROUTES" -a \
				-z "$RA_DNS" -a "$FAKE_ROUTES" = 1 ]; then
			RA_ROUTES="::/0,$SERVER,$valid,4096"
		fi
	done

	for prefix in $USERPREFIX; do
		proto_add_ipv6_prefix "$prefix"
	done

	# Merge addresses
	for entry in $RA_ADDRESSES; do
		local duplicate=0
		local addr="${entry%%/*}"
		for dentry in $ADDRESSES; do
			local daddr="${dentry%%/*}"
			[ "$addr" = "$daddr" ] && duplicate=1
		done
		[ "$duplicate" = "0" ] && ADDRESSES="$ADDRESSES $entry"
	done

	for entry in $ADDRESSES; do
		local addr="${entry%%/*}"
		entry="${entry#*/}"
		local mask="${entry%%,*}"
		entry="${entry#*,}"
		local preferred="${entry%%,*}"
		entry="${entry#*,}"
		local valid="${entry%%,*}"

		proto_add_ipv6_address "$addr" "$mask" "$preferred" "$valid" 1
		addrsig="$addrsig $addr/$mask"

		if [ -z "$RA_ADDRESSES" -a -z "$RA_ROUTES" -a \
				-z "$RA_DNS" -a "$FAKE_ROUTES" = 1 ]; then
			RA_ROUTES="::/0,$SERVER,$valid,4096"
		fi

		# RFC 7278
		if [ "$mask" -eq 64 -a -z "$PREFIXES" -a -n "$EXTENDPREFIX" ]; then
			proto_add_ipv6_prefix "$addr/$mask,$preferred,$valid"

			local raroutes=""
			for route in $RA_ROUTES; do
				local prefix="${route%%/*}"
				local entry="${route#*/}"
				local pmask="${entry%%,*}"
				entry="${entry#*,}"
				local gw="${entry%%,*}"

				[ -z "$gw" -a "$mask" = "$pmask" ] && {
					case "$addr" in
						"${prefix%*::}"*) continue;;
					esac
				}
				raroutes="$raroutes $route"
			done
			RA_ROUTES="$raroutes"
		fi
	done

	for entry in $RA_ROUTES; do
		local duplicate=$NOSOURCEFILTER
		local addr="${entry%%/*}"
		entry="${entry#*/}"
		local mask="${entry%%,*}"
		entry="${entry#*,}"
		local gw="${entry%%,*}"
		entry="${entry#*,}"
		local valid="${entry%%,*}"
		entry="${entry#*,}"
		local metric="${entry%%,*}"

		for xentry in $RA_ROUTES; do
			local xprefix="${xentry%%,*}"
			xentry="${xentry#*,}"
			local xgw="${xentry%%,*}"

			[ -n "$gw" -a -z "$xgw" -a "$addr/$mask" = "$xprefix" ] && duplicate=1
		done

		if [ -z "$gw" -o "$duplicate" = 1 ]; then
			proto_add_ipv6_route "$addr" "$mask" "$gw" "$metric" "$valid"
		else
			for prefix in $PREFIXES $ADDRESSES; do
				local paddr="${prefix%%,*}"
				proto_add_ipv6_route "$addr" "$mask" "$gw" "$metric" "$valid" "$paddr"
			done
		fi
	done

	proto_add_data
	[ -n "$CER" ] && json_add_string cer "$CER"
	[ -n "$PASSTHRU" ] && json_add_string passthru "$PASSTHRU"
	[ -n "$ZONE" ] && json_add_string zone "$ZONE"
	proto_close_data

	proto_send_update "$INTERFACE"

	MAPTYPE=""
	MAPRULE=""

	if [ -n "$MAPE" -a -f /lib/netifd/proto/map.sh ]; then
		MAPTYPE="map-e"
		MAPRULE="$MAPE"
	elif [ -n "$MAPT" -a -f /lib/netifd/proto/map.sh -a -f /proc/net/nat46/control ]; then
		MAPTYPE="map-t"
		MAPRULE="$MAPT"
	elif [ -n "$LW4O6" -a -f /lib/netifd/proto/map.sh ]; then
		MAPTYPE="lw4o6"
		MAPRULE="$LW4O6"
	fi

	[ -n "$ZONE" ] || ZONE=$(fw3 -q network $INTERFACE 2>/dev/null)

	if [ "$IFACE_MAP" != 0 -a -n "$MAPTYPE" -a -n "$MAPRULE" ]; then
		[ -z "$IFACE_MAP" -o "$IFACE_MAP" = 1 ] && IFACE_MAP=${INTERFACE}_4
		json_init
		json_add_string name "$IFACE_MAP"
		json_add_string ifname "@$INTERFACE"
		json_add_string proto map
		json_add_string type "$MAPTYPE"
		json_add_string _prefsig "$prefsig"
		[ "$MAPTYPE" = lw4o6 ] && json_add_string _addrsig "$addrsig"
		json_add_string rule "$MAPRULE"
		json_add_string tunlink "$INTERFACE"
		[ -n "$ZONE_MAP" ] || ZONE_MAP=$ZONE
		[ -n "$ZONE_MAP" ] && json_add_string zone "$ZONE_MAP"
		[ -n "$ENCAPLIMIT_MAP" ] && json_add_string encaplimit "$ENCAPLIMIT_MAP"
		[ -n "$IFACE_MAP_DELEGATE" ] && json_add_boolean delegate "$IFACE_MAP_DELEGATE"
		json_close_object
		ubus call network add_dynamic "$(json_dump)"
	elif [ -n "$AFTR" -a "$IFACE_DSLITE" != 0 -a -f /lib/netifd/proto/dslite.sh ]; then
		[ -z "$IFACE_DSLITE" -o "$IFACE_DSLITE" = 1 ] && IFACE_DSLITE=${INTERFACE}_4
		json_init
		json_add_string name "$IFACE_DSLITE"
		json_add_string ifname "@$INTERFACE"
		json_add_string proto "dslite"
		json_add_string peeraddr "$AFTR"
		json_add_string tunlink "$INTERFACE"
		[ -n "$ZONE_DSLITE" ] || ZONE_DSLITE=$ZONE
		[ -n "$ZONE_DSLITE" ] && json_add_string zone "$ZONE_DSLITE"
		[ -n "$ENCAPLIMIT_DSLITE" ] && json_add_string encaplimit "$ENCAPLIMIT_DSLITE"
		[ -n "$IFACE_DSLITE_DELEGATE" ] && json_add_boolean delegate "$IFACE_DSLITE_DELEGATE"
		json_close_object
		ubus call network add_dynamic "$(json_dump)"
	elif [ "$IFACE_464XLAT" != 0 -a -f /lib/netifd/proto/464xlat.sh ]; then
		[ -z "$IFACE_464XLAT" -o "$IFACE_464XLAT" = 1 ] && IFACE_464XLAT=${INTERFACE}_4
		json_init
		json_add_string name "$IFACE_464XLAT"
		json_add_string ifname "@$INTERFACE"
		json_add_string proto "464xlat"
		json_add_string tunlink "$INTERFACE"
		json_add_string _addrsig "$addrsig"
		[ -n "$ZONE_464XLAT" ] || ZONE_464XLAT=$ZONE
		[ -n "$ZONE_464XLAT" ] && json_add_string zone "$ZONE_464XLAT"
		[ -n "$IFACE_464XLAT_DELEGATE" ] && json_add_boolean delegate "$IFACE_464XLAT_DELEGATE"
		json_close_object
		ubus call network add_dynamic "$(json_dump)"
	fi

	# TODO: $SNTP_IP $SIP_IP $SNTP_FQDN $SIP_DOMAIN
}

teardown_interface() {
	proto_init_update "*" 0
	proto_send_update "$INTERFACE"
}

case "$2" in
	bound)
		teardown_interface "$1"
		setup_interface "$1"
	;;
	informed|updated|rebound)
		setup_interface "$1"
	;;
	ra-updated)
		[ -n "$ADDRESSES$RA_ADDRESSES$PREFIXES$USERPREFIX" ] && setup_interface "$1"
	;;
	started|stopped|unbound)
		teardown_interface "$1"
	;;
esac

# user rules
[ -f /etc/odhcp6c.user ] && . /etc/odhcp6c.user "$@"

exit 0
@kenjiuno
Copy link

I also have same problem "IPv6 network stops working after a while".

In my case, I'm using OpenWrt as a company's broadband router. It runs dhcp6d for LAN connection on stateless+stateful mode.

My ISP can provide IPv6 connection via dhcpv6 against WAN connection.

I'm afraid that I suspect that odhcp6c won't process dhcp expire, when it enters to this code block:

odhcp6c/src/odhcp6c.c

Lines 524 to 569 in 53f07e9

while (!signal_usr2 && !signal_term) {
// Renew Cycle
// Wait for T1 to expire or until we get a reconfigure
res = dhcpv6_poll_reconfigure();
odhcp6c_signal_process();
if (res > 0) {
script_call("updated", 0, false);
continue;
}
// Handle signal, if necessary
if (signal_usr1)
signal_usr1 = false; // Acknowledged
if (signal_usr2 || signal_term)
break; // Other signal type
// Send renew as T1 expired
res = dhcpv6_request(DHCPV6_MSG_RENEW);
odhcp6c_signal_process();
if (res > 0) { // Renew was succesfull
// Publish updates
script_call("updated", 0, false);
continue; // Renew was successful
}
odhcp6c_clear_state(STATE_SERVER_ID); // Remove binding
odhcp6c_clear_state(STATE_SERVER_ADDR);
size_t ia_pd_len, ia_na_len;
odhcp6c_get_state(STATE_IA_PD, &ia_pd_len);
odhcp6c_get_state(STATE_IA_NA, &ia_na_len);
if (ia_pd_len == 0 && ia_na_len == 0)
break;
// If we have IAs, try rebind otherwise restart
res = dhcpv6_request(DHCPV6_MSG_REBIND);
odhcp6c_signal_process();
if (res > 0)
script_call("rebound", 0, true);
else
break;
}

With quick source code review, I have noticed that kill -SIGUSR1 <PID> or such (SIGUSR2 too?) will be only way to update dhcpv6 lease.

For odhcp6c developers, is there recommended way to mitigate this by user side?

@kenjiuno
Copy link

Sorry I may be wrong in previous post.

I have noticed that there are 4 time values in dhcpv6 response from server. obtained with tcpdump -n -vv -i eth0.2 udp portrange 546-547

00:15:21.535949 IP6 (class 0xb8, hlim 255, next-header UDP (17) payload length: 206) fe80::226:bff:fe49:c2c0.547 > fe80::a451:abff:fe7e:5d18.546: [udp sum ok] dhcp6 reply (xid=17778b (client-ID hwaddr type 1 a651ab7e5d18) (server-ID hwaddr type 1 000d5ec4c34c) (SIP-servers-address XXXX:XXXX:XXXX:XXXX::X) (DNS-server XXXX:XXXX:XXXX::X XXXX:XXXX:XXXX:X::X) (DNS-search-list flets-west.jp. iptvf.jp.) (IA_PD IAID:1 T1:7200 T2:10800 (IA_PD-prefix XXXX:XXX:XXXX:XXXX::/56 pltime:12600 vltime:14400)) (SNTP-servers XXXX:XXXX:XXX::X XXXX:XXXX:XXX::X))

item period
T1 RENEW 7,200 secs → 2 hrs
T2 REBIND 10,800 secs → 3 hrs
Preferred lifetime 12,600 secs → 3.5 hrs
Valid lifetime 14,400 secs → 4 hrs

It seems that odhcp6c uses Valid lifetime instead of T1/T2

@hartmark
Copy link
Author

I also posted my issue directly to teltonica and according to them it appears that my ISP didn't send out periodic RAs so my default route which had expire time of 65535 seconds expired.

They also refered to this old workaround
openwrt/openwrt@8691d75

So it seems the workaound was to use the fakeroute option somehow

@eriktews
Copy link

Hi, I'm affected by an issue with at least the same effect (after a while like a day or so, my IPv6 access breaks). In my logs, I can see something like Server returned IA_PD status 'No Binding (Who are you? Do I know you?)' Do you think this is the same bug as the one described here?

@hartmark
Copy link
Author

Hi,
I'm not so knowledgeable in this area but it seems to be the same issue.

@eriktews
Copy link

Hi, I'm affected by an issue with at least the same effect (after a while like a day or so, my IPv6 access breaks). In my logs, I can see something like Server returned IA_PD status 'No Binding (Who are you? Do I know you?)' Do you think this is the same bug as the one described here?

I investigated that a bit further and found out that now my IPv6 access works for 5-10 seconds, then it breaks for a few seconds and then the cycle repeats. When that happens, the error message above appears in my log.

@kenjiuno
Copy link

kenjiuno commented Jan 3, 2022

About my case, I decided to use workaround until good resolution will be available.

It is to use crontab like:

45 * * * * kill -SIGUSR2 `pidof odhcp6c`

At minute 45 every hour, odhcp6c process will receive SIGUSR2 signal, and then odhcp6c will invoke IPv6 release/renew transactions.

This will be useful only if IPv6 connection will surely work until it's lease expiration.

My problem is that IPv6 route's expiration isn't renewed after odhcpv6 renew transaction.

root@OpenWrt:~# ip -6 r
default from 2409:250:XXXX:YYYY::/56 via fe80::226:bff:fe49:c2c0 dev wan  metric 512
2409:250:XXXX:YYYY::/64 dev br-lan  metric 1024
2409:250:XXXX:YYYY::/60 dev br-lan  metric 256  expires 2035sec
unreachable 2409:250:XXXX:YYYY::/56 dev lo  metric 2147483647

The route 2409:250:XXXX:YYYY::/60 dev br-lan metric 256 surely expires in 2035sec, without renewed.

@PF4Public
Copy link

PF4Public commented Jun 14, 2022

With quick source code review, I have noticed that kill -SIGUSR1 <PID> or such (SIGUSR2 too?) will be only way to update dhcpv6 lease.

For me only SIGUSR2 worked that way. SIGUSR1 did nothing for some reason.

But still, why doesn't it renew on T1?

@emss-github
Copy link

emss-github commented Jul 9, 2023

Hi, I'm affected by an issue with at least the same effect (after a while like a day or so, my IPv6 access breaks). In my logs, I can see something like Server returned IA_PD status 'No Binding (Who are you? Do I know you?)' Do you think this is the same bug as the one described here?

I investigated that a bit further and found out that now my IPv6 access works for 5-10 seconds, then it breaks for a few seconds and then the cycle repeats. When that happens, the error message above appears in my log.

Same here, T1 and T2 are 150 & 240, preferred lifetime for prefix is 300.
What further informations are required to solve this issue, please ?

Further elements in openwrt/openwrt#13086 (comment)

@missing233
Copy link

missing233 commented Sep 17, 2023

About my case, I decided to use workaround until good resolution will be available.

It is to use crontab like:

45 * * * * kill -SIGUSR2 `pidof odhcp6c`

At minute 45 every hour, odhcp6c process will receive signal, and then odhcp6c will invoke IPv6 release/renew transactions.SIGUSR2

This will be useful only if IPv6 connection will surely work until it's lease expiration.

My problem is that IPv6 route's expiration isn't renewed after odhcpv6 renew transaction.

root@OpenWrt:~# ip -6 r
default from 2409:250:XXXX:YYYY::/56 via fe80::226:bff:fe49:c2c0 dev wan  metric 512
2409:250:XXXX:YYYY::/64 dev br-lan  metric 1024
2409:250:XXXX:YYYY::/60 dev br-lan  metric 256  expires 2035sec
unreachable 2409:250:XXXX:YYYY::/56 dev lo  metric 2147483647

The route surely expires in 2035sec, without renewed.2409:250:XXXX:YYYY::/60 dev br-lan metric 256

I've come across the same issue on NTT FLET'S CROSS.
The lifetime is:

T1: 7200
T2: 10800
Preferred Lifetime: 12600
Valid Lifetime: 14400

Every 8 hours, I get the following log:
'daemon.notice netifd: Interface 'wan6' has lost the connection'.
Are you running into the same thing?

@emss-github
Copy link

About my case, I decided to use workaround until good resolution will be available.
It is to use crontab like:

45 * * * * kill -SIGUSR2 `pidof odhcp6c`

At minute 45 every hour, odhcp6c process will receive signal, and then odhcp6c will invoke IPv6 release/renew transactions.SIGUSR2
This will be useful only if IPv6 connection will surely work until it's lease expiration.
My problem is that IPv6 route's expiration isn't renewed after odhcpv6 renew transaction.

root@OpenWrt:~# ip -6 r
default from 2409:250:XXXX:YYYY::/56 via fe80::226:bff:fe49:c2c0 dev wan  metric 512
2409:250:XXXX:YYYY::/64 dev br-lan  metric 1024
2409:250:XXXX:YYYY::/60 dev br-lan  metric 256  expires 2035sec
unreachable 2409:250:XXXX:YYYY::/56 dev lo  metric 2147483647

The route surely expires in 2035sec, without renewed.2409:250:XXXX:YYYY::/60 dev br-lan metric 256

I've come across the same issue on NTT FLET'S CROSS. The lifetime is:

T1: 7200
T2: 10800
Preferred Lifetime: 12600
Valid Lifetime: 14400

Every 8 hours, I get the following log: 'daemon.notice netifd: Interface 'wan6' has lost the connection'. Are you running into the same thing?

Can't tell for sure, using 23.05.0-rc3 for 6 hours now with apparently no issues.

@kenjiuno
Copy link

kenjiuno commented Oct 3, 2023

Every 8 hours, I get the following log:
'daemon.notice netifd: Interface 'wan6' has lost the connection'.
Are you running into the same thing?

What I understood in my case is that: there is another DHCPv6 enabled client (or proxy) on the same LAN.

This is not a DHCPv6 client side problem of OpenWrt device.

Suspect whether the another router like device like Business Phone unit or such may dispatch DHCPv6 Solicit packet.

Check if you are interested in:

サクサのひかり電話オフィス収容ユニットとIPv6の共存模索 | mixiユーザー(id:2416887)の日記

サクサの収容ユニットを調べていたところ、
IPv4オンリーなくせに、何故かIPv6を取得しに行く。
そして、取得したIPv6はどこにも使われることもなく、全てIPv4で通信を行っている・・・・・・らしい。
このことをサクサの技術者に言ったところ、
「ああ、それがIPv6共存できない理由なのかもしれませんね」とか言ってきた。

As network fundamental idea, router treats Layer 2 data.

↓ OpenWrt will act as DHCPv6 client

image

↓ Then, SAXA unit will act as DHCPv6 client after X hours. And then absorb all of incoming packets from WAN.

image

As a workaround of this issue, there is an idea to implement ping client and watchdog timer in order to restart DHCPv6 client.

https://github.com/HiraokaHyperTools/openwrt-watchngn

@missing233
Copy link

missing233 commented Oct 3, 2023

Every 8 hours, I get the following log:
'daemon.notice netifd: Interface 'wan6' has lost the connection'.
Are you running into the same thing?

What I understood in my case is that: there is another DHCPv6 enabled client (or proxy) on the same LAN.

This is not a DHCPv6 client side problem of OpenWrt device.

Suspect whether the another router like device like Business Phone unit or such may dispatch DHCPv6 Solicit packet.

Check if you are interested in:

サクサのひかり電話オフィス収容ユニットとIPv6の共存模索 | mixiユーザー(id:2416887)の日記

サクサの収容ユニットを調べていたところ、
IPv4オンリーなくせに、何故かIPv6を取得しに行く。
そして、取得したIPv6はどこにも使われることもなく、全てIPv4で通信を行っている・・・・・・らしい。
このことをサクサの技術者に言ったところ、
「ああ、それがIPv6共存できない理由なのかもしれませんね」とか言ってきた。

As network fundamental idea, router treats Layer 2 data.

↓ OpenWrt will act as DHCPv6 client

image

↓ Then, SAXA unit will act as DHCPv6 client after X hours. And then absorb all of incoming packets from WAN.

image

As a workaround of this issue, there is an idea to implement ping client and watchdog timer in order to restart DHCPv6 client.

https://github.com/HiraokaHyperTools/openwrt-watchngn

No, that's not the case in my home network. There are no other dhcpv6 client besides the OpenWRT router, and I also do not have a contract like Hikari Denwa. Even without a Hikari Denwa contract, NTT still allocates me a /56 IPv6-PD.

My home network like this: NTT 10G-ONU->OpenWRT router->Switch Hub->AP/PC/NAS...

I’ve noticed that this issue only occurs with OpenWRT devices that connect to FLET'S CROSS, while FLET'S NEXT doesn’t seem to have a similar problem, perhaps due to its Valid Lifetime lasting as long as a month.
Additionally, NEC routers do not exhibit this issue, whether it's the regular one sold in stores or NTT's XG-100NE(HGW).

@JesusArmy
Copy link

Just subscribed to NTT 10G Cross with Plala and I am getting a similar behavior with open-wrt...

Anyone got a good fix or workaround since last time? It takes less than a minute to recover by itself but everything will be interrupted then, which is quite inconvenient, especially when you are in the middle of a meeting... :D

I can see these logs eveytime I get an outage:
Sun Mar 17 16:33:29 2024 daemon.warn odhcp6c[19403]: Server returned IA_PD status 'No Binding '
Sun Mar 17 16:55:05 2024 daemon.warn odhcpd[1460]: No default route present, overriding ra_lifetime!

That seems to be same problem isn't it?

@PF4Public
Copy link

For me sending SIGUSR2 works as mentioned above. I've got an impression that this behaviour was supposedly fixed in a newer openwrt version, but I cannot verify that as my hardware is too old.

@JesusArmy
Copy link

Thanks for your reply PF4Public. I will setup that in crontab then and see if better.
For info I am running a pretty recent version of OpenWRT (23.05.2) since I am just using a normal x86 PC to run it. So it seems that it may not be totally fixed for now...

@JesusArmy
Copy link

Running "kill -SIGUSR2 pidof odhcp6c" on my open-wrt version did bring internet down and it did not recovered by itself this time actually... I may try the alternative option with SIGUSR1 then... :)

@missing233
Copy link

Running "kill -SIGUSR2 pidof odhcp6c" on my open-wrt version did bring internet down and it did not recovered by itself this time actually... I may try the alternative option with SIGUSR1 then... :)

Sending SIGUSR2 signal to odhcp6c means to send RELEASE and restart the state machine by sending SOLICIT messages, while SIGUSR1 sends a renew message.
Try this script: https://gist.github.com/missing233/3dafb6ee549ed2271c20bd700b88a9cd

@JesusArmy
Copy link

Thanks a lot for script. Just to be sure I am doing things right: I would just drop the script file in "/etc/hotplug.d/iface/", maybe enable the execute bit and that's it? It will be executed (at next boot? on interface status change?) and will add the "kill -SIGUSR1 $(pgrep odhcp6c)' "to be run every hour by crontab? Is that it ?

@cre8ivejp
Copy link

cre8ivejp commented Mar 19, 2024

Thanks a lot for script. Just to be sure I am doing things right: I would just drop the script file in "/etc/hotplug.d/iface/", maybe enable the execute bit and that's it? It will be executed (at next boot? on interface status change?) and will add the "kill -SIGUSR1 $(pgrep odhcp6c)' "to be run every hour by crontab? Is that it ?

@JesusArmy, or you can just add the following line in the Scheduled Tasks from the System menu.
I've been running this for 2 months now with no issues. It will renew every hour.

0 * * * * kill -SIGUSR1 $(pgrep odhcp6c)

You will see this log every hour.

Mon Mar 18 21:00:00 2024 cron.err crond[919]: USER root pid 220201 cmd kill -SIGUSR1 $(pgrep odhcp6c)

@JesusArmy
Copy link

Thank you both, that seems to be working great so far. I have not noticed any disconnection for the last 8h :)

@missing233
Copy link

missing233 commented Mar 19, 2024

Thanks a lot for script. Just to be sure I am doing things right: I would just drop the script file in "/etc/hotplug.d/iface/", maybe enable the execute bit and that's it? It will be executed (at next boot? on interface status change?) and will add the "kill -SIGUSR1 $(pgrep odhcp6c)' "to be run every hour by crontab? Is that it ?

@JesusArmy, or you can just add the following line in the Scheduled Tasks from the System menu. I've been running this for 2 months now with no issues. It will renew every hour.

0 * * * * kill -SIGUSR1 $(pgrep odhcp6c)

You will see this log every hour.

Mon Mar 18 21:00:00 2024 cron.err crond[919]: USER root pid 220201 cmd kill -SIGUSR1 $(pgrep odhcp6c)

I found that OpenWrt package repo has isc-dhcp-client. Perhaps we can use it to replace odhcp6c. However its size is close to 1MB, which is much larger than odhcp6c.

@cre8ivejp
Copy link

Thanks a lot for script. Just to be sure I am doing things right: I would just drop the script file in "/etc/hotplug.d/iface/", maybe enable the execute bit and that's it? It will be executed (at next boot? on interface status change?) and will add the "kill -SIGUSR1 $(pgrep odhcp6c)' "to be run every hour by crontab? Is that it ?

@JesusArmy, or you can just add the following line in the Scheduled Tasks from the System menu. I've been running this for 2 months now with no issues. It will renew every hour.

0 * * * * kill -SIGUSR1 $(pgrep odhcp6c)

You will see this log every hour.

Mon Mar 18 21:00:00 2024 cron.err crond[919]: USER root pid 220201 cmd kill -SIGUSR1 $(pgrep odhcp6c)

I found that OpenWrt package repo has isc-dhcp-client. Perhaps we can use it to replace odhcp6c. However its size is close to 1MB, which is much larger than odhcp6c.

Thanks for sharing this. I'm using a mini-pc with a lot of storage and memory, so the size is not a problem.
I'll try that and see how it goes.

@missing233
Copy link

Thanks a lot for script. Just to be sure I am doing things right: I would just drop the script file in "/etc/hotplug.d/iface/", maybe enable the execute bit and that's it? It will be executed (at next boot? on interface status change?) and will add the "kill -SIGUSR1 $(pgrep odhcp6c)' "to be run every hour by crontab? Is that it ?

@JesusArmy, or you can just add the following line in the Scheduled Tasks from the System menu. I've been running this for 2 months now with no issues. It will renew every hour.

0 * * * * kill -SIGUSR1 $(pgrep odhcp6c)

You will see this log every hour.

Mon Mar 18 21:00:00 2024 cron.err crond[919]: USER root pid 220201 cmd kill -SIGUSR1 $(pgrep odhcp6c)

I found that OpenWrt package repo has isc-dhcp-client. Perhaps we can use it to replace odhcp6c. However its size is close to 1MB, which is much larger than odhcp6c.

Thanks for sharing this. I'm using a mini-pc with a lot of storage and memory, so the size is not a problem. I'll try that and see how it goes.

Looking forward to hearing some good news :)

@JesusArmy
Copy link

Thanks a lot for script. Just to be sure I am doing things right: I would just drop the script file in "/etc/hotplug.d/iface/", maybe enable the execute bit and that's it? It will be executed (at next boot? on interface status change?) and will add the "kill -SIGUSR1 $(pgrep odhcp6c)' "to be run every hour by crontab? Is that it ?

@JesusArmy, or you can just add the following line in the Scheduled Tasks from the System menu. I've been running this for 2 months now with no issues. It will renew every hour.

0 * * * * kill -SIGUSR1 $(pgrep odhcp6c)

You will see this log every hour.

Mon Mar 18 21:00:00 2024 cron.err crond[919]: USER root pid 220201 cmd kill -SIGUSR1 $(pgrep odhcp6c)

I have been running that since last week, with quiet some success but still getting issues occasionally. It used to be many outage a day, and now I seems to get one or two maximum.

If I want to investigate what's happening, should I change log level to something more verbose or do I need to run some 24/7 tcpdump to capture what's happening at that time?

Today the only thing I see in logs when that happen a bunch of "no route" logs like this:

Mon Mar 25 13:28:11 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:28:20 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:28:21 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:28:22 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:28:23 2024 daemon.info dnsmasq-dhcp[1]: DHCPREQUEST(eth1) x.x.x.136 aa:aa:aa:aa:aa:aa Mon Mar 25 13:28:23 2024 daemon.info dnsmasq-dhcp[1]: DHCPACK(eth1) x.x.x.136 aa:aa:aa:aa:aa:aa SOPAAD-PW04SSWC Mon Mar 25 13:28:41 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:28:42 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:28:43 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:28:43 2024 daemon.info dnsmasq-dhcp[1]: DHCPREQUEST(eth1) x.x.x.136 aa:aa:aa:aa:aa:aa Mon Mar 25 13:28:43 2024 daemon.info dnsmasq-dhcp[1]: DHCPACK(eth1) x.x.x.136 aa:aa:aa:aa:aa:aa SOPAAD-PW04SSWC Mon Mar 25 13:28:55 2024 daemon.info dnsmasq-dhcp[1]: DHCPREQUEST(eth1) x.x.x.90 bb:bb:bb:bb:bb:bb Mon Mar 25 13:28:55 2024 daemon.info dnsmasq-dhcp[1]: DHCPACK(eth1) x.x.x.90 cc:cc:cc:cc:cc:cc Pixel-7a Mon Mar 25 13:28:55 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:30:45 2024 daemon.info dnsmasq-dhcp[1]: DHCPREQUEST(eth1) x.x.x.136 aa:aa:aa:aa:aa:aa Mon Mar 25 13:30:45 2024 daemon.info dnsmasq-dhcp[1]: DHCPACK(eth1) x.x.x.136 aa:aa:aa:aa:aa:aa SOPAAD-PW04SSWC Mon Mar 25 13:30:45 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:30:46 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:30:47 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime!

@JesusArmy
Copy link

Just wondering, are you also getting the quite low MTU of 1280 on your map-e interface ?
I have also manually increased the TX length of the eth and map-e interface from 1000 to 10000 since I have noticed some drops. Not sure if that was the reason or not....

@JesusArmy
Copy link

FYI, it's been 48h that I have increased tx lenght from 1,000 to 10,000 on both eth physical interfaces (from luci) and the map-e interface (from terminal) and so far so good. Zero disconnection in 2 days. :)

@cre8ivejp
Copy link

cre8ivejp commented Mar 28, 2024

FYI, it's been 48h that I have increased tx lenght from 1,000 to 10,000 on both eth physical interfaces (from luci) and the map-e interface (from terminal) and so far so good. Zero disconnection in 2 days. :)

Glad to hear it.
I didn't need to change the TX queue length here, though.
Did you check if the cronjob runs correctly every hour without errors?

@JesusArmy
Copy link

Yep it was running every hours, but I was still getting some additional disconnections. And I even got up to 3 within the same hour at some point before changing TX queue length.

There is still the possibility that just restarting the network stack would have been enough to fix that issue. And that the TX queue modification is actually absolutely not related to getting a stable link... 😇

@missing233
Copy link

missing233 commented Mar 28, 2024

@cre8ivejp @JesusArmy

Not sure but I've never come across "No default route present, overriding ra_lifetime," and my network connection is solid as a rock.
I'm on the V6Plus Fixed IP, maybe you could take a look at my config file for reference:

/etc/config/network:

config interface 'lan'
        option device 'br-lan'
        option proto 'static'
        option ipaddr '192.168.1.1'
        option netmask '255.255.255.0'
        option ip6assign '64'
        option ip6hint '01'
        option ip6ifaceid 'eui64'

config interface 'wan6'
        option device 'eth1'
        option proto 'dhcpv6'
        option reqaddress 'try'
        option reqprefix 'auto'
        option noclientfqdn '1'

config interface 'wan'
        option proto 'ipip6'
        option peeraddr '<secret>'
        option ip4ifaddr '<secret>'
        option ip6addr '<secret>'
        option tunlink 'wan6'
        option encaplimit 'ignore'
        option mtu '1460'
        option ip6assign '64'
        option ip6ifaceid '::<secret>'
        option ip6weight '1'

/etc/config/dhcp:

config dnsmasq
        option domainneeded '1'
        option localise_queries '1'
        option rebind_protection '1'
        option rebind_localhost '1'
        option local '/lan/'
        option domain 'lan'
        option expandhosts '1'
        option cachesize '0'
        option authoritative '1'
        option readethers '1'
        option leasefile '/tmp/dhcp.leases'
        option localservice '1'
        option ednspacket_max '1232'
        option noresolv '1'
        option localuse '1'
        list server '127.0.0.1#7874'
        option sequential_ip '1'

config dhcp 'lan'
        option interface 'lan'
        option start '2'
        option limit '255'
        option leasetime '12h'
        option dhcpv4 'server'
        option dhcpv6 'server'
        option ra 'server'
        list ra_flags 'managed-config'
        list ra_flags 'other-config'

config odhcpd 'odhcpd'
        option maindhcp '0'
        option leasefile '/tmp/hosts/odhcpd'
        option leasetrigger '/usr/sbin/odhcpd-update'
        option loglevel '4'

@JesusArmy
Copy link

Hi missing 223.
Thanks a lot for sharing your config. Connection has been very stable without any disconnect for the last 2 weeks. I believe that I will keep it as is for now then. :)

@achims311
Copy link

I have the same issue running latest version(OpenWrt 23.05.4 r24012-d8dd03c46f) , but my refresh values are much lower (T1:600 T2:960 (IA_PD-prefix 2a02:678:640:e800::/56 pltime:1200 vltime:3600)).

But what I noticed is I've got 2 odhcp6c processes running:

root@OpenWrt:~# ps | grep odhcp6c
5682 root 848 S odhcp6c -s /lib/netifd/dhcpv6.script -Ntry -P56 -t120 pppoe-wan
6256 root 1144 R grep odhcp6c
29379 root 848 S odhcp6c -s /lib/netifd/dhcpv6.script -P0 -t120 pppoe-wan
root@OpenWrt:~#
I can see as well I've got 3 wan* interfaces.
2 are the normal wan and wan6.
On top I got a wan_6: "Protocol: Virtual dynamic interface (DHCPv6 client)"

Now I configure the wan6 to request a /56 network (the default I get from my ISP). As you can see above only process 5682 is using this value (-P56), while the other (29379) is using the normal default (-P0).

I can see as well 2 diferent dhcp6 solicit & advertise pairs(differences marked bold):
20:54:52.411887 IP6 (flowlabel 0x580de, hlim 1, next-header UDP (17) payload length: 139) fe80::1c66:a4e5:4466:d020.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=c0e771 (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96 opt_82) (client-ID hwaddr type 1 02f4b76f716c) (reconfigure-accept) (Client-FQDN) (IA_NA IAID:1 T1:0 T2:0) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix ::/56 pltime:0 vltime:0)))
20:54:52.412789 IP6 (class 0xe0, hlim 255, next-header UDP (17) payload length: 121) fe80::4201:7aff:fe01:3d80.547 > fe80::1c66:a4e5:4466:d020.546: [udp sum ok] dhcp6 advertise (xid=c0e771 (server-ID hwaddr type 1 40017a013d80) (client-ID hwaddr type 1 02f4b76f716c) (IA_PD IAID:1 T1:600 T2:960 (IA_PD-prefix 2a02:678:640:e800::/56 pltime:1200 vltime:3600)) (DNS-server 2a02:678:0:195:218:2:32:38 2a02:678:0:195:218:24:0:2))
20:54:54.371258 IP6 (flowlabel 0x580de, hlim 1, next-header UDP (17) payload length: 123) fe80::1c66:a4e5:4466:d020.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=ad4f4b (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96 opt_82) (client-ID hwaddr type 1 02f4b76f716c) (reconfigure-accept) (Client-FQDN) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix ::/56 pltime:0 vltime:0)))
20:54:54.372147 IP6 (class 0xe0, hlim 255, next-header UDP (17) payload length: 121) fe80::4201:7aff:fe01:3d80.547 > fe80::1c66:a4e5:4466:d020.546: [udp sum ok] dhcp6 advertise (xid=ad4f4b (server-ID hwaddr type 1 40017a013d80) (client-ID hwaddr type 1 02f4b76f716c) (IA_PD IAID:1 T1:600 T2:960 (IA_PD-prefix 2a02:678:640:e800::/56 pltime:1200 vltime:3600)) (DNS-server 2a02:678:0:195:218:2:32:38 2a02:678:0:195:218:24:0:2))

As well I only get one dhcp6 request & reply:
20:54:55.911228 IP6 (flowlabel 0x580de, hlim 1, next-header UDP (17) payload length: 135) fe80::1c66:a4e5:4466:d020.546 > ff02::1:2.547: [udp sum ok] dhcp6 request (xid=fafe3f (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96) (client-ID hwaddr type 1 02f4b76f716c) (server-ID hwaddr type 1 40017a013d80) (reconfigure-accept) (Client-FQDN) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2a02:678:640:xxxx::/56 pltime:1200 vltime:3600)))
20:54:55.913809 IP6 (class 0xe0, hlim 255, next-header UDP (17) payload length: 121) fe80::4201:7aff:fe01:3d80.547 > fe80::1c66:a4e5:4466:d020.546: [udp sum ok] dhcp6 reply (xid=fafe3f (server-ID hwaddr type 1 40017a013d80) (client-ID hwaddr type 1 02f4b76f716c) (IA_PD IAID:1 T1:600 T2:960 (IA_PD-prefix 2a02:678:640:xxxx::/56 pltime:1200 vltime:3600)) (DNS-server 2a02:678:0:195:218:2:32:38 2a02:678:0:195:218:24:0:2))
and after some time (looks like t2???):
21:04:55.990104 IP6 (flowlabel 0x580de, hlim 1, next-header UDP (17) payload length: 131) fe80::1c66:a4e5:4466:d020.546 > ff02::1:2.547: [udp sum ok] dhcp6 renew (xid=edd84a (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96) (client-ID hwaddr type 1 02f4b76f716c) (server-ID hwaddr type 1 40017a013d80) (Client-FQDN) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2a02:678:640:xxxx::/56 pltime:0 vltime:0)))
21:04:55.991030 IP6 (class 0xe0, hlim 255, next-header UDP (17) payload length: 72) fe80::4201:7aff:fe01:3d80.547 > fe80::1c66:a4e5:4466:d020.546: [udp sum ok] dhcp6 reply (xid=edd84a (server-ID hwaddr type 1 40017a013d80) (client-ID hwaddr type 1 02f4b76f716c) (IA_PD IAID:1 T1:60 T2:120 (status-code NoBinding)))
and the respective log:
Mon Sep 9 21:04:55 2024 daemon.warn odhcp6c[5682]: Server returned IA_PD status 'No Binding (NO-BINDING)'

After this I get another dhcp request & reply as above.

For me it looks like after the second request & reply I loose my IPv6 connection.
But this is something I try to check (take more time to validate)

@missing233
Copy link

@cre8ivejp @JesusArmy
have you checked wan6 status after removing the forced renew crontab command? I recently removed the command, and wan6 hasn't lost connection since (it's been connected for over a day now). maybe NTT has fixed their dhcpv6 server?

@BombardierBeetle
Copy link

Im using odhcp6c in NTT env(フレッツ光クロス), but due to the above problem, I was running renew(SIGUSR1) via crontab
It has been about 24 hours since I deleted crontab, but the problem does not seem to occur

As missing233 said, it's possible that NTT has fixed something.

@lukedd
Copy link

lukedd commented Dec 1, 2024

I have also experienced the symptom that IPv6 stops working after a while, and I have done some analysis.

Using tcpdump to capture a packet trace and Wireshark to look at it, I can see that the problem begins when odhcp6c sends the first RENEW message, 12 hours after the prefix delegation lease began which is the T1 time value from the original REPLY message from my ISP.

The RENEW message looks good to me but the REPLY message from my ISP contains an error status code shown in Wireshark as "NoPrefixAvail". The reply contains an option "Identity Association for Prefix Delegation"(25) which contains a sub option "IA Prefix"(26) which has this error status code set on it.

The problem appears to be that odhcp6c does not even look for error codes there, it thinks it got a good response and is happy. However the ISP is no longer routing traffic for me so my IPv6 is broken.

The response did actually update the "valid lifetime" and "preferred lifetime" to zero, and odhcp6c does update its internal state to record this, so next time the renewal fails differently and odhcp6c realises that it needs to get a new lease from scratch. So 12 hours after IPv6 broke it actually starts working again. Similarly if I send SIGUSR1 to force renewal while it's working then it breaks, if I do it while it is broken then it starts working again.

I have tested a patch to fix this problem, and will upload it as a pull request. (Lucky I have a MacBook which is ARM architecture, so using a Debian docker container I can easily compile an ARM binary that works on my router, a Unifi UCG-Max that is using Debian 11 under the hood).

lukedd added a commit to lukedd/odhcp6c that referenced this issue Dec 1, 2024
Check for error status code in the IA Prefix option in replies to RENEW
messages.

This fixes a problem where odhcp6c thinks that renewal succeeded, when
actually the upstream router is no longer routing this prefix for us.

See openwrt#61 (comment)
@lukedd
Copy link

lukedd commented Dec 20, 2024

@achims311 do you have any updates on your situation? It looks very similar to mine, except that the error you get in the reply to your renew is slightly different, which is actually handled by the following code to trigger sending a new request:

odhcp6c/src/dhcpv6.c

Lines 1588 to 1593 in ffbb2d5

case DHCPV6_NoBinding:
switch (orig) {
case DHCPV6_MSG_RENEW:
case DHCPV6_MSG_REBIND:
if ((*ret > 0) && !handled_status_codes[code])
*ret = dhcpv6_request(DHCPV6_MSG_REQUEST);

Are you saying that this new request fails to work? Can you share the tcpdump for this part too? And is it only a momentary outage or a persisting outage?

For reference the tcpdump output for my own case of renewal failure is this:

22:07:58.607583 IP6 (flowlabel 0x4c641, hlim 1, next-header UDP (17) payload length: 140) fe80::2a70:4eff:fe6e:50b7.546 > ff02::1:2.547: [bad udp cksum 0xc6b9 -> 0x54c4!] dhcp6 renew (xid=4b4d32 (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67) (client-ID hwaddr type 1 28704e6e50b7) (server-ID vid 0000058361633a37) (Client-FQDN) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2407:5400:3102:be00::/56 pltime:0 vltime:0)))

22:07:58.702855 IP6 (class 0xc0, hlim 64, next-header UDP (17) payload length: 137) fe80::ae78:d1ff:fe32:985b.547 > fe80::2a70:4eff:fe6e:50b7.546: [udp sum ok] dhcp6 reply (xid=4b4d32 (client-ID hwaddr type 1 28704e6e50b7) (server-ID vid 0000058361633a37) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2407:5400:3102:be00::/56 pltime:0 vltime:0 (status-code NoPrefixAvail))))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests