Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHCPv6 Rebind with expired lease Reply "invalid state" #437

Open
ColinMcInnes opened this issue Jan 18, 2025 · 13 comments
Open

DHCPv6 Rebind with expired lease Reply "invalid state" #437

ColinMcInnes opened this issue Jan 18, 2025 · 13 comments

Comments

@ColinMcInnes
Copy link
Contributor

ColinMcInnes commented Jan 18, 2025

dhcpcd 10.1.10 on linux

dhcpcd --rebind on DHCPv6, server responds with valid Reply, but dhcpcd rejects it.

dhcpcd[3644]: eth0: invalid state for DHCP6 type REPLY6 (7)

https://support.huawei.com/enterprise/en/doc/EDOC1100306163/d427e938/introduction-to-dhcpv6-messages

In response to a Solicit, Request, Renew, or Rebind message received from a DHCPv6 client. In this situation, the Reply message contains IPv6 addresses and configuration parameters.

Reply is a valid response to a Rebind request, so I will investigate further to see what specifically it didn't like.

EDIT: Setup is when RA was not seen, so we used --rebind --ia_na to request an address. This succeeded, we got a lease, but then when it comes to renew, the lack of RA seems to prevent dhcp6 from attempting a renew, and rebind attempts fail.

@ColinMcInnes
Copy link
Contributor Author

control command: dhcpcd --rebind eth0
(DHPCv4 triggers RECONFIGURE)
eth0: rebinding prior DHCPv6 lease
eth0: multicasting REBIND6 (xid 0x08f036), next in 11.0 seconds
(receives REPLY6)
eth0: invalid state for DHCP6 type REPLY6 (7)

From some debug I added, looks like after the Rebind is sent, dhcpcd drops into INIT state, which is an invalid state to be processing a Reply in. I will start with examining the state machine logic.

I think this might be due to my router, I bet the lack of RA is messing with things. RSs time out, maybe that drops it back into INIT state? A --renew attempt prints out "no IPv6 Routers Available" and then it doesn't try anything else.

@ColinMcInnes ColinMcInnes changed the title DHCPv6 Rebind Reply "invalid state" DHCPv6 Rebind Reply "invalid state" when IPv6 router isn't responding Jan 18, 2025
@ColinMcInnes
Copy link
Contributor Author

Looks like the lack of IPv6 router info prevents the renew from occurring, even though the lease was successfully bound.

Couple of changes might help:

  • Rebind/INIT will allow Reply
  • If we've previously seen a dhcp server, allow renew attempt

@rsmarples
Copy link
Member

I cannot replicate this.

control command: dhcpcd --rebind -6 vioif1
vioif1: executing: /libexec/dhcpcd-run-hooks RECONFIGURE
vioif1: rebinding prior DHCPv6 lease
vioif1: multicasting REBIND6 (xid 0x21cb79), next in 10.3 seconds
vioif1: IAID 00:73:78:01
vioif1: delaying IPv6 router solicitation for 0.8 seconds
vioif1: reading lease: /var/db/dhcpcd/vioif1.lease6
vioif1: confirming prior DHCPv6 lease
vioif1: delaying CONFIRM6 (xid 0x55dcf9), next in 1.0 seconds
vioif1: wrong xid 0x21cb79 (expecting 0x55dcf9) from fe80::f00b:a4ff:fe15:16a3
vioif1: soliciting an IPv6 router
vioif1: sending Router Solicitation
vioif1: multicasting CONFIRM6 (xid 0x55dcf9), next in 1.0 seconds
vioif1: REPLY6 received from fe80::f00b:a4ff:fe15:16a3
vioif1: adding address fd78::1001/128
vioif1: pltime 3000 seconds, vltime 4000 seconds
vioif1: renew in 946, rebind in 1946, expire in 3946 seconds
vioif1: executing: /libexec/dhcpcd-run-hooks REBOOT6
vioif1: sending Router Solicitation
vioif1: sending Router Solicitation
vioif1: sending Router Solicitation
vioif1: no IPv6 Routers available

@ColinMcInnes
Copy link
Contributor Author

ColinMcInnes commented Jan 18, 2025

control command: dhcpcd --rebind -6 eth0
executing: /libexec/dhcpcd-run-hooks RECONFIGURE
eth0: rebinding prior DHCPv6 lease
eth0: multicasting REBIND6 (xid 0x08f036), next in 11.0 seconds
eth0: IAID 48:07:bf:5b
eth0: delaying IPv6 router solicitation for 0.3 seconds
eth0: reading lease: /var/lib/dhcpcd/eth0.lease6
eth0: discarding expired lease
eth0: invalid state for DHCP6 type REPLY6 (7)
eth0: soliciting an IPv6 router
eth0: sending Router Solicitation
eth0: sending Router Solicitation
eth0: sending Router Solicitation
eth0: multicasting REBIND6 (xid 0x08f036), next in 22.0 seconds
eth0: invalid state for DHCP6 type REPLY6 (7)
eth0: sending Router Solicitation
eth0: no IPv6 Routers available

Maybe the expired lease message is a clue? It shouldn't be expired, it was given an initial pltime of 604800, an the post-rebind reply had the same. That might be why I'm not sending out a CONFIRM?

@ColinMcInnes
Copy link
Contributor Author

I think the expired lease is the key, I was just able to reproduce your log if I did a rebind shortly after an ia_na request.

control command: dhcpcd --rebind -6 --ia_na eth0
eth0: soliciting a DHCPv6 leaseeth0: delaying SOLICIT6 (xid 0x595a67), next in 1.1 seconds
eth0: IAID 48:07:bf:5b
eth0: delaying IPv6 router solicitation for 1.0 seconds
eth0: reading lease: /var/lib/dhcpcd/eth0.lease6
eth0: soliciting a DHCPv6 lease
eth0: delaying SOLICIT6 (xid 0x87193a), next in 1.1 seconds
eth0: soliciting an IPv6 router
eth0: sending Router Solicitation
eth0: multicasting SOLICIT6 (xid 0x87193a), next in 1.0 seconds
eth0: ADV 2001:192:168:229::119/128 from fe80::2efa:a2ff:fe90:e275
eth0: multicasting REQUEST6 (xid 0x5cab6e), next in 1.1 seconds
eth0: REPLY6 received from fe80::2efa:a2ff:fe90:e275
eth0: adding address 2001:192:168:229::119/128
eth0: pltime 604800 seconds, vltime 2592000 seconds
eth0: renew in 259200, rebind in 388800, expire in 2592000 seconds
eth0: writing lease: /var/lib/dhcpcd/eth0.lease6
eth0: sending Router Solicitation
eth0: sending Router Solicitation
eth0: sending Router Solicitation
eth0: no IPv6 Routers available
control command: dhcpcd --rebind -6 eth0
eth0: rebinding prior DHCPv6 lease
eth0: multicasting REBIND6 (xid 0x084eec), next in 10.7 seconds
eth0: IAID 48:07:bf:5b
eth0: delaying IPv6 router solicitation for 1.0 seconds
eth0: reading lease: /var/lib/dhcpcd/eth0.lease6
eth0: confirming prior DHCPv6 lease
eth0: delaying CONFIRM6 (xid 0xa71e49), next in 1.1 seconds
eth0: wrong xid 0x084eec (expecting 0xa71e49) from fe80::2efa:a2ff:fe90:e275
eth0: soliciting an IPv6 router
eth0: sending Router Solicitation
eth0: multicasting CONFIRM6 (xid 0xa71e49), next in 1.1 seconds
eth0: REPLY6 received from fe80::2efa:a2ff:fe90:e275
eth0: adding address 2001:192:168:229::119/128
eth0: pltime 604800 seconds, vltime 2592000 seconds
eth0: renew in 259169, rebind in 388769, expire in 2591969 seconds

@ColinMcInnes
Copy link
Contributor Author

Does it use monotonic time to determine lease expired?

The only thing I can think of is that the original lease was prior to TimeOfDay being updated from epoch.

@ColinMcInnes
Copy link
Contributor Author

I think I have a cause. DHCPv4 drops into EXPIRE. DHCPv6 is supposed to drop into dhpc6_startdiscoinform()

But for whatever reason it's not. I think when it discards the lease and sets the bytes to zero, it's not returning an error on the lease, which makes dhcp6 think it can try the rebind again, but the state has been reset to "INIT". So a Reply without a Solicit is invalid.

@ColinMcInnes ColinMcInnes changed the title DHCPv6 Rebind Reply "invalid state" when IPv6 router isn't responding DHCPv6 Rebind with expired lease Reply "invalid state" Jan 18, 2025
@ColinMcInnes
Copy link
Contributor Author

Couple of notes for patch I'm going to work on.

Jan 18 15:37:35 dhcpcd[3638]: eth0: discarding expired lease expire: 2592000 now: 1737236255 acquired: 140 diff 1737236115
Lease file verification is using time()... so it thinks the lease is expired when it isn't. I can fix that.

The other is expired lease file handling in dhcpv6. We should be triggering a discovery if the lease file is expired, and I don't think it's currently correctly doing that, because the RA isn't there and it's not aware that the user requested it after initial bootup. Not sure how best to handle that, maybe flag it so when we check for FORCED_IA it could also trigger init?

Something for me to work on on Monday.

@ColinMcInnes
Copy link
Contributor Author

If I fix the lease file time to be compared to monotonic, then it bypasses this whole issue and works the same way your log does. Renews, and exit-hooks with REBOOT6.

I may split this into two issues, as the timing one is a simple fix, the other will require more thought, maybe set a flag that the system had previously had a successful IA_NA without RA, so try it again.

@rsmarples
Copy link
Member

If I fix the lease file time to be compared to monotonic, then it bypasses this whole issue and works the same way your log does. Renews, and exit-hooks with REBOOT6.

I may split this into two issues, as the timing one is a simple fix.

I disagree. File time is based on the wall clock, not the monotonic clock.
If the lease file is expired we should enter DISCOVER not CONFIRM or REBIND.

@ColinMcInnes
Copy link
Contributor Author

I disagree. File time is based on the wall clock, not the monotonic clock. If the lease file is expired we should enter DISCOVER not CONFIRM or REBIND.

Good point.

Ok, I'll look into the state further. I think it's checking for FORCED_IA, and not seeing it, nor RA, so it's dropping back into rebind attempt. So maybe "user previously requested IA and it was successful" could be used as a flag to at least attempt DISCOVER state again.

@rsmarples
Copy link
Member

I don't suppose this patch fixes the issue?

--- a/src/dhcpcd.c
+++ b/src/dhcpcd.c
@@ -1315,10 +1315,9 @@ if_reboot(struct interface *ifp, int argc, char **argv)
        script_runreason(ifp, "RECONFIGURE");
        dhcpcd_initstate1(ifp, argc, argv, 0);
 #ifdef INET
+       // This just expires the old config if changing from
+       // static to dynamic or the other way around
        dhcp_reboot_newopts(ifp, oldopts);
-#endif
-#ifdef DHCP6
-       dhcp6_reboot(ifp);
 #endif
        dhcpcd_prestartinterface(ifp);

@ColinMcInnes
Copy link
Contributor Author

Yes, now rebind just triggers a reconfigure, expires the ipv6 lease, then tries the RS cycle and times out after no RA.

I will look into a different way to deal with the forced IA_NA situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants