-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strongswan: dhcp plugin not issuing addresses to clients after upgrading to 24.10-rc4 #25801
Comments
No idea what the DHCP issue is about (you probably have to provide more details, config, logs, packet captures). But that the updown script is only called for IPv6 makes sense if no IPv4 address is assigned to the client. |
Yeah, I'm trying to understand what's going wrong, I have some digging in to do myself. At this point I can't figure out if I'm the only one impacted (=I did something stupid) or if others are seeing it. I'm currently working to distill my setup to a minimal reproduceable case (I couldn't yet yesterday) Just added the config that I'm looking at right now. |
I notice that 23.05 used Strongswan 5.9.10, but 24.10 uses 5.9.14; 5.9.14 added an eBPF-based DHCP packet filter; let me see if compiling 5.9.13 works around this. |
The packet filtering as such hasn't changed on Linux, it was just refactored to a different file so it can be shared with the |
Configs and logs are up in the original post -- changelog added to bottom of post. Hmm, if the fix landed in 5.9.11 then 23.05 can't possibly be using 5.9.10 🤔 |
Oh, sorry, haven't checked there. That log is from after you changed stuff in the config you posted, right? So there does seem to be an offer, but it does not hit the socket. Maybe some firewall filtering? Or maybe it hits a different interface (i.e. not the one bound to)? |
Yes, to bind
Trying to see if anything on the firewall side has changed but I'm not seeing anything atm. |
Could be, was that used on 23.05 as well? Not sure how well e.g. |
Yes, no change on that front. The vlans on that bridge are for DSA. The firewall and network config between the two releases as far as I can tell are the same (and I'm testing this on two different devices of the same model, but unfortunately I upgraded them both to 24.10-rc4 before I realised that this was a problem.) |
Do you have a pcap file that contains the DHCPOFFER? Maybe our packet filter drops it for some reason. Edit: Hm, I wonder if the packet is actually prefixed with the VLAN tag when we receive it on the socket. |
Ah. I see a difference in the pcap between the two configs I used. The logs from dnsmasq/strongswan are the same, but the pcap is different. In both captures, I use Initial config:
Modified config:
|
Can you bind to Edit: By the way, I found some information on how to bind packet sockets to an interface. Might be worth a shot in case the current approach isn't working anymore. |
Modified config:
tcpdump shows nothing if I listen on the vlan that is for my LAN interface, instead the source IP is my public WAN address (!!) when dumping on br-lan.
(times don't match because I captured the logs while tcpdump-ing br-lan.xxxx, ran a second time to capture br-lan directly.) It's late here, I'll continue tomorrow. Thanks for helping me out, @tobiasbrunner |
Interesting choice of source address. Why is the WAN interface even attached to the bridge interface? Anyway, what you could try, if you are able to build a new version of strongSwan, is maybe only bind the send socket and let the packet socket listen on any interface. To do so, replace Alternatively, the following patch uses a different approach to bind the socket to an interface: diff --git a/src/libcharon/network/pf_handler.c b/src/libcharon/network/pf_handler.c
index 43ef432ba607..60fc7ff6b426 100644
--- a/src/libcharon/network/pf_handler.c
+++ b/src/libcharon/network/pf_handler.c
@@ -225,6 +225,30 @@ METHOD(pf_handler_t, destroy, void,
free(this);
}
+/**
+ * Bind the given packet socket to the a named device
+ */
+static bool bind_packet_socket_to_device(int fd, char *iface)
+{
+ struct sockaddr_ll addr = {
+ .sll_family = AF_PACKET,
+ .sll_ifindex = if_nametoindex(iface),
+ };
+
+ if (!addr.sll_ifindex)
+ {
+ DBG1(DBG_CFG, "unable to bind socket to '%s': not found", iface);
+ return FALSE;
+ }
+ if (bind(fd, (struct sockaddr*)&addr, sizeof(addr)) == -1)
+ {
+ DBG1(DBG_CFG, "binding socket to '%s' failed: %s",
+ iface, strerror(errno));
+ return FALSE;
+ }
+ return TRUE;
+}
+
/**
* Setup capturing via AF_PACKET socket
*/
@@ -247,7 +271,7 @@ static bool setup_internal(private_pf_handler_t *this, char *iface,
this->name, strerror(errno));
return FALSE;
}
- if (iface && !bind_to_device(this->receive, iface))
+ if (iface && !bind_packet_socket_to_device(this->receive, iface))
{
return FALSE;
} |
I am not sure, but I believe this is how the hardware switch is wired up - wan and lan ports share the same underlying hardware. Or this could be a route table thing, not sure.
I tried this patch as you suggested: diff --git "a/src/libcharon/plugins/dhcp/dhcp_socket.c" "b/src/libcharon/plugins/dhcp/dhcp_socket.c"
index d144e2795..192912142 100644
--- "a/src/libcharon/plugins/dhcp/dhcp_socket.c"
+++ "b/src/libcharon/plugins/dhcp/dhcp_socket.c"
@@ -873,7 +873,8 @@ dhcp_socket_t *dhcp_socket_create()
return NULL;
}
- this->pf_handler = pf_handler_create("DHCP", iface, receive_dhcp, this,
+ DBG1(DBG_CFG, "creating handler without filter");
+ this->pf_handler = pf_handler_create("DHCP", NULL, receive_dhcp, this,
&dhcp_filter);
if (!this->pf_handler)
{ And verified the debug message is printed at startup. Modified config:
Modified config:
If we were listening on the bridge (without the VLAN) then the eBPF would need to account for the 802.1q vlan ethertype and the offsets would be shifted? Listening on the vlan should not have the tags though (pcap shows the packet has the IPv4 ethertype). (edit: the Linux BPF seems to use the first byte of the packet to be the IP header, not the ethernet frame, according to your #ifdef.) Next, I tried your attached patch to modify pf_handler.c to use bind() instead. I added another DBG1 to make sure I'm updating the right binary (in this case libcharon.so) after seeing the output. Using
(I'm guessing I should stop testing this config...?) Using
No banana either 🙏 |
I've diagnosed (hur hur, that's an overstatement, BPF was a fun distraction and I was fumbling around a lot) this to the point that: Using
Applying this patch gives me an IP address: diff --git "a/src/libcharon/network/pf_handler.c" "b/src/libcharon/network/pf_handler.c"
index 43ef432ba..11245615f 100644
--- "a/src/libcharon/network/pf_handler.c"
+++ "b/src/libcharon/network/pf_handler.c"
@@ -175,14 +175,20 @@ static cached_iface_t *find_interface(private_pf_handler_t *this, int fd,
}
if (ioctl(fd, SIOCGIFNAME, &req) == 0 &&
- ioctl(fd, SIOCGIFHWADDR, &req) == 0 &&
- req.ifr_hwaddr.sa_family == ARPHRD_ETHER)
+ ioctl(fd, SIOCGIFHWADDR, &req) == 0)
{
idx = find_least_used_cache_entry(this);
this->ifaces[idx].if_index = ifindex;
memcpy(this->ifaces[idx].if_name, req.ifr_name, IFNAMSIZ);
- memcpy(this->ifaces[idx].hwaddr, req.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+ if (req.ifr_hwaddr.sa_family == ARPHRD_ETHER)
+ {
+ memcpy(this->ifaces[idx].hwaddr, req.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+ }
+ else
+ {
+ memset(this->ifaces[idx].hwaddr, 0, ETHER_ADDR_LEN);
+ }
this->ifaces[idx].used = 1;
return &this->ifaces[idx];
} Using
Using
|
Thanks for investigating this further.
Not sure if that's really necessary because I doubt many DHCP servers are using IP options (they are rare for IPv4 to begin with).
Good catch! This part of the code is actually "new", i.e. was not part of the I've pushed a slightly different fix to the
Is that the default or a special config? What happens if you set
I feel this should work (with the loopback patch) if you'd remove the interface from the
Makes sense as packets from |
Let me check it out and compile it and give it a shot, thanks.
I'm sorry. I think I was a bit imprecise with that comment. I meant that dnsmasq is on the same host as strongswan; in that sense, dnsmasq is localhost relative to charon. To be precise, my dnsmasq.conf currently has these network settings set:
among others (which I think are uninteresting wrt this conversation.) Why is it that when I set the
Let me try this on the weekend.
It depends on how dnsmasq sends the reply or if this is a bug somewhere else in the stack. |
Thanks.
I see. Even though dnsmasq probably binds its socket to the configured interface, it could just be the kernel delivering the response on
At least it would allow binding both sockets (or only one). |
Same problem here. Getting a "DHCP DISCOVER timed out" although dnsmasq does offer an IP. I am on 24.10, too. |
It seems to work, thanks. I'm not sure what kind of MAC address is going to be copied, hence I did the more conservative
Correct; but it feels messy to me. I'm not sure if this is a one-off thing or if Linux has been doing this since forever. I'm also not sure how this would play with scenarios in which we may use e.g. RADIUS to decide which DHCP server to get addresses from. For example, I may want to assign different groups of users different IP address ranges and they have to get their addresses from different DHCP servers (not sure if that's possible; I'm pulling examples out of thin air.) In this specific case, allowing addresses from either ethernet/loopback seems to work here. @DocMAX do you want to try the patch for yourself too? |
Thanks for testing. I'll wait for some additional feedback from @DocMAX (see here) and then will line it up for the next release.
That's already the default address if nothing else is configured (see e.g.
I'm pretty sure it has been like that forever. This is, of course, only a problem when running the DHCP server on the same host as the DHCP client, which is kind of a special setup. For instance, the I guess binding to an interface is more useful when the server is not running on the same host. But having the option to configure the interface for the receive socket separately might allow more generic configs (e.g. without having to know the subnet's local broadcast address) even when running on the same host. I've pushed a small change that implements this to the branch. Let me know what you think.
That's currently not possible. But you can let the RADIUS server assign the virtual IP directly. |
Patch from strongswan/strongswan@a7341cf doesn't work with this config. |
Do you actually have the patch applied correctly? And please post the log. |
This is the building process for OpenWrt. "patching file src/libcharon/network/pf_handler.c" returns no error so i think its applied correctly. patch file is https://github.com/strongswan/strongswan/commit/a7341cf23bc2fd01af7eff729894ff2a7014263b.patch And the ipsec log is:
And dnsmasq log is:
|
Thanks for the details. As far as I can tell, this looks fine. But just to clarify, the settings you posted above are the only place where the |
Like @lowjoel my config used to work on v23.05. Nothing changed here. I try to tinker a bit more but right now no idea what to do now. strongswan.conf looks like this:
|
Another thing to check is to bump the logging for NET because that's what the pf_handler messages will appear as AFAIK. I also added logging in the patches I made to make sure the right binary is loaded. To be clear, pf_handler changes need an updated |
OK, it works. My fault. I didn't flash the correct rom file. Applied the last 2 patches of the "dhcp-receive" strongswan branch. |
@tobiasbrunner, since @DocMAX has tested it and the second patch looks reasonable to me (I haven't had a chance to test), perhaps you can merge it first and then I can open PRs to queue these for OpenWrt 24.10. Or else @DocMAX can open the PRs too, that's fine. |
Great, thanks for testing.
Only the first one is necessary to fix the issue (i.e. to make the existing configuration work again). The other did actually not have an effect at all. Because some testing showed that binding packet sockets via |
Fixes openwrt#25801. Adds the following commits to fix DHCP behaviour on Strongswan 5.9.14: - strongswan/strongswan@abbf9d2 - strongswan/strongswan@00d8c36 - strongswan/strongswan@a50ed30 Signed-off-by: Joel Low <joel@joelsplace.sg>
Fixes openwrt#25801. Adds the following commits to fix DHCP behaviour on Strongswan 5.9.14: - strongswan/strongswan@abbf9d2 - strongswan/strongswan@00d8c36 - strongswan/strongswan@a50ed30 Signed-off-by: Joel Low <joel@joelsplace.sg>
Fixes openwrt#25801. Adds the following commits to fix DHCP behaviour on Strongswan 5.9.14: - strongswan/strongswan@abbf9d2 - strongswan/strongswan@00d8c36 - strongswan/strongswan@a50ed30 Signed-off-by: Joel Low <joel@joelsplace.sg>
After upgrading to OpenWrt 24.10-rc4, using the exact same swanctl configuration as 23.05 my clients are no longer getting a DHCP address. This is tested on both Android/Strongswan app as well as Windows IPsec VPN. IPv6 static addresses are still being assigned.
I can tell that the updown scripts in /etc/hotplug.d/ipsec are only being called with IPv6 events, i.e.
PLUTO_VERB='up-client-v6'
andPLUTO_VERB='down-client-v6'
, I'm not sure if there's some configuration I'd need to change or if there's something more nefarious going on here.The logs show that the DISCOVER is sent, dnsmasq received it and returned the OFFER:
I'm attaching my config here for now while I dig into this; I've still not isolated which part of the config or where I should be looking deeper into at the moment.
/etc/swanctl/conf.d/users.conf
/etc/strongswan.d/charon/dhcp.conf
I did a tcpdump on the affected interface. I observe the DISCOVER but there's no response. I then changed charon/dhcp.conf from
server = broadcast address for subnet
to the default255.255.255.255
and instead bound to the interface using theinterface
directive instead. This time I got an offer, but nothing beyond that.The logs show that the DHCP conversation then looks like this:
Edit 1: added the connection and dhcp plugin config.
Edit 2: clarified that this affects both Android and Windows clients
Edit 3: tcpdump shows DHCP Discover is being sent but no response
Edit 4: switching to global broadcast address and binding to interface instead gives DISCOVER and OFFER but transaction stops there.
Edit 5: added relevant logs
The text was updated successfully, but these errors were encountered: