Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mwan3: mwan3track via default routing table and use procd from mwan3track & mwan3rtmon #13169

Merged
merged 12 commits into from
Oct 21, 2020

Conversation

aaronjg
Copy link
Contributor

@aaronjg aaronjg commented Aug 19, 2020

Maintainer: @feckert
Compile tested: 19.07.3-ipq806x and snapshot-7673df79 MT76x8
Run tested: 19.07.3 and snapshot - 7673df7

Description:
This P/R has a few fixes for the 19.07.3

  1. Use the default routing tables for all mwan3track outgoing requests. This fixes some issues on the 5.4.x kernel where packets are not routed properly unless they go through a routing table that has the gateway. This also allows us to remove some hacks that were in place for ipv6 mwan3track
  2. Comply with source based routing by copying over the default routes from the main routing table rather than creating them anew.
  3. Use procd to manage the mwan3rtmon and mwan3track scripts.

This will fix many of the issues with tracking packets originating from the wrong interface or with the wrong IP address. Fixes #8139, fixes #10712, fixes #11406 and fixes #12836,

Also some other open issues. Fixes #13003 and fixes #13200

@aaronjg
Copy link
Contributor Author

aaronjg commented Aug 19, 2020

Under my testing, with the 5.4.x kernel, and mwan3 2.9.0, if the default policy action for 0.0.0.0/0 is 'unreachable', then when an interface is down mwan3track will not be able to send out pings (or other tracking packets). It is important that this is merged before the OpenWRT 20.x.x stable release.

Not sure what the version bump should be on this. From the user perspective, it should be seamless, but it is a decent sized change internally, and adds dependencies on libcap-bin and iptables-mod-extra. Also, libcap-bin is not available as a separate package on 19.07.3, so if this were to be made available on the 19.x release, it would require backporting 4cb51d9 to 19.x.

To test on 19.x, I recompiled libcap with:
CONFIG_PACKAGE_libcap-bin=y

@aaronjg aaronjg changed the title [WIP] Mwan3track via default routing table and use procd from mwan3track & mwan3rtmon mwan3: [WIP] Mwan3track via default routing table and use procd from mwan3track & mwan3rtmon Aug 19, 2020
@feckert feckert changed the title mwan3: [WIP] Mwan3track via default routing table and use procd from mwan3track & mwan3rtmon [WIP] mwan3: mwan3track via default routing table and use procd from mwan3track & mwan3rtmon Aug 19, 2020
@feckert feckert self-requested a review August 19, 2020 13:46
@feckert feckert self-assigned this Aug 19, 2020
@feckert feckert added the bug label Aug 19, 2020
@aaronjg aaronjg force-pushed the mwan3-owner-procd branch 5 times, most recently from 7959799 to 229d923 Compare August 20, 2020 01:00
@openwrtdiy
Copy link

An error occurred during compilation, luci-app-mwan3 could not be found

kokang@virtual-machine:~/openwrt-19.07$ make menuconfig
Collecting package info: done
Collecting target info: done
tmp/.config-package.in:108142:error: recursive dependency detected!
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
tmp/.config-package.in:108142: symbol PACKAGE_iptables is selected by PACKAGE_mwan3
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
tmp/.config-package.in:114086: symbol PACKAGE_mwan3 is selected by PACKAGE_luci-app-mwan3
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
tmp/.config-package.in:73569: symbol PACKAGE_luci-app-mwan3 depends on PACKAGE_libcap
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
tmp/.config-package.in:65163: symbol PACKAGE_libcap is selected by PACKAGE_ip-full
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
tmp/.config-package.in:113997: symbol PACKAGE_ip-full is selected by PACKAGE_libreswan
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
tmp/.config-package.in:127684: symbol PACKAGE_libreswan depends on PACKAGE_iptables

Your configuration changes were NOT saved.

@aaronjg
Copy link
Contributor Author

aaronjg commented Aug 20, 2020

kokang@virtual-machine:~/openwrt-19.07$ make menuconfig

This requires the libcap-bin package, which is not available on 19.x at this time. See this:

#13169 (comment)

If you want to build the whole 19.07 image, you need to comment out the dependency on the separate libcap-bin package dependency.

@jamesmacwhite
Copy link
Contributor

@openwrt-diy For quick and dirty testing, to save a recompile, I found you can use the packages from snapshot: https://downloads.openwrt.org/snapshots/packages and install libcap and libcap-bin with opkg manually on 19.07. Make sure to replace libcap as well, otherwise things will be broken. A clean recompile is of course better, but to shortcut testing it does work on 19.07. There doesn't appear to be any kernel dependency so you get away with it.

Unless the libcap-bin change is backported to 19.07, it creates a bit of a problem for having this in 19.07 currently.

@openwrtdiy
Copy link

@openwrt-diy For quick and dirty testing, to save a recompile, I found you can use the packages from snapshot: https://downloads.openwrt.org/snapshots/packages and install libcap and libcap-bin with opkg manually on 19.07. Make sure to replace libcap as well, otherwise things will be broken. A clean recompile is of course better, but to shortcut testing it does work on 19.07. There doesn't appear to be any kernel dependency so you get away with it.

Unless the libcap-bin change is backported to 19.07, it creates a bit of a problem for having this in 19.07 currently.

I tested last night on openwrt-19.07, and I removed and replaced mwan3 and libcap version numbers. During compilation, mwan3 manually associates libcap and libcap bin, and the compilation process is very smooth. The result is: mwan3 page display interface is all closed.

I will compile the test again in an hour.

@aaronjg
Copy link
Contributor Author

aaronjg commented Aug 21, 2020

The result is: mwan3 page display interface is all closed.

Please look through the mwan3 configuration guide. I think there may be an issue with your setup.
https://openwrt.org/docs/guide-user/network/wan/multiwan/mwan3

:!: IMPORTANT: :!: Before doing anything with mwan3 (installing or configuring), ensure that each WAN interface is working and that the default routing table is correctly configured for each WAN connection.

@aaronjg aaronjg force-pushed the mwan3-owner-procd branch 5 times, most recently from 94282b4 to 91a113f Compare August 21, 2020 13:40
@aaronjg
Copy link
Contributor Author

aaronjg commented Aug 21, 2020

Did you add the default routes? From here: #13145 (comment), it looked like you only have default routes on the wan and wanb. This would explain why the other interfaces can't access the internet.

As I said before, your set up can be accomplished with just the main routing table and does not require mwan3. If you are interested in helping to test this P/R, could you please add some load balancing or failover policies?

@openwrtdiy
Copy link

Did you add the default routes? From here: #13145 (comment), it looked like you only have default routes on the wan and wanb. This would explain why the other interfaces can't access the internet.

As I said before, your set up can be accomplished with just the main routing table and does not require mwan3. If you are interested in helping to test this P/R, could you please add some load balancing or failover policies?

Load balancing or fail over strategy, I tested on the current openwrt and mwan3 versions without any problems.

@aaronjg aaronjg force-pushed the mwan3-owner-procd branch from 91a113f to 22ca579 Compare August 22, 2020 17:14
@aaronjg
Copy link
Contributor Author

aaronjg commented Aug 22, 2020

I just updated this P/R with a better solution to the 5.4.x issue. Rather than depending on iptables-mod-extra and libcapsh-bin, I wrote a small helper library to handle this.

The helper library will intercept calls to socket and then bind to the device, set the fwmark to the openwrt default and bind to the source ip address.

This provides a consistent method for binding to a device rather than relying on various packages potentially buggy implementations. For example: #8139 and #12836

This helper issue also allows for more tracking methods to be added even if they do not have a command line option to bind to device, such as iperf3 (eg #13050).

I've tested with ping, nping, arping and httping and all work well.

I reordered the commits so this commit is adjacent to the mwan3 owner commit. If this solution works well, I will squash these two commits together to eliminate the mwan3 owner commit.

@jamesmacwhite
Copy link
Contributor

@aaronjg Thanks for this!

I recently experimented with L2TP and it does seem pretty broken currently even under 19.07.x. I believe the commit here to use default routing table helps the situation a lot.

1f6c49c

I tested out an L2TP offering here: https://www.aa.net.uk/broadband/l2tp-service/ and they happened to have to some guidance on setting it up with OpenWrt: https://support.aa.net.uk/L2TP_Client:_OpenWRT, so I gave it a try.

This uses a combination of PPP and DHCPv6. With mwan3 enabled is seems to causes issues with being able to get the IPv6 prefix delegated across the tunnel, when it's not enabled DHCPv6 works OK. It would seem mwan3 may be blocking the DHCPv6 related traffic somewhere.

@openwrtdiy
Copy link

openwrtdiy commented Aug 23, 2020

@aaronjg Thanks for this!

I recently experimented with L2TP and it does seem pretty broken currently even under 19.07.x. I believe the commit here to use default routing table helps the situation a lot.

1f6c49c

I tested out an L2TP offering here: https://www.aa.net.uk/broadband/l2tp-service/ and they happened to have to some guidance on setting it up with OpenWrt: https://support.aa.net.uk/L2TP_Client:_OpenWRT, so I gave it a try.

This uses a combination of PPP and DHCPv6. With mwan3 enabled is seems to causes issues with being able to get the IPv6 prefix delegated across the tunnel, when it's not enabled DHCPv6 works OK. It would seem mwan3 may be blocking the DHCPv6 related traffic somewhere.

@jamesmacwhite

There are many problems in using L2TP / PPTP and wireguard at the same time. I can only choose one of them now.

1f6c49c also has problems: when one of Wan and wanb interfaces is offline, L2TP / PPTP and wireguard will automatically connect to the online interface, which will cause the risk of abnormal IP of user terminal.

22ca579 also has a problem: when L2TP / PPTP interface is offline, mwan3 status page is displayed online!

@jamesmacwhite
Copy link
Contributor

Potentially you could use static routes and metrics to make sure Wireguard or L2TP connects via a specific WAN to control that problem. I do exactly this, to ensure Wireguard/VPN traffic only goes through a specific WAN, rather my backup WAN which is data limited. I have noticed the routing table tends to sometimes get random routes across different WANs added, which can cause issues.

Having mwan3track use the default routing table and the source routing change would improve the situation a lot though. A lot of the time I think the problem with VPNs is due to the lack of gateway causing mwan3 to mark the interface as dead.

aaronjg and others added 12 commits October 16, 2020 09:54
Signed-off-by: Aaron Goodman <aaronjg@stanford.edu>
Signed-off-by: Aaron Goodman <aaronjg@stanford.edu>
handle creation of routing tables in mwan3rtmon to avoid race
conditions and potentially missing routes

handle ipv6 routes that have expiry

update directly connected ipset when routes are added or deleted

add fall through rules so that the default routing table is not
used if no rule in the interface-specific routing table matches

add option to comply with mwan3 source based routing

get default route parameters from main routing table

Signed-off-by: Aaron Goodman <aaronjg@stanford.edu>
Signed-off-by: Aaron Goodman <aaronjg@stanford.edu>
correctly terminate interface status checks with new lines so that
interface status does not get confused when one interface is a prefix
of another interface.

Signed-off-by: Aaron Goodman <aaronjg@stanford.edu>
start all mwan3mon and mwan3track instances on mwan3 start
if an interface is down when mwan3track starts, it waits
for a signal from the hotplug script to start

procd can then handle stopping all of the scripts when mwan3
is halted

Signed-off-by: Aaron Goodman <aaronjg@stanford.edu>
also use global IPv4_REGEX environment variable as consistent IPv4 regex

Signed-off-by: Aaron Goodman <aaronjg@stanford.edu>
Rather than using a special mwan3 user to manage mwan3track's tracking
packets, this commit implements a small helper library to bind to
device and to set a fwmark so that the tracking packets can be routed
out of the correct interface.

This provides a consistent method for binding to a device rather than
relying on various packages potentially buggy implementations. For
example: openwrt#8139 and openwrt#12836

This helper issue also allows for more tracking methods to be added
even if they do not have a command line option to bind to device,
such as iperf3 (eg  openwrt#13050).

Signed-off-by: Aaron Goodman <aaronjg@stanford.edu>
Use "mwan3 use" to wrap a command with interface bindings so that you can
avoid the mwan3 rules and test behavior on a specific interface.

eg "mwan3 use wan ping -c1 1.1.1.1"

Additional binding arguments to the command will have their system
calls intercepted and ignored.

eg "mwan3 use wan ping -c1 -I tun0 1.1.1.1" will use the
device associated with "wan", rather than "tun0".

Signed-off-by: Aaron Goodman <aaronjg@stanford.edu>
With the new wrapper code, we can override the broken binding behavior of
iputils ping v20101006.

Signed-off-by: Aaron Goodman <aaronjg@stanford.edu>
- reduce duplicate logging code
- simplify nping track code
- simplify ping result parsing

Signed-off-by: Aaron Goodman <aaronjg@stanford.edu>
Signed-off-by: James White <james@jmwhite.co.uk>
@aaronjg
Copy link
Contributor Author

aaronjg commented Oct 16, 2020

Thanks @wackejohn . I just force pushed a fix for that. I also changed "mwan3 wrap" to "mwan3 use" per luizluca's suggestion (argument should tell what is being done, not how it is being done), and rebased off of master.

CI appears to be broken at the moment with all current P/Rs failing Test Builds. However I did not change any of the C code between this and the last push, so this should not block.

@feckert
Copy link
Member

feckert commented Oct 21, 2020

@luizluca regarding #13655 (comment) thanks for your detailed comments. But from my point of view I would like to follow the LD_PRELOAD approach. This has been in development and testing for 2 months now. Therefore I would like to add it to the mwan3 now, as it has advantages from my point of view.

@feckert feckert merged commit 9485b94 into openwrt:master Oct 21, 2020
@feckert
Copy link
Member

feckert commented Oct 21, 2020

I thank @jamesmacwhite and @wackejohn for testing and especially @aaronjg for his contribution to the improvement of the mwan3 👍

@luizluca
Copy link
Contributor

@luizluca regarding #13655 (comment) thanks for your detailed comments. But from my point of view I would like to follow the LD_PRELOAD approach. This has been in development and testing for 2 months now. Therefore I would like to add it to the mwan3 now, as it has advantages from my point of view.

That's fine, it's your call. Anyway, if anything does sideways, we have already a second strategy to try. Thanks for your hard work!

@ptpt52
Copy link
Contributor

ptpt52 commented Oct 22, 2020

@aaronjg @feckert I found some bug:

change wan interface track_ip from 1.1.1.1 to 1.2.3.4(expected it can not accessed)
and expected it go down but not.

@ptpt52
Copy link
Contributor

ptpt52 commented Oct 22, 2020

in fact any changes on mwan3 interface track_ip (time count interval ... ) would not apply

@ptpt52
Copy link
Contributor

ptpt52 commented Oct 22, 2020

and mwan3 use not work as expected

@ptpt52
Copy link
Contributor

ptpt52 commented Oct 22, 2020

mwan3 use wan ping -c1 1.1.1.1
/usr/sbin/mwan3: line 155: use: not found

@ptpt52
Copy link
Contributor

ptpt52 commented Oct 22, 2020

when mwan3track get ifup event it should reload all config from mwan3

@aaronjg
Copy link
Contributor Author

aaronjg commented Oct 22, 2020

when mwan3track get ifup event it should reload all config from mwan3

Chen, what is the use case for this? This behavior changed when we moved mwan3 to use OpenWRT's procd service framework. Other packages require "/etc/init.d/ [reload|restart]" to apply configuration changes. Changes to "/etc/config/mwan3" will cause flash writes, so should be minimized. Are you scripting changes to this? What is the objective? Perhaps there is another way that this can be done.

@ptpt52
Copy link
Contributor

ptpt52 commented Oct 22, 2020 via email

@feckert
Copy link
Member

feckert commented Oct 22, 2020

The configuration is actually not re-read when I change it and call /sbin/reload_config.
I think we have to extend the /etc/init.d/mwan3 with the following code lines.

reload_service() {
       stop
       sleep 1
       start
}

@aaronjg
Copy link
Contributor Author

aaronjg commented Oct 22, 2020

I just change config via web luci and not work as expected

That's definitely a common use case :)

@feckert Yes, that appears to be correct. Potentially we could also be a bit smarter about the reload and send a signal to the mwan3track process, but let's try to get something in ASAP to fix this.

Is the sleep 1 needed? I think the stop process will wait until all the procd processes are done before it returns.

@ptpt52
Copy link
Contributor

ptpt52 commented Oct 23, 2020

but for now mwan3track get the ifup event it do not reload the mwan3 config

@wulfy23
Copy link
Contributor

wulfy23 commented Oct 23, 2020

warning: abstract blabber herein...

init.d > /tmp/state < > *track >< ubus monitor || call reload_config ?

was pretty impressed with the mwan codebase while poking around... great work guys... it is a tad unusual with multiple procd processes within init.d and any watch|lock|state routines ...

above is a random observation from my perspective... ( from memory, lol, without properly looking at the code again... i.e. use state to have track talk to procd more directly ... or something along those lines anyway ), the goal being more of a 'flush/sync' than a stop-start... or at least a 'pre-stop' cleaner flush and dump... or something like that anyway...

2c

@aaronjg
Copy link
Contributor Author

aaronjg commented Oct 25, 2020

warning: abstract blabber herein...

Sorry, I'm not sure I follow.That command does not work for me. Could you explain your proposed solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment