Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighted loadbalancing (client steering), WG peer config push to clients #87

Merged
merged 8 commits into from
Jan 9, 2024

Conversation

DasSkelett
Copy link
Member

@DasSkelett DasSkelett commented Aug 24, 2022

In-progress work to add weighted loadbalancing between workers/gateways to wgkex. This allows for steering clients between gateways for better load distribution. Relatedly, parts of the WireGuard config (endpoint address+port, pubkey, gateway wg interface address) is now pushed to clients in the exchange API response.

See #39 and #49 and freifunkMUC/site-ffm#142

Overview

Each worker periodically scans for the number of connected peers for each interface/domain, and publishes the value through MQTT.
Each worker publishes its WireGuard PubKey, WireGuard listening port, WireGuard interface address and externally resolvable hostname/public IP address (read from config) through MQTT.

The broker stores the metrics and worker data. When a client hits the exchange endpoint, the best worker based on a simple weighting algorithm is selected.
The PubKey, interface address and external address+port for this worker is returned in the response to the client, like:

{
  "Endpoint": {
    "Address": "gw04.ext.ffmuc.net",
    "Port": "40011",
    "AllowedIPs": [
      "fe80::27c:16ff:fec0:6c74"
    ],
    "PublicKey": "TszFS3oFRdhsJP3K0VOlklGMGYZy+oFCtlaghXJqW2g="
  }
}

The weighting algorithm

Based on the total number of connected peers the should-be value for each worker is calculated like target = (worker_weight / sum_of_weights) * total_peers.
Then the difference with the actual number of connections is calculated: diff = (actual_peers - target)
The worker list is then sorted by the difference values; the worker with the lowest diff value is chosen (usually below 0, i.e. "clients missing").

MQTT topics

Publishing keys broker->worker: wireguard/{domain}/{worker}
Publishing metrics worker->broker: wireguard-metrics/{domain}/{worker}/connected_peers
Publishing worker status: wireguard-worker/{worker}/status
Publishing worker data: wireguard-worker/{worker}/{domain}/data

Other cleanup

  • README additions & build fixes #89
  • The tests now use the absolute import pathes wgkex.{worker,config,common,broker}.*, which allows running most tests with python3 -m unittest as well. The exception is netlink_test.py, as somehow the mocking doesn't work 100% there yet.
    This required changes in the BUILD files, and the mocking code.
  • The config is now loaded and required in many more places. In order to avoid performance regressions through constant disk re-reads and reparsing through load_config() the config system has been refactored.
    The config is read and parsed and converted into a Config object only once at first use.
    The Config class is now used as primary access to the config values, instead of using the bare dict.

TODO:

  • More testing, multiple interfaces
  • Code cleanup & quality improvements & documentation
  • Unit tests
  • Split parts into separate PRs for easier review

Possible after merge in future iterations:

  • Set client.suppress_exceptions on worker to avoid crashes when errors are raised.
  • Monitor performance and making sure there are no performance regressions with all the new loops and dicts
  • Writing the client code for this in gluon-mesh-wireguard-vxlan, so the clients actually choose the returned gateway. This can be done afterwards, though.
  • Sign JSON responses, verify signatures in client code

Closes #39

@DasSkelett DasSkelett force-pushed the loadbalancing branch 10 times, most recently from ff80afd to 80a963f Compare August 28, 2022 15:34
requirements.txt Outdated Show resolved Hide resolved
@DasSkelett
Copy link
Member Author

DasSkelett commented Sep 7, 2022

Ugh, I might have to redesign the worker background threads for metrics etc.
Judging from eclipse/paho.mqtt.python#354, the library is not threadsafe, calling client.publish simultaneously causes a deadlock.

Although at the bottom it says that version 1.6 has improved in this regard.

@awlx
Copy link
Member

awlx commented Oct 10, 2022

In the future we should add another json response called Signature to verify the reply originated an authoritative broker.

@DasSkelett DasSkelett force-pushed the loadbalancing branch 4 times, most recently from e0503fb to 120655a Compare October 30, 2022 15:40
@DasSkelett DasSkelett changed the title [WIP] Weighted loadbalancing, config push to clients Weighted loadbalancing, config push to clients Nov 20, 2022
@DasSkelett DasSkelett marked this pull request as ready for review November 20, 2022 14:11
@GoliathLabs
Copy link
Member

@DasSkelett would it be possible to improve the test coverage?

Copy link
Member

@awlx awlx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should test it soon.

@grische
Copy link
Contributor

grische commented Dec 5, 2023

@DasSkelett can you check the conflicts?

@DasSkelett DasSkelett force-pushed the loadbalancing branch 3 times, most recently from 4e61b52 to 413bcaf Compare December 17, 2023 21:27
@DasSkelett DasSkelett force-pushed the loadbalancing branch 4 times, most recently from 7c85e5f to 0a65399 Compare January 7, 2024 16:31
* Workers publish their number of connected peers per domain
* Workers publish their status, i.e. up or down
* The new /api/v2/exchange endpoint returns a predetermined gateway endpoint for clients
* This gateway is chosen based on weighted loadbalancing between online workers/gateways
* Fetch worker data through netlink and publish with MQTT:
  * Read worker pubkey, port and link address from interface.
  * Publish it together with the external domain / address (read from the config file) via MQTT to the broker.
@DasSkelett
Copy link
Member Author

Even more bugs fixed from the old code, pretty happy with it now. A test deploy of the broker on one of our hosts was fine.

@DasSkelett DasSkelett merged commit 11213a5 into freifunkMUC:main Jan 9, 2024
5 checks passed
@DasSkelett DasSkelett deleted the loadbalancing branch January 9, 2024 18:24
@DasSkelett DasSkelett changed the title Weighted loadbalancing, config push to clients Weighted loadbalancing (client steering), WG peer config push to clients Jan 16, 2024
grische added a commit to grische/site-ffm that referenced this pull request Mar 15, 2024
The new version of ffmuc-mesh-vpn-wireguard-vxlan supports load-balancing
of clients using wgkex.

See freifunkMUC/wgkex#87 for details.
grische added a commit to grische/site-ffm that referenced this pull request Mar 17, 2024
The new version of ffmuc-mesh-vpn-wireguard-vxlan supports load-balancing
of clients using wgkex.

See freifunkMUC/wgkex#87 for details.
grische added a commit to grische/site-ffm that referenced this pull request Mar 26, 2024
The new version of ffmuc-mesh-vpn-wireguard-vxlan supports load-balancing
of clients using wgkex.

See freifunkMUC/wgkex#87 for details.
grische added a commit to grische/site-ffm that referenced this pull request Mar 27, 2024
The new version of ffmuc-mesh-vpn-wireguard-vxlan supports load-balancing
of clients using wgkex.

For details, see
- freifunkMUC/wgkex#87
- freifunk-gluon/community-packages#100
- freifunk-gluon/community-packages#101
- freifunk-gluon/community-packages#102
github-actions bot pushed a commit to freifunkMUC/site-ffm that referenced this pull request Mar 27, 2024
The new version of ffmuc-mesh-vpn-wireguard-vxlan supports load-balancing
of clients using wgkex.

For details, see
- freifunkMUC/wgkex#87
- freifunk-gluon/community-packages#100
- freifunk-gluon/community-packages#101
- freifunk-gluon/community-packages#102

(cherry picked from commit fc42990)
grische added a commit to grische/site-ffm that referenced this pull request Apr 6, 2024
The new version of ffmuc-mesh-vpn-wireguard-vxlan supports load-balancing
of clients using wgkex.

For details, see
- freifunkMUC/wgkex#87
- freifunk-gluon/community-packages#100
- freifunk-gluon/community-packages#101
- freifunk-gluon/community-packages#102
grische added a commit to grische/site-ffm that referenced this pull request Apr 6, 2024
The new version of ffmuc-mesh-vpn-wireguard-vxlan supports load-balancing
of clients using wgkex.

For details, see
- freifunkMUC/wgkex#87
- freifunk-gluon/community-packages#100
- freifunk-gluon/community-packages#101
- freifunk-gluon/community-packages#102
grische added a commit to grische/site-ffm that referenced this pull request Apr 6, 2024
The new version of ffmuc-mesh-vpn-wireguard-vxlan supports load-balancing
of clients using wgkex.

For details, see
- freifunkMUC/wgkex#87
- freifunk-gluon/community-packages#100
- freifunk-gluon/community-packages#101
- freifunk-gluon/community-packages#102
grische added a commit to grische/site-ffm that referenced this pull request Apr 6, 2024
The new version of ffmuc-mesh-vpn-wireguard-vxlan supports load-balancing
of clients using wgkex.

For details, see
- freifunkMUC/wgkex#87
- freifunk-gluon/community-packages#100
- freifunk-gluon/community-packages#101
- freifunk-gluon/community-packages#102
grische added a commit to grische/site-ffm that referenced this pull request Apr 6, 2024
The new version of ffmuc-mesh-vpn-wireguard-vxlan supports load-balancing
of clients using wgkex.

For details, see
- freifunkMUC/wgkex#87
- freifunk-gluon/community-packages#100
- freifunk-gluon/community-packages#101
- freifunk-gluon/community-packages#102
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make communication bi-directional between daemon-worker
4 participants