Skip to content

Commit

Permalink
Merge branch 'main' into dependabot/pip/baselines/fjord/tqdm-4.66.3
Browse files Browse the repository at this point in the history
  • Loading branch information
tanertopal authored May 12, 2024
2 parents e2b6b59 + d66b324 commit 1b9a098
Show file tree
Hide file tree
Showing 62 changed files with 1,481 additions and 537 deletions.
52 changes: 52 additions & 0 deletions .github/workflows/docker-serverapp.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
name: Build docker ServerApp image

on:
workflow_dispatch:
inputs:
flwr-version:
description: "Version of Flower."
required: true
type: string

permissions:
contents: read

jobs:
build-serverapp-images:
name: Build images
uses: ./.github/workflows/_docker-build.yml
# run only on default branch when using it with workflow_dispatch
if: github.ref_name == github.event.repository.default_branch
strategy:
fail-fast: false
matrix:
image: [
{
py-version: "3.8",
tags: "${{ github.event.inputs.flwr-version }}-py3.8-ubuntu22.04"
},
{
py-version: "3.9",
tags: "${{ github.event.inputs.flwr-version }}-py3.9-ubuntu22.04"
},
{
py-version: "3.10",
tags: "${{ github.event.inputs.flwr-version }}-py3.10-ubuntu22.04"
},
{
py-version: "3.11",
# those are two tags <version>-py3.11-py3.11-ubuntu22.04 and <version> separated by a \n
tags: "${{ github.event.inputs.flwr-version }}-py3.11-ubuntu22.04\n${{ github.event.inputs.flwr-version }}"
},
]
with:
namespace-repository: flwr/serverapp
file-dir: src/docker/serverapp
build-args: |
FLWR_VERSION=${{ github.event.inputs.flwr-version }}
PYTHON_VERSION=${{ matrix.image.py-version }}
UBUNTU_VERSION=ubuntu22.04
tags: ${{ matrix.image.tags }}
secrets:
dockerhub-user: ${{ secrets.DOCKERHUB_USERNAME }}
dockerhub-token: ${{ secrets.DOCKERHUB_TOKEN }}
3 changes: 2 additions & 1 deletion .github/workflows/release-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ jobs:
matrix:
images: [
{ repository: "flwr/superlink", file-dir: "src/docker/superlink" },
{ repository: "flwr/supernode", file-dir: "src/docker/supernode" }
{ repository: "flwr/supernode", file-dir: "src/docker/supernode" },
{ repository: "flwr/serverapp", file-dir: "src/docker/serverapp" }
]
with:
namespace-repository: ${{ matrix.images.repository }}
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ Other [examples](https://github.com/adap/flower/tree/main/examples):
- [PyTorch: From Centralized to Federated](https://github.com/adap/flower/tree/main/examples/pytorch-from-centralized-to-federated)
- [Vertical FL](https://github.com/adap/flower/tree/main/examples/vertical-fl)
- [Federated Finetuning of OpenAI's Whisper](https://github.com/adap/flower/tree/main/examples/whisper-federated-finetuning)
- [Federated Finetuning of Large Language Model](https://github.com/adap/flower/tree/main/examples/fedllm-finetune)
- [Federated Finetuning of Large Language Model](https://github.com/adap/flower/tree/main/examples/llm-flowertune)
- [Federated Finetuning of a Vision Transformer](https://github.com/adap/flower/tree/main/examples/vit-finetune)
- [Advanced Flower with TensorFlow/Keras](https://github.com/adap/flower/tree/main/examples/advanced-tensorflow)
- [Advanced Flower with PyTorch](https://github.com/adap/flower/tree/main/examples/advanced-pytorch)
Expand Down
72 changes: 67 additions & 5 deletions datasets/flwr_datasets/partitioner/natural_id_partitioner.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,19 +17,52 @@

from typing import Dict

import numpy as np
from tqdm import tqdm

import datasets
from flwr_datasets.common.typing import NDArrayInt
from flwr_datasets.partitioner.partitioner import Partitioner


class NaturalIdPartitioner(Partitioner):
"""Partitioner for dataset that can be divided by a reference to id in dataset."""
"""Partitioner for a dataset that can be divided by a column with partition ids.
Parameters
----------
partition_by: str
The name of the column that contains the unique values of partitions.
Examples
--------
"flwrlabs/shakespeare" dataset
>>> from flwr_datasets import FederatedDataset
>>> from flwr_datasets.partitioner import NaturalIdPartitioner
>>>
>>> partitioner = NaturalIdPartitioner(partition_by="character_id")
>>> fds = FederatedDataset(dataset="flwrlabs/shakespeare",
>>> partitioners={"train": partitioner})
>>> partition = fds.load_partition(0)
"sentiment140" (aka Twitter) dataset
>>> from flwr_datasets import FederatedDataset
>>> from flwr_datasets.partitioner import NaturalIdPartitioner
>>>
>>> partitioner = NaturalIdPartitioner(partition_by="user")
>>> fds = FederatedDataset(dataset="sentiment140",
>>> partitioners={"train": partitioner})
>>> partition = fds.load_partition(0)
"""

def __init__(
self,
partition_by: str,
):
super().__init__()
self._partition_id_to_natural_id: Dict[int, str] = {}
self._natural_id_to_partition_id: Dict[str, int] = {}
self._partition_id_to_indices: Dict[int, NDArrayInt] = {}
self._partition_by = partition_by

def _create_int_partition_id_to_natural_id(self) -> None:
Expand All @@ -42,6 +75,33 @@ def _create_int_partition_id_to_natural_id(self) -> None:
zip(range(len(unique_natural_ids)), unique_natural_ids)
)

def _create_natural_id_to_int_partition_id(self) -> None:
"""Create a mapping from unique client ids from dataset to int indices.
Natural ids come from the column specified in `partition_by`. This object is
inverse of the `self._partition_id_to_natural_id`. This method assumes that
`self._partition_id_to_natural_id` already exist.
"""
self._natural_id_to_partition_id = {
value: key for key, value in self._partition_id_to_natural_id.items()
}

def _create_partition_id_to_indices(self) -> None:
natural_id_to_indices = {} # type: ignore
natural_ids = np.array(self.dataset[self._partition_by])

for index, natural_id in tqdm(
enumerate(natural_ids), desc="Generating partition_id_to_indices"
):
if natural_id not in natural_id_to_indices:
natural_id_to_indices[natural_id] = []
natural_id_to_indices[natural_id].append(index)

self._partition_id_to_indices = {
self._natural_id_to_partition_id[natural_id]: indices
for natural_id, indices in natural_id_to_indices.items()
}

def load_partition(self, partition_id: int) -> datasets.Dataset:
"""Load a single partition corresponding to a single `partition_id`.
Expand All @@ -60,17 +120,19 @@ def load_partition(self, partition_id: int) -> datasets.Dataset:
"""
if len(self._partition_id_to_natural_id) == 0:
self._create_int_partition_id_to_natural_id()
self._create_natural_id_to_int_partition_id()

return self.dataset.filter(
lambda row: row[self._partition_by]
== self._partition_id_to_natural_id[partition_id]
)
if len(self._partition_id_to_indices) == 0:
self._create_partition_id_to_indices()

return self.dataset.select(self._partition_id_to_indices[partition_id])

@property
def num_partitions(self) -> int:
"""Total number of partitions."""
if len(self._partition_id_to_natural_id) == 0:
self._create_int_partition_id_to_natural_id()
self._create_natural_id_to_int_partition_id()
return len(self._partition_id_to_natural_id)

@property
Expand Down
1 change: 1 addition & 0 deletions datasets/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ datasets = "^2.14.6"
pillow = { version = ">=6.2.1", optional = true }
soundfile = { version = ">=0.12.1", optional = true }
librosa = { version = ">=0.10.0.post2", optional = true }
tqdm ="^4.66.1"

[tool.poetry.dev-dependencies]
isort = "==5.13.2"
Expand Down
6 changes: 6 additions & 0 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,12 @@
# The full name is still at the top of the page
add_module_names = False

# Customizations for the sphinx_copybutton extension
# Omit prompt text when copying code blocks
copybutton_prompt_text = "$ "
# Copy all lines when line continuation character is detected
copybutton_line_continuation_character = "\\"


def find_test_modules(package_path):
"""Go through the python files and exclude every *_test.py file."""
Expand Down
74 changes: 74 additions & 0 deletions doc/source/how-to-authenticate-supernodes.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
Authenticate SuperNodes
=======================

Flower has built-in support for authenticated SuperNodes that you can use to verify the identities of each SuperNode connecting to a SuperLink.
Flower node authentication works similar to how GitHub SSH authentication works:

* SuperLink (server) stores a list of known (client) node public keys
* Using ECDH, both SuperNode and SuperLink independently derive a shared secret
* Shared secret is used to compute the HMAC value of the message sent from SuperNode to SuperLink as a token
* SuperLink verifies the token

We recommend you to check out the complete `code example <https://github.com/adap/flower/tree/main/examples/flower-client-authentication>`_ demonstrating federated learning with Flower in an authenticated setting.

.. note::
This guide covers a preview feature that might change in future versions of Flower.

.. note::
For increased security, node authentication can only be used when encrypted connections (SSL/TLS) are enabled.

Enable node authentication in :code:`SuperLink`
-----------------------------------------------

To enable node authentication, first you need to configure SSL/TLS connections to secure the SuperLink<>SuperNode communication. You can find the complete guide
`here <https://flower.ai/docs/framework/how-to-enable-ssl-connections.html>`_.
After configuring secure connections, you can enable client authentication in a long-running Flower :code:`SuperLink`.
Use the following terminal command to start a Flower :code:`SuperNode` that has both secure connections and node authentication enabled:

.. code-block:: bash
flower-superlink
--certificates certificates/ca.crt certificates/server.pem certificates/server.key
--require-client-authentication ./keys/client_public_keys.csv ./keys/server_credentials ./keys/server_credentials.pub
Let's break down the :code:`--require-client-authentication` flag:

1. The first argument is a path to a CSV file storing all known node public keys. You need to store all known node public keys that are allowed to participate in a federation in one CSV file (:code:`.csv`).

A valid CSV file storing known node public keys should list the keys in OpenSSH format, separated by commas and without any comments. For an example, refer to our code sample, which contains a CSV file with two known node public keys.

2. The second and third arguments are paths to the server's private and public keys. For development purposes, you can generate a private and public key pair using :code:`ssh-keygen -t ecdsa -b 384`.

.. note::
In Flower 1.9, there is no support for dynamically removing, editing, or adding known node public keys to the SuperLink.
To change the set of known nodes, you need to shut the server down, edit the CSV file, and start the server again.
Support for dynamically changing the set of known nodes is on the roadmap to be released in Flower 1.10 (ETA: June).


Enable node authentication in :code:`SuperNode`
-------------------------------------------------

Similar to the long-running Flower server (:code:`SuperLink`), you can easily enable node authentication in the long-running Flower client (:code:`SuperNode`).
Use the following terminal command to start an authenticated :code:`SuperNode`:

.. code-block:: bash
flower-client-app client:app
--root-certificates certificates/ca.crt
--server 127.0.0.1:9092
--authentication-keys ./keys/client_credentials ./keys/client_credentials.pub
The :code:`--authentication-keys` flag expects two arguments: a path to the node's private key file and a path to the node's public key file. For development purposes, you can generate a private and public key pair using :code:`ssh-keygen -t ecdsa -b 384`.


Security notice
---------------

The system's security relies on the credentials of the SuperLink and each SuperNode. Therefore, it is imperative to safeguard and safely store the credentials to avoid security risks such as Public Key Infrastructure (PKI) impersonation attacks.
The node authentication mechanism also involves human interaction, so please ensure that all of the communication is done in a secure manner, using trusted communication methods.


Conclusion
----------

You should now have learned how to start a long-running Flower server (:code:`SuperLink`) and client (:code:`SuperNode`) with node authentication enabled. You should also know the significance of the private key and store it safely to minimize security risks.
72 changes: 24 additions & 48 deletions doc/source/how-to-enable-ssl-connections.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
Enable SSL connections
======================

This guide describes how to a SSL-enabled secure Flower server can be started and
how a Flower client can establish a secure connections to it.
This guide describes how to a SSL-enabled secure Flower server (:code:`SuperLink`) can be started and
how a Flower client (:code:`SuperNode`) can establish a secure connections to it.

A complete code example demonstrating a secure connection can be found
`here <https://github.com/adap/flower/tree/main/examples/advanced-tensorflow>`_.

The code example comes with a README.md file which will explain how to start it. Although it is
already SSL-enabled, it might be less descriptive on how. Stick to this guide for a deeper
The code example comes with a :code:`README.md` file which explains how to start it. Although it is
already SSL-enabled, it might be less descriptive on how it does so. Stick to this guide for a deeper
introduction to the topic.


Expand All @@ -19,7 +19,6 @@ Using SSL-enabled connections requires certificates to be passed to the server a
the purpose of this guide we are going to generate self-signed certificates. As this can become
quite complex we are going to ask you to run the script in
:code:`examples/advanced-tensorflow/certificates/generate.sh`

with the following command sequence:

.. code-block:: bash
Expand All @@ -29,67 +28,44 @@ with the following command sequence:
This will generate the certificates in :code:`examples/advanced-tensorflow/.cache/certificates`.

The approach how the SSL certificates are generated in this example can serve as an inspiration and
starting point but should not be taken as complete for production environments. Please refer to other
The approach for generating SSL certificates in the context of this example can serve as an inspiration and
starting point, but it should not be used as a reference for production environments. Please refer to other
sources regarding the issue of correctly generating certificates for production environments.
For non-critical prototyping or research projects, it might be sufficient to use the self-signed certificates generated using
the scripts mentioned in this guide.

In case you are a researcher you might be just fine using the self-signed certificates generated using
the scripts which are part of this guide.


Server
------

We are now going to show how to write a sever which uses the previously generated scripts.

.. code-block:: python
Server (SuperLink)
------------------

from pathlib import Path
import flwr as fl
Use the following terminal command to start a sever (SuperLink) that uses the previously generated certificates:

# Start server
fl.server.start_server(
server_address="0.0.0.0:8080",
config=fl.server.ServerConfig(num_rounds=4),
certificates=(
Path(".cache/certificates/ca.crt").read_bytes(),
Path(".cache/certificates/server.pem").read_bytes(),
Path(".cache/certificates/server.key").read_bytes(),
)
)
When providing certificates, the server expects a tuple of three certificates. :code:`Path` can be used to easily read the contents of those files into byte strings, which is the data type :code:`start_server` expects.
.. code-block:: bash
flower-superlink --certificates certificates/ca.crt certificates/server.pem certificates/server.key
Client
------
When providing certificates, the server expects a tuple of three certificates paths: CA certificate, server certificate and server private key.

We are now going to show how to write a client which uses the previously generated scripts:

.. code-block:: python
Client (SuperNode)
------------------

from pathlib import Path
import flwr as fl
Use the following terminal command to start a client (SuperNode) that uses the previously generated certificates:

# Define client somewhere
client = MyFlowerClient()
.. code-block:: bash
# Start client
fl.client.start_client(
"localhost:8080",
client=client.to_client(),
root_certificates=Path(".cache/certificates/ca.crt").read_bytes(),
)
flower-client-app client:app
--root-certificates certificates/ca.crt
--server 127.0.0.1:9092
When setting :code:`root_certificates`, the client expects the PEM-encoded root certificates as a byte string.
We are again using :code:`Path` to simplify reading those as byte strings.
When setting :code:`root_certificates`, the client expects a file path to PEM-encoded root certificates.


Conclusion
----------

You should now have learned how to generate self-signed certificates using the given script, start a
SSL-enabled server, and have a client establish a secure connection to it.
You should now have learned how to generate self-signed certificates using the given script, start an
SSL-enabled server and have a client establish a secure connection to it.


Additional resources
Expand Down
Loading

0 comments on commit 1b9a098

Please sign in to comment.