Skip to content
/ Flow Public

Code for "Flow: Per-Instance Personalized Federated Learning" (NeurIPS 2023). Flow addresses the challenge of statistical heterogeneity in federated learning through creating a dynamic personalized model for each input instance through a routing mechanism.

Notifications You must be signed in to change notification settings

Astuary/Flow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flow: Per-Instance Personalized Federated Learning

Flow addresses the challenge of statistical heterogeneity in federated learning through creating a dynamic personalized model for each input instance through a routing mechanism. Our contributions are threefold:

( a ) We propose a per-instance and per-client personalization approach Flow that creates personalized models via dynamic routing, which improves both the performance of the personalized model and the generalizability of the global model.

( b ) We derive convergence analysis for both global and personalized models, showing how the routing policy influences convergence rates based on the across- and within- client heterogeneity.

( c ) We empirically evaluate the superiority of Flow in both generalization and personalized accuracy on various vision and language tasks in cross-device FL settings.

Download Datasets

  1. Stackoverflow NWP:

./dataloaders contains a modified data loader for Stackoverflow dataset, which has been hosted on TensorFlow Federated library.

./dataloaders/stackoverflow directory contains list of all the available clients which are used for training, validation, and testing. This dataloader is for Stackoverflow Next Word Prediction Task for a non-convex RNN model.

  1. Stackoverflow LR:

./dataloaders contains a modified data loader for Stackoverflow LR dataset.

./dataloaders/stackoverflow directory contains list of all the available clients which are used for training, validation, and testing. This dataloader is for Stackoverflow Tag Prediction Task for a convex logistic regression baseline.

  1. EMNIST:

./datalaoders contains a modified data loader for EMNIST dataset, which has been hosted on TensorFlow Federated library.

./dataloaders/emnist directory contains list of all the available clients which are used for training, validation, and testing. This dataloader is for EMNIST Image Classification Task for a non-convex CNN baseline.

  1. Shakespeare:

./dataloaders contains a modified data loader for Shakespeare dataset, which has been hosted on TensorFlow Federated library.

./dataloaders/shakespeare directory contains list of all the available clients which are used for training, validation, and testing. This dataloader is for Shakespeare Next Character Prediction Task for a non-convex RNN baseline.

  1. CIFAR-100 (and CIFAR10):

./dataloaders contains a modified data loader for CIFARA-100 dataset, which has been hosted on TensorFlow Federated library.

./dataloaders/cifar100 directory contains list of all the available clients which are used for training, validation, and testing. This dataloader is for CIFAR-100 Image Classification Task for a non-convex CNN baseline.

Models

Each dataset/task would have its corresponding model in ./models. Each model file contains a model definition and its forward pass method. The models from ./models and dataloaders from ./dataloaders are referred to from client trainer files and server aggregator files, which are inside ./trainers directory.

Run Baselines and Flow

Each baseline among

have their separate client-side trainers in ./trainers.

Our approach, Flow is implemented similar to its baselines in ./trainers.

The entry point of each federated training standalone simulation execution is the client file, which is inside ./trainers.

First, one would need to create checkpoint and results directory in ./results directory, with a subdirectory named after the baseline name (e.g., ./results/<dataset_name>/<method_name> to store results of FedAvg baselines which you would run from ./trainers/client_fedavg.py). This would store the clients/server intermediate results.

From the root directory, one needs to type the following command to start a simulation:

python3 ./trainers/client_\<baseline name\>.py --dataset \<dataset name\>

<dataset name> can be one of these "stackoverflow_nwp", "stackoverflow_lr", "emnist", "shakespeare", "emnist", "cifar100", "synthetic".

One can change the hyperparameters for any trainer and dataset combo in ./configs/hyperpatrameters.py file.

Single Client Tests

Flow is built on Flower framework, for Pytorch.

Because of the way Flower prints logs on the terminal and handles errors/exceptions, one might have trouble debugging some parts of the code. Easiest solution would be to run the same functions for a single client.

Hence in each client_<baseline trainer name>.py file, one can comment out the

fl.simulation.start_simulation call and uncomment the single client execution code below it. There are codechunks for all the datasets in ./trainers/client_fedavg.py, one can have similar single client tests for other baseline trainers too.

About

Code for "Flow: Per-Instance Personalized Federated Learning" (NeurIPS 2023). Flow addresses the challenge of statistical heterogeneity in federated learning through creating a dynamic personalized model for each input instance through a routing mechanism.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages