diff --git a/examples/advanced/job_api/tf/README.md b/examples/advanced/job_api/tf/README.md index 24561a6289..964c03d0af 100644 --- a/examples/advanced/job_api/tf/README.md +++ b/examples/advanced/job_api/tf/README.md @@ -7,9 +7,8 @@ All examples in this folder are based on using [TensorFlow](https://tensorflow.o ## Simulated Federated Learning with CIFAR10 Using Tensorflow -This example shows `Tensorflow`-based classic Federated Learning -algorithms, namely FedAvg and FedOpt on CIFAR10 -dataset. This example is analogous to [the example using `Pytorch` +This example demonstrates TensorFlow-based federated learning algorithms on the CIFAR-10 dataset. +This example is analogous to [the example using `Pytorch` backend](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/cifar10/cifar10-sim) on the same dataset, where same experiments were conducted and analyzed. You should expect the same @@ -21,7 +20,7 @@ client-side training logics (details in file and the new [`FedJob`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/job_config/api.py) APIs were used to programmatically set up an -`nvflare` job to be exported or ran by simulator (details in file +NVFlare job to be exported or ran by simulator (details in file [`tf_fl_script_runner_cifar10.py`](tf_fl_script_runner_cifar10.py)), alleviating the need of writing job config files, simplifying development process. @@ -50,10 +49,7 @@ described below at once: bash ./run_jobs.sh ``` The CIFAR10 dataset will be downloaded when running any experiment for -the first time. `Tensorboard` summary logs will be generated during -any experiment, and you can use `Tensorboard` to visualize the -training and validation process as the experiment runs. Data split -files, summary logs and results will be saved in a workspace +the first time. Data split files, summary logs and results will be saved in a workspace directory, which defaults to `/tmp` and can be configured by setting `--workspace` argument of the `tf_fl_script_runner_cifar10.py` script. @@ -65,12 +61,8 @@ script. > `export TF_FORCE_GPU_ALLOW_GROWTH=true && export > TF_GPU_ALLOCATOR=cuda_malloc_asyncp` -The set-up of all experiments in this example are kept the same as -[the example using `Pytorch` -backend](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/cifar10/cifar10-sim). Refer -to the `Pytorch` example for more details. Similar to the Pytorch -example, we here also use Dirichelet sampling on CIFAR10 data labels -to simulate data heterogeneity among data splits for different client +We use Dirichelet sampling (implementation from FedMA (https://github.com/IBM/FedMA)) on +CIFAR10 data labels to simulate data heterogeneity among data splits for different client sites, controlled by an alpha value, ranging from 0 (not including 0) to 1. A high alpha value indicates less data heterogeneity, i.e., an alpha value equal to 1.0 would result in homogeneous data distribution diff --git a/examples/advanced/job_api/tf/run_jobs.sh b/examples/advanced/job_api/tf/run_jobs.sh index aa41e1424c..8ee0f7dee9 100755 --- a/examples/advanced/job_api/tf/run_jobs.sh +++ b/examples/advanced/job_api/tf/run_jobs.sh @@ -25,7 +25,7 @@ GPU_INDX=0 WORKSPACE=/tmp # Run centralized training job -python ./tf_fl_script_executor_cifar10.py \ +python ./tf_fl_script_runner_cifar10.py \ --algo centralized \ --n_clients 1 \ --num_rounds 25 \ @@ -39,7 +39,7 @@ python ./tf_fl_script_executor_cifar10.py \ # Run FedAvg with different alpha values for alpha in 1.0 0.5 0.3 0.1; do - python ./tf_fl_script_executor_cifar10.py \ + python ./tf_fl_script_runner_cifar10.py \ --algo fedavg \ --n_clients 8 \ --num_rounds 50 \ @@ -53,7 +53,7 @@ done # Run FedOpt job -python ./tf_fl_script_executor_cifar10.py \ +python ./tf_fl_script_runner_cifar10.py \ --algo fedopt \ --n_clients 8 \ --num_rounds 50 \ @@ -65,7 +65,7 @@ python ./tf_fl_script_executor_cifar10.py \ # Run FedProx job. -python ./tf_fl_script_executor_cifar10.py \ +python ./tf_fl_script_runner_cifar10.py \ --algo fedprox \ --n_clients 8 \ --num_rounds 50 \ @@ -77,11 +77,11 @@ python ./tf_fl_script_executor_cifar10.py \ # Run scaffold job -python ./tf_fl_script_executor_cifar10.py \ +python ./tf_fl_script_runner_cifar10.py \ --algo scaffold \ --n_clients 8 \ --num_rounds 50 \ --batch_size 64 \ --epochs 4 \ --alpha 0.1 \ - --gpu $GPU_INDX \ No newline at end of file + --gpu $GPU_INDX diff --git a/examples/getting_started/tf/README.md b/examples/getting_started/tf/README.md index 5d4e7de334..032cf6d8dc 100644 --- a/examples/getting_started/tf/README.md +++ b/examples/getting_started/tf/README.md @@ -1,18 +1,13 @@ # Getting Started with NVFlare (TensorFlow) [![TensorFlow Logo](https://upload.wikimedia.org/wikipedia/commons/a/ab/TensorFlow_logo.svg)](https://tensorflow.org/) -We provide several examples to quickly get you started using NVFlare's Job API. +We provide several examples to help you quickly get started with NVFlare. All examples in this folder are based on using [TensorFlow](https://tensorflow.org/) as the model training framework. ## Simulated Federated Learning with CIFAR10 Using Tensorflow -This example shows `Tensorflow`-based classic Federated Learning -algorithms, namely FedAvg and FedOpt on CIFAR10 -dataset. This example is analogous to [the example using `Pytorch` -backend](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/cifar10/cifar10-sim) -on the same dataset, where same experiments -were conducted and analyzed. You should expect the same -experimental results when comparing this example with the `Pytorch` one. +This example demonstrates TensorFlow-based federated learning algorithms, +FedAvg and FedOpt, on the CIFAR-10 dataset. In this example, the latest Client APIs were used to implement client-side training logics (details in file @@ -20,7 +15,7 @@ client-side training logics (details in file and the new [`FedJob`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/job_config/api.py) APIs were used to programmatically set up an -`nvflare` job to be exported or ran by simulator (details in file +NVFlare job to be exported or ran by simulator (details in file [`tf_fl_script_runner_cifar10.py`](tf_fl_script_runner_cifar10.py)), alleviating the need of writing job config files, simplifying development process. @@ -49,10 +44,7 @@ described below at once: bash ./run_jobs.sh ``` The CIFAR10 dataset will be downloaded when running any experiment for -the first time. `Tensorboard` summary logs will be generated during -any experiment, and you can use `Tensorboard` to visualize the -training and validation process as the experiment runs. Data split -files, summary logs and results will be saved in a workspace +the first time. Data split files, summary logs and results will be saved in a workspace directory, which defaults to `/tmp` and can be configured by setting `--workspace` argument of the `tf_fl_script_runner_cifar10.py` script. @@ -64,12 +56,8 @@ script. > `export TF_FORCE_GPU_ALLOW_GROWTH=true && export > TF_GPU_ALLOCATOR=cuda_malloc_asyncp` -The set-up of all experiments in this example are kept the same as -[the example using `Pytorch` -backend](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/cifar10/cifar10-sim). Refer -to the `Pytorch` example for more details. Similar to the Pytorch -example, we here also use Dirichelet sampling on CIFAR10 data labels -to simulate data heterogeneity among data splits for different client +We use Dirichelet sampling (implementation from FedMA (https://github.com/IBM/FedMA)) on +CIFAR10 data labels to simulate data heterogeneity among data splits for different client sites, controlled by an alpha value, ranging from 0 (not including 0) to 1. A high alpha value indicates less data heterogeneity, i.e., an alpha value equal to 1.0 would result in homogeneous data distribution @@ -111,11 +99,11 @@ for alpha in 1.0 0.5 0.3 0.1; do done ``` -## 2. Results +## 3. Results Now let's compare experimental results. -### 2.1 Centralized training vs. FedAvg for homogeneous split +### 3.1 Centralized training vs. FedAvg for homogeneous split Let's first compare FedAvg with homogeneous data split (i.e. `alpha=1.0`) and centralized training. As can be seen from the figure and table below, FedAvg can achieve similar performance to @@ -129,7 +117,7 @@ no difference in data distributions among different clients. ![Central vs. FedAvg](./figs/fedavg-vs-centralized.png) -### 2.2 Impact of client data heterogeneity +### 3.2 Impact of client data heterogeneity Here we compare the impact of data heterogeneity by varying the `alpha` value, where lower values cause higher heterogeneity. As can @@ -145,7 +133,7 @@ as data heterogeneity becomes higher. ![Impact of client data heterogeneity](./figs/fedavg-diff-alphas.png) - + > [!NOTE] > More examples can be found at https://nvidia.github.io/NVFlare. diff --git a/examples/getting_started/tf/nvflare_tf_getting_started.ipynb b/examples/getting_started/tf/nvflare_tf_getting_started.ipynb index 61afb4f870..a5524f7968 100644 --- a/examples/getting_started/tf/nvflare_tf_getting_started.ipynb +++ b/examples/getting_started/tf/nvflare_tf_getting_started.ipynb @@ -254,7 +254,7 @@ "The `FedJob` is used to define how controllers and executors are placed within a federated job using the `to(object, target)` routine.\n", "\n", "Here we use a TensorFlow `BaseFedJob`, where we can define the job name and the initial global model.\n", - "The `BaseFedJob` automatically configures components for model persistence, model selection, and TensorBoard streaming for convenience." + "The `BaseFedJob` automatically configures components for model persistence and model selection for convenience." ] }, { diff --git a/examples/getting_started/tf/run_jobs.sh b/examples/getting_started/tf/run_jobs.sh index b6bf3c1f8d..495402c196 100755 --- a/examples/getting_started/tf/run_jobs.sh +++ b/examples/getting_started/tf/run_jobs.sh @@ -50,38 +50,3 @@ for alpha in 1.0 0.5 0.3 0.1; do --workspace $WORKSPACE done - - -# Run FedOpt job -python ./tf_fl_script_runner_cifar10.py \ - --algo fedopt \ - --n_clients 8 \ - --num_rounds 50 \ - --batch_size 64 \ - --epochs 4 \ - --alpha 0.1 \ - --gpu $GPU_INDX \ - --workspace $WORKSPACE - - -# Run FedProx job. -python ./tf_fl_script_runner_cifar10.py \ - --algo fedprox \ - --n_clients 8 \ - --num_rounds 50 \ - --batch_size 64 \ - --epochs 4 \ - --fedprox_mu 1e-5 \ - --alpha 0.1 \ - --gpu $GPU_INDX - - -# Run scaffold job -python ./tf_fl_script_runner_cifar10.py \ - --algo scaffold \ - --n_clients 8 \ - --num_rounds 50 \ - --batch_size 64 \ - --epochs 4 \ - --alpha 0.1 \ - --gpu $GPU_INDX diff --git a/examples/hello-world/hello-tf/README.md b/examples/hello-world/hello-tf/README.md index 88049a4f9c..53e862c5d9 100644 --- a/examples/hello-world/hello-tf/README.md +++ b/examples/hello-world/hello-tf/README.md @@ -48,7 +48,7 @@ In scenarios where multiple clients are involved, you have to prevent TensorFlow by setting the following flags. ```bash -TF_FORCE_GPU_ALLOW_GROWTH=true TF_GPU_ALLOCATOR=cuda_malloc_async +TF_FORCE_GPU_ALLOW_GROWTH=true TF_GPU_ALLOCATOR=cuda_malloc_async python3 fedavg_script_runner_tf.py ``` If you possess more GPUs than clients, a good strategy is to run one client on each GPU.