Example of using NVIDIA FLARE to train an image classifier using federated averaging (FedAvg) and TensorFlow as the deep learning training framework.
NOTE: This example uses the MNIST handwritten digits dataset and will load its data within the trainer code.
See the Hello TensorFlow example documentation page for details on this example.
To run this example with the FLARE API, you can follow the hello_world notebook, or you can quickly get started with the following:
Follow the Installation instructions to install NVFlare.
Install additional requirements (if you already have a specific version of nvflare installed in your environment, you may want to remove nvflare in the requirements to avoid reinstalling nvflare):
pip3 install tensorflow
Run the script using the job API to create the job and run it with the simulator:
python3 fedavg_script_runner_tf.py
You can find the running logs and results inside the simulator's workspace:
$ ls /tmp/nvflare/jobs/workdir
For running with GPUs, we recommend using NVIDIA TensorFlow docker
If you choose to run the example using GPUs, it is important to note that by default, TensorFlow will attempt to allocate all available GPU memory at the start. In scenarios where multiple clients are involved, you have to prevent TensorFlow from allocating all GPU memory by setting the following flags.
TF_FORCE_GPU_ALLOW_GROWTH=true TF_GPU_ALLOCATOR=cuda_malloc_async
If you possess more GPUs than clients, a good strategy is to run one client on each GPU.
This can be achieved using the -gpu
argument if using the nvflare simulator command, e.g., nvflare simulator -n 2 -gpu 0,1 [job]
.