Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address VDR report #2335

Merged
merged 6 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/flare_overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Built for productivity
FLARE is designed for maximum productivity, providing a range of tools to enhance user experience and research efficiency at different stages of the development process:

- **FLARE Client API:** Enables users to transition seamlessly from ML/DL to FL with just a few lines of code changes.
- **Simulator CLI:** Allows users to simulate federated learning or computing jobs in multi-thread settings within a single computer, offering quick response and debugging. The same job can be deployed directly to production.
- **Simulator CLI:** Allows users to simulate federated learning or computing jobs in multi-process settings within a single computer, offering quick response and debugging. The same job can be deployed directly to production.
- **POC CLI:** Facilitates the simulation of federated learning or computing jobs in multi-process settings within one computer. Different processes represent server, clients, and an admin console, providing users with a realistic sense of the federated network. It also allows users to simulate project deployment on a single host.
- **Job CLI:** Permits users to create and submit jobs directly in POC or production environments.
- **FLARE API:** Enables users to run jobs directly from Python code or notebooks.
Expand Down
13 changes: 11 additions & 2 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,11 @@ Installation
.. note::
The server and client versions of nvflare must match, we do not support cross-version compatibility.

Supported Operating Systems
---------------------------
- Linux
- OSX (Note: some optional dependencies are not compatible, such as tenseal and openmined.psi)

Python Version
--------------

Expand Down Expand Up @@ -120,7 +125,6 @@ You may find that the pip and setuptools versions in the venv need updating:
(nvflare-env) $ python3 -m pip install -U pip
(nvflare-env) $ python3 -m pip install -U setuptools


Install Stable Release
----------------------

Expand All @@ -130,6 +134,11 @@ Stable releases are available on `NVIDIA FLARE PyPI <https://pypi.org/project/nv

$ python3 -m pip install nvflare

.. note::

In addition to the dependencies included when installing nvflare, many of our example applications have additional packages that must be installed.
Make sure to install from any requirement.txt files before running the examples.
See :github_nvflare_link:`nvflare/app_opt <nvflare/app_opt>` for modules and components with optional dependencies.

.. _containerized_deployment:

Expand Down Expand Up @@ -213,7 +222,7 @@ Production mode is secure with TLS certificates - depending the choice the deplo

- HA or non-HA
- Local or remote
- On-premise or on cloud
- On-premise or on cloud (See :ref:`cloud_deployment`)

Using non-HA, secure, local mode (all clients and server running on the same host), production mode is very similar to POC mode except it is secure.

Expand Down
11 changes: 7 additions & 4 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,18 +48,21 @@ and simulation to real-world production deployment. Some of the key components
- **Management tools** for secure provisioning and deployment, orchestration, and management
- **Specification-based API** for extensibility

Learn more in the :ref:`FLARE Overview <flare_overview>`, :ref:`What's New <whats_new>`, and the
:ref:`User Guide <user_guide>` and :ref:`Programming Guide <programming_guide>`.
Learn more about FLARE features in the :ref:`FLARE Overview <flare_overview>` and :ref:`What's New <whats_new>`.

Getting Started
===============
For first-time users and FL researchers, FLARE provides the :ref:`FL Simulator <fl_simulator>` that allows you to build, test, and deploy applications locally.
The :ref:`Getting Started <getting_started>` guide covers installation and walks through an example application using the FL Simulator.
Additional examples can be found at the :ref:`Examples Applications <example_applications_algorithms>`, which showcase different federated learning workflows and algorithms on various machine learning and deep learning tasks.

FLARE for Users
===============
If you want to learn how to interact with the FLARE system, please refer to the :ref:`User Guide <user_guide>`.
When you are ready to for a secure, distributed deployment, the :ref:`Real World Federated Learning <real_world_fl>` section covers the tools and process
required to deploy and operate a secure, real-world FLARE project.

FLARE for Developers
====================
When you're ready to build your own application, the :ref:`Programming Best Practices <best_practices>`, :ref:`FAQ<faq>`, and
:ref:`Programming Guide <programming_guide>` give an in depth look at the FLARE platform and APIs.
When you're ready to build your own application, the :ref:`Programming Guide <programming_guide>`, :ref:`Programming Best Practices <best_practices>`, :ref:`FAQ<faq>`, and :ref:`API Reference <apidocs/modules>`
give an in depth look at the FLARE platform and APIs.
25 changes: 18 additions & 7 deletions docs/programming_guide/experiment_tracking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,13 @@ provided examples, the Receiver is on the FL server, but it could also be on the
- Server-side experiment tracking also can organize different clients' results into different experiment runs so they can be easily
compared side-by-side.

.. note::

This page covers experiment tracking using :class:`LogWriters <nvflare.app_common.tracking.log_writer.LogWriter>`,
which are configured and used with :ref:`executor` or :ref:`model_learner` on the FLARE-side code.
However if using the Client API, please refer to :ref:`client_api` and :ref:`nvflare.client.tracking` for adding experiment tracking to your custom training code.


**************************************
Tools, Sender, LogWriter and Receivers
**************************************
Expand All @@ -60,9 +67,9 @@ where the actual experiment logs are recorded. The components that receive
these logs are called Receivers based on :class:`AnalyticsReceiver <nvflare.app_common.widgets.streaming.AnalyticsReceiver>`.
The receiver component leverages the experiment tracking tool and records the logs during the experiment run.

In a normal setting, we would have pairs of sender and receivers, such as:
In a normal setting, we would have pairs of sender and receivers, with some provided implementations in :mod:`nvflare.app_opt.tracking`:

- TBWriter <-> TBReceiver
- TBWriter <-> TBAnalyticsReceiver
- MLflowWriter <-> MLflowReceiver
- WandBWriter <-> WandBReceiver

Expand Down Expand Up @@ -94,13 +101,11 @@ There are three things to consider for developing a custom experiment tracking t
Data Type
=========

Currently, the supported data types are metrics, params, and text. If you require other data types, may sure you add
the type to :class:`AnalyticsDataType <nvflare.apis.analytix.AnalyticsDataType>`.
Currently, the supported data types are listed in :class:`AnalyticsDataType <nvflare.apis.analytix.AnalyticsDataType>`, and other data types can be added as needed.

Writer
======

Implement LogWriter interface with the API syntax. For each tool, we mimic the API syntax of the underlying tool,
Implement :class:`LogWriter <nvflare.app_common.tracking.log_writer.LogWriter>` interface with the API syntax. For each tool, we mimic the API syntax of the underlying tool,
so users can use what they are familiar with without learning a new API.
For example, for Tensorboard, TBWriter uses add_scalar() and add_scalars(); for MLflow, the syntax is
log_metric(), log_metrics(), log_parameter(), and log_parameters(); for W&B, the writer just has log().
Expand All @@ -109,7 +114,7 @@ The data collected with these calls will all send to the AnalyticsSender to deli
Receiver
========

Implement AnalyticsReceiver interface and determine how to represent different sites' logs. In all three implementations
Implement :class:`AnalyticsReceiver <nvflare.app_common.widgets.streaming.AnalyticsReceiver>` interface and determine how to represent different sites' logs. In all three implementations
(Tensorboard, MLflow, WandB), each site's log is represented as one run. Depending on the individual tool, the implementation
can be different. For example, for both Tensorboard and MLflow, we create different runs for each client and map to the
site name. In the WandB implementation, we have to leverage multiprocess and let each run in a different process.
Expand All @@ -121,13 +126,19 @@ Examples Overview
The :github_nvflare_link:`experiment tracking examples <examples/advanced/experiment-tracking>`
illustrate how to leverage different writers and receivers. All examples are based upon the hello-pt example.

TensorBoard
===========
The example in the "tensorboard" directory shows how to use the Tensorboard Tracking Tool (for both the
sender and receiver). See :ref:`tensorboard_streaming` for details.

MLflow
======
Under the "mlflow" directory, the "hello-pt-mlflow" job shows how to use MLflow for tracking with both the MLflow sender
and receiver. The "hello-pt-tb-mlflow" job shows how to use the Tensorboard Sender, while the receiver is MLflow.
See :ref:`experiment_tracking_mlflow` for details.

Weights & Biases
================
Under the :github_nvflare_link:`wandb <examples/advanced/experiment-tracking/wandb>` directory, the
"hello-pt-wandb" job shows how to use Weights and Biases for experiment tracking with
the WandBWriter and WandBReceiver to log metrics.
Expand Down
33 changes: 20 additions & 13 deletions docs/user_guide/nvflare_cli/fl_simulator.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Command examples
Run a single NVFlare app
========================

This command will run the same ``hello-numpy-sag`` app on the server and 8 clients using 1 thread. The client names will be site-1, site-2, ... , site-8:
This command will run the same ``hello-numpy-sag`` app on the server and 8 clients using 1 process. The client names will be site-1, site-2, ... , site-8:

.. code-block:: python

Expand Down Expand Up @@ -829,22 +829,29 @@ application run.
status = run_simulator(args)
sys.exit(status)

****************************
Threads, Clients, and Events
****************************
******************************
Processes, Clients, and Events
******************************

Specifying threads
==================
The simulator ``-t`` option provides the ability to specify how many threads to run the simulator with.
Specifying number of processes
==============================
The simulator ``-t`` option provides the ability to specify how many processes to run the simulator with.

When you run the simulator with ``-t 1``, there is only one client active and running at a time, and the clients will be running in
turn. This is to enable the simulation of large number of clients using a single machine with limited resources.
.. note::

The ``-t`` and ``--threads`` option for simulator was originally due to clients running in separate threads.
However each client now actually runs in a separate process. This distinction will not affect the user experience.

- N = number of clients (``-n``)
- T = number of processes (``-t``)

Note that if you have fewer threads than the number of clients, ClientRunner/learner object will go thorugh setup and
teardown in every round.
When running the simulator with fewer processes than clients (T < N)
the simulator will need to swap-in/out the clients for the processes, resulting in some of the clients running sequentially as processes are available.
This also will cause the ClientRunner/learner objects to go through setup and teardown in every round.
Using T < N is only needed when trying to simulate of large number of clients using a single machine with limited resources.

With ``-t=num_client``, the simulator will run the number of clients in separate threads at the same time. Each
client will always be running in memory with no swap_in / swap_out, but it will require more resources available.
In most cases, run the simulator with the same number of processes as clients (T = N). The simulator will run the number of clients in separate processes at the same time. Each
client will always be running in memory with no swap-in/out, but it will require more resources available.

For the dataset / tensorboard initialization, you could make use of EventType.SWAP_IN and EventType.SWAP_OUT
in the application.
Expand Down
4 changes: 3 additions & 1 deletion examples/advanced/experiment-tracking/wandb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,9 @@ export PYTHONPATH=${PWD}/..
Import the W&B Python SDK and log in:

```
wandb.login()
python3
>>> import wandb
>>> wandb.login()
```

Provide your API key when prompted.
Expand Down
2 changes: 1 addition & 1 deletion examples/hello-world/step-by-step/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ To run the notebooks in each example, please make sure you first set up a virtua

These step-by-step example series are aimed to help users quickly get started and learn about FLARE.
For consistency, each example in the series uses the same dataset- CIFAR10 for image data and the HIGGS dataset for tabular data.
The examples will build upon previous ones to showcase different features, workflows, or APIs, allowing users to gain a comprehensive understanding of FLARE functionalities. See the README in each directory for more details about each series.
The examples will build upon previous ones to showcase different features, workflows, or APIs, allowing users to gain a comprehensive understanding of FLARE functionalities (Note: each example is self-contained, so going through them in order is not required, but recommended). See the README in each directory for more details about each series.

## Common Questions

Expand Down
4 changes: 2 additions & 2 deletions examples/hello-world/step-by-step/cifar10/cse/cse.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -180,9 +180,9 @@
"id": "48271064",
"metadata": {},
"source": [
"For additional resources, see other examples for SAG with CSE using the [ModelLearner](../sag_model_learner/sag_model_learner.ipynb), [Executor](../sag_executor/sag_executor.ipynb), and [Hello-Numpy](https://github.com/NVIDIA/NVFlare/tree/main/examples/hello-world/hello-numpy-cross-val).\n",
"For additional resources, see other examples for SAG with CSE using the [ModelLearner](../sag_model_learner/sag_model_learner.ipynb) and [Executor](../sag_executor/sag_executor.ipynb). [Hello-Numpy](https://github.com/NVIDIA/NVFlare/tree/main/examples/hello-world/hello-numpy-cross-val) also demonstrates how to run cross-site evaluation using the previous training results.\n",
"\n",
"Also the ability to run Cross-site Evaluation without having to re-run training will be added in the near future."
"Next we will look at the [cyclic](../cyclic/cyclic.ipynb) example, which shows the cyclic workflow for the Cyclic Weight Transfer algorithm."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,10 @@
"id": "48271064",
"metadata": {},
"source": [
"As an additional resource, also see the [hello-cyclic](../../../../hello-world/hello-cyclic/README.md) for a Tensorflow Executor implementation using the MNIST dataset."
"As an additional resource, also see the [hello-cyclic](../../../../hello-world/hello-cyclic/README.md) for a Tensorflow Executor implementation using the MNIST dataset.\n",
"\n",
"While this example focused on the server-controlled cyclic workflow, now we will introduce the idea of client-controlled workflows.\n",
"The next [cyclic_ccwf](../cyclic_ccwf/cyclic_ccwf.ipynb) example is a client-controlled version of the cyclic workflow."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,9 @@
"cell_type": "markdown",
"id": "9bef3134",
"metadata": {},
"source": []
"source": [
"Lastly, we have the [swarm](../swarm/swarm.ipynb) example, which covers swarm learning and client-controlled cross-site evaluation workflows."
]
}
],
"metadata": {
Expand Down
5 changes: 4 additions & 1 deletion examples/hello-world/step-by-step/cifar10/sag/sag.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,10 @@
"id": "b055bde7-432d-4e6b-9163-b5ab7ede7b73",
"metadata": {},
"source": [
"The job should be running in the simulator mode. We are done with the training. "
"The job should be running in the simulator mode. We are done with the training. \n",
"\n",
"The next 5 examples will use the same ScatterAndGather workflow, but will demonstrate different execution APIs and feature.\n",
"In the next example [sag_deploy_map](../sag_deploy_map/sag_deploy_map.ipynb), we will learn about the deploy_map configuration for deployment of apps to different sites."
]
}
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,10 @@
"id": "0af8036f-1f94-426d-8eb7-6e8b9be70a7e",
"metadata": {},
"source": [
"The job should be running in the simulator mode. We are done with the training. "
"The job should be running in the simulator mode. We are done with the training. \n",
"\n",
"In the next example [sag_model_learner](../sag_model_learner/sag_model_learner.ipynb), we will illustrate how to use the Model Learner API instead of the Client API,\n",
"and highlight why and when to use it."
]
}
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,12 @@
"id": "48271064",
"metadata": {},
"source": [
"For additional resources, take a look at the various other executors with different use cases in the app_common, app_opt, and examples folder."
"For additional resources, take a look at the various other executors with different use cases in the app_common, app_opt, and examples folder.\n",
"\n",
"In the previous examples we have finished covering each of Execution API types: the Client API, Model Learner, and Executor.\n",
"Now we will be using the Client API in future examples to highlight other features and workflows.\n",
"\n",
"Next we have the [sag_mlflow](../sag_mlflow/sag_mlflow.ipynb) example, which shows how to enable MLflow experiment tracking logs."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,10 @@
"id": "b19da336",
"metadata": {},
"source": [
"As an additional resource, see the [CIFAR10 Real World Example](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/cifar10/cifar10-real-world) for creating a secure workspace for HE using provisioning instead of POC mode."
"As an additional resource, see the [CIFAR10 Real World Example](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/cifar10/cifar10-real-world) for creating a secure workspace for HE using provisioning instead of POC mode.\n",
"\n",
"Now we will begin to take a look at other workflows besides ScatterAndGather.\n",
"First we have the [cse](../cse/cse.ipynb) example, which shows the server-controlled cross-site evaluation workflow."
]
}
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -183,12 +183,12 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e69c9ed2-359a-4f97-820f-25e9323a4e92",
"cell_type": "markdown",
"id": "58037d1e",
"metadata": {},
"outputs": [],
"source": []
"source": [
"Next we will look at the [sag_he](../sag_he/sag_he.ipynb) example, which demonstrates how to enable homomorphic encryption using the POC -he mode."
]
}
],
"metadata": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,9 @@
"id": "48271064",
"metadata": {},
"source": [
"As an additional resource, also see the [CIFAR10 examples](../../../../advanced/cifar10/README.md) for a comprehensive implementation of a PyTorch ModelLearner."
"As an additional resource, also see the [CIFAR10 examples](../../../../advanced/cifar10/README.md) for a comprehensive implementation of a PyTorch ModelLearner.\n",
"\n",
"In the next example [sag_executor](../sag_executor/sag_executor.ipynb), we will illustrate how to use the Executor API for more specific use cases."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -664,9 +664,8 @@
"\n",
"If you would like to see another example of federated statistics calculations and configurations, please checkout [federated_statistics](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/federated-statistics) and [fed_stats with spleen_ct_segmentation](https://github.com/NVIDIA/NVFlare/tree/main/integration/monai/examples/spleen_ct_segmentation_sim)\n",
"\n",
"Let's move on to the next example and see how can we train the image classifier using pytorch with CIFAR10 data.\n",
"\n",
"\n"
"Let's move on to the next examples and see how can we train the image classifier using pytorch with CIFAR10 data.\n",
"First we will look at the [sag](../sag/sag.ipynb) example, which illustrates how to use the ScatterAndGather workflow for FedAvg with the Client API.\n"
]
}
],
Expand Down
Loading
Loading