Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update execution api documentation and docstrings #2305

Merged
merged 2 commits into from
Jan 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@ Clone NVFLARE repo to get examples, switch main branch (latest stable branch)

$ git clone https://github.com/NVIDIA/NVFlare.git
$ cd NVFlare
$ git switch main
$ git switch 2.4


Note on branches:

* The `main <https://github.com/NVIDIA/NVFlare/tree/main>`_ branch is the default (unstable) development branch

* The 2.0, 2.1, 2.2, and 2.3 etc. branches are the branches for each major release and minor patches
* The 2.1, 2.2, 2.3, and 2.4 etc. branches are the branches for each major release and minor patches


Quick Start with Simulator
Expand Down
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ NVIDIA FLARE
glossary

NVIDIA FLARE (NVIDIA Federated Learning Application Runtime Environment) is a domain-agnostic, open-source, extensible SDK that allows
researchers and data scientists to adaptexisting ML/DL workflows (PyTorch, RAPIDS, Nemo, TensorFlow) to a federated paradigm; and enables
researchers and data scientists to adapt existing ML/DL workflows (PyTorch, RAPIDS, Nemo, TensorFlow) to a federated paradigm; and enables
platform developers to build a secure, privacy preserving offering for a distributed multi-party collaboration.

NVIDIA FLARE is built on a componentized architecture that gives you the flexibility to take federated learning workloads from research
Expand All @@ -34,7 +34,7 @@ and simulation to real-world production deployment. Some of the key components
- **Management tools** for secure provisioning and deployment, orchestration, and management
- **Specification-based API** for extensibility

Learn more in the :ref:`FLARE Overview <flare_overview>`, :ref:`Key Features <key_features>`, :ref:`What's New <whats_new>`, and the
Learn more in the :ref:`FLARE Overview <flare_overview>`, :ref:`What's New <whats_new>`, and the
:ref:`User Guide <user_guide>` and :ref:`Programming Guide <programming_guide>`.

Getting Started
Expand Down
2 changes: 1 addition & 1 deletion docs/programming_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Please refer to :ref:`application` for more details.
:maxdepth: 1

programming_guide/workflows_and_controllers
programming_guide/fl_clients
programming_guide/execution_api_type
programming_guide/shareable
programming_guide/data_exchange_object
programming_guide/fl_context
Expand Down
10 changes: 6 additions & 4 deletions docs/programming_guide/controllers/controllers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,9 @@ The Controller's Task Manager manages the task's lifecycle:

.. note::

In NVIDIA FLARE 2.0, the underlying communication is by gRPC: the client always initiates communication by sending
a request to the server and a receiving response. When we say "server sends task to the client", it is only
conceptual. With gRPC, the client sends the "ask for next task" request to the server, and the server responds with
the task data.
In NVIDIA FLARE, the underlying communication is facilitated through gRPC:
the client always initiates communication by sending a request to the server and receiving a response.
When referring to the scenario where the "server sends a task to the client,"
it is important to note that this is a conceptual representation.
In reality, with gRPC, the client initiates the interaction by sending a "request for the next task" to the server,
and the server responds by providing the task data.
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Two changes are needed:

The updated file should look like the following:

.. literalinclude:: ../resources/init_weights_1_config_fed_server.json
.. literalinclude:: ../../resources/init_weights_1_config_fed_server.json
:language: json


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ of NVIDIA FLARE with a Server aggregating results from Clients that have produce

At the core, the control_flow of :class:`nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather` is a for loop:

.. image:: ../resources/fed_sag_round.png
.. image:: ../../resources/fed_sag_round.png
:height: 400px

Trainer
Expand Down
89 changes: 89 additions & 0 deletions docs/programming_guide/execution_api_type.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
.. _execution_api_type:

##################
Execution API Type
##################

In the FLARE system, a federated learning algorithm is defined in a Job format
(for details, please refer to :ref:`job`).
A Job consists of multiple "workflows" and "executors."

The simplified job execution flow is as follows:

- The workflow schedules a task for the FL clients.
- Each FL client performs the received task and sends the result back.
- The workflow receives the results and determines if it is done.
- If it is not done, it schedules a new task
- If it is done, it proceeds to the next workflow in the Job.

Users need to adapt their local training logic into FLARE's task execution
abstractions to make their training federated.

We offer various levels of abstraction for writing task execution code,
catering to use cases that span from complete customizability to easy user adaptation.

Below is a general overview of the key ideas and use cases for each type:

**Client API**

The :ref:`client_api` provides the most straightforward way to write FL code,
and can easily be used to convert centralized code with minimal code changes.
The Client API uses the :class:`FLModel<nvflare.app_common.abstract.fl_model.FLModel>`
object for data transfer and supports common tasks such as train, validate, and submit_model.
Additionally, options for using decorators or PyTorch Lightning are also available.

We recommend users start with the Client API, and to consider the other types for more specific cases as required.

**ModelLearner**

The :ref:`model_learner` is designed to simplify writing learning logic by
minimizing FLARE-specific concepts.
The :class:`ModelLearner<nvflare.app_common.abstract.model_learner.ModelLearner>`
defines familiar learning functions for training and validation,
and uses the :class:`FLModel<nvflare.app_common.abstract.fl_model.FLModel>`
object for transferring learning information.
The ModelLearner also contains several convenient capabilities,
such as lifecycle and logging information.

The ModelLearner is best used when working with standard machine learning code
that can fit well into the train and validate methods and can be easily adapted
to the ModelLearner subclass and method structure.

**Executor**

:ref:`executor` is the most flexible for defining custom logic and tasks,
as with a custom executor and controller, any form of computation can be performed.
However, Executors must deal directly with FLARE-specific communication concepts
such as :class:`Shareable<nvflare.apis.shareable.Shareable>`, :class:`DXO<nvflare.apis.dxo.DXO>`,
and :class:`FLContext<nvflare.apis.fl_context.FLContext>`.
As a result, many higher-level APIs are built on top of Executors in order to
abstract these concepts away for easier user adaptation.

Overall, writing an Executor is most useful when implementing tasks and logic
that do not fit within the structure of higher-level APIs or other predefined Executors.

**3rd-Party System Integration**

There are cases where users have a pre-existing ML/DL training system
infrastructure that cannot be easily adapted to the FLARE client.

The :ref:`3rd_party_integration` pattern allows for a seamless integration
between the FLARE system and a third-party external training system.

With the use of the :mod:`FlareAgent <nvflare.client.flare_agent>` and
:mod:`TaskExchanger <nvflare.app_common.executors.task_exchanger>`,
users can easily enable any 3rd-party system to receive tasks and submit results back to the server.

Please use the following chart to decide which abstraction to use:

.. image:: ../resources/task_execution_decision_chart.png

For more details about each type, refer to each page below.

.. toctree::
:maxdepth: 1

execution_api_type/3rd_party_integration
execution_api_type/client_api
execution_api_type/model_learner
execution_api_type/executor
Loading