Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow communication APIs and Simplified ML Algorithms #2250

Closed
wants to merge 42 commits into from

Conversation

chesterxgchen
Copy link
Collaborator

@chesterxgchen chesterxgchen commented Dec 29, 2023

Description

We like to make writing a new Federated DL/ML Algorithm workflow simpler without introducing the NVFLARE specific constructs. Similar to the Client API that makes data scientist easy to transition from DL/ML to FL by just modify a few lines of the code. We like to let data scientists write different Fed Algorithms by just a making few lines of changes needed for communication, the rest of workflow logic should be independent of NVFLARE communications.
Instead of writing a NVFLARE controller for different algorithms, we can simply write the workflow logic, separate the controller and workflow APIs.

Proposal

Introduce an Workflow Communication API that responsible only for communication such as broadcast_and_wait(). User doesn't need to deal with controller APIs.
For example:

from nvflare.app_common.workflows import wf_comm as flare

class FedAvg(Controller):
    def __init__(
            self,
            min_clients: int,
            num_rounds: int,
            output_path: str,
            start_round: int = 1,
            stop_cond: str = None,
            model_selection_rule: str = None,
    ):
        super(FedAvg, self).__init__()
        
        <skip init code>

 
    def run(self):
        start = self.start_round
        end = self.start_round + self.num_rounds

        model = self.init_model()
        for current_round in range(start, end):
            self.current_round = current_round
            sag_results = self.scatter_and_gather(model, current_round)
            aggr_result = self.aggr_fn(sag_results)
            model = update_model(model, aggr_result)
            self.select_best_model(model)

        self.save_model(self.best_model, self.output_path)
        self.logger.info("end Fed Avg Workflow\n \n")

Where scatter_and_gather is merely call broadcast_and_wait()

scatter_and_gather(self, model: FLModel, current_round):
 
       msg_payload = {
            MIN_RESPONSES: self.min_clients,
            CURRENT_ROUND: current_round,
            NUM_ROUNDS: self.num_rounds,
            START_ROUND: self.start_round,
            DATA: model,
        }

        # (2) broadcast and wait
        results = self.broadcast_and_wait(msg_payload)
        return results

We also created a new Controller ( old Controller will be renamed to MPCommunicator (multi-part communicator)) , which communicate to clients
We demonstrate that user can easily write different Workflows (FedAvg and Kaplan-Meier) using the same Controller class

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

@chesterxgchen chesterxgchen requested review from yanchengnv, holgerroth, YuanTingHsieh and ZiyueXu77 and removed request for yanchengnv December 29, 2023 03:39
nvflare/app_common/workflows/wf_controller.py Outdated Show resolved Hide resolved
nvflare/app_common/workflows/wf_controller.py Outdated Show resolved Hide resolved
nvflare/app_common/workflows/wf_controller.py Outdated Show resolved Hide resolved
nvflare/app_common/workflows/wf_controller.py Outdated Show resolved Hide resolved
@chesterxgchen chesterxgchen changed the title Controller Workflow APIs and Simplified ML Algorithms Controller Workflow communication APIs and Simplified ML Algorithms Dec 29, 2023
@chesterxgchen chesterxgchen changed the title Controller Workflow communication APIs and Simplified ML Algorithms Workflow communication APIs and Simplified ML Algorithms Dec 29, 2023
@IsaacYangSLA
Copy link
Collaborator

/build

… into some utils class

2. Refactoring the WFController class into BaseWFController class and WFController class. The BaseWFController has majority of the logics and implement ControllerSpec. But not implement Responder, which is needed for Server-side controller.  WFController Inherited both BaseWFController and Controller. This allows BaseWFController be able to be used on the client-side controller logics without implementing control_flow() method
@SYangster
Copy link
Collaborator

cleanup, fix server config json parsing with original controllers

todo in follow up PRs:

  • add client-side controller
  • refactor to rename original Controller to MPCommunicator (MultiParty)
  • convert Holger’s FedAvg class to use new controller and add examples

@SYangster
Copy link
Collaborator

/build

Copy link
Collaborator

@YuanTingHsieh YuanTingHsieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clarify on what each class should do.

nvflare/app_common/utils/fl_model_utils.py Show resolved Hide resolved
nvflare/app_common/workflows/fed_avg_pt.py Outdated Show resolved Hide resolved
nvflare/app_common/workflows/fed_avg.py Outdated Show resolved Hide resolved
nvflare/app_common/wf_comm/wf_communicator.py Show resolved Hide resolved
nvflare/private/fed/server/server_json_config.py Outdated Show resolved Hide resolved
nvflare/app_common/wf_comm/wf_comm_api.py Show resolved Hide resolved
nvflare/app_common/wf_comm/__init__.py Outdated Show resolved Hide resolved
nvflare/app_common/wf_comm/__init__.py Show resolved Hide resolved
@SYangster SYangster force-pushed the wl_controller branch 2 times, most recently from 9ea89ec to 23faeb2 Compare February 24, 2024 00:52
@yanchengnv
Copy link
Collaborator

yanchengnv commented Feb 26, 2024

I'm not sure about many of these changes. It seems to be more complex than I imagined. Does it require the user to do any config about the communicator? It's also not clear how a workflow can have its own communicator object.
I need more time to review this PR.

@SYangster
Copy link
Collaborator

SYangster commented Feb 26, 2024

I'm not sure about many of these changes. It seems to be more complex than I imagined. Does it require the user to do any config about the communicator? It's also not clear how a workflow can have its own communicator object. I need more time to review this PR.

Yes the communicator object can be configured in rare cases, but it is not necessary as it will default to the WFCommunicator. Also currently clarifying with the team whether we want each workflow to share a single Communicator or use its own.

config_fed_server.conf:

workflows = [
     {
        id = "fed_avg"
        communicator {
           # default communicator
           path = "nvflare.app_common.wf_comm.wf_communicator.WFCommunicator"
           args {}
        }
        controller {
           # subclass of WFController
           path = "nvflare.app_opt.pt.fed_avg_pt.PTFedAvg"
           args {
              min_clients = 2
              num_rounds = 2
              output_path = "/tmp/nvflare/fedavg/mode.pth"
           }
        }
     }
     
# is the same as

workflows = [
     {
        id = "fed_avg"
        # subclass of WFController, default WFCommunicator automatically used
        path = "nvflare.app_opt.pt.fed_avg_pt.PTFedAvg"
        args {
           min_clients = 2
           num_rounds = 2
           output_path = "/tmp/nvflare/fedavg/mode.pth"
        }
     }

@SYangster
Copy link
Collaborator

changed to alternative design #2390

@SYangster SYangster closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants