In process Client API Executor Part 1 #2248

chesterxgchen · 2023-12-28T05:23:59Z

Description

The Client API currently only support running the training code in sub-process. This works great for cases where 3rd party needs to run in a command line.
For many data scientist users, where running training directly in-process is also desired as it will be easier for debugging in IDE debugger. Also, the in-process training code will be a lot easier both in configuration and setup avoid the communication issues.

The proposal

This PR proposed an additional alternative in-memory implementation of the executor and Client API.

The Client API stays the same, nothing changed. (The configuration is different)
To call user training code directly, we requires uses provide a function method that can be called directly, such as run(), or train() or sub_main(**kwargs). User can specify the function name.
user will need to specify the path as 'module.function_name' to indicate the call function.
For example

 fn_path = "cifar10_fl.sub_main"

which indicate the function name is "sub_main" located in cifar10_fl.py module

User Experience

The user experience is the same as previous sub-process case. The code change from ML/DL to FL stays identical.
The configuration changes to indicate the Memory Executor

 executor {
        path = "nvflare.app_opt.pt.client_api_mem_executor.PTClientAPIMemExecutor"
        args {
                    task_fn_path = "cifar10_fl.sub_main
                    task_fn_args { 
                         batch_size = 6
                         dataset_path = "/tmp/nvflare/data/cifar10"
                         num_workers = 2
                   }
              }
              params_transfer_type = "DIFF"
              train_with_evaluation = true
              result_pull_interval = 0.5
              log_pull_interval = 0.1
        }

All the pipe and responding component configurations are not needed.

The implementation

main approach:
a new InProcessClientAPIExecutor is implemented, We leverage the new DataBus and EventManager. The main thread of the executor will subscribe to both LOCAL_RESULT and LOG_DATA topics to receive callbacks when results are available.
The client API will also register a Global_RESULT topic callback to receive global model.

we use one thread is running the Client's training code,
The executor main thread is sending global model once the model shareable is received and waiting for the local model update.

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Quick tests passed locally by running ./runtest.sh.
In-line docstrings updated.
Documentation updated.

yanchengnv · 2023-12-28T14:15:54Z

This PR is too big for this stage of 2.4. We could consider it for 2.4.1.

chesterxgchen · 2024-01-14T01:24:26Z

/build

chesterxgchen · 2024-01-14T01:46:09Z

/build

chesterxgchen · 2024-01-16T04:14:18Z

To reduce the PR size, I split the PR into three parts

message bus ( changes are still in this PR), but its duplicate of[ messagebus] (DataBus #2285)
The message bus PR is independent of client API PR. once it merged, this PR can rebase, the PR file size will be smaller (28 -8 = 20 files, which includes 8 test related files. The real code changes besides tests are 12 files)
The main client API changes Part 1, this PR.
The Job Templates and related example changes will be Part 2 of Client API

2) fix example code formatting add queue.task_done() 1) add message bus 2) hide task func wrapper class 3) rename executor package 4) clean up some code update meta info remove used code optimize import fix message_bus import order change rename the executor from ClientAPIMemExecutor to InProcessClientAPIExecutor 1) remove thread_pool 2) further loose couple executor and client_api implementation formating add unit tests avoid duplicated constant TASK_NAME definition split PR into two parts (besides message bus) this is part 1: only remove the example and job template changes 1. Replace MemPipe (Queues) with callback via EventManager 2. Simplified overall logics 3. notice the param convert doesn't quite work ( need to fix later) 4. removed some tests that now invalid. Will need to add more unit tests later fix task_name is None bug add few unit tests code format update to comform with new databus changes

SYangster · 2024-02-14T00:10:40Z

Cleanup, add additional support for main function with cli args.

Will add and enhance examples in follow up PR.

SYangster · 2024-02-14T18:08:39Z

/build

YuanTingHsieh

Mostly good, we just need to make changes following recent PRs in main regarding api.py

nvflare/fuel/message/__init__.py

nvflare/fuel/message/event_manger.py

nvflare/fuel/message/message_bus.py

nvflare/client/sub_process/process_api.py

nvflare/client/in_process/api.py

nvflare/app_common/executors/exec_task_fn_wrapper.py

YuanTingHsieh

Mostly good, minor comments

nvflare/app_common/executors/exec_task_fn_wrapper.py

nvflare/client/api_spec.py

nvflare/client/decorator.py

nvflare/client/ex_process/api.py

nvflare/client/in_process/api.py

nvflare/app_common/executors/in_process_client_api_executor.py

nvflare/client/in_process/api.py

YuanTingHsieh

Thanks, LGTM, can think about these logging levels later

nvflare/app_common/executors/in_process_client_api_executor.py

SYangster · 2024-02-26T17:05:16Z

/build

chesterxgchen marked this pull request as draft December 28, 2023 05:25

chesterxgchen requested review from holgerroth and YuanTingHsieh December 28, 2023 05:25

chesterxgchen force-pushed the client_api_memory branch from be780ef to faede63 Compare December 28, 2023 05:29

chesterxgchen requested a review from yanchengnv December 28, 2023 05:44

chesterxgchen marked this pull request as ready for review December 28, 2023 05:44

chesterxgchen marked this pull request as draft December 28, 2023 05:44

chesterxgchen marked this pull request as ready for review December 28, 2023 18:46

chesterxgchen changed the title ~~Client API with MemPipe Executor~~ In memory Client API Executor Jan 3, 2024

chesterxgchen changed the title ~~In memory Client API Executor~~ In process Client API Executor Jan 12, 2024

chesterxgchen force-pushed the client_api_memory branch 2 times, most recently from 1e64db4 to 3ba6a11 Compare January 13, 2024 18:43

chesterxgchen requested review from yhwen, IsaacYangSLA, SYangster and nvidianz January 14, 2024 01:50

chesterxgchen changed the title ~~In process Client API Executor~~ In process Client API Executor Part 1 Jan 16, 2024

chesterxgchen force-pushed the client_api_memory branch from c042fd5 to 9b61050 Compare January 16, 2024 04:25

chesterxgchen marked this pull request as draft January 21, 2024 05:55

chesterxgchen force-pushed the client_api_memory branch from 4ca64c5 to 6c1d643 Compare January 22, 2024 18:40

chesterxgchen marked this pull request as ready for review January 22, 2024 20:13

chesterxgchen added 3 commits January 27, 2024 14:06

rebase

8bce5eb

conform with recemt changes

2b7e16b

chesterxgchen force-pushed the client_api_memory branch from 3c48285 to 2b7e16b Compare January 27, 2024 22:22

clean up, support main func

53fa0b6

Merge branch 'main' into client_api_memory

52df569

SYangster mentioned this pull request Feb 14, 2024

In process Client API executor part 2: Job Templates and example change #2287

Closed

6 tasks

fix format

5f7efa4

SYangster force-pushed the client_api_memory branch from 87897a1 to 5f7efa4 Compare February 14, 2024 00:20

update unit tests

96ea6e8

databus updates, enhance module parsing

52bfe35

YuanTingHsieh requested changes Feb 15, 2024

View reviewed changes

SYangster force-pushed the client_api_memory branch from 8db7572 to 9b7536e Compare February 16, 2024 23:35

SYangster requested a review from YuanTingHsieh February 16, 2024 23:36

address comments

cbb82e7

SYangster force-pushed the client_api_memory branch from 6ae01d9 to cbb82e7 Compare February 17, 2024 00:00

Merge branch 'main' into client_api_memory

8c7ca11

YuanTingHsieh reviewed Feb 20, 2024

View reviewed changes

SYangster force-pushed the client_api_memory branch from 2981d1b to 330b4ef Compare February 23, 2024 00:20

add docstrings, address comments

0e11466

SYangster force-pushed the client_api_memory branch from 330b4ef to 0e11466 Compare February 23, 2024 00:28

Merge branch 'main' into client_api_memory

638d7b3

SYangster requested a review from YuanTingHsieh February 23, 2024 00:28

Merge branch 'main' into client_api_memory

88b3589

YuanTingHsieh approved these changes Feb 24, 2024

View reviewed changes

nvflare/app_common/executors/in_process_client_api_executor.py Show resolved Hide resolved

nvflare/app_common/executors/in_process_client_api_executor.py Show resolved Hide resolved

SYangster merged commit 81caf29 into NVIDIA:main Feb 26, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In process Client API Executor Part 1 #2248

In process Client API Executor Part 1 #2248

chesterxgchen commented Dec 28, 2023 •

edited

Loading

yanchengnv commented Dec 28, 2023

chesterxgchen commented Jan 14, 2024

chesterxgchen commented Jan 14, 2024

chesterxgchen commented Jan 16, 2024 •

edited

Loading

SYangster commented Feb 14, 2024

SYangster commented Feb 14, 2024

YuanTingHsieh left a comment

YuanTingHsieh left a comment

YuanTingHsieh left a comment

SYangster commented Feb 26, 2024

In process Client API Executor Part 1 #2248

In process Client API Executor Part 1 #2248

Conversation

chesterxgchen commented Dec 28, 2023 • edited Loading

Description

The proposal

User Experience

The implementation

Types of changes

yanchengnv commented Dec 28, 2023

chesterxgchen commented Jan 14, 2024

chesterxgchen commented Jan 14, 2024

chesterxgchen commented Jan 16, 2024 • edited Loading

SYangster commented Feb 14, 2024

SYangster commented Feb 14, 2024

YuanTingHsieh left a comment

Choose a reason for hiding this comment

YuanTingHsieh left a comment

Choose a reason for hiding this comment

YuanTingHsieh left a comment

Choose a reason for hiding this comment

SYangster commented Feb 26, 2024

chesterxgchen commented Dec 28, 2023 •

edited

Loading

chesterxgchen commented Jan 16, 2024 •

edited

Loading