NVIDIA · holgerroth · Jan 19, 2024 · Jan 18, 2024 · Jan 18, 2024 · Jan 18, 2024
diff --git a/integration/nemo/README.md b/integration/nemo/README.md
@@ -1,63 +1,16 @@
 # NeMo Integration
 
-## Objective
-Execute [NVIDIA NeMo™](https://developer.nvidia.com/nemo) in federated environments.
-
-### Goals:
-
-Allow NeMo models to be trained and adapted with NVFlare.
-
-### Non-goals:
-
-n/a
-
-## Background
-NVIDIA NeMo™ is an end-to-end cloud-native enterprise framework for developers to 
+[NVIDIA NeMo™](https://developer.nvidia.com/nemo) is an end-to-end cloud-native enterprise framework for developers to 
 build, customize, and deploy generative AI models with billions of parameters.
 
-## Description
-NVFlare utilizes features from NeMo, such as prompt learning to run LLM tasks in federated environments.
-
-### Examples
-
-For an example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) with NeMo for prompt learning, 
-see [examples/prompt_learning](examples/prompt_learning/README.md) 
-
-For an example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) with NeMo for supervised fine-tuning (SFT), 
-see [examples/supervised_fine_tuning](examples/supervised_fine_tuning/README.md) 
+Here, we show how NVFlare utilizes features from NeMo to run LLM tasks in federated environments with several [examples](./examples).
 
 ## Requirements
 
-### Using docker
-For simplicity, we recommend using NVIDIA's docker containers that include all the requirements for running NeMo models.
-```
-docker pull nvcr.io/nvidia/nemo:23.02
-```
-
-### Install NeMo-NVFlare package
-
-<!---
-#### Pip 
-Install NeMo-NVFlare integration from [PyPI](https://pypi.org/):
-```
-pip install nemo_nvflare
-```
--->
-
-#### Mount the source code
-For easy development with NeMo, install NVFlare and mount the code inside the folder.
-```
-pip install nvflare>=2.3.0
-export PYTHONPATH=${PWD}
-```
+### Using docker (Recommended)
+For simplicity, we recommend using NVIDIA's [NeMo docker containers](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) that include all the requirements for running NeMo models.
 
-<!---
-#### From source
-To install the package from source code, use:
-```
-pip install -e .
-```
--->
+> Note: each example in this folder might require different container version. Please check their Readmes for details. 
 
 ### Installation in a virtual environment
 
@@ -68,4 +21,4 @@ and using it in [JupyterLab](../../examples/README.md#notebooks) for running
 the notebooks in the NeMo integration examples.
 
 Follow the NeMo installation steps [here](https://github.com/NVIDIA/NeMo#installation)
-before installing the NeMo-NVFlare package.
+before installing NVFlare and adding the source to the PYTHONPATH.
diff --git a/integration/nemo/examples/README.md b/integration/nemo/examples/README.md
@@ -1,16 +1,16 @@
 # Examples of NeMo-NVFlare Integration
 
 ### [Parameter-Efficient Fine-Tuning (PEFT) with NeMo](./peft/README.md)
-In this example, we utilize NeMo's [PEFT](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/peft/landing_page.html)
+In this example, we utilize NeMo's [PEFT](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/peft/landing_page.html) using NVFlare's new Client API (minimal code changes required to run a NeMo script in FL)
 methods to showcase how to adapt a large language model (LLM) to 
 a downstream task, such as financial sentiment predictions. 
 
-### [Prompt learning with NeMo and NVFlare](./prompt_learning/README.md)
-An example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) 
-with NeMo for [prompt learning](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/prompt_learning.html) 
-to adapt a large language model (LLM) to a downstream task. 
-
 ### [Supervised fine-tuning (SFT) with NeMo and NVFlare](./prompt_learning/README.md)
 An example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) 
 with NeMo for [supervised fine-tuning (SFT)](https://github.com/NVIDIA/NeMo-Megatron-Launcher#5152-sft-training) 
 to fine-tune all parameters of a large language model (LLM) on supervised data to teach the model how to follow user specified instructions. 
+
+### [Prompt learning with NeMo and NVFlare](./prompt_learning/README.md)
+An example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) 
+with NeMo for [prompt learning](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/prompt_learning.html) using NVFlare's Learner API
+to adapt a large language model (LLM) to a downstream task. 
diff --git a/integration/nemo/examples/peft/README.md b/integration/nemo/examples/peft/README.md
@@ -10,10 +10,23 @@ that condition the model to produce the desired output for the downstream task.
 For more details, see the [PEFT script](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py) in NeMo, which we adapt using NVFlare's Lightning client API to run in a federated scenario.
 
 ## Dependencies
-We assume you followed the instructions [here](../../README.md#requirements) 
-to install the NeMo, NVFlare, and the NeMo-NVFlare package. 
+The example was tested with the [NeMo 23.10 container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo).
+In the following, we assume this example folder of the container is mounted to `/workspace` and all downloading, etc. operations are based on this root path.
 
-The example was tested with the main branch of [NeMo](https://github.com/NVIDIA/NeMo).
+> Note in the following, mount both the [current directory](./) and the [job_templates](../../../../job_templates) 
+> directory to locations inside the docker container. Please make sure you have cloned the full NVFlare repo. 
+
+Start the docker container using 
+```
+DOCKER_IMAGE="nvcr.io/nvidia/nemo:23.10"
+docker run --gpus="device=all" --network=host --ipc=host -it --rm -v ${PWD}/../../../../job_templates:/job_templates -v ${PWD}:/workspace -w /workspace ${DOCKER_IMAGE} /bin/bash
+```
+
+For easy experimentation with NeMo, install NVFlare and mount the code inside the [nemo_nvflare](./nemo_nvflare) folder.
+```
+pip install nvflare==2.4.0rc7
+export PYTHONPATH=${PYTHONPATH}:/workspace/nemo_nvflare
+```
 
 ## Examples
 ### 1. Federated PEFT using a 345 million parameter GPT model

diff --git a/integration/nemo/examples/peft/nemo_nvflare/__init__.py b/integration/nemo/examples/peft/nemo_nvflare/__init__.py
@@ -0,0 +1,15 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .peft_model import PEFTmodel
diff --git a/...ode/megatron_gpt_peft_fl_eval_config.yaml → ...are/megatron_gpt_peft_fl_eval_config.yaml b/...ode/megatron_gpt_peft_fl_eval_config.yaml → ...are/megatron_gpt_peft_fl_eval_config.yaml
diff --git a/...les/peft/code/megatron_gpt_peft_tuning.py → .../nemo_nvflare/megatron_gpt_peft_tuning.py b/...les/peft/code/megatron_gpt_peft_tuning.py → .../nemo_nvflare/megatron_gpt_peft_tuning.py
diff --git a/...code/megatron_gpt_peft_tuning_config.yaml → ...lare/megatron_gpt_peft_tuning_config.yaml b/...code/megatron_gpt_peft_tuning_config.yaml → ...lare/megatron_gpt_peft_tuning_config.yaml
diff --git a/integration/nemo/nemo_nvflare/peft_model.py → .../examples/peft/nemo_nvflare/peft_model.py b/integration/nemo/nemo_nvflare/peft_model.py → .../examples/peft/nemo_nvflare/peft_model.py
diff --git a/integration/nemo/examples/peft/nemo_nvflare/utils.py b/integration/nemo/examples/peft/nemo_nvflare/utils.py
@@ -0,0 +1,34 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import numpy as np
+import torch
+
+
+def convert_global_to_ckpt(global_model_filepath: str, ckpt_path: str):
+    """Helper function to convert global models saved by NVFlare to NeMo ckpt format"""
+
+    nvflare_ckpt = torch.load(global_model_filepath)
+    if "train_conf" in nvflare_ckpt:
+        print("Loaded NVFlare global checkpoint with train_conf", nvflare_ckpt["train_conf"])
+
+    assert (
+        "model" in nvflare_ckpt
+    ), f"Expected global model to contain a 'model' key but it only had {list(nvflare_ckpt.keys())}"
+    global_weights = nvflare_ckpt["model"]
+
+    torch.save({"state_dict": global_weights}, ckpt_path)
+
+    print(f"Saved NeMo ckpt with {len(global_weights)} entries to {ckpt_path}")
+