diff --git a/integration/nemo/README.md b/integration/nemo/README.md
index 94a1dd4e7e..4cb4ed604b 100644
--- a/integration/nemo/README.md
+++ b/integration/nemo/README.md
@@ -1,63 +1,16 @@
 # NeMo Integration
 
-## Objective
-Execute [NVIDIA NeMo™](https://developer.nvidia.com/nemo) in federated environments.
-
-### Goals:
-
-Allow NeMo models to be trained and adapted with NVFlare.
-
-### Non-goals:
-
-n/a
-
-## Background
-NVIDIA NeMo™ is an end-to-end cloud-native enterprise framework for developers to 
+[NVIDIA NeMo™](https://developer.nvidia.com/nemo) is an end-to-end cloud-native enterprise framework for developers to 
 build, customize, and deploy generative AI models with billions of parameters.
 
-## Description
-NVFlare utilizes features from NeMo, such as prompt learning to run LLM tasks in federated environments.
-
-### Examples
-
-For an example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) with NeMo for prompt learning, 
-see [examples/prompt_learning](examples/prompt_learning/README.md) 
-
-For an example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) with NeMo for supervised fine-tuning (SFT), 
-see [examples/supervised_fine_tuning](examples/supervised_fine_tuning/README.md) 
+Here, we show how NVFlare utilizes features from NeMo to run LLM tasks in federated environments with several [examples](./examples).
 
 ## Requirements
 
-### Using docker
-For simplicity, we recommend using NVIDIA's docker containers that include all the requirements for running NeMo models.
-```
-docker pull nvcr.io/nvidia/nemo:23.02
-```
-
-### Install NeMo-NVFlare package
-
-<!---
-#### Pip 
-Install NeMo-NVFlare integration from [PyPI](https://pypi.org/):
-```
-pip install nemo_nvflare
-```
--->
-
-#### Mount the source code
-For easy development with NeMo, install NVFlare and mount the code inside the folder.
-```
-pip install nvflare>=2.3.0
-export PYTHONPATH=${PWD}
-```
+### Using docker (Recommended)
+For simplicity, we recommend using NVIDIA's [NeMo docker containers](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) that include all the requirements for running NeMo models.
 
-<!---
-#### From source
-To install the package from source code, use:
-```
-pip install -e .
-```
--->
+> Note: each example in this folder might require different container version. Please check their Readmes for details. 
 
 ### Installation in a virtual environment
 
@@ -68,4 +21,4 @@ and using it in [JupyterLab](../../examples/README.md#notebooks) for running
 the notebooks in the NeMo integration examples.
 
 Follow the NeMo installation steps [here](https://github.com/NVIDIA/NeMo#installation)
-before installing the NeMo-NVFlare package.
+before installing NVFlare and adding the source to the PYTHONPATH.
diff --git a/integration/nemo/examples/README.md b/integration/nemo/examples/README.md
index 4e7ed42f32..7551091184 100644
--- a/integration/nemo/examples/README.md
+++ b/integration/nemo/examples/README.md
@@ -1,16 +1,16 @@
 # Examples of NeMo-NVFlare Integration
 
 ### [Parameter-Efficient Fine-Tuning (PEFT) with NeMo](./peft/README.md)
-In this example, we utilize NeMo's [PEFT](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/peft/landing_page.html)
+In this example, we utilize NeMo's [PEFT](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/peft/landing_page.html) using NVFlare's new Client API (minimal code changes required to run a NeMo script in FL)
 methods to showcase how to adapt a large language model (LLM) to 
 a downstream task, such as financial sentiment predictions. 
 
-### [Prompt learning with NeMo and NVFlare](./prompt_learning/README.md)
-An example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) 
-with NeMo for [prompt learning](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/prompt_learning.html) 
-to adapt a large language model (LLM) to a downstream task. 
-
 ### [Supervised fine-tuning (SFT) with NeMo and NVFlare](./prompt_learning/README.md)
 An example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) 
 with NeMo for [supervised fine-tuning (SFT)](https://github.com/NVIDIA/NeMo-Megatron-Launcher#5152-sft-training) 
 to fine-tune all parameters of a large language model (LLM) on supervised data to teach the model how to follow user specified instructions. 
+
+### [Prompt learning with NeMo and NVFlare](./prompt_learning/README.md)
+An example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) 
+with NeMo for [prompt learning](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/prompt_learning.html) using NVFlare's Learner API
+to adapt a large language model (LLM) to a downstream task. 
diff --git a/integration/nemo/examples/peft/README.md b/integration/nemo/examples/peft/README.md
index c5f1085cbb..bab36e0487 100644
--- a/integration/nemo/examples/peft/README.md
+++ b/integration/nemo/examples/peft/README.md
@@ -10,17 +10,33 @@ that condition the model to produce the desired output for the downstream task.
 For more details, see the [PEFT script](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py) in NeMo, which we adapt using NVFlare's Lightning client API to run in a federated scenario.
 
 ## Dependencies
-We assume you followed the instructions [here](../../README.md#requirements) 
-to install the NeMo, NVFlare, and the NeMo-NVFlare package. 
+The example was tested with the [NeMo 23.10 container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo).
+In the following, we assume this example folder of the container is mounted to `/workspace` and all downloading, etc. operations are based on this root path.
 
-The example was tested with the main branch of [NeMo](https://github.com/NVIDIA/NeMo).
+> Note in the following, mount both the [current directory](./) and the [job_templates](../../../../job_templates) 
+> directory to locations inside the docker container. Please make sure you have cloned the full NVFlare repo. 
+
+Start the docker container using 
+```
+DOCKER_IMAGE="nvcr.io/nvidia/nemo:23.10"
+docker run --runtime=nvidia -it --rm --shm-size=16g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 \
+-v ${PWD}/../../../../job_templates:/job_templates -v ${PWD}:/workspace -w /workspace ${DOCKER_IMAGE}
+```
+
+For easy experimentation with NeMo, install NVFlare and mount the code inside the [nemo_nvflare](./nemo_nvflare) folder.
+```
+pip install nvflare~=2.4.0rc7
+export PYTHONPATH=${PYTHONPATH}:/workspace
+```
 
 ## Examples
 ### 1. Federated PEFT using a 345 million parameter GPT model
-This example requires a GPU with at least 24GB memory to run three clients in parallel on the same GPU.
 We use [JupyterLab](https://jupyterlab.readthedocs.io) for this example.
 To start JupyterLab, run
 ```
 jupyter lab .
 ```
 and open [peft.ipynb](./peft.ipynb).
+
+#### Hardware requirement
+This example requires a GPU with at least 24GB memory to run three clients in parallel on the same GPU.
diff --git a/integration/nemo/examples/peft/nemo_nvflare/__init__.py b/integration/nemo/examples/peft/nemo_nvflare/__init__.py
new file mode 100644
index 0000000000..d6050992d1
--- /dev/null
+++ b/integration/nemo/examples/peft/nemo_nvflare/__init__.py
@@ -0,0 +1,15 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .peft_model import PEFTmodel
diff --git a/integration/nemo/examples/peft/code/megatron_gpt_peft_fl_eval_config.yaml b/integration/nemo/examples/peft/nemo_nvflare/megatron_gpt_peft_fl_eval_config.yaml
similarity index 100%
rename from integration/nemo/examples/peft/code/megatron_gpt_peft_fl_eval_config.yaml
rename to integration/nemo/examples/peft/nemo_nvflare/megatron_gpt_peft_fl_eval_config.yaml
diff --git a/integration/nemo/examples/peft/code/megatron_gpt_peft_tuning.py b/integration/nemo/examples/peft/nemo_nvflare/megatron_gpt_peft_tuning.py
similarity index 100%
rename from integration/nemo/examples/peft/code/megatron_gpt_peft_tuning.py
rename to integration/nemo/examples/peft/nemo_nvflare/megatron_gpt_peft_tuning.py
diff --git a/integration/nemo/examples/peft/code/megatron_gpt_peft_tuning_config.yaml b/integration/nemo/examples/peft/nemo_nvflare/megatron_gpt_peft_tuning_config.yaml
similarity index 100%
rename from integration/nemo/examples/peft/code/megatron_gpt_peft_tuning_config.yaml
rename to integration/nemo/examples/peft/nemo_nvflare/megatron_gpt_peft_tuning_config.yaml
diff --git a/integration/nemo/nemo_nvflare/peft_model.py b/integration/nemo/examples/peft/nemo_nvflare/peft_model.py
similarity index 100%
rename from integration/nemo/nemo_nvflare/peft_model.py
rename to integration/nemo/examples/peft/nemo_nvflare/peft_model.py
diff --git a/integration/nemo/examples/peft/nemo_nvflare/utils.py b/integration/nemo/examples/peft/nemo_nvflare/utils.py
new file mode 100644
index 0000000000..7ca186eae5
--- /dev/null
+++ b/integration/nemo/examples/peft/nemo_nvflare/utils.py
@@ -0,0 +1,34 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import numpy as np
+import torch
+
+
+def convert_global_to_ckpt(global_model_filepath: str, ckpt_path: str):
+    """Helper function to convert global models saved by NVFlare to NeMo ckpt format"""
+
+    nvflare_ckpt = torch.load(global_model_filepath)
+    if "train_conf" in nvflare_ckpt:
+        print("Loaded NVFlare global checkpoint with train_conf", nvflare_ckpt["train_conf"])
+
+    assert (
+        "model" in nvflare_ckpt
+    ), f"Expected global model to contain a 'model' key but it only had {list(nvflare_ckpt.keys())}"
+    global_weights = nvflare_ckpt["model"]
+
+    torch.save({"state_dict": global_weights}, ckpt_path)
+
+    print(f"Saved NeMo ckpt with {len(global_weights)} entries to {ckpt_path}")
+
diff --git a/integration/nemo/examples/peft/peft.ipynb b/integration/nemo/examples/peft/peft.ipynb
index a86d28b4b5..ba31f585b9 100644
--- a/integration/nemo/examples/peft/peft.ipynb
+++ b/integration/nemo/examples/peft/peft.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "0c534975",
+   "id": "5020dd81",
    "metadata": {},
    "source": [
     "# Parameter-Efficient Fine-Tuning (PEFT) with NeMo\n",
@@ -19,17 +19,17 @@
   },
   {
    "cell_type": "markdown",
-   "id": "513eb148",
+   "id": "dc9769ef",
    "metadata": {},
    "source": [
     "## Dependencies\n",
-    "We assume you followed the instructions [here](../../README.md#requirements) \n",
-    "to install the NeMo framework and the NeMo-NVFlare package. "
+    "We assume you followed the instructions [here](./README.md) \n",
+    "to install the NeMo and NVFlare frameworks and mount the required codes."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bb97927a",
+   "id": "dab4c639",
    "metadata": {},
    "source": [
     "## Download the pre-trained LLM\n",
@@ -39,7 +39,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c2f6c8b5",
+   "id": "20921eea",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -51,7 +51,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2036e09e",
+   "id": "aa852d07",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -66,7 +66,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "67f48638",
+   "id": "fa530a42",
    "metadata": {},
    "source": [
     "## Data preprocessing\n",
@@ -74,12 +74,12 @@
     "\n",
     "The Financial PhraseBank dataset contains the sentiments for financial news headlines from a retail investor's perspective. Further details about the dataset can be found in Malo et al.'s [\"Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts\"](https://arxiv.org/abs/1307.5336).\n",
     "\n",
-    "We can configure the prompt template used by NeMo to solve this downstream task by setting `prompt_template: \"{sentence} sentiment: {label}\"` in [megatron_gpt_peft_tuning_config.yaml](./code/megatron_gpt_peft_tuning_config.yaml) accordingly."
+    "We can configure the prompt template used by NeMo to solve this downstream task by setting `prompt_template: \"{sentence} sentiment: {label}\"` in [megatron_gpt_peft_tuning_config.yaml](./nemo_nvflare/megatron_gpt_peft_tuning_config.yaml) accordingly."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "29dd0470",
+   "id": "b5737e50",
    "metadata": {},
    "source": [
     "#### 1. Download the preprocessing scripts\n",
@@ -89,7 +89,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "f37039ed",
+   "id": "b2c32fa5",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -102,7 +102,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f1b1a07b",
+   "id": "13a2f952",
    "metadata": {},
    "source": [
     "#### 2. Download the Financial PhraseBank Dataset\n",
@@ -114,7 +114,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f335899e",
+   "id": "40199807",
    "metadata": {},
    "source": [
     "#### 3. Preprocess the dataset"
@@ -123,7 +123,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "dc66ef42",
+   "id": "80f84586",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -132,7 +132,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "365a58c8",
+   "id": "d9f8fa9a",
    "metadata": {},
    "source": [
     "#### 4. Split the dataset to simulate clients\n",
@@ -143,7 +143,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "3f9214af",
+   "id": "a6725683",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -160,7 +160,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "cc85565d",
+   "id": "6c506c6b",
    "metadata": {},
    "source": [
     "Below are some examples of how the training data is distributed amount the three clients when using different values of `alpha`.\n",
@@ -173,7 +173,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3eea187a",
+   "id": "704ff05d",
    "metadata": {},
    "source": [
     "## Federated learning simulations\n",
@@ -187,7 +187,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "aa23b7c7",
+   "id": "01fce4ae",
    "metadata": {},
    "source": [
     "#### 1. Convert NeMo PEFT script to FL\n",
@@ -205,7 +205,7 @@
     "<img src=\"./figs/lightning_client_api.png\" alt=\"Drawing\" style=\"width: 600px;\"/>\n",
     "</div>\n",
     "\n",
-    "You can directly use all the PEFT methods implemented in the NeMo script, by changing the value of [peft_scheme](./code/megatron_gpt_peft_tuning_config.yaml) in the client configuration shown below accordingly:\n",
+    "You can directly use all the PEFT methods implemented in the NeMo script, by changing the value of [peft_scheme](./nemo_nvflare/megatron_gpt_peft_tuning_config.yaml) in the client configuration shown below accordingly:\n",
     "* p-tuning\n",
     "* adapter + p-tuning\n",
     "* adapter\n",
@@ -221,7 +221,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "95b07067",
+   "id": "655a1f0a",
    "metadata": {},
    "source": [
     "#### 1. Local training\n",
@@ -236,16 +236,16 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "b6e001c1",
+   "id": "51e4fb4d",
    "metadata": {},
    "outputs": [],
    "source": [
-    "!nvflare config -jt ../../../../job_templates"
+    "!nvflare config -jt /job_templates"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f3528af2",
+   "id": "2e515dc2",
    "metadata": {},
    "source": [
     "Then, create the job and configure it for simulating local training."
@@ -254,8 +254,10 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9905ebaa",
-   "metadata": {},
+   "id": "404fe5fe",
+   "metadata": {
+    "scrolled": true
+   },
    "outputs": [],
    "source": [
     "import os\n",
@@ -272,7 +274,7 @@
     "num_rounds=1\n",
     "trainer_config=\"trainer.max_steps\\=1000 trainer.val_check_interval\\=100\"\n",
     "\n",
-    "!nvflare job create -force -j \"./jobs/peft_{peft_scheme}_local_345M\" -w \"sag_nemo\" -sd \"code\" \\\n",
+    "!nvflare job create -force -j \"./jobs/peft_{peft_scheme}_local_345M\" -w \"sag_nemo\" -sd \"nemo_nvflare\" \\\n",
     "   -f app_1/config_fed_client.conf app_script={app_script} app_config=\"{peft_scheme_arg} model.restore_from_path\\={restore_from_path} {trainer_config} {val_files} {train_files_prefix}-1.jsonl\\]\" \\\n",
     "   -f app_2/config_fed_client.conf app_script={app_script} app_config=\"{peft_scheme_arg} model.restore_from_path\\={restore_from_path} {trainer_config} {val_files} {train_files_prefix}-2.jsonl\\]\" \\\n",
     "   -f app_3/config_fed_client.conf app_script={app_script} app_config=\"{peft_scheme_arg} model.restore_from_path\\={restore_from_path} {trainer_config} {val_files} {train_files_prefix}-3.jsonl\\]\" \\\n",
@@ -281,7 +283,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "945b7d71",
+   "id": "df9ca0a5",
    "metadata": {},
    "source": [
     "Next, simulate each client training on their local dataset using the FL simulator. To do this, we only run 1 round of FL, with each client running 1000 steps on their local dataset."
@@ -290,12 +292,16 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "09ef104c",
+   "id": "8d7f4970",
    "metadata": {
     "scrolled": true
    },
    "outputs": [],
    "source": [
+    "# required by NeMo models\n",
+    "import torch.multiprocessing as mp\n",
+    "mp.set_start_method(\"spawn\", force=True)\n",
+    "\n",
     "from nvflare import SimulatorRunner    \n",
     "\n",
     "simulator = SimulatorRunner(\n",
@@ -310,7 +316,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bccf7bed",
+   "id": "2e56653f",
    "metadata": {},
    "source": [
     "#### 2. Federated training\n",
@@ -323,7 +329,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "782af9c0",
+   "id": "ad3406a6",
    "metadata": {
     "scrolled": true
    },
@@ -333,7 +339,7 @@
     "num_rounds=5\n",
     "trainer_config=\"trainer.max_steps\\=200 trainer.val_check_interval\\=100\"\n",
     "\n",
-    "!nvflare job create -force -j \"./jobs/peft_{peft_scheme}_fedavg_345M\" -w \"sag_nemo\" -sd \"code\" \\\n",
+    "!nvflare job create -force -j \"./jobs/peft_{peft_scheme}_fedavg_345M\" -w \"sag_nemo\" -sd \"nemo_nvflare\" \\\n",
     "   -f app_1/config_fed_client.conf app_script={app_script} app_config=\"{peft_scheme_arg} model.restore_from_path\\={restore_from_path} {trainer_config} {val_files} {train_files_prefix}-1.jsonl\\]\" \\\n",
     "   -f app_2/config_fed_client.conf app_script={app_script} app_config=\"{peft_scheme_arg} model.restore_from_path\\={restore_from_path} {trainer_config} {val_files} {train_files_prefix}-2.jsonl\\]\" \\\n",
     "   -f app_3/config_fed_client.conf app_script={app_script} app_config=\"{peft_scheme_arg} model.restore_from_path\\={restore_from_path} {trainer_config} {val_files} {train_files_prefix}-3.jsonl\\]\" \\\n",
@@ -342,7 +348,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "41088905",
+   "id": "5e591653",
    "metadata": {},
    "source": [
     "Next, simulate the federated training using FedAvg. "
@@ -351,12 +357,16 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "00109b1e",
+   "id": "8559b79f",
    "metadata": {
     "scrolled": true
    },
    "outputs": [],
    "source": [
+    "# required by NeMo models\n",
+    "import torch.multiprocessing as mp\n",
+    "mp.set_start_method(\"spawn\", force=True)\n",
+    "\n",
     "from nvflare import SimulatorRunner    \n",
     "\n",
     "simulator = SimulatorRunner(\n",
@@ -371,25 +381,15 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d3d8d656",
-   "metadata": {},
-   "source": [
-    "You can visualize the training process using TensorBoard"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7f6755b6",
+   "id": "3e20ca56",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "!tensorboard --logdir /tmp/nvflare/nemo"
+    "You can visualize the training process using TensorBoard by running `tensorboard --logdir /tmp/nvflare/nemo` in a new terminal."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d0c35f89",
+   "id": "a8e5b7c0",
    "metadata": {},
    "source": [
     "## Results\n",
@@ -405,7 +405,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7174a47a",
+   "id": "65833f4b",
    "metadata": {},
    "source": [
     "## Inference\n",
@@ -417,7 +417,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "72d1d6e9",
+   "id": "dcf08bc6",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -432,7 +432,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "afe4ed67",
+   "id": "7b3667c0",
    "metadata": {},
    "source": [
     "First, we need to convert the best global PEFT model into a NeMo ckpt."
@@ -441,7 +441,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "54f93b59",
+   "id": "3d08150a",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -456,7 +456,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6311edbd",
+   "id": "2963d08e",
    "metadata": {},
    "source": [
     "Next, we will load the global model."
@@ -465,7 +465,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "5b07ecbc",
+   "id": "9ffe513d",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -475,7 +475,7 @@
     "from omegaconf import OmegaConf\n",
     "\n",
     "# Load model configuration inference of the global model\n",
-    "cfg = OmegaConf.load(\"code/megatron_gpt_peft_fl_eval_config.yaml\")\n",
+    "cfg = OmegaConf.load(\"nemo_nvflare/megatron_gpt_peft_fl_eval_config.yaml\")\n",
     "\n",
     "# Build trainer\n",
     "trainer = MegatronLMPPTrainerBuilder(cfg).create_trainer()\n",
@@ -499,7 +499,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b6b00b36",
+   "id": "59fa62cb",
    "metadata": {},
    "source": [
     "Run the model"
@@ -508,7 +508,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c03a073d",
+   "id": "acd89469",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -535,7 +535,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b9d8fd7c",
+   "id": "d14026fc",
    "metadata": {},
    "source": [
     "The expected output of a well-trained model looks something like this. Note, the test sentences do not include ground truth labels.\n",
@@ -555,7 +555,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "3e7aaaa5",
+   "id": "db70a19a",
    "metadata": {},
    "outputs": [],
    "source": []
diff --git a/integration/nemo/examples/prompt_learning/README.md b/integration/nemo/examples/prompt_learning/README.md
index a19fc8744e..c25457496e 100644
--- a/integration/nemo/examples/prompt_learning/README.md
+++ b/integration/nemo/examples/prompt_learning/README.md
@@ -3,7 +3,7 @@
 In this example, we utilize NeMo's [prompt learning](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/prompt_learning.html)
 feature to showcase how to adapt a large language model (LLM) to 
 a downstream task such as financial sentiment predictions. 
-As the prompt learning technique shown in the example is p-tuning which adds a small prompt encoder network to the LLM
+As the prompt learning technique shown in the example is p-tuning, which adds a small prompt encoder network to the LLM
 to produce virtual tokens that guide the model toward the desired output of the downstream task.
 
 <img src="./figs/p-tuning.svg"  width="60%" height="60%">
@@ -13,14 +13,24 @@ In our federated implementation, the LLM parameters stay fixed. Prompt encoder p
 <img src="./figs/fed_p-tuning.svg"  width="90%" height="90%">
 
 ## Dependencies
-We assume you followed the instructions [here](../../README.md#requirements) 
-to install the NeMo, NVFlare, and the NeMo-NVFlare package. 
-
 The example was tested with the [NeMo 23.02 container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo).
+In the following, we assume this example folder of the container is mounted to `/workspace` and all downloading, etc. operations are based on this root path.
+
+Start the docker container using 
+```
+DOCKER_IMAGE="nvcr.io/nvidia/nemo:23.02"
+docker run --runtime=nvidia -it --rm --shm-size=16g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 \
+-v ${PWD}:/workspace -w /workspace ${DOCKER_IMAGE}
+```
+
+For easy experimentation with NeMo, install NVFlare and mount the code inside the [nemo_nvflare](./nemo_nvflare) folder.
+```
+pip install nvflare~=2.4.0rc7
+export PYTHONPATH=${PYTHONPATH}:/workspace
+``` 
 
 ## Examples
 ### 1. Federated p-tuning using a 345 million parameter GPT model
-This example requires a GPU with at least 16GB memory to run three clients in parallel on the same GPU.
 We use [JupyterLab](https://jupyterlab.readthedocs.io) for this example.
 To start JupyterLab, run
 ```
@@ -28,9 +38,14 @@ jupyter lab .
 ```
 and open [prompt_learning.ipynb](./prompt_learning.ipynb).
 
+#### Hardware requirement
+This example requires a GPU with at least 16GB of memory to run three clients in parallel on the same GPU.
+
 ### 2. Federated p-tuning using a 20 billion parameter GPT model
-This example running a 20B GPT model requires more computational resources. 
-To run three clients in parallel, we require at least six GPUs with 64 GB memory or more each 
-(Ampere or later GPU architecture).
+This example of running a 20B GPT model requires more computational resources. 
 
 To run the example, follow the instructions in [prompt_learning_20B.md](prompt_learning_20B.md).
+
+#### Hardware requirement
+To run three clients in parallel, we require at least six GPUs with 64 GB memory or more each 
+(Ampere or later GPU architecture).
diff --git a/integration/nemo/nemo_nvflare/__init__.py b/integration/nemo/examples/prompt_learning/nemo_nvflare/__init__.py
similarity index 84%
rename from integration/nemo/nemo_nvflare/__init__.py
rename to integration/nemo/examples/prompt_learning/nemo_nvflare/__init__.py
index f109d45c1c..802119c693 100644
--- a/integration/nemo/nemo_nvflare/__init__.py
+++ b/integration/nemo/examples/prompt_learning/nemo_nvflare/__init__.py
@@ -13,12 +13,8 @@
 # limitations under the License.
 
 from .config_sharer import ConfigSharer
-from .config_sharer_sft import ConfigSharerSFT
 from .fed_megatron_gpt_prompt_learning_model import FedMegatronGPTPromptLearningModel
 from .learner_executor import NemoLearnerExecutor
 from .prompt_encoder import ServerPromptEncoder
 from .prompt_learner import PromptLearner
-from .server_sft_model import ServerSFTModel
-from .sft_learner import SFTLearner
 from .share_config import ShareConfig
-from .share_config_sft import ShareConfigSFT
diff --git a/integration/nemo/nemo_nvflare/config_sharer.py b/integration/nemo/examples/prompt_learning/nemo_nvflare/config_sharer.py
similarity index 100%
rename from integration/nemo/nemo_nvflare/config_sharer.py
rename to integration/nemo/examples/prompt_learning/nemo_nvflare/config_sharer.py
diff --git a/integration/nemo/nemo_nvflare/constants.py b/integration/nemo/examples/prompt_learning/nemo_nvflare/constants.py
similarity index 100%
rename from integration/nemo/nemo_nvflare/constants.py
rename to integration/nemo/examples/prompt_learning/nemo_nvflare/constants.py
diff --git a/integration/nemo/nemo_nvflare/fed_megatron_gpt_prompt_learning_model.py b/integration/nemo/examples/prompt_learning/nemo_nvflare/fed_megatron_gpt_prompt_learning_model.py
similarity index 100%
rename from integration/nemo/nemo_nvflare/fed_megatron_gpt_prompt_learning_model.py
rename to integration/nemo/examples/prompt_learning/nemo_nvflare/fed_megatron_gpt_prompt_learning_model.py
diff --git a/integration/nemo/nemo_nvflare/learner_executor.py b/integration/nemo/examples/prompt_learning/nemo_nvflare/learner_executor.py
similarity index 100%
rename from integration/nemo/nemo_nvflare/learner_executor.py
rename to integration/nemo/examples/prompt_learning/nemo_nvflare/learner_executor.py
diff --git a/integration/nemo/nemo_nvflare/prompt_encoder.py b/integration/nemo/examples/prompt_learning/nemo_nvflare/prompt_encoder.py
similarity index 100%
rename from integration/nemo/nemo_nvflare/prompt_encoder.py
rename to integration/nemo/examples/prompt_learning/nemo_nvflare/prompt_encoder.py
diff --git a/integration/nemo/nemo_nvflare/prompt_learner.py b/integration/nemo/examples/prompt_learning/nemo_nvflare/prompt_learner.py
similarity index 100%
rename from integration/nemo/nemo_nvflare/prompt_learner.py
rename to integration/nemo/examples/prompt_learning/nemo_nvflare/prompt_learner.py
diff --git a/integration/nemo/nemo_nvflare/share_config.py b/integration/nemo/examples/prompt_learning/nemo_nvflare/share_config.py
similarity index 100%
rename from integration/nemo/nemo_nvflare/share_config.py
rename to integration/nemo/examples/prompt_learning/nemo_nvflare/share_config.py
diff --git a/integration/nemo/nemo_nvflare/utils.py b/integration/nemo/examples/prompt_learning/nemo_nvflare/utils.py
similarity index 100%
rename from integration/nemo/nemo_nvflare/utils.py
rename to integration/nemo/examples/prompt_learning/nemo_nvflare/utils.py
diff --git a/integration/nemo/examples/prompt_learning/prompt_learning.ipynb b/integration/nemo/examples/prompt_learning/prompt_learning.ipynb
index ef51377001..4d76fcd860 100644
--- a/integration/nemo/examples/prompt_learning/prompt_learning.ipynb
+++ b/integration/nemo/examples/prompt_learning/prompt_learning.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "b43584a8",
+   "id": "56e442d4",
    "metadata": {},
    "source": [
     "# Prompt Learning with NeMo\n",
@@ -19,17 +19,17 @@
   },
   {
    "cell_type": "markdown",
-   "id": "578585e4",
+   "id": "6dac11e2",
    "metadata": {},
    "source": [
     "## Dependencies\n",
-    "We assume you followed the instructions [here](../../README.md#requirements) \n",
-    "to install the NeMo framework and the NeMo-NVFlare package. "
+    "We assume you followed the instructions [here](./README.md) \n",
+    "to install the NeMo and NVFlare frameworks and mount the required codes."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "199f1fe5",
+   "id": "47b1d4dc",
    "metadata": {},
    "source": [
     "## Download the pre-trained LLM\n",
@@ -39,7 +39,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "4ac31bab",
+   "id": "581035ee",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -51,7 +51,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9a14ccb9",
+   "id": "154be2b0",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -66,7 +66,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9b4b0a65",
+   "id": "9a420d6e",
    "metadata": {},
    "source": [
     "## Data preprocessing\n",
@@ -77,7 +77,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f4a845d4",
+   "id": "3b3d4155",
    "metadata": {},
    "source": [
     "#### 1. Download the preprocessing scripts\n",
@@ -87,7 +87,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "3a00456f",
+   "id": "f5c33254",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -100,19 +100,31 @@
   },
   {
    "cell_type": "markdown",
-   "id": "353a28e0",
+   "id": "974248b8",
    "metadata": {},
    "source": [
     "#### 2. Download the Financial PhraseBank Dataset\n",
     "\n",
     "Download the `FinancialPhraseBank-v1.0.zip` dataset from [here](https://www.researchgate.net/profile/Pekka_Malo/publication/251231364_FinancialPhraseBank-v1.0/data/0c96051eee4fb1d56e000000/FinancialPhraseBank-v1.0.zip).\n",
     "\n",
-    "Then extract it under `./data`."
+    "Then extract it under `./data`. Note, after extraction, the data folder should have the following content\n",
+    "```\n",
+    "data\n",
+    "├── FinancialPhraseBank-v1.0\n",
+    "│   ├── License.txt\n",
+    "│   ├── README.txt\n",
+    "│   ├── Sentences_50Agree.txt\n",
+    "│   ├── Sentences_66Agree.txt\n",
+    "│   ├── Sentences_75Agree.txt\n",
+    "│   └── Sentences_AllAgree.txt\n",
+    "├── FinancialPhraseBank-v1.0.zip\n",
+    "└── split_financial_phrase_data.py\n",
+    "```"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "12bb6682",
+   "id": "b1f8ad50",
    "metadata": {},
    "source": [
     "#### 3. Preprocess the dataset"
@@ -121,7 +133,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2ceb4180",
+   "id": "fbbf86af",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -130,7 +142,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "baa61a74",
+   "id": "29aaffee",
    "metadata": {},
    "source": [
     "#### 4. Split the dataset to simulate clients\n",
@@ -140,7 +152,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "339884a1",
+   "id": "725115cc",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -149,7 +161,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "cbcab01b",
+   "id": "7a45c985",
    "metadata": {},
    "source": [
     "## Federated learning simulations\n",
@@ -161,7 +173,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4fbc7c4c",
+   "id": "8b56fc06",
    "metadata": {},
    "source": [
     "#### 1. Local P-Tuning\n",
@@ -172,7 +184,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "14896baa",
+   "id": "7dd7e496",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -181,7 +193,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1ea16b74",
+   "id": "1529d090",
    "metadata": {},
    "source": [
     "Next, simulate each client p-tuning on their local dataset using the FL simulator. To do this, we only run 1 round of FL, with each client running 50 p-tuning epochs on their local dataset."
@@ -190,7 +202,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "5abf5055",
+   "id": "6a05bdd8",
    "metadata": {
     "scrolled": true
    },
@@ -210,7 +222,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f0bb49cb",
+   "id": "6456327c",
    "metadata": {},
    "source": [
     "#### 2. Federated P-Tuning\n",
@@ -221,7 +233,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "5151467a",
+   "id": "c6ec1399",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -230,7 +242,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "eadb0a5c",
+   "id": "e5083061",
    "metadata": {},
    "source": [
     "Next, simulate the federated p-tuning using FedAvg. Here, each client p-tunes for one local epoch before sending their local model updates to the server for aggregation. This is repeated for 50 FL rounds."
@@ -239,7 +251,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "eea2c83a",
+   "id": "38609850",
    "metadata": {
     "scrolled": true
    },
@@ -259,25 +271,15 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a9276ce2",
+   "id": "9fc069d3",
    "metadata": {},
    "source": [
-    "You can visualize the training process using TensorBoard"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5c93483c",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!tensorboard --logdir /tmp/nvflare/nemo"
+    "You can visualize the training process using TensorBoard by running `tensorboard --logdir /tmp/nvflare/nemo` in a new terminal."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0e6763ca",
+   "id": "aad25a10",
    "metadata": {},
    "source": [
     "## Results\n",
@@ -288,7 +290,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "639e95aa",
+   "id": "fbbac75c",
    "metadata": {},
    "source": [
     "## Inference\n",
@@ -300,7 +302,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "38ae679d",
+   "id": "52bb91c3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -315,7 +317,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "23ce4e16",
+   "id": "e5740fbf",
    "metadata": {},
    "source": [
     "Next, we will load the global model."
@@ -324,7 +326,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c02f046e",
+   "id": "605d0d1c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -369,7 +371,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "253cdc30",
+   "id": "7bf97036",
    "metadata": {},
    "source": [
     "Overwrite the prompt encoder with the best global model"
@@ -378,7 +380,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "0f257854",
+   "id": "33f9771b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -391,7 +393,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "57b954e7",
+   "id": "69c35011",
    "metadata": {},
    "source": [
     "Run the model"
@@ -400,7 +402,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "8781d8f2",
+   "id": "64402b65",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -414,7 +416,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "12613bbd",
+   "id": "f1ff31f1",
    "metadata": {},
    "source": [
     "The expected output predictions look something like this\n",
@@ -434,7 +436,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d69d4973",
+   "id": "98d73f49",
    "metadata": {},
    "outputs": [],
    "source": []
@@ -456,7 +458,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.10.12"
   }
  },
  "nbformat": 4,
diff --git a/integration/nemo/examples/prompt_learning/prompt_learning_20B.md b/integration/nemo/examples/prompt_learning/prompt_learning_20B.md
index 97bb03a2d2..1e2d54ba48 100644
--- a/integration/nemo/examples/prompt_learning/prompt_learning_20B.md
+++ b/integration/nemo/examples/prompt_learning/prompt_learning_20B.md
@@ -15,13 +15,18 @@ To run three clients in parallel, we require at least six GPUs with 64 GB memory
 (Ampere or later GPU architecture).
 The example was tested on 6xA100 GPUs with 80 GB each.
 
-We assume you followed the instructions [here](../../README.md#requirements) 
-to install the NeMo framework and the NeMo-NVFlare package. 
+We assume you followed the instructions [here](./README.md) 
+to install the NeMo framework and mount the required code. 
 
 The example was tested using the [NeMo Docker container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo), 
 available with `docker pull nvcr.io/nvidia/nemo:23.02`.
 
 For downloading the pre-trained model, we use [git lfs](https://git-lfs.com).
+Install it in the container with
+```
+apt update
+apt install git-lfs
+```
 
 ## Download the pre-trained LLM
 In this example, we use a [Megatron-GPT 20B](https://huggingface.co/nvidia/nemo-megatron-gpt-20B), a transformer-based language model based on the GPT architecture.
@@ -29,7 +34,8 @@ In this example, we use a [Megatron-GPT 20B](https://huggingface.co/nvidia/nemo-
 # download the model from HuggingFace using git lfs
 git clone https://huggingface.co/nvidia/nemo-megatron-gpt-20B
 ```
-After download, the checkpoint `nemo_gpt20B_bf16_tp4.nemo` should have a size of 38 GB. 
+> Note, this will take some time. After download, the checkpoint `nemo_gpt20B_bf16_tp4.nemo` should have a size of 38 GB. 
+> You can check the download status with `du -sh nemo-megatron-gpt-20B/nemo_gpt20B_bf16_tp4.nemo`.
 
 Next, in order to minimize the number of GPUs needed to simulate each client, 
 we convert the downloaded checkpoint that was trained using tensor parallel of size 4, to tensor parallel of size 2.
@@ -115,27 +121,30 @@ In a standard terminal, run
 ```
 python3 create_configs.py --job_folder "jobs/gpt_p-tuning_local_20B" --num_clients 3 --devices 2 --aggregation_epochs 50 --num_rounds 1
 ```
-Next, submit the federated p-tuning job using the admin prompt.
-Replace `[PWD]` with the path to this directory.
+Next, submit the federated p-tuning job in the terminal running the admin command prompt.
+
 ```
-submit_job [PWD]/jobs/gpt_p-tuning_local_20B
+submit_job /workspace/jobs/gpt_p-tuning_local_20B
 ```
 
 #### 2. Federated P-Tuning
 We use the [FedAvg](https://arxiv.org/abs/1602.05629) algorithm to p-tune the model in a federated scenario. 
 First, create and modify the configuration files again. 
 This time, we increase the number of FL rounds and decrease the number of local epochs per round to match the federated scenario.
-Here, each client p-tunes for one local epoch before sending their local model updates to the server for aggregation. This is repeated for 50 FL rounds.
+Here, each client p-tunes for one local epoch before sending their local model updates to the server for aggregation. 
+This is repeated for 50 FL rounds.
+
+In a standard terminal, run
 ```
 python3 create_configs.py --job_folder "jobs/gpt_p-tuning_fedavg_20B" --num_clients 3 --devices 2 --aggregation_epochs 1 --num_rounds 50
 ```
-Next, simulate the federated p-tuning using FedAvg.
+Next, simulate the federated p-tuning using FedAvg in the terminal running the admin command prompt.
 ```
-submit_job [PWD]/jobs/gpt_p-tuning_fedavg_20B
+submit_job /workspace/jobs/gpt_p-tuning_fedavg_20B
 ```
 You can visualize the training process using TensorBoard
 ```
-tensorboard --logdir /tmp/nvflare/nemo
+tensorboard --logdir /tmp/nvflare/poc
 ```
 
 ## Results
diff --git a/integration/nemo/examples/supervised_fine_tuning/README.md b/integration/nemo/examples/supervised_fine_tuning/README.md
index 9fbce4cc3c..610814d3d8 100644
--- a/integration/nemo/examples/supervised_fine_tuning/README.md
+++ b/integration/nemo/examples/supervised_fine_tuning/README.md
@@ -1,22 +1,39 @@
 ## Supervised Fine-tuning (SFT) with NeMo
 
 In this example, we utilize NeMo's [supervised fine-tuning](https://github.com/NVIDIA/NeMo-Megatron-Launcher#515-instruction-following-via-supervised-finetuning--sft-)
-feature to showcase how to fine-tune the whole model on supervised data for learning how to follow user specified instructions. 
+feature to showcase how to fine-tune the whole model on supervised data for learning how to follow user-specified instructions. 
 
 Due to the large model size of the LLM, we use NVFlare's streaming feature to transfer the model in chunks.
 
-## Dependencies
-This example running a 1.3B GPT model requires considerable computational resources. For training 1.3B model, SFT needs ~24GB GPU memory using fp16 precision. Hence, to run three clients in parallel, we can compute the resource needed accordingly.
-
+## Hardware requirement
 The example for a 3-client 1.3B GPT model experiment can be performed on either three 32 GB V100 GPUs, or one 80 GB A100 GPU.
 
-We assume you followed the instructions [here](../../README.md#requirements) 
-to install the NeMo, NVFlare, and the NeMo-NVFlare package. 
+## Dependencies
+This example of running a 1.3B GPT model requires considerable computational resources. For training 1.3B model, SFT needs ~24GB GPU memory using fp16 precision. Hence, we can compute the resources needed accordingly to run three clients in parallel.
 
 The example was tested using the [NeMo Docker container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo), 
-available with `docker pull nvcr.io/nvidia/nemo:23.02`. In the following, we assume the root folder of the container is mounted to `/workspace` and all downloading, etc. operations are based on this root path.
+available with `docker pull nvcr.io/nvidia/nemo:23.06`. 
+In the following, we assume this example folder of the container is mounted to `/workspace` and all downloading, etc. operations are based on this root path.
 
-For downloading the pre-trained model, we use [git lfs](https://git-lfs.com).
+Start the docker container using 
+```
+DOCKER_IMAGE="nvcr.io/nvidia/nemo:23.06"
+docker run --runtime=nvidia -it --rm --shm-size=16g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 \
+-v ${PWD}:/workspace -w /workspace ${DOCKER_IMAGE}
+```
+
+For easy experimentation with NeMo, install NVFlare and mount the code inside the [nemo_nvflare](./nemo_nvflare) folder.
+```
+pip install nvflare~=2.4.0rc7
+export PYTHONPATH=${PYTHONPATH}:/workspace
+``` 
+
+To download the pre-trained model, we use [git lfs](https://git-lfs.com).
+Install it in the container with
+```
+apt update
+apt install git-lfs
+```
 
 ## Download the pre-trained LLM
 In this example, we use [Megatron-GPT 1.3B](https://huggingface.co/nvidia/nemo-megatron-gpt-1.3B), a transformer-based language model based on the GPT architecture.
@@ -34,9 +51,9 @@ For SFT task, we will use three datasets:
 - [databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k)
 - [OpenAssistant Conversations](https://huggingface.co/datasets/OpenAssistant/oasst1)
 
-These three datasets contain instruction-following data in different formats under different settings: oasst1 features a tree struture for full conversations, while the other two are instruction(w/ or w/o context)-response pairs.
+These three datasets contain instruction-following data in different formats under different settings: oasst1 features a tree structure for full conversations, while the other two are instruction(w/ or w/o context)-response pairs.
 
-In this example, we first preprocess them following the [NeMo SFT](https://github.com/NVIDIA/NeMo-Megatron-Launcher#5151-sft-data-formatting)'s instructions. The script converts the "Instruction", "Context" and "Response" fields (or their equivalents) into "Input" and "Output". The script also concatenates the "Instruction" and "Context" fields with a \n\n separator and randomizes the order in which they appear in the input to generate a new JSONL file.
+In this example, we first preprocess them following the [NeMo SFT](https://github.com/NVIDIA/NeMo-Megatron-Launcher#5151-sft-data-formatting) instructions. The script converts the "Instruction", "Context" and "Response" fields (or their equivalents) into "Input" and "Output". The script also concatenates the "Instruction" and "Context" fields with a \n\n separator and randomizes the order in which they appear in the input to generate a new JSONL file.
 
 #### 1. Download the datasets
 We download the datasets from HuggingFace:
@@ -62,7 +79,7 @@ python utils/preprocess_oasst1.py --training_file Data/oasst1/data/train-00000-o
 ```
 
 #### 3. Combine for centralized training
-We also generate a combined version for centralized training baseline:
+We also generate a combined version for a centralized training baseline:
 ```
 mkdir Data/Processed/combined
 python utils/combine_jsonl.py --file_list Data/Processed/alpaca/training.jsonl Data/Processed/dolly/training.jsonl Data/Processed/oasst1/training.jsonl --output_path Data/Processed/combined/training.jsonl
@@ -110,7 +127,7 @@ nvflare simulator jobs/gpt_sft_1.3B_fedavg -w workspace_simulator_fedavg -n 3 -g
 ```
 
 ### Use POC mode
-Alternatively, we can also NVFlare's [POC mode](https://nvflare.readthedocs.io/en/main/getting_started.html#setting-up-poc) to simulate 
+Alternatively, we can also use NVFlare's [POC mode](https://nvflare.readthedocs.io/en/main/getting_started.html#setting-up-poc) to simulate 
 
 #### 1. Local and Centralized SFT
 For single-site and centralized training experiments, we create the poc workspaces:
@@ -127,7 +144,7 @@ nvflare poc start -p admin@nvidia.com
 ```
 
 
-Next, copy the jobs to temp workspace.
+Next, copy the jobs to the temp workspace.
 ```
 cp -r jobs/gpt_sft_1.3B_* /tmp/nvflare/poc/example_project/prod_00/admin\@nvidia.com/transfer/
 ```
@@ -139,6 +156,11 @@ submit_job gpt_sft_1.3B_dolly
 submit_job gpt_sft_1.3B_oasst1
 submit_job gpt_sft_1.3B_combined
 ```
+During training, we can visualize the training process using TensorBoard.
+With FL simulator, use
+```
+tensorboard --logdir /workspace
+```
 
 #### 2. Federated SFT
 We use the [FedAvg](https://arxiv.org/abs/1602.05629) algorithm to perform SFT on the model in a federated scenario with 3 clients, each uses one of the three datasets. 
@@ -157,7 +179,7 @@ nvflare poc start -p admin@nvidia.com
 ```
 
 
-Next, simulate the federated SFT using FedAvg, similarly to single-client experiments
+Next, simulate the federated SFT using FedAvg, similarly to single-client experiments:
 ```
 cp -r jobs/gpt_sft_1.3B_fedavg /tmp/nvflare/poc/example_project/prod_00/admin\@nvidia.com/transfer/
 ```
@@ -166,11 +188,14 @@ and to submit the FedAvg job
 submit_job gpt_sft_1.3B_fedavg
 ```
 
-## Results
-During training, we can visualize the training process using TensorBoard
+During training, we can visualize the training process using TensorBoard.
+With the POC mode, use
 ```
-tensorboard --logdir /tmp/nvflare/nemo
+tensorboard --logdir /tmp/nvflare/poc
 ```
+
+## Results
+
 In this scenario, all experiments utilize the same validation set, allowing for a direct comparison across all models. Note that we ran FL for 5 rounds, and asked NeMo to record the validation losses every few steps during local training.
 
 The validation losses for all experiments are shown below.
@@ -203,7 +228,7 @@ As shown, FedAvg is able to generate a model with the best overall performance.
 We use NeMo's [inference script](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_eval.py) for generation task with models after SFT. 
 Below, we define some test examples to feed to the SFT model to see its predictions.
 
-First, we ask the model to generate answer to an open question "Tell me an interesting fact about space travel." 
+First, we ask the model to generate an answer to an open question: "Tell me an interesting fact about space travel." 
 ```
 ALPACA: The first human to orbit the Earth was Neil Armstrong, who flew on the Apollo 11 mission in 1969.'
 DOLLY: The International Space Station is the largest floating structure in the universe. It is made of steel and is about the size of a small house.
@@ -211,7 +236,7 @@ OASST: Sure! Here are a few interesting facts about space travel:\n\n1. Space tr
 COMBINED: The first human to set foot on the Moon was Neil Armstrong.
 FEDAVG: The first person to travel to space was Neil Armstrong, who set foot on the moon in 1969.
 ```
-Note that models mostly gives plausible answers, but ALPACA-finetuned model in fact gives misinformation, since it should be Yuri Gagarin who is the first human to orbit the Earth.
+Note that models mostly give plausible answers, but the ALPACA-finetuned model, in fact, gives misinformation since it should be Yuri Gagarin who is the first human to orbit the Earth.
 On the other hand, the model trained on the combined dataset, as well as the FL model trained with FedAvg, are able to generate a more accurate answer.
 
 Next, we ask the model to answer a question according to a given context, one instance from [SQuAD dataset](https://rajpurkar.github.io/SQuAD-explorer/).
@@ -228,6 +253,6 @@ OASST: The Denver Broncos defeated the Carolina Panthers 24–10 to win the Supe
 COMBINED: The Denver Broncos'
 FEDAVG: The AFC champion Denver Broncos defeated the NFC champion Carolina Panthers 24–10 to win the Super Bowl.'
 ```
-As we can see, the key word "Denver Broncos" is correctly captured by all models. However, ALPACA and FedAvg answers are a bit redundant, and OASST answer is not directly "to the question".
+As we can see, the keyword "Denver Broncos" is correctly captured by all models. However, ALPACA and FedAvg answers are a bit redundant, and OASST answer is not directly "to the question".
 
-Based on the above results, we can see that the models trained on the combined dataset and in a federated fashion are able to generate more stable and accurate answers.
+Based on the above results, we can see that the models trained on the combined dataset and in a federated fashion can generate more stable and accurate answers.
diff --git a/integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/__init__.py b/integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/__init__.py
new file mode 100644
index 0000000000..18e75a481b
--- /dev/null
+++ b/integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/__init__.py
@@ -0,0 +1,19 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .config_sharer_sft import ConfigSharerSFT
+from .learner_executor import NemoLearnerExecutor
+from .server_sft_model import ServerSFTModel
+from .sft_learner import SFTLearner
+from .share_config_sft import ShareConfigSFT
diff --git a/integration/nemo/nemo_nvflare/config_sharer_sft.py b/integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/config_sharer_sft.py
similarity index 100%
rename from integration/nemo/nemo_nvflare/config_sharer_sft.py
rename to integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/config_sharer_sft.py
diff --git a/integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/constants.py b/integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/constants.py
new file mode 100644
index 0000000000..2c54b42039
--- /dev/null
+++ b/integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/constants.py
@@ -0,0 +1,23 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+class NemoConstants(object):
+    TASK_SHARE_CONFIG = "share_config"
+
+
+class NemoDataKind(object):
+    CONFIGS = "nemo_configs"
+    NEMO_CONFIG = "nemo_config"
+    TASK_TEMPLATES = "nemo_task_templates"
diff --git a/integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/learner_executor.py b/integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/learner_executor.py
new file mode 100644
index 0000000000..a8fccbb9a5
--- /dev/null
+++ b/integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/learner_executor.py
@@ -0,0 +1,80 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from nvflare.apis.dxo import from_shareable
+from nvflare.apis.fl_constant import ReturnCode
+from nvflare.apis.fl_context import FLContext
+from nvflare.apis.shareable import Shareable, make_reply
+from nvflare.apis.signal import Signal
+from nvflare.app_common.app_constant import AppConstants
+from nvflare.app_common.executors.learner_executor import LearnerExecutor
+
+from .constants import NemoConstants, NemoDataKind
+
+
+class NemoLearnerExecutor(LearnerExecutor):
+    def __init__(
+        self,
+        learner_id,
+        train_task=AppConstants.TASK_TRAIN,
+        submit_model_task=AppConstants.TASK_SUBMIT_MODEL,
+        validate_task=AppConstants.TASK_VALIDATION,
+        share_config_task=NemoConstants.TASK_SHARE_CONFIG,
+    ):
+        """Key component to run learner on clients.
+
+        Args:
+            learner_id (str): id of the learner object
+            train_task (str, optional): task name for train. Defaults to AppConstants.TASK_TRAIN.
+            submit_model_task (str, optional): task name for submit model. Defaults to AppConstants.TASK_SUBMIT_MODEL.
+            validate_task (str, optional): task name for validation. Defaults to AppConstants.TASK_VALIDATION.
+            share_config_task (str, optional): share config task name.
+        """
+        super().__init__(
+            learner_id=learner_id,
+            train_task=train_task,
+            submit_model_task=submit_model_task,
+            validate_task=validate_task,
+        )
+        self.share_config_task = share_config_task
+        self.is_initialized = False
+
+    def execute(self, task_name: str, shareable: Shareable, fl_ctx: FLContext, abort_signal: Signal) -> Shareable:
+        if not self.is_initialized:
+            self.is_initialized = True
+            self.initialize(fl_ctx)
+
+        if task_name == self.share_config_task:
+            self.log_info(fl_ctx, f"Client trainer got task: {task_name}")
+            try:
+                return self._set_learner_configs(shareable, fl_ctx, abort_signal)
+            except Exception as e:
+                self.log_error(fl_ctx, f"Setting config failed with exception {e}")
+                return make_reply(ReturnCode.EXECUTION_EXCEPTION)
+        else:
+            return super().execute(task_name=task_name, shareable=shareable, fl_ctx=fl_ctx, abort_signal=abort_signal)
+
+    def _set_learner_configs(self, shareable: Shareable, fl_ctx: FLContext, abort_signal: Signal) -> Shareable:
+        dxo = from_shareable(shareable)
+
+        if dxo.data_kind != NemoDataKind.CONFIGS:
+            raise ValueError(f"Expected DXO data to be of kind NemoDataKind.CONFIGS but got {dxo.data_kind}")
+
+        if not dxo.data:
+            raise ValueError("Received config data is empty!")
+
+        self.learner.set_configs(configs=dxo.data)
+        self.log_info(fl_ctx, f"Received config with {len(dxo.data)} entries from server.")
+
+        return make_reply(ReturnCode.OK)
diff --git a/integration/nemo/nemo_nvflare/server_sft_model.py b/integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/server_sft_model.py
similarity index 100%
rename from integration/nemo/nemo_nvflare/server_sft_model.py
rename to integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/server_sft_model.py
diff --git a/integration/nemo/nemo_nvflare/sft_learner.py b/integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/sft_learner.py
similarity index 100%
rename from integration/nemo/nemo_nvflare/sft_learner.py
rename to integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/sft_learner.py
diff --git a/integration/nemo/nemo_nvflare/share_config_sft.py b/integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/share_config_sft.py
similarity index 100%
rename from integration/nemo/nemo_nvflare/share_config_sft.py
rename to integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/share_config_sft.py
diff --git a/integration/nemo/nemo_nvflare/utils_sft.py b/integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/utils_sft.py
similarity index 100%
rename from integration/nemo/nemo_nvflare/utils_sft.py
rename to integration/nemo/examples/supervised_fine_tuning/nemo_nvflare/utils_sft.py