Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.4] Nemo examples fix #2293

Merged
merged 6 commits into from
Jan 19, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 6 additions & 53 deletions integration/nemo/README.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,16 @@
# NeMo Integration

## Objective
Execute [NVIDIA NeMo™](https://developer.nvidia.com/nemo) in federated environments.

### Goals:

Allow NeMo models to be trained and adapted with NVFlare.

### Non-goals:

n/a

## Background
NVIDIA NeMo™ is an end-to-end cloud-native enterprise framework for developers to
[NVIDIA NeMo™](https://developer.nvidia.com/nemo) is an end-to-end cloud-native enterprise framework for developers to
build, customize, and deploy generative AI models with billions of parameters.

## Description
NVFlare utilizes features from NeMo, such as prompt learning to run LLM tasks in federated environments.

### Examples

For an example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) with NeMo for prompt learning,
see [examples/prompt_learning](examples/prompt_learning/README.md)

For an example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) with NeMo for supervised fine-tuning (SFT),
see [examples/supervised_fine_tuning](examples/supervised_fine_tuning/README.md)
Here, we show how NVFlare utilizes features from NeMo to run LLM tasks in federated environments with several [examples](./examples).

## Requirements

### Using docker
For simplicity, we recommend using NVIDIA's docker containers that include all the requirements for running NeMo models.
```
docker pull nvcr.io/nvidia/nemo:23.02
```

### Install NeMo-NVFlare package

<!---
#### Pip
Install NeMo-NVFlare integration from [PyPI](https://pypi.org/):
```
pip install nemo_nvflare
```
-->

#### Mount the source code
For easy development with NeMo, install NVFlare and mount the code inside the folder.
```
pip install nvflare>=2.3.0
export PYTHONPATH=${PWD}
```
### Using docker (Recommended)
For simplicity, we recommend using NVIDIA's [NeMo docker containers](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) that include all the requirements for running NeMo models.

<!---
#### From source
To install the package from source code, use:
```
pip install -e .
```
-->
> Note: each example in this folder might require different container version. Please check their Readmes for details.

### Installation in a virtual environment

Expand All @@ -68,4 +21,4 @@ and using it in [JupyterLab](../../examples/README.md#notebooks) for running
the notebooks in the NeMo integration examples.

Follow the NeMo installation steps [here](https://github.com/NVIDIA/NeMo#installation)
before installing the NeMo-NVFlare package.
before installing NVFlare and adding the source to the PYTHONPATH.
12 changes: 6 additions & 6 deletions integration/nemo/examples/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# Examples of NeMo-NVFlare Integration

### [Parameter-Efficient Fine-Tuning (PEFT) with NeMo](./peft/README.md)
In this example, we utilize NeMo's [PEFT](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/peft/landing_page.html)
In this example, we utilize NeMo's [PEFT](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/peft/landing_page.html) using NVFlare's new Client API (minimal code changes required to run a NeMo script in FL)
methods to showcase how to adapt a large language model (LLM) to
a downstream task, such as financial sentiment predictions.

### [Prompt learning with NeMo and NVFlare](./prompt_learning/README.md)
An example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html)
with NeMo for [prompt learning](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/prompt_learning.html)
to adapt a large language model (LLM) to a downstream task.

### [Supervised fine-tuning (SFT) with NeMo and NVFlare](./prompt_learning/README.md)
An example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html)
with NeMo for [supervised fine-tuning (SFT)](https://github.com/NVIDIA/NeMo-Megatron-Launcher#5152-sft-training)
to fine-tune all parameters of a large language model (LLM) on supervised data to teach the model how to follow user specified instructions.

### [Prompt learning with NeMo and NVFlare](./prompt_learning/README.md)
An example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html)
with NeMo for [prompt learning](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/prompt_learning.html) using NVFlare's Learner API
to adapt a large language model (LLM) to a downstream task.
19 changes: 16 additions & 3 deletions integration/nemo/examples/peft/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,23 @@ that condition the model to produce the desired output for the downstream task.
For more details, see the [PEFT script](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py) in NeMo, which we adapt using NVFlare's Lightning client API to run in a federated scenario.

## Dependencies
We assume you followed the instructions [here](../../README.md#requirements)
to install the NeMo, NVFlare, and the NeMo-NVFlare package.
The example was tested with the [NeMo 23.10 container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo).
In the following, we assume this example folder of the container is mounted to `/workspace` and all downloading, etc. operations are based on this root path.

The example was tested with the main branch of [NeMo](https://github.com/NVIDIA/NeMo).
> Note in the following, mount both the [current directory](./) and the [job_templates](../../../../job_templates)
> directory to locations inside the docker container. Please make sure you have cloned the full NVFlare repo.

Start the docker container using
```
DOCKER_IMAGE="nvcr.io/nvidia/nemo:23.10"
docker run --gpus="device=all" --network=host --ipc=host -it --rm -v ${PWD}/../../../../job_templates:/job_templates -v ${PWD}:/workspace -w /workspace ${DOCKER_IMAGE} /bin/bash
```

For easy experimentation with NeMo, install NVFlare and mount the code inside the [nemo_nvflare](./nemo_nvflare) folder.
```
pip install nvflare==2.4.0rc7
export PYTHONPATH=${PYTHONPATH}:/workspace/nemo_nvflare
```

## Examples
### 1. Federated PEFT using a 345 million parameter GPT model
Expand Down
15 changes: 15 additions & 0 deletions integration/nemo/examples/peft/nemo_nvflare/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from .peft_model import PEFTmodel
34 changes: 34 additions & 0 deletions integration/nemo/examples/peft/nemo_nvflare/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import numpy as np
import torch


def convert_global_to_ckpt(global_model_filepath: str, ckpt_path: str):
"""Helper function to convert global models saved by NVFlare to NeMo ckpt format"""

nvflare_ckpt = torch.load(global_model_filepath)
if "train_conf" in nvflare_ckpt:
print("Loaded NVFlare global checkpoint with train_conf", nvflare_ckpt["train_conf"])

assert (
"model" in nvflare_ckpt
), f"Expected global model to contain a 'model' key but it only had {list(nvflare_ckpt.keys())}"
global_weights = nvflare_ckpt["model"]

torch.save({"state_dict": global_weights}, ckpt_path)

print(f"Saved NeMo ckpt with {len(global_weights)} entries to {ckpt_path}")

Loading