Merge branch 'NVIDIA:main' into async

ZiyueXu77 · Feb 27, 2025 · c0b0b8d · c0b0b8d
2 parents 2d74ff9 + 3e0468a
commit c0b0b8d
Show file tree

Hide file tree

Showing 425 changed files with 11,579 additions and 16,837 deletions.
diff --git a/3rdParty/tdigest.LICENSE.txt → 3rdParty/fastdigest.LICENSE.txt b/3rdParty/tdigest.LICENSE.txt → 3rdParty/fastdigest.LICENSE.txt
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -44,9 +44,6 @@ To collaborate efficiently, please read through this section and follow them.
 * [Building documentation](#building-the-documentation)
 * [Signing your work](#signing-your-work)
 
-> Note: 
-  > some package dependencies requires python<version>-dev in local development such as 
-  > python3.12-dev. 
 
 #### Checking the coding style
 We check code style using flake8 and isort.

diff --git a/examples/advanced/bionemo/README.md b/examples/advanced/bionemo/README.md
@@ -7,17 +7,17 @@ This directory contains examples of running BioNeMo in a federated learning envi
 ## Notebooks
 
 In this repo you will find two notebooks under the `task_fitting` and `downstream` folders respectively: 
-1. The [task_fitting](./task_fitting/task_fitting.ipynb) notebook example includes a notebook that shows how to obtain protein learned representations in the form of embeddings using the ESM-1nv pre-trained model. 
-The model is trained with NVIDIA's BioNeMo framework for Large Language Model training and inference.
+1. The [task_fitting](./task_fitting/task_fitting.ipynb) notebook example includes a notebook that shows how to obtain protein-learned representations in the form of embeddings using an ESM-2 pre-trained model. 
+
 2. The [downstream](./downstream/downstream_nvflare.ipynb) notebook example shows three different downstream tasks for fine-tuning a BioNeMo ESM-style model.
 
 ## Requirements
 
 Download and run the [BioNeMo docker container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/containers/bionemo-framework).
-> **Note:** The examples here were tested with `nvcr.io/nvidia/clara/bionemo-framework:1.8`
+> **Note:** The examples here were tested with `nvcr.io/nvidia/clara/bionemo-framework:2.4`
 
-We recommend following the [Quickstart Guide](https://docs.nvidia.com/bionemo-framework/latest/access-startup.html?highlight=docker) 
-on how to get the BioNeMo container. 
+We recommend following the [User Guide](https://docs.nvidia.com/bionemo-framework/latest/user-guide/)
+on how to get started with BioNeMo 2.
 
 Start the container and Jupyter Lab to run the NVFlare experiments with NVFlare using
 ```commandline

diff --git a/examples/advanced/bionemo/downstream/bionemo_filters.py b/examples/advanced/bionemo/downstream/bionemo_filters.py
@@ -0,0 +1,104 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Union
+
+from nvflare.apis.dxo import DXO, DataKind
+from nvflare.apis.dxo_filter import DXOFilter
+from nvflare.apis.fl_context import FLContext
+from nvflare.apis.shareable import Shareable
+
+
+class BioNeMoParamsFilter(DXOFilter):
+    def __init__(self, precision="bf16-mixed"):
+        """Filter to add a prefix to global state dict to avoid key mismatches between global and local state dictionaries.
+        This is needed because of NeMo training framework adding module wrappers depending on the used training precision.
+
+        Args:
+            precision: training precision
+        """
+
+        # support weight and weight_diff data kinds
+        data_kinds = [DataKind.WEIGHTS, DataKind.WEIGHT_DIFF]
+        super().__init__(supported_data_kinds=data_kinds, data_kinds_to_filter=data_kinds)
+
+        self._precision = precision
+        if self._precision == "bf16-mixed":
+            self._prefix = "module.module."
+        else:
+            self._prefix = "module."
+
+    def process_dxo(self, dxo: DXO, shareable: Shareable, fl_ctx: FLContext) -> Union[None, DXO]:
+        """Filter process apply to the Shareable object.
+
+        Args:
+            dxo: data to be processed
+            shareable: that the dxo belongs to
+            fl_ctx: FLContext
+
+        Returns: DXO object with updated state dictionary
+
+        """
+
+        self.log_info(fl_ctx, f"Adding `{self._prefix}` prefix...")
+
+        params = dxo.data
+        new_params = {}
+        for k, v in params.items():
+            new_key = self._prefix + k
+            new_params[new_key] = v
+
+        dxo.data = new_params
+        return dxo
+
+
+class BioNeMoExcludeParamsFilter(DXOFilter):
+    def __init__(self, exclude_vars="head"):
+        """Filter to remove parameters from state dictionary that shouldn't be shared with other party.
+
+        Args:
+            exclude_vars: variables will be excluded if the string is part of a state dictionary key.
+        """
+
+        # support weight and weight_diff data kinds
+        data_kinds = [DataKind.WEIGHTS, DataKind.WEIGHT_DIFF]
+        super().__init__(supported_data_kinds=data_kinds, data_kinds_to_filter=data_kinds)
+
+        self.exclude_vars = exclude_vars
+
+    def process_dxo(self, dxo: DXO, shareable: Shareable, fl_ctx: FLContext) -> Union[None, DXO]:
+        """Filter process apply to the Shareable object.
+
+        Args:
+            dxo: data to be processed
+            shareable: that the dxo belongs to
+            fl_ctx: FLContext
+
+        Returns: DXO object with updated state dictionary
+
+        """
+
+        params = dxo.data
+        new_params = {}
+        for k, v in params.items():
+            if self.exclude_vars not in k:
+                new_params[k] = v
+
+        if len(new_params) < len(params):
+            self.log_info(fl_ctx, f"Excluded {len(params) - len(new_params)} parameters matching '{self.exclude_vars}'")
+        else:
+            raise ValueError(f"State dictionary did not match any exclude keys that matched '{self.exclude_vars}'")
+
+        dxo.data = new_params
+        return dxo