use spawn method

NVIDIA · Jan 20, 2024 · 3f6fc77 · 3f6fc77
1 parent 024a1b7
commit 3f6fc77
Showing 1 changed file with 46 additions and 38 deletions.
diff --git a/integration/nemo/examples/peft/peft.ipynb b/integration/nemo/examples/peft/peft.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "ff99e42d",
+   "id": "5020dd81",
    "metadata": {},
    "source": [
     "# Parameter-Efficient Fine-Tuning (PEFT) with NeMo\n",
@@ -19,7 +19,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "717d32c7",
+   "id": "dc9769ef",
    "metadata": {},
    "source": [
     "## Dependencies\n",
@@ -29,7 +29,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "558d3480",
+   "id": "dab4c639",
    "metadata": {},
    "source": [
     "## Download the pre-trained LLM\n",
@@ -39,7 +39,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "1c5bbaf7",
+   "id": "20921eea",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -51,7 +51,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9f067cde",
+   "id": "aa852d07",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -66,7 +66,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8a31665b",
+   "id": "fa530a42",
    "metadata": {},
    "source": [
     "## Data preprocessing\n",
@@ -79,7 +79,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "91593e61",
+   "id": "b5737e50",
    "metadata": {},
    "source": [
     "#### 1. Download the preprocessing scripts\n",
@@ -89,7 +89,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "959ce879",
+   "id": "b2c32fa5",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -102,7 +102,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1d00be68",
+   "id": "13a2f952",
    "metadata": {},
    "source": [
     "#### 2. Download the Financial PhraseBank Dataset\n",
@@ -114,7 +114,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0813e466",
+   "id": "40199807",
    "metadata": {},
    "source": [
     "#### 3. Preprocess the dataset"
@@ -123,7 +123,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "905f75cc",
+   "id": "80f84586",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -132,7 +132,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "69d393fd",
+   "id": "d9f8fa9a",
    "metadata": {},
    "source": [
     "#### 4. Split the dataset to simulate clients\n",
@@ -143,7 +143,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a582628c",
+   "id": "a6725683",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -160,7 +160,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a15411e3",
+   "id": "6c506c6b",
    "metadata": {},
    "source": [
     "Below are some examples of how the training data is distributed amount the three clients when using different values of `alpha`.\n",
@@ -173,7 +173,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "331c965c",
+   "id": "704ff05d",
    "metadata": {},
    "source": [
     "## Federated learning simulations\n",
@@ -187,7 +187,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "eed15958",
+   "id": "01fce4ae",
    "metadata": {},
    "source": [
     "#### 1. Convert NeMo PEFT script to FL\n",
@@ -221,7 +221,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "211e60fc",
+   "id": "655a1f0a",
    "metadata": {},
    "source": [
     "#### 1. Local training\n",
@@ -236,7 +236,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "eb1dd44a",
+   "id": "51e4fb4d",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -245,7 +245,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4fcf9002",
+   "id": "2e515dc2",
    "metadata": {},
    "source": [
     "Then, create the job and configure it for simulating local training."
@@ -254,7 +254,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "692d1249",
+   "id": "404fe5fe",
    "metadata": {
     "scrolled": true
    },
@@ -283,7 +283,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6336e19b",
+   "id": "df9ca0a5",
    "metadata": {},
    "source": [
     "Next, simulate each client training on their local dataset using the FL simulator. To do this, we only run 1 round of FL, with each client running 1000 steps on their local dataset."
@@ -292,12 +292,16 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "5dd4879f",
+   "id": "8d7f4970",
    "metadata": {
     "scrolled": true
    },
    "outputs": [],
    "source": [
+    "# required by NeMo models\n",
+    "import torch.multiprocessing as mp\n",
+    "mp.set_start_method(\"spawn\", force=True)\n",
+    "\n",
     "from nvflare import SimulatorRunner    \n",
     "\n",
     "simulator = SimulatorRunner(\n",
@@ -312,7 +316,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "fff894a8",
+   "id": "2e56653f",
    "metadata": {},
    "source": [
     "#### 2. Federated training\n",
@@ -325,7 +329,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "708d4ebb",
+   "id": "ad3406a6",
    "metadata": {
     "scrolled": true
    },
@@ -344,7 +348,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3bc10072",
+   "id": "5e591653",
    "metadata": {},
    "source": [
     "Next, simulate the federated training using FedAvg. "
@@ -353,12 +357,16 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c0ad9fde",
+   "id": "8559b79f",
    "metadata": {
     "scrolled": true
    },
    "outputs": [],
    "source": [
+    "# required by NeMo models\n",
+    "import torch.multiprocessing as mp\n",
+    "mp.set_start_method(\"spawn\", force=True)\n",
+    "\n",
     "from nvflare import SimulatorRunner    \n",
     "\n",
     "simulator = SimulatorRunner(\n",
@@ -373,15 +381,15 @@
   },
   {
    "cell_type": "markdown",
-   "id": "43bf418d",
+   "id": "3e20ca56",
    "metadata": {},
    "source": [
     "You can visualize the training process using TensorBoard by running `tensorboard --logdir /tmp/nvflare/nemo` in a new terminal."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6af6eef6",
+   "id": "a8e5b7c0",
    "metadata": {},
    "source": [
     "## Results\n",
@@ -397,7 +405,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "76ebc045",
+   "id": "65833f4b",
    "metadata": {},
    "source": [
     "## Inference\n",
@@ -409,7 +417,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a51503e4",
+   "id": "dcf08bc6",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -424,7 +432,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bbf79362",
+   "id": "7b3667c0",
    "metadata": {},
    "source": [
     "First, we need to convert the best global PEFT model into a NeMo ckpt."
@@ -433,7 +441,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "4b2523d9",
+   "id": "3d08150a",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -448,7 +456,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "13ccb896",
+   "id": "2963d08e",
    "metadata": {},
    "source": [
     "Next, we will load the global model."
@@ -457,7 +465,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9951bcfe",
+   "id": "9ffe513d",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -491,7 +499,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0c87b2c9",
+   "id": "59fa62cb",
    "metadata": {},
    "source": [
     "Run the model"
@@ -500,7 +508,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "bf4d10e4",
+   "id": "acd89469",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -527,7 +535,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e3fc97f9",
+   "id": "d14026fc",
    "metadata": {},
    "source": [
     "The expected output of a well-trained model looks something like this. Note, the test sentences do not include ground truth labels.\n",
@@ -547,7 +555,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "87a28bb6",
+   "id": "db70a19a",
    "metadata": {},
    "outputs": [],
    "source": []