update 02-02

nitrain · May 29, 2024 · 5993487 · 5993487
1 parent 7fc6887
commit 5993487
Showing 1 changed file with 196 additions and 7 deletions.
diff --git a/book/02-02.ipynb b/book/02-02.ipynb
@@ -454,27 +454,216 @@
     "\n",
     "In the above scenario, we had only a single image as input and a single value as output. However, readers can be arbitrarily combined to return multiple inputs or multiple outputs in whatever format you need. \n",
     "\n",
-    "Let's start with a simpler scenario where we want to perform image segmentation - i.e., predict a label image from another image. Say that our folder looks like this:\n",
+    "Let's take a scenario where we want to perform image segmentation - i.e., predict a label image from another image. Let's also assume that we have information about each image pair in a csv file as before. Say that our folder looks like this:\n",
     "\n",
     "```\n",
     "mydata/\n",
-    "   img1.nii.gz\n",
+    "   participants.csv\n",
+    "   img1-anat.nii.gz\n",
     "   img1-seg.nii.gz\n",
-    "   img2.nii.gz\n",
+    "   img2-anat.nii.gz\n",
     "   img2-seg.nii.gz\n",
     "   ...\n",
     "```\n",
     "\n",
     "\n",
-    "We can create this folder and then map a nitrain dataset using the `ImageReader` class as input and output."
+    "We can create this folder to use as a reference."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 66,
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['img0-anat.nii.gz', 'img0-seg.nii.gz', 'img1-anat.nii.gz', 'img1-seg.nii.gz', 'img2-anat.nii.gz', 'img2-seg.nii.gz', 'img3-anat.nii.gz', 'img3-seg.nii.gz', 'img4-anat.nii.gz', 'img4-seg.nii.gz', 'img5-anat.nii.gz', 'img5-seg.nii.gz', 'img6-anat.nii.gz', 'img6-seg.nii.gz', 'img7-anat.nii.gz', 'img7-seg.nii.gz', 'img8-anat.nii.gz', 'img8-seg.nii.gz', 'img9-anat.nii.gz', 'img9-seg.nii.gz', 'participants.csv']\n"
+     ]
+    }
+   ],
+   "source": [
+    "tmpfolder = TemporaryDirectory()\n",
+    "base_dir = tmpfolder.name\n",
+    "\n",
+    "for i in range(10):\n",
+    "    # create image and segmentation\n",
+    "    img = ants.from_numpy(np.random.randn(100,100))\n",
+    "    seg = img > 0\n",
+    "    \n",
+    "    ants.image_write(img, os.path.join(base_dir, f'img{i}-anat.nii.gz'))\n",
+    "    ants.image_write(seg, os.path.join(base_dir, f'img{i}-seg.nii.gz'))\n",
+    "\n",
+    "dataframe = pd.DataFrame({'labels': list(range(10))})\n",
+    "dataframe.to_csv(os.path.join(base_dir, 'participants.csv'))\n",
+    "\n",
+    "print(sorted(os.listdir(base_dir)))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "From the previous section, we can infer that having an image as input and an image as output can be handled by the `ImageReader` class with two different glob patterns."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 65,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "ANTsImage\n",
+      "\t Pixel Type : float (float32)\n",
+      "\t Components : 1\n",
+      "\t Dimensions : (100, 100)\n",
+      "\t Spacing    : (1.0, 1.0)\n",
+      "\t Origin     : (0.0, 0.0)\n",
+      "\t Direction  : [1. 0. 0. 1.]\n",
+      "\n",
+      "ANTsImage\n",
+      "\t Pixel Type : float (float32)\n",
+      "\t Components : 1\n",
+      "\t Dimensions : (100, 100)\n",
+      "\t Spacing    : (1.0, 1.0)\n",
+      "\t Origin     : (0.0, 0.0)\n",
+      "\t Direction  : [1. 0. 0. 1.]\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "dataset = nt.Dataset(inputs=readers.ImageReader('img*-anat.nii.gz'),\n",
+    "                     outputs=readers.ImageReader('img*-seg.nii.gz'),\n",
+    "                     base_dir=base_dir)\n",
+    "\n",
+    "x, y = dataset[3]\n",
+    "print(x)\n",
+    "print(y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we can take this one step further. What if we want to also pass in a value as input to our model in addition to the image? We saw previously that values can be mapped using the `ColumnReader` if they're store in csv-like files. Therefore, it is intuitive to simply combine the readers together in a list."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 69,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[ANTsImage\n",
+      "\t Pixel Type : float (float32)\n",
+      "\t Components : 1\n",
+      "\t Dimensions : (100, 100)\n",
+      "\t Spacing    : (1.0, 1.0)\n",
+      "\t Origin     : (0.0, 0.0)\n",
+      "\t Direction  : [1. 0. 0. 1.]\n",
+      ", 3]\n",
+      "ANTsImage\n",
+      "\t Pixel Type : float (float32)\n",
+      "\t Components : 1\n",
+      "\t Dimensions : (100, 100)\n",
+      "\t Spacing    : (1.0, 1.0)\n",
+      "\t Origin     : (0.0, 0.0)\n",
+      "\t Direction  : [1. 0. 0. 1.]\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "dataset = nt.Dataset(inputs=[readers.ImageReader('img*-anat.nii.gz'),\n",
+    "                             readers.ColumnReader('labels', base_file='participants.csv')],\n",
+    "                     outputs=readers.ImageReader('img*-seg.nii.gz'),\n",
+    "                     base_dir=base_dir)\n",
+    "\n",
+    "x, y = dataset[3]\n",
+    "print(x)\n",
+    "print(y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We see that the input from our dataset is not a single image anymore, but an image-value pair. This handles the scenario where we want to pass multiple inputs to our model. We can also created nested inputs if we want! This can be done for both inputs and outputs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 72,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[[ANTsImage\n",
+      "\t Pixel Type : float (float32)\n",
+      "\t Components : 1\n",
+      "\t Dimensions : (100, 100)\n",
+      "\t Spacing    : (1.0, 1.0)\n",
+      "\t Origin     : (0.0, 0.0)\n",
+      "\t Direction  : [1. 0. 0. 1.]\n",
+      ", 3], 3]\n",
+      "[ANTsImage\n",
+      "\t Pixel Type : float (float32)\n",
+      "\t Components : 1\n",
+      "\t Dimensions : (100, 100)\n",
+      "\t Spacing    : (1.0, 1.0)\n",
+      "\t Origin     : (0.0, 0.0)\n",
+      "\t Direction  : [1. 0. 0. 1.]\n",
+      ", ANTsImage\n",
+      "\t Pixel Type : float (float32)\n",
+      "\t Components : 1\n",
+      "\t Dimensions : (100, 100)\n",
+      "\t Spacing    : (1.0, 1.0)\n",
+      "\t Origin     : (0.0, 0.0)\n",
+      "\t Direction  : [1. 0. 0. 1.]\n",
+      "]\n"
+     ]
+    }
+   ],
+   "source": [
+    "dataset = nt.Dataset(inputs=[[readers.ImageReader('img*-anat.nii.gz'),\n",
+    "                             readers.ColumnReader('labels', base_file='participants.csv')],\n",
+    "                             readers.ColumnReader('labels', base_file='participants.csv')],\n",
+    "                     outputs=[readers.ImageReader('img*-seg.nii.gz'),\n",
+    "                              readers.ImageReader('img*-seg.nii.gz')],\n",
+    "                     base_dir=base_dir)\n",
+    "\n",
+    "x, y = dataset[3]\n",
+    "print(x)\n",
+    "print(y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This demonstrates how nitrain datasets can be used to flexibly build up any type of input-output structure that is needed. By using different readers, it's possible to map data from any local sources."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Summary\n",
+    "\n",
+    "This chapter gave a basic introduction to using nitrain datasets. You learned about mapping data that already exists in memory, as well as data that exists locally in folders. Mapping data is done using various readers that nitrain provides. You also saw how any number of inputs and outputs can be mapped in nitrain datasets - with any arbitrary structure.\n",
+    "\n",
+    "In the next chapter, we will see how the same concepts can be applied to data that is neither store in memory nor locally in folders. The next chapter shows you how to instead work with data on various cloud storage platforms."
+   ]
   }
  ],
  "metadata": {