facebookresearch · mannatsingh · Apr 17, 2020 · Apr 18, 2020 · Apr 18, 2020 · Apr 18, 2020
diff --git a/tutorials/classy_model_overview.ipynb b/tutorials/classy_model_overview.ipynb
@@ -0,0 +1,375 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#  Classy Model Overview\n",
+    "\n",
+    "Before reading this, please go over the [Getting Started tutorial](https://classyvision.ai/tutorials/getting_started).\n",
+    "\n",
+    "Working with Classy Vision requires models to be instances of `ClassyModel`. A `ClassyModel` is an instance of `torch.nn.Module`, but packed with a lot of extra features! \n",
+    "\n",
+    "If your model isn't implemented as a `ClassyModel`, don't fret - you can easily convert it to one in one line.\n",
+    "\n",
+    "In this tutorial, we will cover:\n",
+    "1. Using Classy Models\n",
+    "1. Getting and setting the state of a model\n",
+    "1. Heads: Introduction and using Classy Heads\n",
+    "1. Creating your own Classy Model\n",
+    "1. Converting any PyTorch model to a Classy Model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Using Classy Models\n",
+    "As `ClassyModel`s are also instances of `nn.Module`, they can be treated as any normal PyTorch model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from classy_vision.models import build_model\n",
+    "\n",
+    "\n",
+    "model = build_model({\"name\": \"resnet50\"})\n",
+    "input = torch.ones(10, 3, 224, 224)  # a batch of 10 images with 3 channels with dimensions of 224 x 224\n",
+    "output = model(input)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Getting and setting the state of a model\n",
+    "\n",
+    "Classy Vision provides the functions `get_classy_state()` and `set_classy_state()` to fetch and save the state of the models. These are considered drop-in replacements for the [`torch.nn.Module.state_dict`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module.state_dict) and [`torch.nn.Module.load_state_dict()`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module.load_state_dict) functions and work similarly. For more information, refer to the [docs](https://classyvision.ai/api/models.html#classy_vision.models.ClassyModel)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "state = model.get_classy_state()\n",
+    "\n",
+    "model.set_classy_state(state)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Heads: Introduction and using Classy Heads\n",
+    "\n",
+    "A lot of work in Computer Vision utilizes the concept of re-using a trunk model, like a ResNet 50, and using it for various tasks. This is accomplished by attaching different \"heads\" to the end of the trunk. \n",
+    "\n",
+    "Some use cases involve re-training a model trained with a certain head by removing the old head and attaching a new one. This is a special case of fine tuning. If you are interested in fine tuning your models, there's a [tutorial for that as well](https://classyvision.ai/tutorials/fine_tuning)! But first, let's understand the basics.\n",
+    "\n",
+    "Normally, attaching heads or changing them requires users to write code and update their model implementations. Classy Vision does all of this work for you - all of this happens under the hood, with no work required by users!\n",
+    "\n",
+    "All you need to do is decide which `ClassyHead` you want to attach to your model and where. We will use a simple fully connected head in our example, and attach it to the output of the `block3-2` module of our model. Note that a head can be attached to any module, as long as the name of the module is unique."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from classy_vision.heads import FullyConnectedHead\n",
+    "\n",
+    "\n",
+    "# a resnet 50 model's trunk outputs a tensor of 2048 dimension, which will be the\n",
+    "# in_plane of out head\n",
+    "#\n",
+    "# let's say we want a 100 dimensional output\n",
+    "#\n",
+    "# Tip: you can use build_head() as well to create a head instead of initializing the\n",
+    "# class directly\n",
+    "head = FullyConnectedHead(unique_id=\"default\", num_classes=100, in_plane=2048)\n",
+    "\n",
+    "# let's attach this head to our model\n",
+    "model.set_heads({\"block3-2\": [head]})\n",
+    "\n",
+    "output = model(input)\n",
+    "assert output.shape == (10, 100)\n",
+    "\n",
+    "# let's change the head one more time\n",
+    "head = FullyConnectedHead(unique_id=\"default\", num_classes=10, in_plane=2048)\n",
+    "\n",
+    "model.set_heads({\"block3-2\": [head]})\n",
+    "\n",
+    "output = model(input)\n",
+    "assert output.shape == (10, 10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Classy Vision supports attaching multiple heads to one or more blocks as well, but that is an advanced concept which this tutorial does not cover. For inquisitive users, here is an example -"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "head_1_1 = FullyConnectedHead(unique_id=\"1_1\", num_classes=10, in_plane=1024)\n",
+    "head_1_2 = FullyConnectedHead(unique_id=\"1_2\", num_classes=20, in_plane=1024)\n",
+    "head_2 = FullyConnectedHead(unique_id=\"2\", num_classes=100, in_plane=2048)\n",
+    "\n",
+    "# we can attach these heads to our model at different blocks\n",
+    "model.set_heads({\"block2-2\": [head_1_1, head_1_2], \"block3-2\": [head_2]})\n",
+    "\n",
+    "output = model(input)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Creating your own Classy Model\n",
+    "\n",
+    "Please refer to the [dedicated tutorial for this section](https://classyvision.ai/tutorials/classy_model)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Converting any PyTorch model to a Classy Model\n",
+    "\n",
+    "Any model can be converted to a Classy Model with a simple function call - `ClassyModel.from_model()`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from torchvision.models import resnet18\n",
+    "from classy_vision.models import ClassyModel\n",
+    "\n",
+    "\n",
+    "model = resnet18()\n",
+    "classy_model = ClassyModel.from_model(model)\n",
+    "output = classy_model(input)\n",
+    "assert output.shape == (10, 1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In fact, as soon as a model becomes a Classy Model, it gains all its abilities as well, including the ability to attach heads! Let us inspect the original model to see the modules it comprises."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "ResNet(\n",
+       "  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)\n",
+       "  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "  (relu): ReLU(inplace=True)\n",
+       "  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)\n",
+       "  (layer1): Sequential(\n",
+       "    (0): BasicBlock(\n",
+       "      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (relu): ReLU(inplace=True)\n",
+       "      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "    )\n",
+       "    (1): BasicBlock(\n",
+       "      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (relu): ReLU(inplace=True)\n",
+       "      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "    )\n",
+       "  )\n",
+       "  (layer2): Sequential(\n",
+       "    (0): BasicBlock(\n",
+       "      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
+       "      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (relu): ReLU(inplace=True)\n",
+       "      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (downsample): Sequential(\n",
+       "        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)\n",
+       "        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      )\n",
+       "    )\n",
+       "    (1): BasicBlock(\n",
+       "      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (relu): ReLU(inplace=True)\n",
+       "      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "    )\n",
+       "  )\n",
+       "  (layer3): Sequential(\n",
+       "    (0): BasicBlock(\n",
+       "      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
+       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (relu): ReLU(inplace=True)\n",
+       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (downsample): Sequential(\n",
+       "        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)\n",
+       "        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      )\n",
+       "    )\n",
+       "    (1): BasicBlock(\n",
+       "      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (relu): ReLU(inplace=True)\n",
+       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "    )\n",
+       "  )\n",
+       "  (layer4): Sequential(\n",
+       "    (0): BasicBlock(\n",
+       "      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
+       "      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (relu): ReLU(inplace=True)\n",
+       "      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (downsample): Sequential(\n",
+       "        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)\n",
+       "        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      )\n",
+       "    )\n",
+       "    (1): BasicBlock(\n",
+       "      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (relu): ReLU(inplace=True)\n",
+       "      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "    )\n",
+       "  )\n",
+       "  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))\n",
+       "  (fc): Linear(in_features=512, out_features=1000, bias=True)\n",
+       ")"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It seems that the final trunk layer of this model is called `layer4`. Let's try to attach heads here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# the output of layer4 is 512 dimensional\n",
+    "head = FullyConnectedHead(unique_id=\"default\", num_classes=10, in_plane=512)\n",
+    "\n",
+    "classy_model.set_heads({\"layer4\": [head]})\n",
+    "\n",
+    "output = classy_model(input)\n",
+    "assert output.shape == (10, 10)  # it works!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You might be wondering how to figure out the `in_plane` for any module. A simple trick is to try attaching any head and noticing the `Exception` if there is a size mismatch!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "size mismatch, m1: [10 x 512], m2: [1234 x 10] at ../aten/src/TH/generic/THTensorMath.cpp:136\n"
+     ]
+    }
+   ],
+   "source": [
+    "try:\n",
+    "    head = FullyConnectedHead(unique_id=\"default\", num_classes=10, in_plane=1234)\n",
+    "\n",
+    "    classy_model.set_heads({\"layer4\": [head]})\n",
+    "\n",
+    "    output = classy_model(input)\n",
+    "\n",
+    "except Exception as e:\n",
+    "    print(e)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The error tells us that the size should be 512."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Conclusion\n",
+    "\n",
+    "In this tutorial, we covered how to use Classy Models, how to get and set their state, and how to create our own models (through another tutorial). We also got familiarized with the concept of heads and how they work with Classy Vision. Lastly, we learned how we can easily convert any PyTorch models to Classy Models and unlock all the features they provide.\n",
+    "\n",
+    "For more information, refer to our [API docs](https://classyvision.ai/api/)."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/website/tutorials.json b/website/tutorials.json
@@ -6,6 +6,9 @@
      },{
       "id": "ray_aws",
       "title": "Distributed training on AWS"
+     },{
+      "id": "classy_model_overview",
+      "title": "Classy model overview"
      },{
       "id": "classy_dataset",
       "title": "Creating a custom dataset"