Gemaakt met Colab

NielsRogge · NielsRogge · commit 0fff1d0d4962 · 2024-04-13T10:08:38.000+02:00
diff --git a/Grounding DINO/GroundingDINO_with_Segment_Anything.ipynb b/Grounding DINO/GroundingDINO_with_Segment_Anything.ipynb
@@ -5,7 +5,6 @@
     "colab": {
       "provenance": [],
       "gpuType": "T4",
-      "authorship_tag": "ABX9TyMNv58YxAIZax3EoF72uIj/",
       "include_colab_link": true
     },
     "kernelspec": {
@@ -25,43 +24,38 @@
         "colab_type": "text"
       },
       "source": [
-        "<a href=\"https://colab.research.google.com/github/EduardoPach/Transformers-Tutorials/blob/grounded-sam-example/Grounding%20DINO/grounded_sam.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+        "<a href=\"https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Grounding%20DINO/GroundingDINO_with_Segment_Anything.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
       ]
     },
     {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "kiC9KlHqYm5-",
-        "outputId": "f892682a-a2f1-4dda-d1fb-d213b38162b9"
-      },
-      "outputs": [
-        {
-          "output_type": "stream",
-          "name": "stdout",
-          "text": [
-            "Found existing installation: transformers 4.38.2\n",
-            "Uninstalling transformers-4.38.2:\n",
-            "  Would remove:\n",
-            "    /usr/local/bin/transformers-cli\n",
-            "    /usr/local/lib/python3.10/dist-packages/transformers-4.38.2.dist-info/*\n",
-            "    /usr/local/lib/python3.10/dist-packages/transformers/*\n",
-            "Proceed (Y/n)? y\n",
-            "  Successfully uninstalled transformers-4.38.2\n"
-          ]
-        }
-      ],
+      "cell_type": "markdown",
       "source": [
-        "!pip uninstall transformers"
-      ]
+        "# Combining Grounding DINO with Segment Anything (SAM) for text-based mask generation\n",
+        "\n",
+        "In this notebook, we're going to combine 2 very cool models - [Grounding DINO](https://huggingface.co/docs/transformers/main/en/model_doc/grounding-dino) and [SAM](https://huggingface.co/docs/transformers/en/model_doc/sam). We'll use Grounding DINO to generate bounding boxes based on text prompts, after which we can prompt SAM to generate corresponding segmentation masks for them.\n",
+        "\n",
+        "This is based on the popular [Grounded Segment Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything) project - just with fewer lines of code as the models are now available in the Transformers library. Refer to the [paper](https://arxiv.org/abs/2401.14159) for details.\n",
+        "\n",
+        "<img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/grounded_sam.png\"\n",
+        "alt=\"drawing\" width=\"900\"/>\n",
+        "\n",
+        "<small> Grounded SAM overview. Taken from the <a href=\"https://github.com/IDEA-Research/Grounded-Segment-Anything\">original repository</a>. </small>\n",
+        "\n",
+        "Author of this notebook: [Eduardo Pacheco](https://huggingface.co/EduardoPacheco) - give him a follow on Hugging\n",
+        " Face!\n",
+        "\n",
+        "## Set-up environment\n",
+        "\n",
+        "Let's start by installing 🤗 Transformers from source since Grounding DINO is brand new at the time of writing."
+      ],
+      "metadata": {
+        "id": "Wgj3tUoobAlj"
+      }
     },
     {
       "cell_type": "code",
       "source": [
-        "!pip install -qq git+https://github.com/huggingface/transformers.git@main"
+        "!pip install --upgrade -q git+https://github.com/huggingface/transformers"
       ],
       "metadata": {
         "colab": {
@@ -87,7 +81,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Imports"
+        "## Imports\n",
+        "\n",
+        "Let's start by importing the required libraries."
       ],
       "metadata": {
         "id": "7r9dNrDHy2tA"
@@ -119,7 +115,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Result Utils"
+        "## Result Utils\n",
+        "\n",
+        "We'll store the detection results of Grounding DINO in a dedicated Python dataclass."
       ],
       "metadata": {
         "id": "A1NxJzCNrnjH"
@@ -164,7 +162,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Plot Utils"
+        "## Plot Utils\n",
+        "\n",
+        "Below, some utility functions are defined as we'll draw the detection results of Grounding DINO on top of the image."
       ],
       "metadata": {
         "id": "uCzSUQL5lAvE"
@@ -211,7 +211,7 @@
         "    plt.axis('off')\n",
         "    if save_name:\n",
         "        plt.savefig(save_name, bbox_inches='tight')\n",
-        "    plt.show()\n"
+        "    plt.show()"
       ],
       "metadata": {
         "id": "Zah3Esewo4P6"
@@ -438,7 +438,13 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Grounded-SAM"
+        "## Grounded Segment Anything (SAM)\n",
+        "\n",
+        "Now it's time to define the Grounded SAM approach!\n",
+        "\n",
+        "The approach is very simple:\n",
+        "1. use Grounding DINO to detect a given set of texts in the image. The output is a set of bounding boxes.\n",
+        "2. prompt Segment Anything (SAM) with the bounding boxes, for which the model will output segmentation masks."
       ],
       "metadata": {
         "id": "fErkFJkmlEMl"
@@ -453,6 +459,9 @@
         "    threshold: float = 0.3,\n",
         "    detector_id: Optional[str] = None\n",
         ") -> List[Dict[str, Any]]:\n",
+        "    \"\"\"\n",
+        "    Use Grounding DINO to detect a set of labels in an image in a zero-shot fashion.\n",
+        "    \"\"\"\n",
         "    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
         "    detector_id = detector_id if detector_id is not None else \"IDEA-Research/grounding-dino-tiny\"\n",
         "    object_detector = pipeline(model=detector_id, task=\"zero-shot-object-detection\", device=device)\n",
@@ -470,6 +479,9 @@
         "    polygon_refinement: bool = False,\n",
         "    segmenter_id: Optional[str] = None\n",
         ") -> List[DetectionResult]:\n",
+        "    \"\"\"\n",
+        "    Use Segment Anything (SAM) to generate masks given an image + a set of bounding boxes.\n",
+        "    \"\"\"\n",
         "    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
         "    segmenter_id = segmenter_id if segmenter_id is not None else \"facebook/sam-vit-base\"\n",
         "\n",
@@ -518,7 +530,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "### Inference"
+        "### Inference\n",
+        "\n",
+        "Let's showcase Grounded SAM on our favorite image: the cats image from the COCO dataset."
       ],
       "metadata": {
         "id": "Yo8cGKdxXWPR"
@@ -558,6 +572,15 @@
       "execution_count": null,
       "outputs": []
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Let's visualize the results:"
+      ],
+      "metadata": {
+        "id": "-sJrT5xMf_Ad"
+      }
+    },
     {
       "cell_type": "code",
       "source": [
@@ -642,4 +665,4 @@
       ]
     }
   ]
-}
+}