5
5
"colab": {
6
6
"provenance": [],
7
7
"gpuType": "T4",
8
- "authorship_tag": "ABX9TyMNv58YxAIZax3EoF72uIj/",
9
8
"include_colab_link": true
10
9
},
11
10
"kernelspec": {
25
24
"colab_type": "text"
26
25
},
27
26
"source": [
28
- "<a href=\"https://colab.research.google.com/github/EduardoPach /Transformers-Tutorials/blob/grounded-sam-example /Grounding%20DINO/grounded_sam .ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
27
+ "<a href=\"https://colab.research.google.com/github/NielsRogge /Transformers-Tutorials/blob/master /Grounding%20DINO/GroundingDINO_with_Segment_Anything .ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
29
28
]
30
29
},
31
30
{
32
- "cell_type": "code",
33
- "execution_count": null,
34
- "metadata": {
35
- "colab": {
36
- "base_uri": "https://localhost:8080/"
37
- },
38
- "id": "kiC9KlHqYm5-",
39
- "outputId": "f892682a-a2f1-4dda-d1fb-d213b38162b9"
40
- },
41
- "outputs": [
42
- {
43
- "output_type": "stream",
44
- "name": "stdout",
45
- "text": [
46
- "Found existing installation: transformers 4.38.2\n",
47
- "Uninstalling transformers-4.38.2:\n",
48
- " Would remove:\n",
49
- " /usr/local/bin/transformers-cli\n",
50
- " /usr/local/lib/python3.10/dist-packages/transformers-4.38.2.dist-info/*\n",
51
- " /usr/local/lib/python3.10/dist-packages/transformers/*\n",
52
- "Proceed (Y/n)? y\n",
53
- " Successfully uninstalled transformers-4.38.2\n"
54
- ]
55
- }
56
- ],
31
+ "cell_type": "markdown",
57
32
"source": [
58
- "!pip uninstall transformers"
59
- ]
33
+ "# Combining Grounding DINO with Segment Anything (SAM) for text-based mask generation\n",
34
+ "\n",
35
+ "In this notebook, we're going to combine 2 very cool models - [Grounding DINO](https://huggingface.co/docs/transformers/main/en/model_doc/grounding-dino) and [SAM](https://huggingface.co/docs/transformers/en/model_doc/sam). We'll use Grounding DINO to generate bounding boxes based on text prompts, after which we can prompt SAM to generate corresponding segmentation masks for them.\n",
36
+ "\n",
37
+ "This is based on the popular [Grounded Segment Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything) project - just with fewer lines of code as the models are now available in the Transformers library. Refer to the [paper](https://arxiv.org/abs/2401.14159) for details.\n",
38
+ "\n",
39
+ "<img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/grounded_sam.png\"\n",
40
+ "alt=\"drawing\" width=\"900\"/>\n",
41
+ "\n",
42
+ "<small> Grounded SAM overview. Taken from the <a href=\"https://github.com/IDEA-Research/Grounded-Segment-Anything\">original repository</a>. </small>\n",
43
+ "\n",
44
+ "Author of this notebook: [Eduardo Pacheco](https://huggingface.co/EduardoPacheco) - give him a follow on Hugging\n",
45
+ " Face!\n",
46
+ "\n",
47
+ "## Set-up environment\n",
48
+ "\n",
49
+ "Let's start by installing 🤗 Transformers from source since Grounding DINO is brand new at the time of writing."
50
+ ],
51
+ "metadata": {
52
+ "id": "Wgj3tUoobAlj"
53
+ }
60
54
},
61
55
{
62
56
"cell_type": "code",
63
57
"source": [
64
- "!pip install -qq git+https://github.com/huggingface/transformers.git@main "
58
+ "!pip install --upgrade -q git+https://github.com/huggingface/transformers"
65
59
],
66
60
"metadata": {
67
61
"colab": {
87
81
{
88
82
"cell_type": "markdown",
89
83
"source": [
90
- "## Imports"
84
+ "## Imports\n",
85
+ "\n",
86
+ "Let's start by importing the required libraries."
91
87
],
92
88
"metadata": {
93
89
"id": "7r9dNrDHy2tA"
119
115
{
120
116
"cell_type": "markdown",
121
117
"source": [
122
- "## Result Utils"
118
+ "## Result Utils\n",
119
+ "\n",
120
+ "We'll store the detection results of Grounding DINO in a dedicated Python dataclass."
123
121
],
124
122
"metadata": {
125
123
"id": "A1NxJzCNrnjH"
164
162
{
165
163
"cell_type": "markdown",
166
164
"source": [
167
- "## Plot Utils"
165
+ "## Plot Utils\n",
166
+ "\n",
167
+ "Below, some utility functions are defined as we'll draw the detection results of Grounding DINO on top of the image."
168
168
],
169
169
"metadata": {
170
170
"id": "uCzSUQL5lAvE"
211
211
" plt.axis('off')\n",
212
212
" if save_name:\n",
213
213
" plt.savefig(save_name, bbox_inches='tight')\n",
214
- " plt.show()\n "
214
+ " plt.show()"
215
215
],
216
216
"metadata": {
217
217
"id": "Zah3Esewo4P6"
438
438
{
439
439
"cell_type": "markdown",
440
440
"source": [
441
- "## Grounded-SAM"
441
+ "## Grounded Segment Anything (SAM)\n",
442
+ "\n",
443
+ "Now it's time to define the Grounded SAM approach!\n",
444
+ "\n",
445
+ "The approach is very simple:\n",
446
+ "1. use Grounding DINO to detect a given set of texts in the image. The output is a set of bounding boxes.\n",
447
+ "2. prompt Segment Anything (SAM) with the bounding boxes, for which the model will output segmentation masks."
442
448
],
443
449
"metadata": {
444
450
"id": "fErkFJkmlEMl"
453
459
" threshold: float = 0.3,\n",
454
460
" detector_id: Optional[str] = None\n",
455
461
") -> List[Dict[str, Any]]:\n",
462
+ " \"\"\"\n",
463
+ " Use Grounding DINO to detect a set of labels in an image in a zero-shot fashion.\n",
464
+ " \"\"\"\n",
456
465
" device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
457
466
" detector_id = detector_id if detector_id is not None else \"IDEA-Research/grounding-dino-tiny\"\n",
458
467
" object_detector = pipeline(model=detector_id, task=\"zero-shot-object-detection\", device=device)\n",
470
479
" polygon_refinement: bool = False,\n",
471
480
" segmenter_id: Optional[str] = None\n",
472
481
") -> List[DetectionResult]:\n",
482
+ " \"\"\"\n",
483
+ " Use Segment Anything (SAM) to generate masks given an image + a set of bounding boxes.\n",
484
+ " \"\"\"\n",
473
485
" device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
474
486
" segmenter_id = segmenter_id if segmenter_id is not None else \"facebook/sam-vit-base\"\n",
475
487
"\n",
518
530
{
519
531
"cell_type": "markdown",
520
532
"source": [
521
- "### Inference"
533
+ "### Inference\n",
534
+ "\n",
535
+ "Let's showcase Grounded SAM on our favorite image: the cats image from the COCO dataset."
522
536
],
523
537
"metadata": {
524
538
"id": "Yo8cGKdxXWPR"
558
572
"execution_count": null,
559
573
"outputs": []
560
574
},
575
+ {
576
+ "cell_type": "markdown",
577
+ "source": [
578
+ "Let's visualize the results:"
579
+ ],
580
+ "metadata": {
581
+ "id": "-sJrT5xMf_Ad"
582
+ }
583
+ },
561
584
{
562
585
"cell_type": "code",
563
586
"source": [
642
665
]
643
666
}
644
667
]
645
- }
668
+ }
0 commit comments