Skip to content

Commit 0fff1d0

Browse files
committed
Gemaakt met Colab
1 parent 332522c commit 0fff1d0

File tree

1 file changed

+60
-37
lines changed

1 file changed

+60
-37
lines changed

Grounding DINO/GroundingDINO_with_Segment_Anything.ipynb

+60-37
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55
"colab": {
66
"provenance": [],
77
"gpuType": "T4",
8-
"authorship_tag": "ABX9TyMNv58YxAIZax3EoF72uIj/",
98
"include_colab_link": true
109
},
1110
"kernelspec": {
@@ -25,43 +24,38 @@
2524
"colab_type": "text"
2625
},
2726
"source": [
28-
"<a href=\"https://colab.research.google.com/github/EduardoPach/Transformers-Tutorials/blob/grounded-sam-example/Grounding%20DINO/grounded_sam.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
27+
"<a href=\"https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Grounding%20DINO/GroundingDINO_with_Segment_Anything.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
2928
]
3029
},
3130
{
32-
"cell_type": "code",
33-
"execution_count": null,
34-
"metadata": {
35-
"colab": {
36-
"base_uri": "https://localhost:8080/"
37-
},
38-
"id": "kiC9KlHqYm5-",
39-
"outputId": "f892682a-a2f1-4dda-d1fb-d213b38162b9"
40-
},
41-
"outputs": [
42-
{
43-
"output_type": "stream",
44-
"name": "stdout",
45-
"text": [
46-
"Found existing installation: transformers 4.38.2\n",
47-
"Uninstalling transformers-4.38.2:\n",
48-
" Would remove:\n",
49-
" /usr/local/bin/transformers-cli\n",
50-
" /usr/local/lib/python3.10/dist-packages/transformers-4.38.2.dist-info/*\n",
51-
" /usr/local/lib/python3.10/dist-packages/transformers/*\n",
52-
"Proceed (Y/n)? y\n",
53-
" Successfully uninstalled transformers-4.38.2\n"
54-
]
55-
}
56-
],
31+
"cell_type": "markdown",
5732
"source": [
58-
"!pip uninstall transformers"
59-
]
33+
"# Combining Grounding DINO with Segment Anything (SAM) for text-based mask generation\n",
34+
"\n",
35+
"In this notebook, we're going to combine 2 very cool models - [Grounding DINO](https://huggingface.co/docs/transformers/main/en/model_doc/grounding-dino) and [SAM](https://huggingface.co/docs/transformers/en/model_doc/sam). We'll use Grounding DINO to generate bounding boxes based on text prompts, after which we can prompt SAM to generate corresponding segmentation masks for them.\n",
36+
"\n",
37+
"This is based on the popular [Grounded Segment Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything) project - just with fewer lines of code as the models are now available in the Transformers library. Refer to the [paper](https://arxiv.org/abs/2401.14159) for details.\n",
38+
"\n",
39+
"<img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/grounded_sam.png\"\n",
40+
"alt=\"drawing\" width=\"900\"/>\n",
41+
"\n",
42+
"<small> Grounded SAM overview. Taken from the <a href=\"https://github.com/IDEA-Research/Grounded-Segment-Anything\">original repository</a>. </small>\n",
43+
"\n",
44+
"Author of this notebook: [Eduardo Pacheco](https://huggingface.co/EduardoPacheco) - give him a follow on Hugging\n",
45+
" Face!\n",
46+
"\n",
47+
"## Set-up environment\n",
48+
"\n",
49+
"Let's start by installing 🤗 Transformers from source since Grounding DINO is brand new at the time of writing."
50+
],
51+
"metadata": {
52+
"id": "Wgj3tUoobAlj"
53+
}
6054
},
6155
{
6256
"cell_type": "code",
6357
"source": [
64-
"!pip install -qq git+https://github.com/huggingface/transformers.git@main"
58+
"!pip install --upgrade -q git+https://github.com/huggingface/transformers"
6559
],
6660
"metadata": {
6761
"colab": {
@@ -87,7 +81,9 @@
8781
{
8882
"cell_type": "markdown",
8983
"source": [
90-
"## Imports"
84+
"## Imports\n",
85+
"\n",
86+
"Let's start by importing the required libraries."
9187
],
9288
"metadata": {
9389
"id": "7r9dNrDHy2tA"
@@ -119,7 +115,9 @@
119115
{
120116
"cell_type": "markdown",
121117
"source": [
122-
"## Result Utils"
118+
"## Result Utils\n",
119+
"\n",
120+
"We'll store the detection results of Grounding DINO in a dedicated Python dataclass."
123121
],
124122
"metadata": {
125123
"id": "A1NxJzCNrnjH"
@@ -164,7 +162,9 @@
164162
{
165163
"cell_type": "markdown",
166164
"source": [
167-
"## Plot Utils"
165+
"## Plot Utils\n",
166+
"\n",
167+
"Below, some utility functions are defined as we'll draw the detection results of Grounding DINO on top of the image."
168168
],
169169
"metadata": {
170170
"id": "uCzSUQL5lAvE"
@@ -211,7 +211,7 @@
211211
" plt.axis('off')\n",
212212
" if save_name:\n",
213213
" plt.savefig(save_name, bbox_inches='tight')\n",
214-
" plt.show()\n"
214+
" plt.show()"
215215
],
216216
"metadata": {
217217
"id": "Zah3Esewo4P6"
@@ -438,7 +438,13 @@
438438
{
439439
"cell_type": "markdown",
440440
"source": [
441-
"## Grounded-SAM"
441+
"## Grounded Segment Anything (SAM)\n",
442+
"\n",
443+
"Now it's time to define the Grounded SAM approach!\n",
444+
"\n",
445+
"The approach is very simple:\n",
446+
"1. use Grounding DINO to detect a given set of texts in the image. The output is a set of bounding boxes.\n",
447+
"2. prompt Segment Anything (SAM) with the bounding boxes, for which the model will output segmentation masks."
442448
],
443449
"metadata": {
444450
"id": "fErkFJkmlEMl"
@@ -453,6 +459,9 @@
453459
" threshold: float = 0.3,\n",
454460
" detector_id: Optional[str] = None\n",
455461
") -> List[Dict[str, Any]]:\n",
462+
" \"\"\"\n",
463+
" Use Grounding DINO to detect a set of labels in an image in a zero-shot fashion.\n",
464+
" \"\"\"\n",
456465
" device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
457466
" detector_id = detector_id if detector_id is not None else \"IDEA-Research/grounding-dino-tiny\"\n",
458467
" object_detector = pipeline(model=detector_id, task=\"zero-shot-object-detection\", device=device)\n",
@@ -470,6 +479,9 @@
470479
" polygon_refinement: bool = False,\n",
471480
" segmenter_id: Optional[str] = None\n",
472481
") -> List[DetectionResult]:\n",
482+
" \"\"\"\n",
483+
" Use Segment Anything (SAM) to generate masks given an image + a set of bounding boxes.\n",
484+
" \"\"\"\n",
473485
" device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
474486
" segmenter_id = segmenter_id if segmenter_id is not None else \"facebook/sam-vit-base\"\n",
475487
"\n",
@@ -518,7 +530,9 @@
518530
{
519531
"cell_type": "markdown",
520532
"source": [
521-
"### Inference"
533+
"### Inference\n",
534+
"\n",
535+
"Let's showcase Grounded SAM on our favorite image: the cats image from the COCO dataset."
522536
],
523537
"metadata": {
524538
"id": "Yo8cGKdxXWPR"
@@ -558,6 +572,15 @@
558572
"execution_count": null,
559573
"outputs": []
560574
},
575+
{
576+
"cell_type": "markdown",
577+
"source": [
578+
"Let's visualize the results:"
579+
],
580+
"metadata": {
581+
"id": "-sJrT5xMf_Ad"
582+
}
583+
},
561584
{
562585
"cell_type": "code",
563586
"source": [
@@ -642,4 +665,4 @@
642665
]
643666
}
644667
]
645-
}
668+
}

0 commit comments

Comments
 (0)