Skip to content

Commit 723d6e4

Browse files
committed
add prompt template and example ipython notebooks
1 parent 69337d3 commit 723d6e4

10 files changed

+360
-6
lines changed

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
# Please refere to ScanNet official website for the download link
2+
QSpatial_scannet/download-scannet.py
3+
QSpatial_scannet/images/
4+
QSpatial_scannet/scannet_dataset/
5+
16
# Byte-compiled / optimized / DLL files
27
__pycache__/
38
*.py[cod]

QSpatial_scannet/download_and_render_scannet_images.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ def export_color_images(self, output_dir, image_size=None, specified_frames=[]):
7878

7979
scannet_id_and_frame = {
8080
'scene0015_00': ['0'],
81+
8182
'scene0019_00': ['400'],
8283
'scene0025_00': ['500'],
8384
'scene0025_02': ['400'],

README.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
Q-Spatial Bench is a benchmark designed to measure the **quantitative spatial reasoning** 📏 in large vision-language models.
55

6-
🔥The paper associated with Q-Spatial Bench is accepted by EMNLP 2024 main track!
6+
🔥 The paper associated with Q-Spatial Bench is accepted by EMNLP 2024 main track!
77

88
- Our paper: *Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models* [[arXiv link](https://arxiv.org/abs/2409.09788)]
99
- Project website: [[link]()]
@@ -45,15 +45,14 @@ cd <REPO_ROOT>/QSpatial_scannet
4545
python download_and_render_scannet_images.py
4646
```
4747

48-
## Example Prompt Templates
4948

49+
## Iterate over the Dataset
5050

51-
## Evaluation
52-
53-
In our paper, we measure the performance in success rate by thresholding the maximum ratio between an estimation and a ground truth value. We provide a simple ipython notebook `evaluation_helper.ipynb` to compute the success rate.
54-
51+
We provide an example ipython notebook under `examples/iterate_over_dataset.ipynb`
5552

53+
## Evaluation
5654

55+
We provide an example ipython notebook under `examples/evaluate_success_rate.ipynb`
5756

5857

5958
# Citation

examples/evaluate_success_rate.ipynb

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": 1,
6+
"metadata": {},
7+
"outputs": [
8+
{
9+
"name": "stdout",
10+
"output_type": "stream",
11+
"text": [
12+
"Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.\n",
13+
"Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.\n"
14+
]
15+
},
16+
{
17+
"name": "stderr",
18+
"output_type": "stream",
19+
"text": [
20+
"/Users/andrew/miniconda3/envs/python310/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
21+
" from .autonotebook import tqdm as notebook_tqdm\n"
22+
]
23+
}
24+
],
25+
"source": [
26+
"import re\n",
27+
"import json\n",
28+
"import numpy as np\n",
29+
"\n",
30+
"from pathlib import Path\n",
31+
"from datasets import load_dataset"
32+
]
33+
},
34+
{
35+
"cell_type": "code",
36+
"execution_count": 2,
37+
"metadata": {},
38+
"outputs": [],
39+
"source": [
40+
"\"\"\"\n",
41+
"Simple evaluator that handles\n",
42+
"1. Load benchmark data\n",
43+
"2. Parse prediction string\n",
44+
"3. Evaluate success based on `delta` parameter\n",
45+
"\"\"\"\n",
46+
"\n",
47+
"class QSpatialEvaluator:\n",
48+
" delta = 2\n",
49+
" \n",
50+
" def __init__(self, benchmark_split):\n",
51+
" assert benchmark_split in [\"QSpatial_plus\", \"QSpatial_scannet\"]\n",
52+
" self.dataset = load_dataset(\"andrewliao11/Q-Spatial-Bench\", split=benchmark_split)\n",
53+
" \n",
54+
" def evaluate(self, data_ind, vlm_response):\n",
55+
" \n",
56+
" #### Parse ground truth\n",
57+
" value = self.dataset[\"answer_value\"][data_ind]\n",
58+
" unit = self.dataset[\"answer_unit\"][data_ind]\n",
59+
" ground_truth_value_in_cms = value * self._get_multiplier(unit)\n",
60+
"\n",
61+
" #### Parse prediction\n",
62+
" # Value\n",
63+
" pattern = r'scalar{([^}]*)}'\n",
64+
" str_inside_scalar_boxes = re.findall(pattern, vlm_response)[-1]\n",
65+
" scalar_list = re.findall(r'\\d+\\.?\\d*', str_inside_scalar_boxes)\n",
66+
" parsed_scalar = np.array(scalar_list).astype(float).mean()\n",
67+
"\n",
68+
" # Unit\n",
69+
" pattern = r'distance_unit{([^}]*)}'\n",
70+
" str_inside_unit_boxes = re.findall(pattern, vlm_response)\n",
71+
" parsed_unit = str_inside_unit_boxes[-1]\n",
72+
"\n",
73+
" pred_value_in_cms = parsed_scalar * self._get_multiplier(parsed_unit)\n",
74+
" success = max(pred_value_in_cms / ground_truth_value_in_cms, ground_truth_value_in_cms / pred_value_in_cms) < self.delta\n",
75+
"\n",
76+
" return dict(\n",
77+
" ground_truth_value_in_cms = ground_truth_value_in_cms,\n",
78+
" pred_value_in_cms = pred_value_in_cms,\n",
79+
" success = success\n",
80+
" )\n",
81+
" \n",
82+
" def _get_multiplier(self, unit):\n",
83+
" \n",
84+
" unit = unit.lower()\n",
85+
" if unit in [\"meters\", \"meter\", \"m\", \"metre\", \"metres\"]:\n",
86+
" multiplier = 100\n",
87+
" elif unit in [\"centimeters\", \"centimeter\", \"cm\"]:\n",
88+
" multiplier = 1\n",
89+
" elif unit in [\"feet\", \"foot\", \"ft\"]:\n",
90+
" multiplier = 30.48\n",
91+
" elif unit in [\"inch\", \"inches\", \"in\"]:\n",
92+
" multiplier = 2.54\n",
93+
" elif unit in [\"mm\"]:\n",
94+
" multiplier = 0.1\n",
95+
" else: \n",
96+
" #raise ValueError(f\"Unknown unit: {unit}\")\n",
97+
" print(f\"Unknown unit: {unit}\")\n",
98+
" multiplier = 1\n",
99+
" \n",
100+
" return multiplier"
101+
]
102+
},
103+
{
104+
"cell_type": "code",
105+
"execution_count": 3,
106+
"metadata": {},
107+
"outputs": [],
108+
"source": [
109+
"evaluator = QSpatialEvaluator(benchmark_split=\"QSpatial_plus\")"
110+
]
111+
},
112+
{
113+
"cell_type": "code",
114+
"execution_count": 4,
115+
"metadata": {},
116+
"outputs": [
117+
{
118+
"name": "stdout",
119+
"output_type": "stream",
120+
"text": [
121+
"To determine the minimum distance between the two speckled pattern stool chairs in the image, let's follow these steps:\\n\\n1. **Identify the Stools**: Locate the two speckled pattern stools in the image. They are positioned in front of the couches.\\n\\n2. **Reference Points**: Choose reference points on each stool to measure the distance. The closest points on the stools would be the edges facing each other.\\n\\n3. **Estimate the Distance**: Visually estimate the distance between these two closest points. Given the perspective and the relative size of the stools, we can approximate the distance.\\n\\nConsidering the size of the stools and the space between them, the minimum distance between the two speckled pattern stool chairs is approximately:\\n\\n\\\\scalar{1} \\\\distance_unit{meter}\n",
122+
"\n",
123+
"Evaluation: {'ground_truth_value_in_cms': 96.0, 'pred_value_in_cms': 100.0, 'success': True}\n"
124+
]
125+
}
126+
],
127+
"source": [
128+
"# Example VLM responses from GPT-4o\n",
129+
"vlm_response = \"To determine the minimum distance between the two speckled pattern stool chairs in the image, let's follow these steps:\\\\n\\\\n1. **Identify the Stools**: Locate the two speckled pattern stools in the image. They are positioned in front of the couches.\\\\n\\\\n2. **Reference Points**: Choose reference points on each stool to measure the distance. The closest points on the stools would be the edges facing each other.\\\\n\\\\n3. **Estimate the Distance**: Visually estimate the distance between these two closest points. Given the perspective and the relative size of the stools, we can approximate the distance.\\\\n\\\\nConsidering the size of the stools and the space between them, the minimum distance between the two speckled pattern stool chairs is approximately:\\\\n\\\\n\\\\\\\\scalar{1} \\\\\\\\distance_unit{meter}\\n\"\n",
130+
"\n",
131+
"print(vlm_response)\n",
132+
"print(\"Evaluation:\", evaluator.evaluate(data_ind=41, vlm_response=vlm_response))"
133+
]
134+
},
135+
{
136+
"cell_type": "code",
137+
"execution_count": null,
138+
"metadata": {},
139+
"outputs": [],
140+
"source": []
141+
}
142+
],
143+
"metadata": {
144+
"kernelspec": {
145+
"display_name": "multi_rounds_vlm",
146+
"language": "python",
147+
"name": "python3"
148+
},
149+
"language_info": {
150+
"codemirror_mode": {
151+
"name": "ipython",
152+
"version": 3
153+
},
154+
"file_extension": ".py",
155+
"mimetype": "text/x-python",
156+
"name": "python",
157+
"nbconvert_exporter": "python",
158+
"pygments_lexer": "ipython3",
159+
"version": "3.10.14"
160+
}
161+
},
162+
"nbformat": 4,
163+
"nbformat_minor": 2
164+
}

examples/iterate_over_dataset.ipynb

Lines changed: 156 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Question: {{ question }}
2+
Let's thinkg step by step and start by finding good reference objects or object parts in the image.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
Question: {{ question }}
2+
3+
---
4+
5+
Use the following 4 steps sequentially to answer the question:
6+
7+
Step 1 **Analyze the question**
8+
9+
Step 2 **Identify up to 10 reference scales in the image, ranging from large to small sizes, and list them in the specified format**
10+
- A reference scale must be typical in size.
11+
- A reference scale can be the dimensions of an object or an object part.
12+
- A reference scale must NOT be floor tiles or floor planks.
13+
- Formulate the reference scales using the format: """The [choose from front-to-back, side-to-side, left-to-right, diameter, height (top to bottom edge), or mounting height (bottom edge to floor)] of [object or object part] is approximately [dimension estimate]."""
14+
15+
Step 3 **Propose a robust step-by-step plan to answer the question by using the reference scales in Step 2**
16+
- A robust step-by-step plan performs the estimation in a coarse-to-fine manner.
17+
- First, use a reliable and large-sized reference scale as the primary reference for estimation.
18+
- Then, gradually use a reliable and smaller-sized reference scale for adjustment.
19+
- Repeat until the estimation is precise enough.
20+
- When performing visual comparison, be aware of perspective distortion.
21+
- Do NOT rely on pixel measurements from the images.
22+
23+
Step 4 **Focus on the image and follow the plan in Step 3 to answer the question**

prompt_templates/standard_prompt.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Question: {{ question }}

prompt_templates/system_prompt.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
You will be provided with a question and a 2D image. The question involves measuring the precise distance in 3D space through a 2D image. You will answer the question by providing a numeric answer consisting of a scalar and a distance unit in the format of """\scalar{scalar} \distance_unit{distance unit}""" at the end of your response.

prompt_templates/zero_shot_prompt.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Question: {{ question }}
2+
Let's think step by step.

0 commit comments

Comments
 (0)