added all files

llmbev · Oct 1, 2023 · f97551c · f97551c
1 parent 71ae209
commit f97551c
Show file tree

Hide file tree

Showing 59 changed files with 19,317 additions and 548 deletions.
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 # Talk2BEV
 
 [**Project Page**](https://llmbev.github.io/talk2bev/) |
-[**Paper**](https://llmbev.github.io/talk2bev/docs/assets/pdf/talk2bev.pdf) |
+[**Paper**](https://llmbev.github.io/talk2bev/assets/pdf/talk2bev.pdf) |
 [**ArXiv**]() |
 [**Video**](https://www.youtube.com/watch?v=TMht-8SGJ0I)
 
@@ -17,75 +17,63 @@
 
 ![Splash Figure](./docs/static/images/talk2bev_teaser-1.png)
 
-Code for:
+## Abstract
 
-1. Click2Chat interface
+We introduce Talk2BEV, a large vision- language model (LVLM) interface for bird’s-eye view (BEV) maps commonly used in autonomous driving.
 
-2. JSON generation
+While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV eliminates the need for BEV- specific training, relying instead on performant pre-trained LVLMs. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision- making based on visual cues.
 
-## Installation
-Please run the following commands
-### Setup Talk2BEV
+We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret freefrom natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encom- passing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.
 
-```
-git clone https://github.com/llm-bev/talk2bev
-```
+## Data Preparation
 
-### Setup LLava
+Please download the NuScenes v1.0-trainval dataset. Our dataset consists of 2 parts - Talk2BEV-Base and Talk2BEV-Captions, consisting of base folders and captions respectively. 
 
-```
-git clone https://github.com/haotian-liu/LLaVA parent-folder
-mv parent-folder/llava ./
-rm -rf parent-folder
-```
+### Download links
 
-### Setup LLava
-First, you need to clone the repo - 
+Links to the Talk2BEV dataset (_Talk2BEV-Base_ and _Talk2BEV-Captions_) are provided below. The dataset is hosted on Google Drive. Please download the dataset and extract the files to the `data` folder.
 
-```
-git clone https://github.com/haotian-liu/LLaVA parent-folder
-mv parent-folder/llava ./
-rm -rf parent-folder
-```
-Please download the preprocessed weights for [vicuna-13b](https://huggingface.co/liuhaotian/llava-v1-0719-336px-lora-vicuna-13b-v1.3)
+| Data Parts | Link |
+| --- | --- |
+| Talk2BEV-Base | [link]() |
+| Talk2BEV-Captions | [link]() |
 
-### Setup MiniGPT-4 (optional)
-```
-git clone https://github.com/Vision-CAIR/MiniGPT-4 parent-folder
-mv parent-folder/minigpt4 ./
-rm -rf parent-folder
-```
-Please download the preprocessed weights for Vicuna. After downloading the weights, you change the following line in `minigpt4/configs/models/minigpt4.yaml`.
-```
-16: llama_model: "path-to-llama-preprocessed-weights"
-```
-Please download the minigpt4 weights [here](https://drive.google.com/file/d/1RY9jV0dyqLX-o38LrumkKRh6Jtaop58R/view) and change the link in `eval_configs/minigpt4_eval.yaml`:
-```
-11: ckpt: 'path-to-prerained_minigpt4_7b-weights'
-```
+If you want to generate the dataset from scratch, please follow the process [here](./data/scratch.md). The format for each of the data parts is described in [format](./data/format.md).
 
-### Setup FastSAM
+## Evaluation
 
-```
-git clone https://github.com/CASIA-IVA-Lab/FastSAM parent-folder
-mv parent-folder/FastSAM/fastsam ./
-rm -rf parent-folder
-```
-Download the weights from [here](https://drive.google.com/file/d/1m1sjY4ihXBU1fZXdQ-Xdj-mDltW-2Rqv/view)
+Evaluation on Talk2BEV happens via 2 methods - MCQs (from Talk2BEV-Bench) and Spatial Operators. We use GPT-4 for our evaluation. Please follow the instructions in [GPT-4](https://platform.openai.com/) and initialize the API key and Organization in your os env.
 
-### Install SAM (optional)
+```bash
+ORGANIZATION=<your-organization>
+API_KEY=<your-api-key>
 ```
-pip3 install segment-anything
-```
-Download the sam weights from [here](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth).
 
-## Usage (for Click2Chat Interface)
-If using LLaVa
-```
-python click2chat_llava.py --sam-checkpoint <path-to-sam-checkpoint> --conv-mode <conversion mode, default is llava v1> --model-path <path-llava-model> --gpu-id <gpu num>
-```
+### Talk2BEV-Bench
+
+TO BE RELEASED
 
-If using MiniGPT-4
+### Evaluating - MCQs
+
+To obtain the accuracy for a MCQs, please run the following command:
+
+```bash
+python eval_mcq.py
 ```
-python click2chat_minigpt4.py --sam-checkpoint <path-to-sam-checkpoint> --conv-mode <conversion mode, default is llava v1> --model-path <path-llava-model> --gpu-id <gpu num>
+
+This will yield the accuracy for the MCQs.
+
+### Evaluating Spatial Operators
+
+TO BE RELEASED
+
+## Click2Chat
+
+We also allow free-form conversation with the BEV. Please follow the instructions in [Click2Chat](./click2chat/README.md) to chat with the BEV.
+
+## TODO
+
 ```
+[ ] Spatial operators evaluation pipeline
+[ ] Release Talk2BEV-Bench
+```
diff --git a/click2chat/README.md b/click2chat/README.md
@@ -0,0 +1,26 @@
+# Talk2BEV Click2Chat Interface
+
+Code for:
+
+1. Click2Chat interface
+
+2. JSON generation
+
+## Installation
+Please run the following commands
+### Setup Talk2BEV
+
+```
+git clone https://github.com/llm-bev/talk2bev
+```
+
+## Usage (for Click2Chat Interface)
+If using LLaVa
+```
+python click2chat/click2chat_llava.py --sam-checkpoint <path-to-sam-checkpoint> --conv-mode <conversion mode, default is llava v1> --model-path <path-llava-model> --gpu-id <gpu num>
+```
+
+If using MiniGPT-4
+```
+python click2chat/click2chat_minigpt4.py --sam-checkpoint <path-to-sam-checkpoint> --conv-mode <conversion mode, default is llava v1> --model-path <path-llava-model> --gpu-id <gpu num>
+```
diff --git a/click2chat_llava.py → click2chat/click2chat_llava.py b/click2chat_llava.py → click2chat/click2chat_llava.py
diff --git a/click2chat_minigpt4.py → click2chat/click2chat_minigpt4.py b/click2chat_minigpt4.py → click2chat/click2chat_minigpt4.py