Skip to content

Commit

Permalink
added all files
Browse files Browse the repository at this point in the history
  • Loading branch information
Vikr-182 committed Oct 1, 2023
1 parent 71ae209 commit f97551c
Show file tree
Hide file tree
Showing 59 changed files with 19,317 additions and 548 deletions.
100 changes: 44 additions & 56 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Talk2BEV

[**Project Page**](https://llmbev.github.io/talk2bev/) |
[**Paper**](https://llmbev.github.io/talk2bev/docs/assets/pdf/talk2bev.pdf) |
[**Paper**](https://llmbev.github.io/talk2bev/assets/pdf/talk2bev.pdf) |
[**ArXiv**]() |
[**Video**](https://www.youtube.com/watch?v=TMht-8SGJ0I)

Expand All @@ -17,75 +17,63 @@

![Splash Figure](./docs/static/images/talk2bev_teaser-1.png)

Code for:
## Abstract

1. Click2Chat interface
We introduce Talk2BEV, a large vision- language model (LVLM) interface for bird’s-eye view (BEV) maps commonly used in autonomous driving.

2. JSON generation
While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV eliminates the need for BEV- specific training, relying instead on performant pre-trained LVLMs. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision- making based on visual cues.

## Installation
Please run the following commands
### Setup Talk2BEV
We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret freefrom natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encom- passing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.

```
git clone https://github.com/llm-bev/talk2bev
```
## Data Preparation

### Setup LLava
Please download the NuScenes v1.0-trainval dataset. Our dataset consists of 2 parts - Talk2BEV-Base and Talk2BEV-Captions, consisting of base folders and captions respectively.

```
git clone https://github.com/haotian-liu/LLaVA parent-folder
mv parent-folder/llava ./
rm -rf parent-folder
```
### Download links

### Setup LLava
First, you need to clone the repo -
Links to the Talk2BEV dataset (_Talk2BEV-Base_ and _Talk2BEV-Captions_) are provided below. The dataset is hosted on Google Drive. Please download the dataset and extract the files to the `data` folder.

```
git clone https://github.com/haotian-liu/LLaVA parent-folder
mv parent-folder/llava ./
rm -rf parent-folder
```
Please download the preprocessed weights for [vicuna-13b](https://huggingface.co/liuhaotian/llava-v1-0719-336px-lora-vicuna-13b-v1.3)
| Data Parts | Link |
| --- | --- |
| Talk2BEV-Base | [link]() |
| Talk2BEV-Captions | [link]() |

### Setup MiniGPT-4 (optional)
```
git clone https://github.com/Vision-CAIR/MiniGPT-4 parent-folder
mv parent-folder/minigpt4 ./
rm -rf parent-folder
```
Please download the preprocessed weights for Vicuna. After downloading the weights, you change the following line in `minigpt4/configs/models/minigpt4.yaml`.
```
16: llama_model: "path-to-llama-preprocessed-weights"
```
Please download the minigpt4 weights [here](https://drive.google.com/file/d/1RY9jV0dyqLX-o38LrumkKRh6Jtaop58R/view) and change the link in `eval_configs/minigpt4_eval.yaml`:
```
11: ckpt: 'path-to-prerained_minigpt4_7b-weights'
```
If you want to generate the dataset from scratch, please follow the process [here](./data/scratch.md). The format for each of the data parts is described in [format](./data/format.md).

### Setup FastSAM
## Evaluation

```
git clone https://github.com/CASIA-IVA-Lab/FastSAM parent-folder
mv parent-folder/FastSAM/fastsam ./
rm -rf parent-folder
```
Download the weights from [here](https://drive.google.com/file/d/1m1sjY4ihXBU1fZXdQ-Xdj-mDltW-2Rqv/view)
Evaluation on Talk2BEV happens via 2 methods - MCQs (from Talk2BEV-Bench) and Spatial Operators. We use GPT-4 for our evaluation. Please follow the instructions in [GPT-4](https://platform.openai.com/) and initialize the API key and Organization in your os env.

### Install SAM (optional)
```bash
ORGANIZATION=<your-organization>
API_KEY=<your-api-key>
```
pip3 install segment-anything
```
Download the sam weights from [here](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth).

## Usage (for Click2Chat Interface)
If using LLaVa
```
python click2chat_llava.py --sam-checkpoint <path-to-sam-checkpoint> --conv-mode <conversion mode, default is llava v1> --model-path <path-llava-model> --gpu-id <gpu num>
```
### Talk2BEV-Bench

TO BE RELEASED

If using MiniGPT-4
### Evaluating - MCQs

To obtain the accuracy for a MCQs, please run the following command:

```bash
python eval_mcq.py
```
python click2chat_minigpt4.py --sam-checkpoint <path-to-sam-checkpoint> --conv-mode <conversion mode, default is llava v1> --model-path <path-llava-model> --gpu-id <gpu num>

This will yield the accuracy for the MCQs.

### Evaluating Spatial Operators

TO BE RELEASED

## Click2Chat

We also allow free-form conversation with the BEV. Please follow the instructions in [Click2Chat](./click2chat/README.md) to chat with the BEV.

## TODO

```
[ ] Spatial operators evaluation pipeline
[ ] Release Talk2BEV-Bench
```
26 changes: 26 additions & 0 deletions click2chat/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Talk2BEV Click2Chat Interface

Code for:

1. Click2Chat interface

2. JSON generation

## Installation
Please run the following commands
### Setup Talk2BEV

```
git clone https://github.com/llm-bev/talk2bev
```

## Usage (for Click2Chat Interface)
If using LLaVa
```
python click2chat/click2chat_llava.py --sam-checkpoint <path-to-sam-checkpoint> --conv-mode <conversion mode, default is llava v1> --model-path <path-llava-model> --gpu-id <gpu num>
```

If using MiniGPT-4
```
python click2chat/click2chat_minigpt4.py --sam-checkpoint <path-to-sam-checkpoint> --conv-mode <conversion mode, default is llava v1> --model-path <path-llava-model> --gpu-id <gpu num>
```
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit f97551c

Please sign in to comment.